Changes

Temporal Difference Learning

4,205 bytes added, 08:53, 15 April 2021

no edit summary

: [[Meep#RootStrap|RootStrap]]

: [[Meep#TreeStrap|TreeStrap]]

* [[Merlin (HU)|Merlin]]

* [[Morph]]

* [[NeuroChess]]

* [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf]

* [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com]

* [[Ian H. Witten]] ('''1977'''). ''An Adaptive Optimal Controller for Discrete-Time Markov Environments''. [https://en.wikipedia.org/wiki/Information_and_Computation Information and Control], Vol. 34, No. 4, [https://core.ac.uk/download/pdf/82451748.pdf pdf]

==1980 ...==

* [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts]

* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time-Derivative Models of Pavlovian Reinforcement''. in [http://node.realityspline.net/ari/work/neuro/people/showpeople.php?person=faculty/mgabriel.php Michael Gabriel], [http://people.umass.edu/jwmoore/people.htm#JWMoore John Moore] (eds.) ('''1990'''). ''Learning and Computational Neuroscience: Foundations of Adaptive Networks''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-barto-90.pdf pdf]

* [http://dblp.uni-trier.de/pers/hd/y/Yee:Richard_C= Richard C. Yee], [http://dblp.uni-trier.de/pers/hd/s/Saxena:Sharad Sharad Saxena], [[Paul E. Utgoff]], [[Andrew Barto]] ('''1990'''). ''Explaining Temporal Differences to Create Useful Concepts for Evaluating States''. [http://dblp.uni-trier.de/db/conf/aaai/aaai90.html#YeeSUB90 AAAI 1990], [http://www.aaai.org/Papers/AAAI/1990/AAAI90-132.pdf pdf]

* [[Peter Dayan]] ('''1990'''). ''[https://papers.nips.cc/paper/428-navigating-through-temporal-difference Navigating Through Temporal Difference]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-3-1990 NIPS 1990~~], [https://papers.nips.cc/paper/428-navigating-through-temporal-difference.pdf pdf~~]

* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]

* [[Peter Dayan]] ('''1992'''). ''[https://www.researchgate.net/publication/227208155_The_Convergence_of_TDl_for_General_l The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3

* [[Gerald Tesauro]] ('''1992'''). ''[http://dl.acm.org/citation.cfm?id=139616 Practical Issues in Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 8, Nos. 3-4

* [[Michael Gherrity]] ('''1993'''). ''A Game Learning Machine''. Ph.D. thesis, [https://de.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego], advisor [[Mathematician#PKube|Paul Kube]], [http://www.gherrity.org/thesis.pdf pdf], [http://www.top-5000.nl/ps/A%20game%20learning%20machine.pdf pdf]

* [[Peter Dayan]] ('''1993'''). ''Improving generalisation for temporal difference learning: The successor representation''. [https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation], Vol. 5, [http://www.gatsby.ucl.ac.uk/~dayan/papers/sr93.pdf pdf]

* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''~~1994~~1993'''). ''[~~http~~https://~~nic~~papers.~~schraudolph~~nips.~~org~~cc/~~bib2html~~paper/~~b2hd~~820-temporal-difference-learning-of-position-evaluation-in-the-game-of-~~SchDaySej94.html~~ go Temporal Difference Learning of Position Evaluation in the Game of Go]''. [~~http~~https://papers.nips.cc/book/advances-in-neural-information-processing-systems-6-1993 ~~Advances in Neural Information Processing Systems 6~~NIPS 1993] <ref>[http://satirist.org/learn-game/systems/go-net.html Nici Schraudolph’s go networks], review by [[Jay Scott]]</ref>

* [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1994'''). ''TD(λ) converges with Probability 1''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 14, No. 1, [https://www.researchgate.net/profile/Terrence_Sejnowski/publication/228392650_TD_X_Converges_with_Probability/links/54a4afea0cf256bf8bb327a9.pdf?origin=publication_detail pdf]

==1995 ...==

* [[Jonathan Schaeffer]], [[Markian Hlynka]], [[Vili Jussila]] ('''2001'''). ''Temporal Difference Learning Applied to a High-Performance Game-Playing Program''. [http://www.informatik.uni-trier.de/~ley/db/conf/ijcai/ijcai2001.html#SchaefferHJ01 IJCAI 2001]

* [[Don Beal]], [[Martin C. Smith]] ('''2001'''). ''[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V1G-41MJ1SV-7&_user=10&_coverDate=02%2F06%2F2001&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1436661548&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d855cbad10953476dbb92258347c8e94 Temporal difference learning applied to game playing and the results of application to Shogi]''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_%28journal%29 Theoretical Computer Science], Vol. 252, Nos. 1-2

* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''2001'''). ''[~~http~~https://~~nic~~link.~~schraudolph~~springer.~~org~~com/chapter/~~bib2html~~10.1007/~~b2hd~~978-3-7908-~~SchDaySej01.html~~ 1833-8_4 Learning to Evaluate Go Positions via Temporal Difference Methods]''. ~~in [[Norio Baba]], [[Lakhmi C. Jain]] (eds.) ('''2001'''). ''~~[http://jasss.soc.surrey.ac.uk/7/1/reviews/takama.html Computational Intelligence in Games, Studies in Fuzziness and Soft Computing]''. [http://www.springer.com/economics?SGWID=1-165-6-73481-0 Physica-Verlag], [https://papers.cnl.salk.edu/PDFs/Learning%20to%20Evaluate%20Go%20Positions%20Via%20Temporal%20Difference%20Methods%202001-3244.pdf pdf]

* [[Lex Weaver]], [[Jonathan Baxter]] ('''2001'''). ''STD (λ): learning state differences with TD (λ)''. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.7737 CiteSeerX]

'''2002'''

* [[Mark Winands]], [[Levente Kocsis]], [[Jos Uiterwijk]], [[Jaap van den Herik]] ('''2002'''). ''Temporal difference learning and the Neural MoveMap heuristic in the game of Lines of Action''. GAME-ON 2002 » [[Neural MoveMap Heuristic]]

* [[James Swafford]] ('''2002'''). ''Optimizing Parameter Learning using Temporal Differences''. [http://www.aaai.org/Conferences/AAAI/aaai02.php AAAI-02], Student Abstracts, [https://www.aaai.org/Papers/AAAI/2002/AAAI02-150.pdf pdf]

* [[Justin A. Boyan]] ('''2002'''). ''[https://link.springer.com/article/10.1023%2FA%3A1017936530646 Technical Update: Least-Squares Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 49, [http://research.cs.rutgers.edu/~lihong/project/ahlp/boyan02least.pdf pdf]

* [[Don Beal]] ('''2002'''). ''[https://www.researchgate.net/publication/221556841_TD_mu_A_Modificaiton_of_TD_lambda_That_Enables_a_Program_to_Learn_Weights_for_Good_Play_Even_if_It_Observes_Only_Bad_Play TD(µ): A Modification of TD(λ) That Enables a Program to Learn Weights for Good Play Even if It Observes Only Bad Play]''. [https://dblp.org/db/conf/jcis/jcis2002 JCIS 2002]

'''2003'''

* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, ~~[http://students.uu.nl/en/hum/cognitive-artificial-intelligence~~ Cognitive Artiﬁcial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]~~'''2004'''~~, [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.810&rep=rep1&type=pdf pdf]* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''[~~http~~https://~~scholar~~www.~~google~~semanticscholar.~~com~~org/paper/Learning-to-Play-Chess-using-TD(lambda)-learning-Mannen-Wiering/~~citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&pagesize=80&citation_for_view=xVas0I8AAAAJ:7PzlFSSx8tAC~~ 00a6f81c8ebe8408c147841f26ed27eb13fb07f3 Learning to play chess using TD(λ)-learning with database games]''. ~~[http://students.uu.nl/en/hum/cognitive-artificial-intelligence~~ Cognitive Artiﬁcial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04, [https://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/learning-chess.pdf pdf]

* [[Marco Block-Berlitz|Marco Block]] ('''2004'''). ''Verwendung von Temporale-Differenz-Methoden im Schachmotor FUSc#''. Diplomarbeit, Betreuer: [[Raúl Rojas]], [[Free University of Berlin]], [http://page.mi.fu-berlin.de/block/Skripte/diplomarbeit.pdf pdf] (German)

* [[Jacek Mańdziuk]], [[Daniel Osman]] ('''2004'''). ''Temporal Difference Approach to Playing Give-Away Checkers''. [http://www.informatik.uni-trier.de/~ley/db/conf/icaisc/icaisc2004.html#MandziukO04 ICAISC 2004], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICAISC04-3.pdf pdf]

==2005 ...==

* [[Marco Wiering]], [http://dblp.uni-trier.de/pers/hd/p/Patist:Jan_Peter Jan Peter Patist], [[Henk Mannen]] ('''2005'''). ''Learning to Play Board Games using Temporal Difference Methods''. Technical Report, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], UU-CS-2005-048, [http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/learning_games_TR.pdf pdf]

* [[Thomas Philip Runarsson]], [[Simon Lucas]] ('''2005'''). ''Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go''. [[IEEE#EC|IEEE Transactions on Evolutionary Computation]], Vol. 9, No. 6

'''2006'''

* [[Simon Lucas]], [[Thomas Philip Runarsson]] ('''2006'''). ''[http://scholar.google.is/citations?view_op=view_citation&hl=en&user=4eWdc_sAAAAJ&citation_for_view=4eWdc_sAAAAJ:qjMakFHDy7sC Temporal Difference Learning versus Co-Evolution for Acquiring Othello Position Evaluation]''. [[IEEE#CIG|~~IEEE Symposium on Computational Intelligence and Games~~CIG 2006]]

'''2007'''

* [[Edward P. Manning]] ('''2007'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4219046 Temporal Difference Learning of an Othello Evaluation Function for a Small Neural Network with Shared Weights]''. [[IEEE#CIG|IEEE Symposium on Computational Intelligence and AI in Games]]

'''2008'''

* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2008'''). ''An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning''. [http://www.csse.uwa.edu.au/cig08/Proceedings/toc.html CIG'08], [http://www.csse.uwa.edu.au/cig08/Proceedings/papers/8010.pdf pdf]

* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [~~http~~https://~~www~~dblp.~~sztaki~~uni-trier.hude/~~%7Eszcsaba~~db/~~papers~~conf/~~gtdnips08~~nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf] ~~(draft)~~

* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]

* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol. 31, No. 4]]

* [[Marco Block-Berlitz|Marco Block]], Maro Bader, [http://page.mi.fu-berlin.de/tapia/ Ernesto Tapia], Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, [[Raúl Rojas]] ('''2008'''). ''Using Reinforcement Learning in Chess Engines''. Concibe Science 2008, [http://www.micai.org/rcs/ Research in Computing Science]: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, Vol. 35, [http://page.mi.fu-berlin.de/block/concibe2008.pdf pdf]

* [[Albrecht Fiebiger]] ('''2008'''). ''Einsatz von allgemeinen Evaluierungsheuristiken in Verbindung mit der Reinforcement-Learning-Strategie in der Schachprogrammierung''. [https://de.wikipedia.org/wiki/Besondere_Lernleistung Besondere Lernleistung] im [https://de.wikipedia.org/wiki/Fachgebiet Fachbereich] [https://de.wikipedia.org/wiki/Informatik Informatik], [https://en.wikipedia.org/wiki/Federal_School_of_Saxony%E2%80%93Saint_Afra Sächsischees Landesgymnasium Sankt Afra], Internal advisor: Ralf Böttcher, External advisors: [[Stefan Meyer-Kahlen]], [[Marco Block-Berlitz|Marco Block]], [http://page.mi.fu-berlin.de/block/abschlussarbeiten/Fiebiger_BeLL.pdf pdf] (German)

'''2009'''

* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' ~~Accepted in Advances in Neural Information Processing Systems 22~~. [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], ~~Vancouver, BC~~[https://papers.nips. ~~December~~ cc/paper/2009/file/3a15c7d0bbe60300a39f76f8a5ba6896-Paper. ~~MIT Press~~pdf pdf]* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://~~books~~papers.nips.cc/~~papers~~paper/~~files~~3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. NIPS 2009, [http:/~~nips22~~/~~NIPS2009_1121~~jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf]* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''~~. In Proceedings of the 26th International Conference on Machine Learning (ICML-09)~~. [~~http~~https://~~www~~dblp.~~sztaki~~uni-trier.hude/~~~szcsaba~~db/~~papers~~conf/icml/~~GTD-ICML09~~icml2009.~~pdf pdf~~html#SuttonMPBSSW09 ICML 2009]* [[~~Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair~~Simon Lucas]] ('''2009'''). ''[~~http~~https://~~papers~~ieeexplore.~~nips~~ieee.ccorg/~~paper~~document/~~3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search~~5286496 Temporal difference learning with interpolated table value functions]''. [~~http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf~~[IEEE#CIG|CIG 2009]]* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2009'''). ''Coevolutionary Temporal Difference Learning for Othello''. [[IEEE#CIG|~~IEEE Symposium on Computational Intelligence and Games~~GIG 2009]], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert09coevolutionary.pdf pdf]* [[Marcin Szubert]] ('''2009'''). ''Coevolutionary Reinforcement Learning and its Application to Othello''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], [https://mszubert.github.io/papers/Szubert_2009_MSC.pdf pdf]

* [http://www.cs.cmu.edu/~zkolter/ J. Zico Kolter], [[Andrew Ng]] ('''2009'''). ''Regularization and Feature Selection in Least-Squares Temporal Difference Learning''. [http://www.machinelearning.org/archive/icml2009/ ICML 2009], [http://www.cs.cmu.edu/~zkolter/pubs/kolter-icml09b-full.pdf pdf]

==2010 ...==

* [[Marco Wiering]] ('''2010'''). ''~~[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&citation_for_view=xVas0I8AAAAJ:_kc_bZDykSQC~~ Self-play and using an expert to learn to play backgammon with temporal difference learning]''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[~~http~~https://www.~~incompleteideas~~researchgate.net/~~sutton~~publication/~~publications.html#GQ~~ 215990384_GQlambda_A_general_gradient_algorithm_for_temporal-difference_prediction_learning_with_eligibility_traces GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. ~~In Proceedings of the Third Conference on Artificial General Intelligence'''2011'''~~[https://agi-conf.org/2010/ AGI 2010]* [[Hamid Reza Maei]] ('''2011'''). ''[https://era.library.ualberta.ca/items/fd55edcb-ce47-4f84-84e2-be281d27b16a Gradient Temporal-Difference Learning Algorithms]''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]~~], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf~~]

* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]

* [[I-Chen Wu]], [[Hsin-Ti Tsai]], [[Hung-Hsuan Lin]], [[Yi-Shan Lin]], [[Chieh-Min Chang]], [[Ping-Hung Lin]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Temporal Difference Learning for Connect6]''. [[Advances in Computer Games 13]]

* [[Nikolaos Papahristou]], [[Ioannis Refanidis]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Improving Temporal Difference Performance in Backgammon Variants]''. [[Advances in Computer Games 13]], [http://ai.uom.gr/nikpapa/publications/Improving%20Temporal%20Difference%20Learning%20in%20Backgammon%20Variants_ACG13.pdf pdf]

* [[Krzysztof Krawiec]], [[Wojciech Jaśkowski]], [[Marcin Szubert]] ('''2011'''). ''[http://www.degruyter.com/view/j/amcs.2011.21.issue-4/v10006-011-0057-3/v10006-011-0057-3.xml Evolving small-board Go players using Coevolutionary Temporal Difference Learning with Archives]''. [http://www.degruyter.com/view/j/amcs Applied Mathematics and Computer Science], Vol. 21, No. 4

* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2011'''). ''Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning''. [http://control.ibspan.waw.pl:3000/mainpage Control and Cybernetics], Vol. 40, No. 3,[http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2011learning.pdf pdf]~~'''2012'''~~* [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. in [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] (eds.). ''~~[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:abG-DnoFyZgC~~ Reinforcement learning: State-of-the-art]''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]~~'''2013'''~~

* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]

* [[Florian Kunz]] ('''2013'''). ''An Introduction to Temporal Difference Learning''. Seminar on Autonomous Learning Systems, [[Darmstadt University of Technology|TU Darmstad]], [http://www.ausy.informatik.tu-darmstadt.de/uploads/Teaching/AutonomousLearningSystems/Kunz_ALS_2013.pdf pdf]

~~'''2014'''~~

* [[I-Chen Wu]], [[Kun-Hao Yeh]], [[Chao-Chin Liang]], [[Chia-Chuan Chang]], [[Han Chiang]] ('''2014'''). ''Multi-Stage Temporal Difference Learning for 2048''. [[TAAI 2014]]

* [[Wojciech Jaśkowski]], [[Marcin Szubert]], [[Paweł Liskowski]] ('''2014'''). ''Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello''. [http://www.evostar.org/2014/ EvoApplications 2014], [http://www.springer.com/computer/theoretical+computer+science/book/978-3-662-45522-7 Springer, volume 8602]

* [[James L. McClelland]] ('''2015'''). ''[https://web.stanford.edu/group/pdplab/pdphandbook/handbook3.html#handbookch10.html Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises]''. Second Edition, [https://web.stanford.edu/group/pdplab/pdphandbook/handbookli1.html Contents], [https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html Temporal-Difference Learning]

* [[Matthew Lai]] ('''2015'''). ''Giraffe: Using Deep Reinforcement Learning to Play Chess''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Imperial_College_London Imperial College London], [http://arxiv.org/abs/1509.01549v1 arXiv:1509.01549v1] » [[Giraffe]]

* [[Markus Thill]] ('''2015'''). ''Temporal Difference Learning Methods with Automatic Step-Size Adaption for Strategic Board Games: Connect-4 and Dots-and-Boxes''. Master thesis, [https://en.wikipedia.org/wiki/Technical_University_of_Cologne Technical University of Cologne], Campus Gummersbach, [http://www.gm.fh-koeln.de/~konen/research/PaperPDF/MT-Thill2015-final.pdf pdf]

* [[Kazuto Oka]], [[Kiminori Matsuzaki]] ('''2016'''). ''Systematic Selection of N-tuple Networks for 2048''. [[CG 2016]]

* [[Huizhen Yu]], [[A. Rupam Mahmood]], [[Richard Sutton]] ('''2017'''). ''On Generalized Bellman Equations and Temporal-Difference Learning''. Canadian Conference on AI 2017, [https://arxiv.org/abs/1704.04463 arXiv:1704.04463]

* [[William Uther]] ('''2017'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_817 Temporal Difference Learning]''. in [https://en.wikipedia.org/wiki/Claude_Sammut Claude Sammut], [https://en.wikipedia.org/wiki/Geoff_Webb Geoffrey I. Webb] (eds) ('''2017'''). ''[https://link.springer.com/referencework/10.1007%2F978-1-4899-7687-1 Encyclopedia of Machine Learning and Data Mining]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]

==2020 ...==

* [https://scholar.google.ca/citations?user=yVtSOt8AAAAJ&hl=en Emmanuel Bengio], [[Joelle Pineau]], [[Doina Precup]] ('''2020'''). ''Interference and Generalization in Temporal Difference Learning''. [https://arxiv.org/abs/2003.06350 arXiv:2003.06350]

* [https://scholar.google.ca/citations?user=4C5wrXIAAAAJ&hl=en Joshua Romoff], [https://scholar.google.com/citations?user=dy_JBs0AAAAJ&hl=en Peter Henderson], [https://scholar.google.com/citations?user=HUmLDxcAAAAJ&hl=en David Kanaa], [https://scholar.google.ca/citations?user=yVtSOt8AAAAJ&hl=en Emmanuel Bengio], [https://scholar.google.com/citations?user=D4LT5xAAAAAJ&hl=en Ahmed Touati], [https://scholar.google.ca/citations?user=9H77FYYAAAAJ&hl=en Pierre-Luc Bacon], [[Joelle Pineau]] ('''2020'''). ''TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?'' [https://arxiv.org/abs/2007.02786 arXiv:2007.02786]

=Forum Posts=

: [https://www.stmintz.com/ccc/index.php?id=28819 Re: Parameter Tuning] by [[Don Beal]], [[CCC]], October 02, 1998

==2000 ...==

* [https://www.stmintz.com/ccc/index.php?id=117970 Pseudo-code for TD learning] by [[Daniel Homan]], [[CCC]], July 06, 2000

* [https://www.stmintz.com/ccc/index.php?id=147377 any good experiences with genetic algos or temporal difference learning?] by [[Rafael B. Andrist]], [[CCC]], January 01, 2001

* [https://www.stmintz.com/ccc/index.php?id=148342 Temporal Difference] by [[Bas Hamstra]], [[CCC]], January 05, 2001

* [http://www.talkchess.com/forum/viewtopic.php?t=57860 td-leaf] by [[Alexandru Mosoi]], [[CCC]], October 06, 2015 » [[Automated Tuning]]

* [http://www.talkchess.com/forum/viewtopic.php?t=62053 TD-leaf(lambda)] by [[Robert Pope]], [[CCC]], November 09, 2016

* [https://www.game-ai-forum.org/viewtopic.php?f=21&t=695 TD(1)] by [[Rémi Coulom]], [[Computer Chess Forums|Game-AI Forum]], November 20, 2019 » [[Automated Tuning]]

==2020 ...==

* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810 Board adaptive / tuning evaluation function - no NN/AI] by Moritz Gedig, [[CCC]], January 14, 2020

* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=77053 TD learning by self play (TD-Gammon)] by [[Chris Whittington]], [[CCC]], April 10, 2021

=External Links=

: [https://en.wiktionary.org/wiki/temporal temporal - Wiktionary]

* [https://en.wikipedia.org/wiki/Reinforcement_learning#Temporal_difference_methods Reinforcement learning - Temporal difference methods from Wikipedia]

* [https://en.chessbase.com/post/standing-on-the-shoulders-of-giants Standing on the shoulders of giants] by [[Albert Silver]], [[ChessBase|ChessBase News]], September 18, 2019* [[:Category:Shawn Lane|Shawn Lane]], [[:Category:Jonas Hellborg|Jonas Hellborg]], [[:Category:Jeff Sipe|Jeff Sipe]] - [https://en.wikipedia.org/wiki/Temporal_Analogues_of_Paradise Temporal Analogues of Paradise ~~- 2nd Movement~~]~~, [https://en.wikipedia.org/wiki/Atlanta Atlanta] Drums & Percussion, August 20, 1996~~, [https://en.wikipedia.org/wiki/YouTube YouTube] Video: {{#evu:https://www.youtube.com/watch?v=~~oJ_MfyrQT8I~~eN7oj7ejlow|alignment=left|valignment=top}}

=References=

'''[[Learning|Up one level]]'''

[[Category:Quotes]]

[[Category:Dailey Quotes]]

[[Category:Jonas Hellborg]]

[[Category:Shawn Lane]]

[[Category:Jeff Sipe]]

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Temporal Difference Learning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools