Changes

Jump to: navigation, search

Temporal Difference Learning

1,637 bytes added, 09:53, 15 April 2021
no edit summary
* [[Don Beal]] ('''2002'''). ''[https://www.researchgate.net/publication/221556841_TD_mu_A_Modificaiton_of_TD_lambda_That_Enables_a_Program_to_Learn_Weights_for_Good_Play_Even_if_It_Observes_Only_Bad_Play TD(µ): A Modification of TD(λ) That Enables a Program to Learn Weights for Good Play Even if It Observes Only Bad Play]''. [https://dblp.org/db/conf/jcis/jcis2002 JCIS 2002]
'''2003'''
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]'''2004''', [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.810&rep=rep1&type=pdf pdf]* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''[https://www.semanticscholar.org/paper/Learning-to-Play-Chess-using-TD(lambda)-learning-Mannen-Wiering/00a6f81c8ebe8408c147841f26ed27eb13fb07f3 Learning to play chess using TD(λ)-learning with database games]''. [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04, [https://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/learning-chess.pdf pdf]
* [[Marco Block-Berlitz|Marco Block]] ('''2004'''). ''Verwendung von Temporale-Differenz-Methoden im Schachmotor FUSc#''. Diplomarbeit, Betreuer: [[Raúl Rojas]], [[Free University of Berlin]], [http://page.mi.fu-berlin.de/block/Skripte/diplomarbeit.pdf pdf] (German)
* [[Jacek Mańdziuk]], [[Daniel Osman]] ('''2004'''). ''Temporal Difference Approach to Playing Give-Away Checkers''. [http://www.informatik.uni-trier.de/~ley/db/conf/icaisc/icaisc2004.html#MandziukO04 ICAISC 2004], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICAISC04-3.pdf pdf]
'''2008'''
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2008'''). ''An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning''. [http://www.csse.uwa.edu.au/cig08/Proceedings/toc.html CIG'08], [http://www.csse.uwa.edu.au/cig08/Proceedings/papers/8010.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [httphttps://wwwdblp.sztakiuni-trier.hude/%7Eszcsabadb/papersconf/gtdnips08nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf] (draft)
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol. 31, No. 4]]
* [[Albrecht Fiebiger]] ('''2008'''). ''Einsatz von allgemeinen Evaluierungsheuristiken in Verbindung mit der Reinforcement-Learning-Strategie in der Schachprogrammierung''. [https://de.wikipedia.org/wiki/Besondere_Lernleistung Besondere Lernleistung] im [https://de.wikipedia.org/wiki/Fachgebiet Fachbereich] [https://de.wikipedia.org/wiki/Informatik Informatik], [https://en.wikipedia.org/wiki/Federal_School_of_Saxony%E2%80%93Saint_Afra Sächsischees Landesgymnasium Sankt Afra], Internal advisor: Ralf Böttcher, External advisors: [[Stefan Meyer-Kahlen]], [[Marco Block-Berlitz|Marco Block]], [http://page.mi.fu-berlin.de/block/abschlussarbeiten/Fiebiger_BeLL.pdf pdf] (German)
'''2009'''
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' . [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], [httphttps://bookspapers.nips.cc/paperspaper/files2009/nips22file/NIPS2009_11213a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]
* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. NIPS 2009, [http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''. ICML-09, [httphttps://wwwdblp.sztakiuni-trier.hude/db/~szcsabaconf/papersicml/GTD-ICML09icml2009.pdf pdfhtml#SuttonMPBSSW09 ICML 2009]
* [[Simon Lucas]] ('''2009'''). ''[https://ieeexplore.ieee.org/document/5286496 Temporal difference learning with interpolated table value functions]''. [[IEEE#CIG|CIG 2009]]
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2009'''). ''Coevolutionary Temporal Difference Learning for Othello''. [[IEEE#CIG|GIG 2009]], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert09coevolutionary.pdf pdf]
==2010 ...==
* [[Marco Wiering]] ('''2010'''). ''Self-play and using an expert to learn to play backgammon with temporal difference learning''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[httphttps://www.incompleteideasresearchgate.net/suttonpublication/publications.html#GQ 215990384_GQlambda_A_general_gradient_algorithm_for_temporal-difference_prediction_learning_with_eligibility_traces GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. In Proceedings of the Third Conference on Artificial General Intelligence'''2011'''[https://agi-conf.org/2010/ AGI 2010]* [[Hamid Reza Maei]] ('''2011'''). ''[https://era.library.ualberta.ca/items/fd55edcb-ce47-4f84-84e2-be281d27b16a Gradient Temporal-Difference Learning Algorithms]''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf]
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
* [[I-Chen Wu]], [[Hsin-Ti Tsai]], [[Hung-Hsuan Lin]], [[Yi-Shan Lin]], [[Chieh-Min Chang]], [[Ping-Hung Lin]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Temporal Difference Learning for Connect6]''. [[Advances in Computer Games 13]]
* [[Krzysztof Krawiec]], [[Wojciech Jaśkowski]], [[Marcin Szubert]] ('''2011'''). ''[http://www.degruyter.com/view/j/amcs.2011.21.issue-4/v10006-011-0057-3/v10006-011-0057-3.xml Evolving small-board Go players using Coevolutionary Temporal Difference Learning with Archives]''. [http://www.degruyter.com/view/j/amcs Applied Mathematics and Computer Science], Vol. 21, No. 4
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2011'''). ''Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning''. [http://control.ibspan.waw.pl:3000/mainpage Control and Cybernetics], Vol. 40, No. 3, [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2011learning.pdf pdf]
'''2012'''
* [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. in [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] (eds.). ''Reinforcement learning: State-of-the-art''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
'''2013'''
* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]
* [[Florian Kunz]] ('''2013'''). ''An Introduction to Temporal Difference Learning''. Seminar on Autonomous Learning Systems, [[Darmstadt University of Technology|TU Darmstad]], [http://www.ausy.informatik.tu-darmstadt.de/uploads/Teaching/AutonomousLearningSystems/Kunz_ALS_2013.pdf pdf]
'''2014'''
* [[I-Chen Wu]], [[Kun-Hao Yeh]], [[Chao-Chin Liang]], [[Chia-Chuan Chang]], [[Han Chiang]] ('''2014'''). ''Multi-Stage Temporal Difference Learning for 2048''. [[TAAI 2014]]
* [[Wojciech Jaśkowski]], [[Marcin Szubert]], [[Paweł Liskowski]] ('''2014'''). ''Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello''. [http://www.evostar.org/2014/ EvoApplications 2014], [http://www.springer.com/computer/theoretical+computer+science/book/978-3-662-45522-7 Springer, volume 8602]
* [[Matthew Lai]] ('''2015'''). ''Giraffe: Using Deep Reinforcement Learning to Play Chess''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Imperial_College_London Imperial College London], [http://arxiv.org/abs/1509.01549v1 arXiv:1509.01549v1] » [[Giraffe]]
* [[Markus Thill]] ('''2015'''). ''Temporal Difference Learning Methods with Automatic Step-Size Adaption for Strategic Board Games: Connect-4 and Dots-and-Boxes''. Master thesis, [https://en.wikipedia.org/wiki/Technical_University_of_Cologne Technical University of Cologne], Campus Gummersbach, [http://www.gm.fh-koeln.de/~konen/research/PaperPDF/MT-Thill2015-final.pdf pdf]
'''2016'''
* [[Kazuto Oka]], [[Kiminori Matsuzaki]] ('''2016'''). ''Systematic Selection of N-tuple Networks for 2048''. [[CG 2016]]
* [[Huizhen Yu]], [[A. Rupam Mahmood]], [[Richard Sutton]] ('''2017'''). ''On Generalized Bellman Equations and Temporal-Difference Learning''. Canadian Conference on AI 2017, [https://arxiv.org/abs/1704.04463 arXiv:1704.04463]
'''2017'''
* [[William Uther]] ('''2017'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_817 Temporal Difference Learning]''. in [https://en.wikipedia.org/wiki/Claude_Sammut Claude Sammut], [https://en.wikipedia.org/wiki/Geoff_Webb Geoffrey I. Webb] (eds) ('''2017'''). ''[https://link.springer.com/referencework/10.1007%2F978-1-4899-7687-1 Encyclopedia of Machine Learning and Data Mining]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
==2020 ...==
* [https://scholar.google.ca/citations?user=yVtSOt8AAAAJ&hl=en Emmanuel Bengio], [[Joelle Pineau]], [[Doina Precup]] ('''2020'''). ''Interference and Generalization in Temporal Difference Learning''. [https://arxiv.org/abs/2003.06350 arXiv:2003.06350]
* [https://scholar.google.ca/citations?user=4C5wrXIAAAAJ&hl=en Joshua Romoff], [https://scholar.google.com/citations?user=dy_JBs0AAAAJ&hl=en Peter Henderson], [https://scholar.google.com/citations?user=HUmLDxcAAAAJ&hl=en David Kanaa], [https://scholar.google.ca/citations?user=yVtSOt8AAAAJ&hl=en Emmanuel Bengio], [https://scholar.google.com/citations?user=D4LT5xAAAAAJ&hl=en Ahmed Touati], [https://scholar.google.ca/citations?user=9H77FYYAAAAJ&hl=en Pierre-Luc Bacon], [[Joelle Pineau]] ('''2020'''). ''TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?'' [https://arxiv.org/abs/2007.02786 arXiv:2007.02786]
=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=57860 td-leaf] by [[Alexandru Mosoi]], [[CCC]], October 06, 2015 » [[Automated Tuning]]
* [http://www.talkchess.com/forum/viewtopic.php?t=62053 TD-leaf(lambda)] by [[Robert Pope]], [[CCC]], November 09, 2016
* [https://www.game-ai-forum.org/viewtopic.php?f=21&t=695 TD(1)] by [[Rémi Coulom]], [[Computer Chess Forums|Game-AI Forum]], November 20, 2019 » [[Automated Tuning]]
==2020 ...==
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810 Board adaptive / tuning evaluation function - no NN/AI] by Moritz Gedig, [[CCC]], January 14, 2020
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=77053 TD learning by self play (TD-Gammon)] by [[Chris Whittington]], [[CCC]], April 10, 2021
=External Links=

Navigation menu