Changes

Jump to: navigation, search

Temporal Difference Learning

6 bytes removed, 09:55, 23 June 2018
no edit summary
<span id="TDLeaf"></span>
=TDLeaf(λ)=
In games like chess or [[Othello]], due to their [[Tactics|tactical]] nature, [[Search|deep searches]] are necessary for expert performance. The problem has already been recognized and solved by [[Arthur Samuel]] but seemed to have been forgotten later on <ref>[[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]</ref> - rediscovered independently by [[Don Beal]] and [[Martin C. Smith]] in 1997 <ref>[[Don Beal]], [[Martin C. Smith]] ('''1997'''). ''Learning Piece Values Using Temporal Differences''. [[ICGA Journal#20_3|ICCA Journal, Vol. 20, No. 3]]</ref>, and by [[Jonathan Baxter]], [[Andrew Tridgell]], and [[Lex Weaver]] in 1998 <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''19981997'''). ''Experiments in Parameter Learning Using Temporal DifferencesKnightcap: A chess program that learns by combining td(λ) with minimax search''. 15th International Conference on Machine Learning, [[ICGA Journal#21_2|ICCA Journal, Volhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. 21, No54. 28263&rep=rep1&type=pdf pdf]], via [http://csciteseerx.ist.anupsu.edu/viewdoc/summary?doi=10.au/%7ELex1.1.Weaver/pub_sem/publications/ICCA-98_equiv54.pdf pdf8263 citeseerX]</ref>, who coined the term TD-Leaf. TD-Leaf is the adaption of TD(λ) to [[Minimax|minimax]] search, where instead of the corresponding [[Chess Position|positions]] of the [[Root|root]] the [[Leaf Node|leaf nodes]] of the [[Principal Variation|principal variation]] are considered in the weight adjustments. TD-Leaf was successfully used in [[Automated Tuning|evaluation tuning]] of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search''. Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]</ref> and [[CilkChess]] as most prominent samples, while the latter used the improved '''Temporal Coherence Learning''' <ref>[http://supertech.csail.mit.edu/chess/ The Cilkchess Parallel Chess Program]</ref>, which automatically adjusts α and λ <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''[http://portal.acm.org/citation.cfm?id=1624299 Temporal Coherence and Prediction Decay in TD Learning]''. [[Conferences#IJCAI1999|IJCAI 1999]], [http://ijcai.org/Past%20Proceedings/IJCAI-99-VOL-1/PDF/081.pdf pdf]</ref>.
=Quotes=
* [[Mathematician#JNTsitsiklis|John N. Tsitsiklis]], [[Mathematician#BVanRoy|Benjamin Van Roy]] ('''1997'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=bWTPrLEAAAAJ&citation_for_view=bWTPrLEAAAAJ:2osOgNQ5qMEC An Analysis of Temporal Difference Learning with Function Approximation]''. [[IEEE#TAC|IEEE Transactions on Automatic Control]], Vol. 42, No. 5
* [[Don Beal]], [[Martin C. Smith]] ('''1997'''). ''Learning Piece Values Using Temporal Differences''. [[ICGA Journal#20_3|ICCA Journal, Vol. 20, No. 3]]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1997''') ''Knightcap: A chess program that learns by combining td(λ) with minimax search''. 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]
'''1998'''
* [[Don Beal]], [[Martin C. Smith]] ('''1998'''). ''[http://www.springerlink.com/content/l9f4ngc2tqgnac9e/ First Results from Using Temporal Difference Learning in Shogi]''. [[CG 1998]]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search''. [https://www.chatbots.org/journal/australian_journal_of_intelligent_information_processing_systems/ Australian Journal of Intelligent Information Processing Systems], Vol. 5 No. 1, [http://arxiv.org/abs/cs/9901001 arXiv:cs/9901001]* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search''. Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[http://www.incompleteideas.net/sutton/book/the-book.html Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html 6. Temporal-Difference Learning]
* [[Justin A. Boyan]] ('''1998'''). ''Least-Squares Temporal Difference Learning''. [[Carnegie Mellon University]], CMU-CS-98-152, [http://www.research.rutgers.edu/~lihong/project/ahlp/boyan99least.pdf pdf]
'''1999'''
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1999'''). ''TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search''. [https://www.chatbots.org/journal/australian_journal_of_intelligent_information_processing_systems/ Australian Journal of Intelligent Information Processing Systems], Vol. 5 No. 1, [http://arxiv.org/abs/cs/9901001 arXiv:cs/9901001]
* [[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''[http://portal.acm.org/citation.cfm?id=1624299 Temporal Coherence and Prediction Decay in TD Learning]''. [[Conferences#IJCAI1999|IJCAI 1999]], [http://ijcai.org/Past%20Proceedings/IJCAI-99-VOL-1/PDF/081.pdf pdf]
* [[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]

Navigation menu