Changes

Jump to: navigation, search

Temporal Difference Learning

No change in size, 22:53, 22 June 2018
no edit summary
<span id="TDLeaf"></span>
=TDLeaf(λ)=
In games like chess or [[Othello]], due to their [[Tactics|tactical]] nature, [[Search|deep searches]] are necessary for expert performance. The problem has already been recognized and solved by [[Arthur Samuel]] but seemed to have been forgotten later on <ref>[[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]</ref> - rediscovered independently by [[Don Beal]] and [[Martin C. Smith]] in 1997 <ref>[[Don Beal]], [[Martin C. Smith]] ('''1997'''). ''Learning Piece Values Using Temporal Differences''. [[ICGA Journal#20_3|ICCA Journal, Vol. 20, No. 3]]</ref>, and by [[Jonathan Baxter]], [[Andrew Tridgell]], and [[Lex Weaver]] in 1998 <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]</ref>, who coined the term TD-Leaf. TD-Leaf is the adaption of TD(λ) to [[Minimax|minimax]] search, where instead of the corresponding [[Chess Position|positions]] of the [[Root|root]] the [[Leaf Node|leaf nodes]] of the [[Principal variationVariation|principal variation]] are considered in the weight adjustments. TD-Leaf was successfully used in [[Automated Tuning|evaluation tuning]] of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search''. Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]</ref> and [[CilkChess]] as most prominent samples, while the latter used the improved '''Temporal Coherence Learning''' <ref>[http://supertech.csail.mit.edu/chess/ The Cilkchess Parallel Chess Program]</ref>, which automatically adjusts α and λ <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''[http://portal.acm.org/citation.cfm?id=1624299 Temporal Coherence and Prediction Decay in TD Learning]''. [[Conferences#IJCAI1999|IJCAI 1999]], [http://ijcai.org/Past%20Proceedings/IJCAI-99-VOL-1/PDF/081.pdf pdf]</ref>.
=Quotes=

Navigation menu