Changes

Jump to: navigation, search

Automated Tuning

2 bytes added, 17:50, 27 July 2020
no edit summary
* [https://en.wikipedia.org/wiki/Time_complexity Time complexity] issues with increasing number of weights to tune
<span id="ReinformentLearning"></span>
=Reinforment Reinforcement Learning=
[[Reinforcement Learning|Reinforcement learning]], in particular [[Temporal Difference Learning|temporal difference learning]], has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by [[Arthur Samuel]] in his [[Checkers]] player <ref>[[Arthur Samuel]] ('''1959'''). ''[http://domino.watson.ibm.com/tchjr/journalindex.nsf/600cc5649e2871db852568150060213c/39a870213169f45685256bfa00683d74!OpenDocument Some Studies in Machine Learning Using the Game of Checkers]''. IBM Journal July 1959</ref>. In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the [[Score|score]] of the [[Root|root position]] after a [[Quiescence Search|quiescence search]] became closer to the score of the full search. This TD method was generalized and formalized by [[Richard Sutton]] in 1988 <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 3, No. 1, [http://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88.pdf pdf]</ref>, who introduced the decay parameter '''λ''', where proportions of the score came from the outcome of [https://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo] simulated games, tapering between [https://en.wikipedia.org/wiki/Bootstrapping#Artificial_intelligence_and_machine_learning bootstrapping] (λ = 0) and Monte Carlo (λ = 1). [[Temporal Difference Learning#TDLamba|TD-λ]] was famously applied by [[Gerald Tesauro]] in his [[Backgammon]] program [https://en.wikipedia.org/wiki/TD-Gammon TD-Gammon] <ref>[[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]</ref> <ref>[[Gerald Tesauro]] ('''1994'''). ''TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play''. [http://www.informatik.uni-trier.de/~ley/db/journals/neco/neco6.html#Tesauro94 Neural Computation Vol. 6, No. 2]</ref>, its [[Minimax|minimax]] adaption [[Temporal Difference Learning#TDLeaf|TD-Leaf]] was successful used in eval tuning of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]</ref> and [[CilkChess]] <ref>[http://supertech.csail.mit.edu/chess/ The Cilkchess Parallel Chess Program]</ref> as prominent samples.

Navigation menu