no edit summary
[[Reinforcement Learning|Reinforcement learning]], in particular [[Temporal Difference Learning|temporal difference learning]], has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by [[Arthur Samuel]] in his [[Checkers]] player <ref>[[Arthur Samuel]] ('''1959'''). ''[http://domino.watson.ibm.com/tchjr/journalindex.nsf/600cc5649e2871db852568150060213c/39a870213169f45685256bfa00683d74!OpenDocument Some Studies in Machine Learning Using the Game of Checkers]''. IBM Journal July 1959</ref>. In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the [[Score|score]] of the [[Root|root position]] after a [[Quiescence Search|quiescence search]] became closer to the score of the full search. This TD method was generalized and formalized by [[Richard Sutton]] in 1988 <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 3, No. 1, [http://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88.pdf pdf]</ref>, who introduced the decay parameter '''λ''', where proportions of the score came from the outcome of [https://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo] simulated games, tapering between [https://en.wikipedia.org/wiki/Bootstrapping#Artificial_intelligence_and_machine_learning bootstrapping] (λ = 0) and Monte Carlo (λ = 1). [[Temporal Difference Learning#TDLamba|TD-λ]] was famously applied by [[Gerald Tesauro]] in his [[Backgammon]] program [https://en.wikipedia.org/wiki/TD-Gammon TD-Gammon] <ref>[[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]</ref> <ref>[[Gerald Tesauro]] ('''1994'''). ''TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play''. [http://www.informatik.uni-trier.de/~ley/db/journals/neco/neco6.html#Tesauro94 Neural Computation Vol. 6, No. 2]</ref>, its [[Minimax|minimax]]
adaption [[Temporal Difference Learning#TDLeaf|TD-Leaf]] was successful used in eval tuning of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]</ref> and [[CilkChess]] <ref>[http://supertech.csail.mit.edu/chess/ The Cilkchess Parallel Chess Program]</ref> as prominent samples.
Adaption==<span id="MoveAdaption"></span>One [[Supervised Learning|supervised learning]] method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. Already pioneering in reinforcement learning some years before, move adaption was described by [[Arthur Samuel]] in 1967 as used in the second version of his checkers player <ref>[[Arthur Samuel]] ('''1967'''). ''Some Studies in Machine Learning. Using the Game of Checkers. II-Recent Progress''. [http://researcher.watson.ibm.com/researcher/files/us-beygel/samuel-checkers.pdf pdf]</ref>, where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert <ref>[[Johannes Fürnkranz]] ('''2000'''). ''Machine Learning in Games: A Survey''. [https://en.wikipedia.org/wiki/Austrian_Research_Institute_for_Artificial_Intelligence Austrian Research Institute for Artificial Intelligence], OEFAI-TR-2000-3, [http://www.ofai.at/cgi-bin/get-tr?download=1&paper=oefai-tr-2000-31.pdf pdf]</ref>. In chess, move adaption was first described by [[Thomas Nitsche]] in 1982 <ref>[[Thomas Nitsche]] ('''1982'''). ''A Learning Chess Program.'' [[Advances in Computer Chess 3]]</ref>, and with some extensions by [[Tony Marsland]] in 1985 <ref>[[Tony Marsland]] ('''1985'''). ''Evaluation-Function Factors''. [[ICGA Journal#8_2|ICCA Journal, Vol. 8, No. 2]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/evaluation.pdf pdf]</ref>. [[Eval Tuning in Deep Thought]] as mentioned by [[Feng-hsiung Hsu]] et al. in 1990 <ref>[[Feng-hsiung Hsu]], [[Thomas Anantharaman]], [[Murray Campbell]], [[Andreas Nowatzyk]] ('''1990'''). ''[http://www.disi.unige.it/person/DelzannoG/AI2/hsu.html A Grandmaster Chess Machine]''. [[Scientific American]], Vol. 263, No. 4, pp. 44-50. ISSN 0036-8733.</ref>, and later published by [[Andreas Nowatzyk]], is also based on an extended form of move adaption <ref>see ''2.1 Learning from Desired Moves in Chess'' in [[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[https://www.jair.org/papers/paper4217.html Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [https://www.jair.org/vol/vol49.html JAIR Vol. 49]</ref>. [[Jonathan Schaeffer|Jonathan Schaeffer's]] and [[Paul Lu|Paul Lu's]] efforts to make Deep Thought's approach work for [https://en.wikipedia.org/wiki/Chinook_%28draughts_player%29 Chinook] in 1990 failed <ref>[[Jonathan Schaeffer]], [[Joe Culberson]], [[Norman Treloar]], [[Brent Knight]], [[Paul Lu]], [[Duane Szafron]] ('''1992'''). ''A World Championship Caliber Checkers Program''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 53, Nos. 2-3,[http://webdocs.cs.ualberta.ca/%7Ejonathan/Papers/Papers/chinook.ps ps]</ref> - nothing seemed to produce results that were as good than their hand-tuned effort <ref>[[Jonathan Schaeffer]] ('''1997, 2009'''). ''[http://www.springer.com/computer/ai/book/978-0-387-76575-4 One Jump Ahead]''. 7. The Case for the Prosecution, pp. 111-114</ref>.
Adaption==<span id="ValueAdaption"></span>A second supervised learning approach used to tune evaluation weights is based on [https://en.wikipedia.org/wiki/Regression regression] of the desired value, i.e. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. in form of annotations from [https://en.wikipedia.org/wiki/Chess_annotation_symbols#Position_evaluation_symbols position evaluation symbols]. Often, value adaption is reinforced by determining an expected outcome by self play <ref>[[Bruce Abramson]] ('''1990'''). ''[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=44404 Expected-Outcome: A General Model of Static Evaluation]''. [[IEEE#TPAMI|IEEE Transactions on Pattern Analysis and Machine Intelligence]], Vol. 12, No. 2</ref>.
| style="vertical-align:top;" | The supervised problem of regression applied to [[Automated Tuning#MoveAdaption|move
adaption]] was used by [[Thomas Nitsche]] in 1982, minimizing the [https://en.wikipedia.org/wiki/Mean_squared_error mean squared error] of a cost function considering the program’s and a grandmaster’s choice of moves, as mentioned, extended by [[Tony Marsland]] in 1985, and later by the [[Deep Thought]] team. Regression used to [[Automated Tuning#ValueAdaption|adapt desired values]] was described by [[Donald H. Mitchell]] in his 1984 masters thesis on evaluation features in [[Othello]], cited by [[Michael Buro]] <ref>[[Michael Buro]] ('''1995'''). ''[http://www.jair.org/papers/paper179.html Statistical Feature Combination for the Evaluation of Game Positions]''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research JAIR], Vol. 3</ref> <ref>[[Donald H. Mitchell]] ('''1984'''). ''Using Features to Evaluate Positions in Experts' and Novices' Othello Games''. Masters thesis, Department of Psychology, [[Northwestern University]], Evanston, IL</ref>. [[Jens Christensen]] applied [https://en.wikipedia.org/wiki/Linear_regression linear regression] to chess in 1986 to learn [[Point Value|point values]] in the domain of [[Temporal Difference Learning|temporal difference learning]] <ref>[[Jens Christensen]] ('''1986'''). ''[http://link.springer.com/chapter/10.1007/978-1-4613-2279-5_9?no-access=true Learning Static Evaluation Functions by Linear Regression]''. in [[Tom Mitchell]], [[Jaime Carbonell]], [[Ryszard Michalski]] ('''1986'''). ''[http://link.springer.com/book/10.1007/978-1-4613-2279-5 Machine Learning: A Guide to Current Research]''. The Kluwer International Series in Engineering and Computer Science, Vol. 12</ref>.
| [[FILE:Linear regression.svg|border|left|thumb|baseline|300px|[https://en.wikipedia.org/wiki/Linear_regression Linear Regression] <ref>Random data points and their [https://en.wikipedia.org/wiki/Linear_regression linear regression]. [https://commons.wikimedia.org/wiki/File:Linear_regression.svg Created] with [https://en.wikipedia.org/wiki/Sage_%28mathematics_software%29 Sage] by Sewaqu, November 5, 2010, [https://en.wikipedia.org/wiki/Wikimedia_Commons Wikimedia Commons]</ref> ]]