Jump to: navigation, search

Automated Tuning

18 bytes added, 16:47, 22 August 2020
no edit summary
<span id="ReinformentLearning"></span>
=Reinforcement Learning=
[[Reinforcement Learning|Reinforcement learning]], in particular [[Temporal Difference Learning|temporal difference learning]], has a long history in tuning evaluation weights in game programming, first seeen in the late 50s by [[Arthur Samuel]] in his [[Checkers]] player <ref>[[Arthur Samuel]] ('''1959'''). ''[!OpenDocument Some Studies in Machine Learning Using the Game of Checkers]''. IBM Journal July 1959</ref>. In self play against a stable copy of itself, after each move, the weights of the evaluation function were adjusted in a way that the [[Score|score]] of the [[Root|root position]] after a [[Quiescence Search|quiescence search]] became closer to the score of the full search. This TD method was generalized and formalized by [[Richard Sutton]] in 1988 <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [ Machine Learning], Vol. 3, No. 1, [ pdf]</ref>, who introduced the decay parameter '''λ''', where proportions of the score came from the outcome of [ Monte Carlo] simulated games, tapering between [ bootstrapping] (λ = 0) and Monte Carlo (λ = 1). [[Temporal Difference Learning#TDLamba|TD-λ]] was famously applied by [[Gerald Tesauro]] in his [[Backgammon]] program [ TD-Gammon] <ref>[[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [ ML 1992]</ref> <ref>[[Gerald Tesauro]] ('''1994'''). ''TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play''. [ Neural Computation Vol. 6, No. 2]</ref>, its [[Minimax|minimax]] adaption adaptation [[Temporal Difference Learning#TDLeaf|TD-Leaf]] was successful used in eval tuning of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [ pdf]</ref> and [[CilkChess]] <ref>[ The Cilkchess Parallel Chess Program]</ref> as prominent samples.
<span id="SupervisedLearning"></span>
=Supervised Learning=
==Move AdaptionAdaptation==<span id="MoveAdaption"></span>One [[Supervised Learning|supervised learning]] method considers desired moves from a set of positions, likely from grandmaster games, and tries to adjust their evaluation weights so that for instance a one-ply search agrees with the desired move. Already pioneering in reinforcement learning some years before, move adaption adaptation was described by [[Arthur Samuel]] in 1967 as used in the second version of his checkers player <ref>[[Arthur Samuel]] ('''1967'''). ''Some Studies in Machine Learning. Using the Game of Checkers. II-Recent Progress''. [ pdf]</ref>, where a structure of stacked linear evaluation functions was trained by computing a correlation measure based on the number of times the feature rated an alternative move higher than the desired move played by an expert <ref>[[Johannes Fürnkranz]] ('''2000'''). ''Machine Learning in Games: A Survey''. [ Austrian Research Institute for Artificial Intelligence], OEFAI-TR-2000-3, [ pdf]</ref>. In chess, move adaption adaptation was first described by [[Thomas Nitsche]] in 1982 <ref>[[Thomas Nitsche]] ('''1982'''). ''A Learning Chess Program.'' [[Advances in Computer Chess 3]]</ref>, and with some extensions by [[Tony Marsland]] in 1985 <ref>[[Tony Marsland]] ('''1985'''). ''Evaluation-Function Factors''. [[ICGA Journal#8_2|ICCA Journal, Vol. 8, No. 2]], [ pdf]</ref>. [[Eval Tuning in Deep Thought]] as mentioned by [[Feng-hsiung Hsu]] et al. in 1990 <ref>[[Feng-hsiung Hsu]], [[Thomas Anantharaman]], [[Murray Campbell]], [[Andreas Nowatzyk]] ('''1990'''). ''[ A Grandmaster Chess Machine]''. [[Scientific American]], Vol. 263, No. 4, pp. 44-50. ISSN 0036-8733.</ref>, and later published by [[Andreas Nowatzyk]], is also based on an extended form of move adaption adaptation <ref>see ''2.1 Learning from Desired Moves in Chess'' in [[Kunihito Hoki]], [[Tomoyuki Kaneko]] ('''2014'''). ''[ Large-Scale Optimization for Evaluation Functions with Minimax Search]''. [ JAIR Vol. 49]</ref>. [[Jonathan Schaeffer|Jonathan Schaeffer's]] and [[Paul Lu|Paul Lu's]] efforts to make Deep Thought's approach work for [ Chinook] in 1990 failed <ref>[[Jonathan Schaeffer]], [[Joe Culberson]], [[Norman Treloar]], [[Brent Knight]], [[Paul Lu]], [[Duane Szafron]] ('''1992'''). ''A World Championship Caliber Checkers Program''. [ Artificial Intelligence], Vol. 53, Nos. 2-3,[ ps]</ref> - nothing seemed to produce results that were as good than their hand-tuned effort <ref>[[Jonathan Schaeffer]] ('''1997, 2009'''). ''[ One Jump Ahead]''. 7. The Case for the Prosecution, pp. 111-114</ref>.
==Value AdaptionAdaptation ==<span id="ValueAdaption"></span>A second supervised learning approach used to tune evaluation weights is based on [ regression] of the desired value, i.e. using the final outcome from huge sets of positions from quality games, or other information supplied by a supervisor, i.e. in form of annotations from [ position evaluation symbols]. Often, value adaption adaptation is reinforced by determining an expected outcome by self play <ref>[[Bruce Abramson]] ('''1990'''). ''[ Expected-Outcome: A General Model of Static Evaluation]''. [[IEEE#TPAMI|IEEE Transactions on Pattern Analysis and Machine Intelligence]], Vol. 12, No. 2</ref>.
| style="vertical-align:top;" | The supervised problem of regression applied to [[Automated Tuning#MoveAdaption|move adaptionadaptation]] was used by [[Thomas Nitsche]] in 1982, minimizing the [ mean squared error] of a cost function considering the program’s and a grandmaster’s choice of moves, as mentioned, extended by [[Tony Marsland]] in 1985, and later by the [[Deep Thought]] team. Regression used to [[Automated Tuning#ValueAdaption|adapt desired values]] was described by [[Donald H. Mitchell]] in his 1984 masters thesis on evaluation features in [[Othello]], cited by [[Michael Buro]] <ref>[[Michael Buro]] ('''1995'''). ''[ Statistical Feature Combination for the Evaluation of Game Positions]''. [ JAIR], Vol. 3</ref> <ref>[[Donald H. Mitchell]] ('''1984'''). ''Using Features to Evaluate Positions in Experts' and Novices' Othello Games''. Masters thesis, Department of Psychology, [[Northwestern University]], Evanston, IL</ref>. [[Jens Christensen]] applied [ linear regression] to chess in 1986 to learn [[Point Value|point values]] in the domain of [[Temporal Difference Learning|temporal difference learning]] <ref>[[Jens Christensen]] ('''1986'''). ''[ Learning Static Evaluation Functions by Linear Regression]''. in [[Tom Mitchell]], [[Jaime Carbonell]], [[Ryszard Michalski]] ('''1986'''). ''[ Machine Learning: A Guide to Current Research]''. The Kluwer International Series in Engineering and Computer Science, Vol. 12</ref>.
| [[FILE:Linear regression.svg|border|left|thumb|baseline|300px|[ Linear Regression] <ref>Random data points and their [ linear regression]. [ Created] with [ Sage] by Sewaqu, November 5, 2010, [ Wikimedia Commons]</ref> ]]

Navigation menu