Difference between revisions of "Meep"

From Chessprogramming wiki
Jump to: navigation, search
Line 91: Line 91:
 
* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. [http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?start=0&t=31667 A paper about parameter tuning] by [[Rémi Coulom]], [[CCC]], January 12, 2010</ref>
 
* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. [http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?start=0&t=31667 A paper about parameter tuning] by [[Rémi Coulom]], [[CCC]], January 12, 2010</ref>
 
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
 
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
* [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. in [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] (eds.). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:abG-DnoFyZgC Reinforcement learning: State-of-the-art]''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
+
* [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. in [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] (eds.). ''Reinforcement learning: State-of-the-art''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
  
 
=Forum Posts=  
 
=Forum Posts=  

Revision as of 14:36, 25 August 2018

Home * Engines * Meep

Strapped woman [1]

Meep,
an experimental chess engine as subject of research on machine learning techniques and automated tuning, written by Joel Veness, supported by David Silver, William Uther, and Alan Blair, as elaborated in their 2009 research paper this page is based on [2] , and in Joel Veness' Ph.D. thesis [3]. Meep is based on the tournament chess engine Bodo, where the hand-crafted evaluation function is replaced by a weighted linear combination of 1812 features. Given a position s, a feature vector Φ(s) can be constructed from the 1812 numeric values of each feature. The majority of these features are binary. Φ(s) is typically sparse, with approximately 100 features active in any given position. Five wellknown, chess specific feature construction concepts, material, piece square tables, pawn structure, mobility and king safety were used to generate the 1812 distinct features. In a training mode with various search frameworks, Meep learns from self-play to adjust the weights of its evaluation function towards the value of the deep search. A tournament mode is later used to verify the strength of a trained weight configuration.

BootStrap

In contrast to temporal difference methods such as TD-Leaf [4] as used in KnightCap, where the target search is performed at subsequent time-steps, after a real move and response have been played, Meep performs various bootstrapping techniques during training, dubbed RootStrap and TreeStrap, to adjust the weights at every time-step inside either a minimax or alpha-beta search. With the heuristic evaluation function as linear combination of

MeepFormula1.jpg

where Φ(s) is a vector of features of position s, and θ is a parameter vector specifying the weight of each feature in the linear combination, following backup rules are given, using V as backed up value of the minimax or alpha-beta search (left arrow theta (←θ) denotes the operator that updates the heuristic function towards some target value):

Algorithm Backup
TD
MeepFormula2.jpg
TD-Root
MeepFormula3.jpg
TD-Leaf
MeepFormula4.jpg
RootStrap(minimax)
MeepFormula5.jpg
TreeStrap(minimax)
MeepFormula6.jpg
TreeStrap(αβ)
MeepFormula7.jpg
TDRootAndLeaf.jpg
MeepStraps.jpg
TD, TD-Root and TD-Leaf backups RootStrap and TreeStrap(minimax) backups [5]

RootStrap

RootStrap(minimax) or the identical RootStrap(αβ) adjust the weights by stochastic gradient descent [6] on the squared error between the static evaluation and the minimax (or alpha-beta) search value of the root.

MeepFormula8.jpg
MeepFormula9.jpg

where η is a step size constant.

TreeStrap

TreeStrap(mm) also considers all interior nodes of the search tree for the stochastic gradient descent on the squared error. The minimax algorithm used for TreeStrap(mm), keeps the entire tree in memory. TreeStrap(αβ) applies a generic implementation, that uses only a few enhancements, transposition table, killer and history heuristics, and check extensions. Bounds computed by alpha-beta can be exploited by using a one-sided loss functions. If the static evaluation is larger than alpha, then it is reduced towards alpha. If the value from the heuristic evaluation is smaller than beta, then it is increased respectively.

Results

A tournament of ~16,000 games of 1 minute per game plus 1 second per move Fischer time between different trained Meep versions and a reference player with randomly initialised weights and arbitrarily assigned rating of 250 was played. Training was previously done by self-play with the same time, using a small opening book to maintain diversity. The target values were determined by at least one ply of full-width search, plus a varying amount of quiescence search. All Elo values are calculated relative to the reference player, the best performance with 95% confidence intervals given [7] :

Algorithm Elo
Untrained 250 ± 63
TD-Leaf 1068 ± 36
RootStrap(αβ) 1362 ± 59
TreeStrap(mm) 1807 ± 32
TreeStrap(αβ) 2157 ± 31

See also

Publications

Forum Posts

Re: A paper about parameter tuning by Joel Veness, CCC, January 15, 2010
Re: A paper about parameter tuning by Joel Veness, CCC, January 15, 2010

External Links

Wearing suspenders or garter belts from Wikipedia
feat: Eberhard Weber, John Marshall, Stan Sulzmann, John Taylor, Cees See, Zbigniew Seifert

References

  1. Cover of Bizarre, Vol. 3 from 1946, depicting a chained woman, and a devil with tailoring tools promising torments due to the wasp-waisted "Fashions of 1946". by John Willie, Wikimedia Commons
  2. Joel Veness, David Silver, William Uther, Alan Blair (2009). Bootstrapping from Game Tree Search. pdf
  3. Joel Veness (2011). Approximate Universal Artificial Intelligence and Self-Play Learning for Games. Ph.D. thesis, University of New South Wales, supervisors: Kee Siong Ng, Marcus Hutter, Alan Blair, William Uther, John Lloyd; pdf
  4. Jonathan Baxter, Andrew Tridgell, Lex Weaver (1998). TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search. Australian Journal of Intelligent Information Processing Systems, Vol. 5 No. 1, arXiv:cs/9901001
  5. Images cropped from Joel Veness, David Silver, William Uther, Alan Blair (2009). Bootstrapping from Game Tree Search. pdf, Figure 1, pp. 2
  6. Nabla operator from Wikipedia
  7. Joel Veness, David Silver, William Uther, Alan Blair (2009). Bootstrapping from Game Tree Search. pdf
  8. A paper about parameter tuning by Rémi Coulom, CCC, January 12, 2010

Up one level