Changes

Jump to: navigation, search

UCT

1,494 bytes added, 12:38, 11 August 2018
no edit summary
* C is a constant to adjust the amount of exploration and incorporates the sqrt(2) from the UCB1 formula
The first component of the UCB1 formula above corresponds to exploitation, as it is high for moves with high average win ratio. The second component corresponds to exploration, since it is high for moves with few simulations.  ==RAVE==Most contemporary implementations of MCTS are based on some variant of UCT <ref>[https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Exploration_and_exploitation Exploration and exploitation - in Monte Carlo tree search from Wikipedia]</ref>. Modifications have been proposed, with the aim of shortening the time to find good moves. They can be divided into improvements based on expert knowledge and into domain-independent improvements in the playouts, and in building the tree in modifying the exploitation part of the UCB1 formula, for instance in [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Improvements Rapid Action Value Estimation] ('''RAVE''') <ref> [[Sylvain Gelly]], [[David Silver]] ('''2011'''). ''Monte-Carlo tree search and rapid action value estimation in computer Go''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 175, No. 11, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/mcrave.pdf preprint as pdf]</ref> considering [[Transposition|transpositions]]. ==PUCT==[[Christopher D. Rosin|Chris Rosin's]] [[Christopher D. Rosin#PUCT|PUCT]] modifies the original UCB1 multi-armed bandit policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials ('Predictor' + UCB = PUCB) <ref>[[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf]</ref>. A variation of PUCT was used in the [[AlphaGo]] and [[AlphaZero]] projects <ref>[[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529</ref> , and subsequently also in [[Leela Zero]] and [[Leela Chess Zero]] <ref>[https://github.com/LeelaChessZero/lc0/wiki/FAQ FAQ · LeelaChessZero/lc0 Wiki · GitHub]</ref>. 
=Quotes=

Navigation menu