Changes

Jump to: navigation, search

Monte-Carlo Tree Search

1,750 bytes added, 16:36, 1 December 2021
no edit summary
'''Monte Carlo Tree Search''', (Monte-Carlo Tree Search, MCTS)<br/>
is a [[Best-First|Best-First search]] algorithm , historically based on [https://en.wikipedia.org/wiki/Randomness random] playouts. In conjunction with [[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) Monte-Carlo Tree Search has yielded in a breakthrough in [[Go|Computer Go]] <ref>[[Rémi Coulom]] ('''2009'''). ''The Monte-Carlo Revolution in Go''. JFFoS'2008: Japanese-French Frontiers of Science Symposium, [http://remi.coulom.free.fr/JFFoS/JFFoS.pdf slides as pdf]</ref>, and is also successful in [[Amazons]] <ref>[[Julien Kloetzer]], [[Hiroyuki Iida]], [[Bruno Bouzy]] ('''2007'''). ''The Monte-Carlo approach in Amazons''. [[CGW 2007]]</ref> <ref>[[Richard J. Lorentz]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_2 Amazons Discover Monte-Carlo]''. [[CG 2008]]</ref>, [[Lines of Action]] <ref>[[Mark Winands]], [[Yngvi Björnsson]] ('''2009'''). ''Evaluation Function Based Monte-Carlo LOA''. [http://www.ru.is/faculty/yngvi/pdf/WinandsB09.pdf pdf]</ref>, [[Havannah]] <ref>[[Richard J. Lorentz]] ('''2010'''). ''[http://www.springerlink.com/content/p4x16832317r1214/ Improving Monte-Carlo Tree Search in Havannah]''. [[CG 2010]]</ref>, [[Hex]] <ref>[[Broderick Arneson]], [[Ryan Hayward]], [[Philip Henderson]] ('''2010'''). ''Monte Carlo Tree Search in Hex''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 4, [http://webdocs.cs.ualberta.ca/~hayward/papers/mcts-hex.pdf pdf]</ref>, [[Checkers]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38554 UCT surprise for checkers !] by [[Daniel Shawul]], [[CCC]], March 25, 2011</ref> and other [[Games]] with some difficulties in position evaluation, but until December 2017, when a [[Google]] [[DeepMind]] team reported on [[AlphaZero]] <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>, not for [[Chess]] <ref>[[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2010'''). ''[http://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1458 On Adversarial Search Spaces and Sampling-Based Planning]''. [http://www.aaai.org/Press/Proceedings/icaps10.php ICAPS 2010]</ref> <ref>[[Oleg Arenz]] ('''2012'''). ''[http://www.ke.tu-darmstadt.de/bibtex/publications/show/2321 Monte Carlo Chess]''. B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf]</ref>.  
MCTS is based on randomized explorations of the [[Search Space|search space]]. Using the results of previous explorations, the algorithm gradually grows a [[Search Tree|game tree]] in [[Memory|memory]], and successively becomes better at accurately estimating the values of the most promising moves <ref>[[Guillaume Chaslot]], [[Mark Winands]], [[Jaap van den Herik]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_6 Parallel Monte-Carlo Tree Search]''. [[CG 2008]], [https://dke.maastrichtuniversity.nl/m.winands/documents/multithreadedMCTS2.pdf pdf]</ref>.
=UCT=
[[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves <ref>[[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]'' ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing [[UCT]], [http://www.sztaki.hu/%7Eszcsaba/papers/ecml06.pdf pdf]</ref>.
In UCT, upper [https://en.wikipedia.org/wiki/Confidence_interval confidence bounds] guide the selection of a node, treating selection as a {https://en.wikipedia.org/wiki/Multi-armed_bandit multi-armed bandit] problem. [[Christopher D. Rosin#PUCT|PUCT]] modifies the original policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials <ref>[[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf]</ref>.
 
=Playouts by NN=
Historically, at the root of MCTS were random and noisy playouts. Many such playouts were necessary to accurately evaluate a state. Since [[AlphaGo]] and [[AlphaZero]] it is not the case anymore. Strong policies and evaluations are now provided by [[Neural Networks|neural networks]] that are trained with [[Reinforcement Learning]]. In AlphaGo and its descendants the policy is used as a prior in the [[Christopher D. Rosin#PUCT|PUCT]] bandit to explore first the most promising moves advised by the neural network policy and the evaluations replace the playouts <ref>[[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]</ref>.
=See also=
* [[Tobias Joppen]], [[Johannes Fürnkranz]] ('''2019'''). ''[https://www.groundai.com/project/ordinal-monte-carlo-tree-search/ Ordinal Monte Carlo Tree Search]''. [[Darmstadt University of Technology|TU Darmstadt]], [https://arxiv.org/abs/1901.04274 arXiv:1901.04274]
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
* [[Aline Hufschmitt]], [[Jean-Noël Vittaut]], [[Nicolas Jouandeau]] ('''2019'''). ''Exploiting Game Decompositions in Monte Carlo Tree Search''. [[Advances in Computer Games 16]]
==2020 ...==
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2020'''). ''Monte-Carlo Graph Search for AlphaZero''. [https://arxiv.org/abs/2012.11045 arXiv:2012.11045] » [[AlphaZero]], [[CrazyAra]]
* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2021'''). ''[https://ojs.aaai.org/index.php/ICAPS/article/view/15952 Improving AlphaZero Using Monte-Carlo Graph Search]''. [https://ojs.aaai.org/index.php/ICAPS/issue/view/380 Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling], Vol. 31, [https://www.ml.informatik.tu-darmstadt.de/papers/czech2021icaps_mcgs.pdf pdf]
* [[Maximilian Langer]] ('''2021'''). ''Evaluation of Monte-Carlo Tree Search for Xiangqi''. B.Sc. thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/langer2021xiangqi.pdf pdf] » [[Chinese Chess|Xiangqi]]

Navigation menu