Changes

Jump to: navigation, search

Monte-Carlo Tree Search

4,248 bytes added, 16:36, 1 December 2021
no edit summary
'''Monte Carlo Tree Search''', (Monte-Carlo Tree Search, MCTS)<br/>
is a [[Best-First|Best-First search]] algorithm , historically based on [https://en.wikipedia.org/wiki/Randomness random] playouts. In conjunction with [[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) Monte-Carlo Tree Search has yielded in a breakthrough in [[Go|Computer Go]] <ref>[[Rémi Coulom]] ('''2009'''). ''The Monte-Carlo Revolution in Go''. JFFoS'2008: Japanese-French Frontiers of Science Symposium, [http://remi.coulom.free.fr/JFFoS/JFFoS.pdf slides as pdf]</ref>, and is also successful in [[Amazons]] <ref>[[Julien Kloetzer]], [[Hiroyuki Iida]], [[Bruno Bouzy]] ('''2007'''). ''The Monte-Carlo approach in Amazons''. [[CGW 2007]]</ref> <ref>[[Richard J. Lorentz]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_2 Amazons Discover Monte-Carlo]''. [[CG 2008]]</ref>, [[Lines of Action]] <ref>[[Mark Winands]], [[Yngvi Björnsson]] ('''2009'''). ''Evaluation Function Based Monte-Carlo LOA''. [http://www.ru.is/faculty/yngvi/pdf/WinandsB09.pdf pdf]</ref>, [[Havannah]] <ref>[[Richard J. Lorentz]] ('''2010'''). ''[http://www.springerlink.com/content/p4x16832317r1214/ Improving Monte-Carlo Tree Search in Havannah]''. [[CG 2010]]</ref>, [[Hex]] <ref>[[Broderick Arneson]], [[Ryan Hayward]], [[Philip Henderson]] ('''2010'''). ''Monte Carlo Tree Search in Hex''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 4, [http://webdocs.cs.ualberta.ca/~hayward/papers/mcts-hex.pdf pdf]</ref>, [[Checkers]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38554 UCT surprise for checkers !] by [[Daniel Shawul]], [[CCC]], March 25, 2011</ref> and other [[Games]] with some difficulties in position evaluation, but until December 2017, when a [[Google]] [[DeepMind]] team reported on [[AlphaZero]] <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>, not for [[Chess]] <ref>[[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2010'''). ''[http://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1458 On Adversarial Search Spaces and Sampling-Based Planning]''. [http://www.aaai.org/Press/Proceedings/icaps10.php ICAPS 2010]</ref> <ref>[[Oleg Arenz]] ('''2012'''). ''[http://www.ke.tu-darmstadt.de/bibtex/publications/show/2321 Monte Carlo Chess]''. B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf]</ref>.  
MCTS is based on randomized explorations of the [[Search Space|search space]]. Using the results of previous explorations, the algorithm gradually grows a [[Search Tree|game tree]] in [[Memory|memory]], and successively becomes better at accurately estimating the values of the most promising moves <ref>[[Guillaume Chaslot]], [[Mark Winands]], [[Jaap van den Herik]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_6 Parallel Monte-Carlo Tree Search]''. [[CG 2008]], [https://dke.maastrichtuniversity.nl/m.winands/documents/multithreadedMCTS2.pdf pdf]</ref>.
=UCT=
[[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves <ref>[[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]'' ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing [[UCT]], [http://www.sztaki.hu/%7Eszcsaba/papers/ecml06.pdf pdf]</ref>.
In UCT, upper [https://en.wikipedia.org/wiki/Confidence_interval confidence bounds] guide the selection of a node, treating selection as a {https://en.wikipedia.org/wiki/Multi-armed_bandit multi-armed bandit] problem. [[Christopher D. Rosin#PUCT|PUCT]] modifies the original policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials <ref>[[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf]</ref>.
 
=Playouts by NN=
Historically, at the root of MCTS were random and noisy playouts. Many such playouts were necessary to accurately evaluate a state. Since [[AlphaGo]] and [[AlphaZero]] it is not the case anymore. Strong policies and evaluations are now provided by [[Neural Networks|neural networks]] that are trained with [[Reinforcement Learning]]. In AlphaGo and its descendants the policy is used as a prior in the [[Christopher D. Rosin#PUCT|PUCT]] bandit to explore first the most promising moves advised by the neural network policy and the evaluations replace the playouts <ref>[[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]</ref>.
=See also=
* [[:Category:MCTS]]
* [[Deep Learning]]
* [[MCαβ]]
'''2014'''
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''HEPGAME and the Simplification of Expressions''. [http://arxiv.org/abs/1405.6369 CoRR abs/1405.6369]
* [[Marc Lanctot]], [[Mark Winands]], [[Tom Pepels]], [[Nathan Sturtevant]] ('''2014'''). ''Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups''. [https://dblp.uni-trier.de/db/conf/cig/cig2014.html#LanctotWPS14 CIG 2014], [https://arxiv.org/abs/1406.0486 arXiv:1406.0486]
* [[S. Ali Mirsoleimani]], [[Aske Plaat]], [[Jaap van den Herik]], [[Jos Vermaseren]] ('''2014'''). ''Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi''. [http://arxiv.org/abs/1409.4297 CoRR abs/1409.4297] » [[Go]], [[Parallel Search]], [[x86-64]]
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''Why Local Search Excels in Expression Simplification''. [http://arxiv.org/abs/1409.5223 CoRR abs/1409.5223]
* [[Nobuo Araki]], [[Masakazu Muramatsu]], [[Kunihito Hoki]], [[Satoshi Takahashi]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8263 Monte-Carlo Simulation Adjusting]''. [[Conferences#AAAI-2014|AAAI-2014]]
* [[Johannes Heinrich]], [[David Silver]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/WS/AAAIW14/paper/view/8811 Self-Play Monte-Carlo Tree Search in Computer Poker]''. [[Conferences#AAAI-2014|AAAI-2014]]
* [[Simon Lucas]], [[Spyridon Samothrakis]], [[Diego Perez]] ('''2014'''). ''[https://link.springer.com/chapter/10.1007/978-3-662-45523-4_29 Fast Evolutionary Adaptation for Monte Carlo Tree Search]''. [https://dblp.uni-trier.de/db/conf/evoW/evoappl2014.html EvoApplications 2014], [http://www.diego-perez.net/papers/FastEvoMCTS.pdf pdf]
==2015 ...==
* [[Richard J. Lorentz]] ('''2015'''). ''Early Playout Termination in MCTS''. [[Advances in Computer Games 14]]
* [[Chao Gao]], [[Martin Müller]], [[Ryan Hayward]] ('''2018'''). ''Three-Head Neural Network Architecture for Monte Carlo Tree Search''. [[Conferences#IJCAI2018|IJCAI 2018]]
* [[Tobias Joppen]], [[Christian Wirth]], [[Johannes Fürnkranz]] ('''2018'''). ''Preference-Based Monte Carlo Tree Search''. [https://arxiv.org/abs/1807.06286 arXiv:1807.06286]
* [https://dblp.org/pers/hd/b/Ba:Seydou Seydou Ba], [[Takuya Hiraoka]], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/n/Nakata:Toru Toru Nakata], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka] ('''2018'''). ''Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks''. [https://arxiv.org/abs/1809.02378 arXiv:1809.02378]
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''Pipeline Pattern for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18_pos.pdf pdf]
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''A Lock-free Algorithm for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18.pdf pdf]
* [[Tobias Joppen]], [[Johannes Fürnkranz]] ('''2019'''). ''[https://www.groundai.com/project/ordinal-monte-carlo-tree-search/ Ordinal Monte Carlo Tree Search]''. [[Darmstadt University of Technology|TU Darmstadt]], [https://arxiv.org/abs/1901.04274 arXiv:1901.04274]
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
* [[Aline Hufschmitt]], [[Jean-Noël Vittaut]], [[Nicolas Jouandeau]] ('''2019'''). ''Exploiting Game Decompositions in Monte Carlo Tree Search''. [[Advances in Computer Games 16]]
==2020 ...==
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2020'''). ''Monte-Carlo Graph Search for AlphaZero''. [https://arxiv.org/abs/2012.11045 arXiv:2012.11045] » [[AlphaZero]], [[CrazyAra]]
* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2021'''). ''[https://ojs.aaai.org/index.php/ICAPS/article/view/15952 Improving AlphaZero Using Monte-Carlo Graph Search]''. [https://ojs.aaai.org/index.php/ICAPS/issue/view/380 Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling], Vol. 31, [https://www.ml.informatik.tu-darmstadt.de/papers/czech2021icaps_mcgs.pdf pdf]
* [[Maximilian Langer]] ('''2021'''). ''Evaluation of Monte-Carlo Tree Search for Xiangqi''. B.Sc. thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/langer2021xiangqi.pdf pdf] » [[Chinese Chess|Xiangqi]]
=Forum Posts=
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70605 Question to Remi about CrazyZero] by [[Harm Geert Muller]], [[CCC]], April 28, 2019 » [[CrazyZero]]
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70611 SL vs RL] by [[Chris Whittington]], [[CCC]], April 28, 2019
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=70788 How to get the "usual" Multi PV with MCTS engines?] by [[Kai Laskos]], [[CCC]], May 21, 2019 » [[Principal Variation#MultiPV|MultiPV]]
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301 A question to MCTS + NN experts] by [[Maksim Korzh]], [[CCC]], July 17, 2019 » [[Deep Learning]]
: [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301&start=3 Re: A question to MCTS + NN experts] by [[Daniel Shawul]], [[CCC]], July 17, 2019
==2020 ...==
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75658 MCTS evaluation question] by [[Maksim Korzh]], [[CCC]], November 02, 2020
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=77670&start=1 Re: Has CrazyAra really improved because of MTGS ?] by [[Johannes Czech]], [[CCC]], July 08, 2021 » [[CrazyAra]]
=External Links=
==Monte Carlo Tree Search==
* [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search Monte Carlo tree search from Wikipedia]
* [https://int8.io/monte-carlo-tree-search-beginners-guide/ Monte Carlo Tree Search - beginners guide int8.io] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75658&start=4 Re: MCTS evaluation question] by [[Joerg Oster]], [[CCC]], November 03, 2020</ref>* [https://senseis.xmp.net/?MonteCarlo Sensei's Library: Monte Carlo Tree Search]* [httphttps://senseis.xmp.net/?UCT Sensei's Library: UCT]* [httphttps://mcts.ai/index.html Monte Carlo Tree Search - Home] by [[Cameron Browne]]* [http://www.althofer.de/lange-nacht-jena.html Lange Nacht der Wissenschaften - Long Night of Sciences Jena - 2007] by [[Ingo Althöfer]], [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]* [http://web.stanford.edu/~surag/posts/alphazero.html A Simple Alpha(Go) Zero Tutorial] by [[Surag Nair]], [[Stanford University]], December 29, 2017 » [[AlphaZero]], [[Deep Learning]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=66179 A Simple Alpha(Go) Zero Tutorial] by Oliver Roese, [[CCC]], December 30, 2017</ref>
: [https://github.com/suragnair/alpha-zero-general GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)]
* Monte Carlo Tree Search by [https://www.strath.ac.uk/staff/levinejohndr/ [John Levine]], [https://en.wikipedia.org/wiki/University_of_Strathclyde University of Strathclyde], March 05, 2017, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=UXW2yZndl7U|alignment=left|valignment=top}}
* [https://en.chessbase.com/post/monte-carlo-instead-of-alpha-beta Monte Carlo instead of Alpha-Beta?] by [[Stephan Oliver Platz]], [[ChessBase|ChessBase News]], January 30, 2019 » [[Komodo#MCTS|Komodo MCTS]]

Navigation menu