Difference between revisions of "Monte-Carlo Tree Search"

From Chessprogramming wiki
Jump to: navigation, search
 
(17 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
 
'''Monte Carlo Tree Search''', (Monte-Carlo Tree Search, MCTS)<br/>
 
'''Monte Carlo Tree Search''', (Monte-Carlo Tree Search, MCTS)<br/>
is a [[Best-First|Best-First search]] algorithm based on [https://en.wikipedia.org/wiki/Randomness random] playouts. In conjunction with [[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) Monte-Carlo Tree Search has yielded in a breakthrough in [[Go|Computer Go]] <ref>[[Rémi Coulom]] ('''2009'''). ''The Monte-Carlo Revolution in Go''. JFFoS'2008: Japanese-French Frontiers of Science Symposium, [http://remi.coulom.free.fr/JFFoS/JFFoS.pdf slides as pdf]</ref>, and is also successful in [[Amazons]] <ref>[[Julien Kloetzer]], [[Hiroyuki Iida]], [[Bruno Bouzy]] ('''2007'''). ''The Monte-Carlo approach in Amazons''. [[CGW 2007]]</ref> <ref>[[Richard J. Lorentz]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_2 Amazons Discover Monte-Carlo]''. [[CG 2008]]</ref>, [[Lines of Action]] <ref>[[Mark Winands]], [[Yngvi Björnsson]] ('''2009'''). ''Evaluation Function Based Monte-Carlo LOA''. [http://www.ru.is/faculty/yngvi/pdf/WinandsB09.pdf pdf]</ref>, [[Havannah]] <ref>[[Richard J. Lorentz]] ('''2010'''). ''[http://www.springerlink.com/content/p4x16832317r1214/ Improving Monte-Carlo Tree Search in Havannah]''. [[CG 2010]]</ref>, [[Hex]] <ref>[[Broderick Arneson]], [[Ryan Hayward]], [[Philip Henderson]] ('''2010'''). ''Monte Carlo Tree Search in Hex''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 4, [http://webdocs.cs.ualberta.ca/~hayward/papers/mcts-hex.pdf pdf]</ref>, [[Checkers]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38554 UCT surprise for checkers !] by [[Daniel Shawul]], [[CCC]], March 25, 2011</ref> and other [[Games]] with some difficulties in position evaluation, but until December 2017, when a  [[Google]] [[DeepMind]] team reported on [[AlphaZero]] <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>, not for [[Chess]] <ref>[[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2010'''). ''[http://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1458 On Adversarial Search Spaces and Sampling-Based Planning]''. [http://www.aaai.org/Press/Proceedings/icaps10.php ICAPS 2010]</ref> <ref>[[Oleg Arenz]] ('''2012'''). ''[http://www.ke.tu-darmstadt.de/bibtex/publications/show/2321 Monte Carlo Chess]''.  B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf]</ref>.  
+
is a [[Best-First|Best-First search]] algorithm, historically based on [https://en.wikipedia.org/wiki/Randomness random] playouts. In conjunction with [[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) Monte-Carlo Tree Search has yielded in a breakthrough in [[Go|Computer Go]] <ref>[[Rémi Coulom]] ('''2009'''). ''The Monte-Carlo Revolution in Go''. JFFoS'2008: Japanese-French Frontiers of Science Symposium, [http://remi.coulom.free.fr/JFFoS/JFFoS.pdf slides as pdf]</ref>, and is also successful in [[Amazons]] <ref>[[Julien Kloetzer]], [[Hiroyuki Iida]], [[Bruno Bouzy]] ('''2007'''). ''The Monte-Carlo approach in Amazons''. [[CGW 2007]]</ref> <ref>[[Richard J. Lorentz]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_2 Amazons Discover Monte-Carlo]''. [[CG 2008]]</ref>, [[Lines of Action]] <ref>[[Mark Winands]], [[Yngvi Björnsson]] ('''2009'''). ''Evaluation Function Based Monte-Carlo LOA''. [http://www.ru.is/faculty/yngvi/pdf/WinandsB09.pdf pdf]</ref>, [[Havannah]] <ref>[[Richard J. Lorentz]] ('''2010'''). ''[http://www.springerlink.com/content/p4x16832317r1214/ Improving Monte-Carlo Tree Search in Havannah]''. [[CG 2010]]</ref>, [[Hex]] <ref>[[Broderick Arneson]], [[Ryan Hayward]], [[Philip Henderson]] ('''2010'''). ''Monte Carlo Tree Search in Hex''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 4, [http://webdocs.cs.ualberta.ca/~hayward/papers/mcts-hex.pdf pdf]</ref>, [[Checkers]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38554 UCT surprise for checkers !] by [[Daniel Shawul]], [[CCC]], March 25, 2011</ref> and other [[Games]] with some difficulties in position evaluation, but until December 2017, when a  [[Google]] [[DeepMind]] team reported on [[AlphaZero]] <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>, not for [[Chess]] <ref>[[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2010'''). ''[http://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1458 On Adversarial Search Spaces and Sampling-Based Planning]''. [http://www.aaai.org/Press/Proceedings/icaps10.php ICAPS 2010]</ref> <ref>[[Oleg Arenz]] ('''2012'''). ''[http://www.ke.tu-darmstadt.de/bibtex/publications/show/2321 Monte Carlo Chess]''.  B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf]</ref>.  
 
 
 
MCTS is based on randomized explorations of the [[Search Space|search space]]. Using the results of previous explorations, the algorithm gradually grows a [[Search Tree|game tree]] in [[Memory|memory]], and successively becomes better at accurately estimating the values of the most promising moves <ref>[[Guillaume Chaslot]], [[Mark Winands]], [[Jaap van den Herik]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_6 Parallel Monte-Carlo Tree Search]''. [[CG 2008]], [https://dke.maastrichtuniversity.nl/m.winands/documents/multithreadedMCTS2.pdf pdf]</ref>.   
 
MCTS is based on randomized explorations of the [[Search Space|search space]]. Using the results of previous explorations, the algorithm gradually grows a [[Search Tree|game tree]] in [[Memory|memory]], and successively becomes better at accurately estimating the values of the most promising moves <ref>[[Guillaume Chaslot]], [[Mark Winands]], [[Jaap van den Herik]] ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_6 Parallel Monte-Carlo Tree Search]''. [[CG 2008]], [https://dke.maastrichtuniversity.nl/m.winands/documents/multithreadedMCTS2.pdf pdf]</ref>.   
  
Line 23: Line 22:
 
=UCT=
 
=UCT=
 
[[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves <ref>[[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]'' ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing [[UCT]], [http://www.sztaki.hu/%7Eszcsaba/papers/ecml06.pdf pdf]</ref>.
 
[[UCT]] ('''U'''pper '''C'''onfidence bounds applied to '''T'''rees) deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves <ref>[[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]'' ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing [[UCT]], [http://www.sztaki.hu/%7Eszcsaba/papers/ecml06.pdf pdf]</ref>.
 +
In UCT, upper [https://en.wikipedia.org/wiki/Confidence_interval confidence bounds] guide the selection of a node, treating selection as a {https://en.wikipedia.org/wiki/Multi-armed_bandit multi-armed bandit] problem. [[Christopher D. Rosin#PUCT|PUCT]] modifies the original policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials  <ref>[[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf]</ref>.
 +
 +
=Playouts by NN=
 +
Historically, at the root of MCTS were random and noisy playouts. Many such playouts were necessary to accurately evaluate a state. Since [[AlphaGo]] and [[AlphaZero]] it is not the case anymore.  Strong policies and evaluations are now provided by [[Neural Networks|neural networks]] that are trained with [[Reinforcement Learning]]. In AlphaGo and its descendants the policy is used as a prior in the [[Christopher D. Rosin#PUCT|PUCT]] bandit to explore first the most promising moves advised by the neural network policy and the evaluations replace the playouts <ref>[[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]</ref>.
  
 
=See also=  
 
=See also=  
 +
* [[:Category:MCTS]]
 
* [[Deep Learning]]
 
* [[Deep Learning]]
 
* [[MCαβ]]
 
* [[MCαβ]]
Line 168: Line 172:
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[William Uther]] , [[David Silver]] ('''2011'''). ''A Monte-Carlo AIXI Approximation''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research JAIR], Vol. 40, [http://www.aaai.org/Papers/JAIR/Vol40/JAIR-4004.pdf pdf]
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[William Uther]] , [[David Silver]] ('''2011'''). ''A Monte-Carlo AIXI Approximation''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research JAIR], Vol. 40, [http://www.aaai.org/Papers/JAIR/Vol40/JAIR-4004.pdf pdf]
 
* [[Christopher D. Rosin]] ('''2011'''). '' Nested Rollout Policy Adaptation for Monte Carlo Tree Search''. [[Conferences#IJCAI2011|IJCAI 2011]], [http://www.chrisrosin.com/rosin-ijcai11.pdf pdf]
 
* [[Christopher D. Rosin]] ('''2011'''). '' Nested Rollout Policy Adaptation for Monte Carlo Tree Search''. [[Conferences#IJCAI2011|IJCAI 2011]], [http://www.chrisrosin.com/rosin-ijcai11.pdf pdf]
 +
* [[David Tolpin]], [[Solomon Eyal Shimony]] ('''2011'''). ''Doing Better Than UCT: Rational Monte Carlo Sampling in Trees''. [https://arxiv.org/abs/1108.3711 arXiv:1108.3711] » [[UCT]]
 
'''2012'''
 
'''2012'''
 
* [[Michael L. Littman]] ('''2012'''). ''Technical Perspective: A New Way to Search Game Trees''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3
 
* [[Michael L. Littman]] ('''2012'''). ''Technical Perspective: A New Way to Search Game Trees''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3
Line 173: Line 178:
 
* [[Oleg Arenz]] ('''2012'''). ''Monte Carlo Chess''.  B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf] » [[Stockfish]]
 
* [[Oleg Arenz]] ('''2012'''). ''Monte Carlo Chess''.  B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf] » [[Stockfish]]
 
* [[Jeff Rollason]] ('''2012'''). ''[http://www.aifactory.co.uk/newsletter/2012_01_tuning_spades.htm Tuning Spades]''. [[AI Factory]], Summer 2012 » [[UCT]]
 
* [[Jeff Rollason]] ('''2012'''). ''[http://www.aifactory.co.uk/newsletter/2012_01_tuning_spades.htm Tuning Spades]''. [[AI Factory]], Summer 2012 » [[UCT]]
 +
* [[David Tolpin]], [[Solomon Eyal Shimony]] ('''2012'''). ''MCTS Based on Simple Regret''. [[Conferences#AAAI-2012|AAAI-2012]], [https://arxiv.org/abs/1207.5536 arXiv:1207.5536]
 +
* [[David Tolpin]], [[Solomon Eyal Shimony]] ('''2012'''). ''VOI-aware MCTS''. [https://dblp.uni-trier.de/db/conf/ecai/ecai2012.html ECAI 2012], [https://arxiv.org/abs/1207.5589  arXiv:1207.5589] <ref>[https://en.wikipedia.org/wiki/Value_of_information Value of information (VOI) from Wikipedia]</ref>
 
* [[Tristan Cazenave]], [[Fabien Teytaud]] ('''2012'''). ''Beam Nested Rollout Policy Adaptation''. [[ECAI CGW 2012]]
 
* [[Tristan Cazenave]], [[Fabien Teytaud]] ('''2012'''). ''Beam Nested Rollout Policy Adaptation''. [[ECAI CGW 2012]]
 
* [[André Fabbri]], [[Frédéric Armetta]], [[Eric Duchêne]], [[Salima Hassas]] ('''2012'''). ''A new self-acquired knowledge process for Monte Carlo Tree Search''. [[ECAI CGW 2012]]
 
* [[André Fabbri]], [[Frédéric Armetta]], [[Eric Duchêne]], [[Salima Hassas]] ('''2012'''). ''A new self-acquired knowledge process for Monte Carlo Tree Search''. [[ECAI CGW 2012]]
Line 180: Line 187:
 
* [[Truong-Huy Dinh Nguyen]], [[Wee Sun Lee]], [[Tze-Yun Leong]] ('''2012'''). ''Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic''. [https://arxiv.org/abs/1206.5940 arXiv:1206.5940]
 
* [[Truong-Huy Dinh Nguyen]], [[Wee Sun Lee]], [[Tze-Yun Leong]] ('''2012'''). ''Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic''. [https://arxiv.org/abs/1206.5940 arXiv:1206.5940]
 
* [[Jan Kuipers (Nikhef)|Jan Kuipers]], [[Aske Plaat]], [[Jos Vermaseren]], [[Jaap van den Herik]] ('''2012'''). ''Improving multivariate Horner schemes with Monte Carlo tree search''. [http://arxiv.org/abs/1207.7079 CoRR abs/1207.7079]
 
* [[Jan Kuipers (Nikhef)|Jan Kuipers]], [[Aske Plaat]], [[Jos Vermaseren]], [[Jaap van den Herik]] ('''2012'''). ''Improving multivariate Horner schemes with Monte Carlo tree search''. [http://arxiv.org/abs/1207.7079 CoRR abs/1207.7079]
* [[Cameron Browne]], [[Edward Powley]], [[Daniel Whitehouse]], [[Simon Lucas]], [[Peter Cowling]], [[Philipp Rohlfshagen]], [[Stephen Tavener]], [[Diego Perez]], [[Spyridon Samothrakis]], [[Simon Colton]] ('''2012'''). ''[http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6145622&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4804728%2F6169178%2F06145622.pdf%3Farnumber%3D6145622 A Survey of Monte Carlo Tree Search Methods]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 4, No. 1
+
* [[Cameron Browne]], [[Edward Powley]], [[Daniel Whitehouse]], [[Simon Lucas]], [[Peter Cowling]], [[Philipp Rohlfshagen]], [[Stephen Tavener]], [[Diego Perez]], [[Spyridon Samothrakis]], [[Simon Colton]] ('''2012'''). ''[https://ieeexplore.ieee.org/document/6145622 A Survey of Monte Carlo Tree Search Methods]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 4, No. 1, [http://ccg.doc.gold.ac.uk/ccg_old/papers/browne_tciaig12_1.pdf pdf]
 
* [[Pim Nijssen]], [[Mark Winands]] ('''2012'''). ''[http://ieeexplore.ieee.org/document/6266709/ Monte-Carlo Tree Search for the Hide-and-Seek Game Scotland Yard]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 4, No. 4 <ref>[https://en.wikipedia.org/wiki/Scotland_Yard_%28board_game%29 Scotland Yard (board game) from Wikipedia]</ref>
 
* [[Pim Nijssen]], [[Mark Winands]] ('''2012'''). ''[http://ieeexplore.ieee.org/document/6266709/ Monte-Carlo Tree Search for the Hide-and-Seek Game Scotland Yard]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 4, No. 4 <ref>[https://en.wikipedia.org/wiki/Scotland_Yard_%28board_game%29 Scotland Yard (board game) from Wikipedia]</ref>
 
* [[Hendrik Baier]], [[Mark Winands]] ('''2012'''). ''Nested Monte-Carlo Tree Search for Online Planning in Large MDPs''. [http://www2.lirmm.fr/ecai2012/ ECAI 2012], [https://dke.maastrichtuniversity.nl/m.winands/documents/ecai2012.pdf pdf]
 
* [[Hendrik Baier]], [[Mark Winands]] ('''2012'''). ''Nested Monte-Carlo Tree Search for Online Planning in Large MDPs''. [http://www2.lirmm.fr/ecai2012/ ECAI 2012], [https://dke.maastrichtuniversity.nl/m.winands/documents/ecai2012.pdf pdf]
Line 214: Line 221:
 
'''2014'''
 
'''2014'''
 
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''HEPGAME and the Simplification of Expressions''. [http://arxiv.org/abs/1405.6369 CoRR abs/1405.6369]
 
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''HEPGAME and the Simplification of Expressions''. [http://arxiv.org/abs/1405.6369 CoRR abs/1405.6369]
 +
* [[Marc Lanctot]], [[Mark Winands]], [[Tom Pepels]], [[Nathan Sturtevant]] ('''2014'''). ''Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups''. [https://dblp.uni-trier.de/db/conf/cig/cig2014.html#LanctotWPS14 CIG 2014], [https://arxiv.org/abs/1406.0486 arXiv:1406.0486]
 
* [[S. Ali Mirsoleimani]], [[Aske Plaat]], [[Jaap van den Herik]], [[Jos Vermaseren]] ('''2014'''). ''Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi''. [http://arxiv.org/abs/1409.4297 CoRR abs/1409.4297] » [[Go]], [[Parallel Search]], [[x86-64]]
 
* [[S. Ali Mirsoleimani]], [[Aske Plaat]], [[Jaap van den Herik]], [[Jos Vermaseren]] ('''2014'''). ''Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi''. [http://arxiv.org/abs/1409.4297 CoRR abs/1409.4297] » [[Go]], [[Parallel Search]], [[x86-64]]
 
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''Why Local Search Excels in Expression Simplification''. [http://arxiv.org/abs/1409.5223 CoRR abs/1409.5223]
 
* [[Ben Ruijl]], [[Jos Vermaseren]], [[Aske Plaat]], [[Jaap van den Herik]] ('''2014'''). ''Why Local Search Excels in Expression Simplification''. [http://arxiv.org/abs/1409.5223 CoRR abs/1409.5223]
Line 227: Line 235:
 
* [[Nobuo Araki]], [[Masakazu Muramatsu]], [[Kunihito Hoki]], [[Satoshi Takahashi]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8263 Monte-Carlo Simulation Adjusting]''. [[Conferences#AAAI-2014|AAAI-2014]]
 
* [[Nobuo Araki]], [[Masakazu Muramatsu]], [[Kunihito Hoki]], [[Satoshi Takahashi]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8263 Monte-Carlo Simulation Adjusting]''. [[Conferences#AAAI-2014|AAAI-2014]]
 
* [[Johannes Heinrich]], [[David Silver]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/WS/AAAIW14/paper/view/8811 Self-Play Monte-Carlo Tree Search in Computer Poker]''. [[Conferences#AAAI-2014|AAAI-2014]]
 
* [[Johannes Heinrich]], [[David Silver]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/WS/AAAIW14/paper/view/8811 Self-Play Monte-Carlo Tree Search in Computer Poker]''. [[Conferences#AAAI-2014|AAAI-2014]]
 +
* [[Simon Lucas]], [[Spyridon Samothrakis]], [[Diego Perez]] ('''2014'''). ''[https://link.springer.com/chapter/10.1007/978-3-662-45523-4_29 Fast Evolutionary Adaptation for Monte Carlo Tree Search]''. [https://dblp.uni-trier.de/db/conf/evoW/evoappl2014.html EvoApplications 2014], [http://www.diego-perez.net/papers/FastEvoMCTS.pdf pdf]
 
==2015 ...==
 
==2015 ...==
 
* [[Richard J. Lorentz]] ('''2015'''). ''Early Playout Termination in MCTS''. [[Advances in Computer Games 14]]
 
* [[Richard J. Lorentz]] ('''2015'''). ''Early Playout Termination in MCTS''. [[Advances in Computer Games 14]]
Line 278: Line 287:
 
* [[Chao Gao]], [[Martin Müller]], [[Ryan Hayward]] ('''2018'''). ''Three-Head Neural Network Architecture for Monte Carlo Tree Search''. [[Conferences#IJCAI2018|IJCAI 2018]]
 
* [[Chao Gao]], [[Martin Müller]], [[Ryan Hayward]] ('''2018'''). ''Three-Head Neural Network Architecture for Monte Carlo Tree Search''. [[Conferences#IJCAI2018|IJCAI 2018]]
 
* [[Tobias Joppen]], [[Christian Wirth]], [[Johannes Fürnkranz]] ('''2018'''). ''Preference-Based Monte Carlo Tree Search''. [https://arxiv.org/abs/1807.06286 arXiv:1807.06286]
 
* [[Tobias Joppen]], [[Christian Wirth]], [[Johannes Fürnkranz]] ('''2018'''). ''Preference-Based Monte Carlo Tree Search''. [https://arxiv.org/abs/1807.06286 arXiv:1807.06286]
 +
* [https://dblp.org/pers/hd/b/Ba:Seydou Seydou Ba], [[Takuya Hiraoka]], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/n/Nakata:Toru Toru Nakata], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka] ('''2018'''). ''Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks''. [https://arxiv.org/abs/1809.02378 arXiv:1809.02378]
 
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''Pipeline Pattern for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18_pos.pdf pdf]
 
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''Pipeline Pattern for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18_pos.pdf pdf]
 
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''A Lock-free Algorithm for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18.pdf pdf]
 
* [[S. Ali Mirsoleimani]], [[Jaap van den Herik]], [[Aske Plaat]], [[Jos Vermaseren]] ('''2018'''). ''A Lock-free Algorithm for Parallel MCTS''. [https://dblp.uni-trier.de/db/conf/icaart/icaart2018-2.html ICAART 2018], [http://liacs.leidenuniv.nl/~plaata1/papers/paper_ICAART18.pdf pdf]
Line 288: Line 298:
 
* [[Tobias Joppen]], [[Johannes Fürnkranz]] ('''2019'''). ''[https://www.groundai.com/project/ordinal-monte-carlo-tree-search/ Ordinal Monte Carlo Tree Search]''. [[Darmstadt University of Technology|TU Darmstadt]], [https://arxiv.org/abs/1901.04274 arXiv:1901.04274]
 
* [[Tobias Joppen]], [[Johannes Fürnkranz]] ('''2019'''). ''[https://www.groundai.com/project/ordinal-monte-carlo-tree-search/ Ordinal Monte Carlo Tree Search]''. [[Darmstadt University of Technology|TU Darmstadt]], [https://arxiv.org/abs/1901.04274 arXiv:1901.04274]
 
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
 
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
 +
* [[Aline Hufschmitt]], [[Jean-Noël Vittaut]], [[Nicolas Jouandeau]] ('''2019'''). ''Exploiting Game Decompositions in Monte Carlo Tree Search''. [[Advances in Computer Games 16]]
 +
==2020 ...==
 +
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2020'''). ''Monte-Carlo Graph Search for AlphaZero''. [https://arxiv.org/abs/2012.11045 arXiv:2012.11045] » [[AlphaZero]], [[CrazyAra]]
 +
* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]
 +
* [[Johannes Czech]], [[Patrick Korus]], [[Kristian Kersting]] ('''2021'''). ''[https://ojs.aaai.org/index.php/ICAPS/article/view/15952 Improving AlphaZero Using Monte-Carlo Graph Search]''. [https://ojs.aaai.org/index.php/ICAPS/issue/view/380 Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling], Vol. 31, [https://www.ml.informatik.tu-darmstadt.de/papers/czech2021icaps_mcgs.pdf pdf]
 +
* [[Maximilian Langer]] ('''2021'''). ''Evaluation of Monte-Carlo Tree Search for Xiangqi''. B.Sc. thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/langer2021xiangqi.pdf pdf] » [[Chinese Chess|Xiangqi]]
  
 
=Forum Posts=  
 
=Forum Posts=  
Line 320: Line 336:
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70605 Question to Remi about CrazyZero] by [[Harm Geert Muller]], [[CCC]], April 28, 2019 » [[CrazyZero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70605 Question to Remi about CrazyZero] by [[Harm Geert Muller]], [[CCC]], April 28, 2019 » [[CrazyZero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70611 SL vs RL] by [[Chris Whittington]], [[CCC]], April 28, 2019
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70611 SL vs RL] by [[Chris Whittington]], [[CCC]], April 28, 2019
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=70788 How to get the "usual" Multi PV with MCTS engines?] by [[Kai Laskos]], [[CCC]], May 21, 2019 » [[Principal Variation#MultiPV|MultiPV]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301 A question to MCTS + NN experts] by [[Maksim Korzh]], [[CCC]], July 17, 2019 » [[Deep Learning]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301 A question to MCTS + NN experts] by [[Maksim Korzh]], [[CCC]], July 17, 2019 » [[Deep Learning]]
 
: [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301&start=3 Re: A question to MCTS + NN experts] by [[Daniel Shawul]], [[CCC]], July 17, 2019  
 
: [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71301&start=3 Re: A question to MCTS + NN experts] by [[Daniel Shawul]], [[CCC]], July 17, 2019  
 +
==2020 ...==
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75658 MCTS evaluation question] by [[Maksim Korzh]], [[CCC]], November 02, 2020
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=77670&start=1 Re: Has CrazyAra really improved because of MTGS ?] by [[Johannes Czech]], [[CCC]], July 08, 2021  » [[CrazyAra]]
  
 
=External Links=  
 
=External Links=  
 
==Monte Carlo Tree Search==
 
==Monte Carlo Tree Search==
 
* [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search Monte Carlo tree search from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search Monte Carlo tree search from Wikipedia]
* [http://senseis.xmp.net/?MonteCarlo Sensei's Library: Monte Carlo Tree Search]
+
* [https://int8.io/monte-carlo-tree-search-beginners-guide/ Monte Carlo Tree Search - beginners guide int8.io] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75658&start=4 Re: MCTS evaluation question] by [[Joerg Oster]], [[CCC]], November 03, 2020</ref>
* [http://senseis.xmp.net/?UCT Sensei's Library: UCT]
+
* [https://senseis.xmp.net/?MonteCarlo Sensei's Library: Monte Carlo Tree Search]
* [http://mcts.ai/index.html Monte Carlo Tree Search - Home] by [[Cameron Browne]]
+
* [https://senseis.xmp.net/?UCT Sensei's Library: UCT]
* [http://www.althofer.de/lange-nacht-jena.html Lange Nacht der Wissenschaften - Long Night of Sciences Jena - 2007] by [[Ingo Althöfer]], [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]
+
* [https://althofer.de/lange-nacht-jena.html Lange Nacht der Wissenschaften - Long Night of Sciences Jena - 2007] by [[Ingo Althöfer]], [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]
* [http://web.stanford.edu/~surag/posts/alphazero.html A Simple Alpha(Go) Zero Tutorial] by [[Surag Nair]], [[Stanford University]], December 29, 2017 » [[AlphaZero]], [[Deep Learning]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=66179 A Simple Alpha(Go) Zero Tutorial] by Oliver Roese, [[CCC]], December 30, 2017</ref>
+
* [http://web.stanford.edu/~surag/posts/alphazero.html A Simple Alpha(Go) Zero Tutorial] by [[Surag Nair]], [[Stanford University]], December 29, 2017 » [[AlphaZero]], [[Deep Learning]]
 
: [https://github.com/suragnair/alpha-zero-general GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)]
 
: [https://github.com/suragnair/alpha-zero-general GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)]
* Monte Carlo Tree Search by [https://www.strath.ac.uk/staff/levinejohndr/ John Levine], [https://en.wikipedia.org/wiki/University_of_Strathclyde University of Strathclyde], March 05, 2017, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
+
* Monte Carlo Tree Search by [[John Levine]], [https://en.wikipedia.org/wiki/University_of_Strathclyde University of Strathclyde], March 05, 2017, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 
: {{#evu:https://www.youtube.com/watch?v=UXW2yZndl7U|alignment=left|valignment=top}}
 
: {{#evu:https://www.youtube.com/watch?v=UXW2yZndl7U|alignment=left|valignment=top}}
 
* [https://en.chessbase.com/post/monte-carlo-instead-of-alpha-beta Monte Carlo instead of Alpha-Beta?] by [[Stephan Oliver Platz]], [[ChessBase|ChessBase News]], January 30, 2019 » [[Komodo#MCTS|Komodo MCTS]]
 
* [https://en.chessbase.com/post/monte-carlo-instead-of-alpha-beta Monte Carlo instead of Alpha-Beta?] by [[Stephan Oliver Platz]], [[ChessBase|ChessBase News]], January 30, 2019 » [[Komodo#MCTS|Komodo MCTS]]

Latest revision as of 15:36, 1 December 2021

Home * Search * Monte-Carlo Tree Search

Edvard Munch - At the Roulette Table in Monte Carlo [1]

Monte Carlo Tree Search, (Monte-Carlo Tree Search, MCTS)
is a Best-First search algorithm, historically based on random playouts. In conjunction with UCT (Upper Confidence bounds applied to Trees) Monte-Carlo Tree Search has yielded in a breakthrough in Computer Go [2], and is also successful in Amazons [3] [4], Lines of Action [5], Havannah [6], Hex [7], Checkers [8] and other Games with some difficulties in position evaluation, but until December 2017, when a Google DeepMind team reported on AlphaZero [9], not for Chess [10] [11]. MCTS is based on randomized explorations of the search space. Using the results of previous explorations, the algorithm gradually grows a game tree in memory, and successively becomes better at accurately estimating the values of the most promising moves [12].

Four Phases

MCTS consists of four strategic phases, repeated as long as there is time left [13] :

  1. In the Selection phase the tree is traversed from the root node until it selects a leaf node that is not added to the tree yet
  2. The Expansion strategy adds the leaf node to the tree
  3. The Simulation strategy plays moves in self-play until the end of the game. The result is either 1, 0 ,-1
  4. The Backpropagation strategy propagates the results through the tree
MCTS (English).svg

Steps of Monte Carlo Tree Search [14]

Pure Monte-Carlo search

Pure Monte-Carlo search with parameter T means that for each feasible move T random games are generated. The move with the best average score is played. A game is called “Monte Carlo perfect” when this procedure converges to perfect play for each position, when T goes to infinity. However, with limited time per move, increasing T does not guarantee to find a better move [15].

UCT

UCT (Upper Confidence bounds applied to Trees) deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves [16]. In UCT, upper confidence bounds guide the selection of a node, treating selection as a {https://en.wikipedia.org/wiki/Multi-armed_bandit multi-armed bandit] problem. PUCT modifies the original policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials [17].

Playouts by NN

Historically, at the root of MCTS were random and noisy playouts. Many such playouts were necessary to accurately evaluate a state. Since AlphaGo and AlphaZero it is not the case anymore. Strong policies and evaluations are now provided by neural networks that are trained with Reinforcement Learning. In AlphaGo and its descendants the policy is used as a prior in the PUCT bandit to explore first the most promising moves advised by the neural network policy and the evaluations replace the playouts [18].

See also

Publications

1987

1990 ...

2000 ...

2005 ...

2006

2007

2008

2009

2010 ...

2011

2012

2013

2014

2015 ...

2016

2017

2018

2019

2020 ...

Forum Posts

2010 ...

2015 ...

2017

2018

2019

Re: Training the trainer: how is it done for Stockfish? by Graham Jones, CCC, March 03, 2019 » Leela Chess Zero
Re: A question to MCTS + NN experts by Daniel Shawul, CCC, July 17, 2019

2020 ...

External Links

Monte Carlo Tree Search

GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)

Monte Carlo Misc

Monte Carlo algorithm
Monte Carlo method
Monte Carlo
Monte Carlo Casino
feat. Raúl Midón, Roy Hargrove and the Monte-Carlo Philharmonic Orchestra, November 29, 2008, Monte-Carlo Jazz Festival

References

  1. Edvard Munch, At the Roulette Table in Monte Carlo, 1893, Edvard Munch from Wikipedia
  2. Rémi Coulom (2009). The Monte-Carlo Revolution in Go. JFFoS'2008: Japanese-French Frontiers of Science Symposium, slides as pdf
  3. Julien Kloetzer, Hiroyuki Iida, Bruno Bouzy (2007). The Monte-Carlo approach in Amazons. CGW 2007
  4. Richard J. Lorentz (2008). Amazons Discover Monte-Carlo. CG 2008
  5. Mark Winands, Yngvi Björnsson (2009). Evaluation Function Based Monte-Carlo LOA. pdf
  6. Richard J. Lorentz (2010). Improving Monte-Carlo Tree Search in Havannah. CG 2010
  7. Broderick Arneson, Ryan Hayward, Philip Henderson (2010). Monte Carlo Tree Search in Hex. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 2, No. 4, pdf
  8. UCT surprise for checkers ! by Daniel Shawul, CCC, March 25, 2011
  9. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815
  10. Raghuram Ramanujan, Ashish Sabharwal, Bart Selman (2010). On Adversarial Search Spaces and Sampling-Based Planning. ICAPS 2010
  11. Oleg Arenz (2012). Monte Carlo Chess. B.Sc. thesis, Darmstadt University of Technology, advisor Johannes Fürnkranz, pdf
  12. Guillaume Chaslot, Mark Winands, Jaap van den Herik (2008). Parallel Monte-Carlo Tree Search. CG 2008, pdf
  13. Guillaume Chaslot, Mark Winands, Jaap van den Herik (2008). Parallel Monte-Carlo Tree Search. CG 2008, pdf
  14. Steps of Monte-Carlo Tree Search by Mciura, March 31, 2013, Wikimedia Commons, Monte Carlo tree search from Wikipedia
  15. Ingo Althöfer (2011). On Board-Filling Games with Random-Turn Order and Monte Carlo Perfectness. Advances in Computer Games 13
  16. Levente Kocsis, Csaba Szepesvári (2006). Bandit based Monte-Carlo Planning ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing UCT, pdf
  17. Christopher D. Rosin (2011). Multi-armed bandits with episode context. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, ISAIM 2010 pdf
  18. Quentin Cohen-Solal, Tristan Cazenave (2020). Minimax Strikes Back. arXiv:2012.10700
  19. Morpion solitaire from Wikipedia
  20. Jeff Rollason (2015). Mixing the Immiscible - MCTS and evaluation. AI Factory, October 2015
  21. Monte Carlo in LOA by Mark Watkins, Open Chess Forum, December 30, 2010
  22. The Settlers of Catan from Wikipedia
  23. Blokus Duo from Wikipedia
  24. Search traps in MCTS and chess by Daniel Shawul, CCC, December 25, 2017
  25. NoGo (ICGA Tournaments)
  26. Ms. Pac-Man from Wikipedia
  27. Value of information (VOI) from Wikipedia
  28. Scotland Yard (board game) from Wikipedia
  29. Morpion solitaire from Wikipedia
  30. Morpion Solitaire - Record Grids (5T game)
  31. Re: MC methods by Daniel Shawul, CCC, April 13, 2013
  32. Crossings from Wikipedia
  33. Epaminondas from Wikipedia
  34. 7 Wonders (board game) from WIkipedia
  35. Guillaume Chaslot, Mark Winands, Jos Uiterwijk, Jaap van den Herik, Bruno Bouzy (2008). Progressive Strategies for Monte-Carlo Tree Search. New Mathematics and Natural Computation, Vol. 4, No. 3, pdf
  36. Dap Hartmann (2017). Let's Catch the Train to Monte-Carlo. ICGA Journal, Vol. 39, No. 1
  37. Re: Minmax backup operator for MCTS by Brahim Hamadicharef, CCC, December 30, 2017
  38. Re: Announcing lczero by Daniel Shawul, CCC, January 21, 2018 » Leela Chess Zero
  39. GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)
  40. "Exact-Win Strategy for Overcoming AlphaZero" · Issue #799 · LeelaChessZero/lc0 · GitHub
  41. GitHub - dkappe/leela_lite: A toolkit for experimenting with UCT and Leela Chess nets in Python by Dietrich Kappe
  42. Re: MCTS evaluation question by Joerg Oster, CCC, November 03, 2020
  43. David Silver, Gerald Tesauro (2009). Monte-Carlo Simulation Balancing. ICML 2009, pdf

Up one level