Changes

Jump to: navigation, search

UCT

4,521 bytes added, 20:04, 16 July 2020
no edit summary
* C is a constant to adjust the amount of exploration and incorporates the sqrt(2) from the UCB1 formula
The first component of the UCB1 formula above corresponds to exploitation, as it is high for moves with high average win ratio. The second component corresponds to exploration, since it is high for moves with few simulations.  ==RAVE==Most contemporary implementations of MCTS are based on some variant of UCT <ref>[https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Exploration_and_exploitation Exploration and exploitation - in Monte Carlo tree search from Wikipedia]</ref>. Modifications have been proposed, with the aim of shortening the time to find good moves. They can be divided into improvements based on expert knowledge and into domain-independent improvements in the playouts, and in building the tree in modifying the exploitation part of the UCB1 formula, for instance in [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Improvements Rapid Action Value Estimation] ('''RAVE''') <ref> [[Sylvain Gelly]], [[David Silver]] ('''2011'''). ''Monte-Carlo tree search and rapid action value estimation in computer Go''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 175, No. 11, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/mcrave.pdf preprint as pdf]</ref> considering [[Transposition|transpositions]]. ==PUCT==[[Christopher D. Rosin|Chris Rosin's]] [[Christopher D. Rosin#PUCT|PUCT]] modifies the original UCB1 multi-armed bandit policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials ('Predictor' + UCB = PUCB) <ref>[[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf]</ref>. A variation of PUCT was used in the [[AlphaGo]] and [[AlphaZero]] projects <ref>[[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529</ref> , and subsequently also in [[Leela Zero]] and [[Leela Chess Zero]] <ref>[https://github.com/LeelaChessZero/lc0/wiki/FAQ FAQ · LeelaChessZero/lc0 Wiki · GitHub]</ref>. 
=Quotes=
==Raghuram Ramanujan et al.==
Quote by [[Raghuram Ramanujan]], [[Ashish Sabharwal]], and [[Bart Selman]] from their abstract ''On Adversarial Search Spaces and Sampling-Based Planning'' <ref>[[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2010'''). ''[http://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1458 On Adversarial Search Spaces and Sampling-Based Planning]''. [http://www.aaai.org/Press/Proceedings/icaps10.php ICAPS 2010]</ref>:
UCT has been shown to outperform traditional [[Minimax|minimax]] based approaches in several challenging domains such as [[Go]] and [[KriegspielKriegSpiel]], although minimax search still prevails in other domains such as [[Chess]]. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
=See also=
* [[Ari Weinstein#FSSS-Minimax|FSSS-Minimax]]
* [[Match Statistics]]
* [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]
* [[MCαβ]]
* [[Christopher D. Rosin#PUCT|PUCT]]
* [[Reinforcement Learning]]
* [[Damien Pellier]], [[Bruno Bouzy]], [[Marc Métivier]] ('''2010'''). ''An UCT Approach for Anytime Agent-based Planning''. [http://www.springer.com/de/book/9783642123832 PAAMS'10], [http://www.mi.parisdescartes.fr/~bouzy/publications/pellier-bouzy-metivier-paams10.pdf pdf]
* [[Thomas J. Walsh]], [[Sergiu Goschin]], [[Michael L. Littman]] ('''2010'''). ''Integrating sample-based planning and model-based reinforcement learning.'' [[AAAI]], [https://pdfs.semanticscholar.org/bdc9/bfb6ecc6fb5afb684df03d7220c46ebdbf4e.pdf pdf] » [[Reinforcement Learning]]
* [[Takayuki Yajima]], [[Tsuyoshi Hashimoto]], [[Toshiki Matsui]], [[Junichi Hashimoto]], [[Kristian Spoerer]] ('''2010'''). ''[https://link.springer.com/chapter/10.1007%2F978-3-642-17928-0_11 Node-Expansion Operators for the UCT Algorithm]''. [[CG 2010]]
'''2011'''
* [[Junichi Hashimoto]], [[Akihiro Kishimoto]], [[Kazuki Yoshizoe]], [[Kokolo Ikeda]] ('''2011'''). ''[http://link.springer.com/chapter/10.1007/978-3-642-31866-5_1 Accelerated UCT and Its Application to Two-Player Games]''. [[Advances in Computer Games 13]]
* [[Lars Schaefers]], [[Marco Platzner]], [[Ulf Lorenz]] ('''2011'''). ''UCT-Treesplit - Parallel MCTS on Distributed Memory''. [http://www.aaai.org/Press/Proceedings/icaps11.php ICAPS 2011], [http://www.cs.uni-paderborn.de/fileadmin/Informatik/AG-Platzner/People/Schaefers/TreesplitICAPS.pdf pdf] » [[Parallel Search]]
* [[Raghuram Ramanujan]], [[Bart Selman]] ('''2011'''). ''[http://aaai.org/ocs/index.php/ICAPS/ICAPS11/paper/view/2708 Trade-Offs in Sampling-Based Adversarial Planning]''. [http://www.aaai.org/Press/Proceedings/icaps11.php ICAPS 2011]
* [[Christopher D. Rosin]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10472-011-9258-6 Multi-armed bandits with episode context]''. Annals of Mathematics and Artificial Intelligence, Vol. 61, No. 3, [http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf ISAIM 2010 pdf] » [[Christopher D. Rosin#PUCT|PUCT]]
* [[Adrien Couëtoux]], [[Jean-Baptiste Hoock]], [[Nataliya Sokolovska]], [[Olivier Teytaud]], [[Nicolas Bonnard]] ('''2011'''). ''Continuous Upper Confidence Trees''. [https://dblp.uni-trier.de/db/conf/lion/lion2011.html LION 2011], [https://hal.archives-ouvertes.fr/hal-00542673v1/document pdf]
* [[David Tolpin]], [[Solomon Eyal Shimony]] ('''2011'''). ''Doing Better Than UCT: Rational Monte Carlo Sampling in Trees''. [https://arxiv.org/abs/1108.3711 arXiv:1108.3711]
'''2012'''
* [[Oleg Arenz]] ('''2012'''). ''Monte Carlo Chess''. B.Sc. thesis, [[Darmstadt University of Technology]], advisor [[Johannes Fürnkranz]], [http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2012/Arenz_Oleg.pdf pdf] » [[Stockfish]]
* [[Cameron Browne]], [[Edward Powley]], [[Daniel Whitehouse]], [[Simon Lucas]], et al. [[Peter Cowling]], [[Philipp Rohlfshagen]], [[Stephen Tavener]], [[Diego Perez]], [[Spyridon Samothrakis]], [[Simon Colton]] ('''2012'''). ''[httphttps://ieeexplore.ieee.org/xpldocument/login.jsp?tp=&arnumber=6145622&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4804728%2F6169178%2F06145622.pdf%3Farnumber%3D6145622 A Survey of Monte Carlo Tree Search Methods]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 4, No. 1, [http://wwwccg.doc.cameroniusgold.comac.uk/ccg_old/cvpapers/mcts-survey-masterbrowne_tciaig12_1.pdf pdf]
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]
* [[Adrien Couetoux]], [[Olivier Teytaud]], [[Hassen Doghmen]] ('''2012'''). ''Learning a Move-Generator for Upper Confidence Trees''. [http://ics2012.ndhu.edu.tw/ ICS 2012], [https://en.wikipedia.org/wiki/Hualien_City Hualien], [https://en.wikipedia.org/wiki/Taiwan Taiwan], December 2012 » [[Learning]], [[Move Generation]]
* [[Ashish Sabharwal]], [[Horst Samulowitz]], [[Chandra Reddy]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-29828-8_23 Guiding Combinatorial Optimization with UCT]''. [http://dblp.uni-trier.de/db/conf/cpaior/cpaior2012.html#SabharwalSR12 CPAIOR 2012], [http://web.emn.fr/x-info/cpaior-2012/uploads/Abstracts/CPAIOR%20-%2072980356.pdf abstract as pdf], [http://www.cs.toronto.edu/~horst/cogrobo/papers/uctmip.pdf draft as pdf]
* [[Raghuram Ramanujan]], [[Ashish Sabharwal]], [[Bart Selman]] ('''2012'''). ''Understanding Sampling Style Adversarial Search Methods''. [http://arxiv.org/abs/1203.4011 arXiv:1203.4011]
* [[Cheng-Wei Chou]], [[Ping-Chiang Chou]], [[Chang-Shing Lee]], [[David L. Saint-Pierre]], [[Olivier Teytaud]], [[Mei-Hui Wang]], [[Li-Wen Wu]], [[Shi-Jim Yen]] ('''2012'''). ''Strategic Choices: Small Budgets and Simple Regret''. [[TAAI 2012]], [http://www.csie.ndhu.edu.tw/csieweb/en/node/685 Excellent Paper Award], [https://hal.inria.fr/hal-00753145v2/document pdf]
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search''. [https://arxiv.org/abs/1205.3109 arXiv:1205.3109]
* [[Truong-Huy Dinh Nguyen]], [[Wee Sun Lee]], [[Tze-Yun Leong]] ('''2012'''). ''Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic''. [https://arxiv.org/abs/1206.5940 arXiv:1206.5940]
* [[David Tolpin]], [[Solomon Eyal Shimony]] ('''2012'''). ''MCTS Based on Simple Regret''. [[Conferences#AAAI-2012|AAAI-2012]], [https://arxiv.org/abs/1207.5536 arXiv:1207.5536]
'''2013'''
* [[Cheng-Wei Chou]], [[Ping-Chiang Chou]], [[Chang-Shing Lee]], [[David L. Saint-Pierre]], [[Olivier Teytaud]], [[Mei-Hui Wang]], [[Li-Wen Wu]], [[Shi-Jim Yen]] ('''2013'''). ''Strategic Choices: Small Budgets and Simple Regret''. [[TAAI|TAAI 2012]], [http://www.csie.ndhu.edu.tw/csieweb/en/node/685 Excellent Paper Award], [https://hal.inria.fr/hal-00753145v2/document pdf]
* [[Simon Viennot]], [[Kokolo Ikeda]] ('''2013'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-09165-5_3 Efficiency of Static Knowledge Bias in Monte-Carlo Tree Search]''. [[CG 2013]]
* [[Rui Li]], [[Yueqiu Wu]], [[Andi Zhang]], [[Chen Ma]], [[Bo Chen]], [[Shuliang Wang]] ('''2013'''). ''[http://cpfd.cnki.com.cn/Article/CPFDTOTAL-KZJC201309001170.htm Technique Analysis and Designing of Program with UCT Algorithm for NoGo]''. [http://www.ccdc.neu.edu.cn/CCDC2013/ CCDC2013]
* [[Ari Weinstein]], [[Michael L. Littman]], [[Sergiu Goschin]] ('''2013'''). ''[http://proceedings.mlr.press/v24/weinstein12a.html Rollout-based Game-tree Search Outprunes Traditional Alpha-beta]''. [http://proceedings.mlr.press/ PMLR], Vol. 24 » [[Ari Weinstein#FSSS-Minimax|FSSS-Minimax]], [[Monte-Carlo Tree Search|MCTS]]
'''2014'''
* [[Rémi Munos]] ('''2014'''). ''From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning''. [http://dblp.uni-trier.de/db/journals/ftml/ftml7.html#Munos14 Foundations and Trends in Machine Learning, Vol. 7, No 1], [https://hal.archives-ouvertes.fr/hal-00747575 hal-00747575v5], [http://chercheurs.lille.inria.fr/~munos/papers/files/AAAI2013_slides.pdf slides as pdf]
* [[David W. King]] ('''2014'''). ''Complexity, Heuristic, and Search Analysis for the Games of Crossings and Epaminondas''. Masters thesis, [https://en.wikipedia.org/wiki/Air_Force_Institute_of_Technology Air Force Institute of Technology], [http://www.afit.edu/docs/AFIT-ENG-14-M-44_King.pdf pdf] <ref>[https://en.wikipedia.org/wiki/Crossings_%28game%29 Crossings from Wikipedia]</ref>
* [[David W. King]], [[Gilbert L. Peterson]] ('''2014'''). ''Epaminondas: Exploring Combat Tactics''. [[ICGA Journal#37_3|ICGA Journal, Vol. 37, No. 3]] <ref> [https://en.wikipedia.org/wiki/Epaminondas_%28game%29 Epaminondas from Wikipedia]</ref>
* [[Truong-Huy Dinh Nguyen]], [[Tomi Silander]], [[Wee Sun Lee]], [[Tze-Yun Leong]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/ICAPS/ICAPS14/paper/view/7934 Bootstrapping Simulation-Based Algorithms with a Suboptimal Policy]''. [https://dblp.uni-trier.de/db/conf/aips/icaps2014.html ICAPS 2014], [https://youtu.be/ccwcZZj5vKM YouTube Video]
==2015 ...==
* [[Xi Liang]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2015'''). ''Job-level UCT search for solving Hex''. [http://dblp.uni-trier.de/db/conf/cig/cig2015.html#LiangWW15 CIG 2015]
* [[Xi Liang]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2015'''). ''Solving Hex Openings Using Job-Level UCT Search''. [[ICGA Journal#38_3|ICGA Journal, Vol. 38, No. 3]]
* [[Naoki Mizukami]], [[Jun Suzuki]], [[Hirotaka Kameko]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Exploration Bonuses Based on Upper Confidence Bounds for Sparse Reward Games''. [[Advances in Computer Games 15]]
* [[Jacek Mańdziuk]] ('''2018'''). ''MCTS/UCT in Solving Real-Life Problems''. [https://www.springer.com/us/book/9783319679457 Advances in Data Analysis with Computational Intelligence Methods], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
* [[Kiminori Matsuzaki]] ('''2018'''). ''Empirical Analysis of PUCT Algorithm with Evaluation Functions of Different Quality''. [[TAAI 2018]]
=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=66125 Search traps in MCTS and chess] by [[Daniel Shawul]], [[CCC]], December 25, 2017 » [[Raghuram Ramanujan#UCT|Sampling-Based Planning]]
* [http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32429 MCTS weakness wrt AB (via Daniel Shawul)] by [[Chris Whittington]], [[Computer Chess Forums|Rybka Forum]], December 25, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=66280 Announcing lczero] by [[Gary Linscott|Gary]], [[CCC]], January 09, 2018 » [[LCZeroLeela Chess Zero]]
=External Links=
* [https://en.wikipedia.org/wiki/Confidence_interval Confidence interval from Wikipedia]
* [https://en.wikipedia.org/wiki/Multi-armed_bandit Multi-armed bandit from Wikipedia]
* [httphttps://mctsgithub.aicom/abouttheKGS/index.html Monte Carlo Tree Search] by [[Cameron Browne]] et al.MCTS GitHub - theKGS/MCTS: [http://mcts.ai/code/python.html Java implementation of UCT Monte Carlo Tree Search algorithm] in [[Python|Python 2.7]] by [[Peter Cowling]], [[Ed Powley]], based MCTS and [[Daniel Whitehouse]]: [http://mcts.ai/code/java.html UCT Flat MCTS implementation] in [[Java]] by [[Simon Lucas]]* [httphttps://senseis.xmp.net/?UCT Sensei's Library: UCT]* [httphttps://www.althofer.de/lange-nacht-jena.html Lange Nacht der Wissenschaften - Long Night of Sciences Jena - 2007] by [[Ingo Althöfer]], [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]* [[:Category:Weather Report|Weather Report]] - [https://en.wikipedia.org/wiki/Sweetnighter Boogie Woogie Waltz], 1974, [https://en.wikipedia.org/wiki/YouTube YouTube] Video: [[:Category:Joe Zawinul|Joe Zawinul]], [[:Category:Wayne Shorter|Wayne Shorter]], [[:Category:Alphonso Johnson|Alphonso Johnson]], [https://www.allmusic.com/artist/darryl-brown-mn0001213839 Darryl Brown], [[:Category:Dom Um Romão|Dom Um Romão]]: {{#evu:https://www.youtube.com/watch?v=MRs5ROwfgW4|alignment=left|valignment=top}}
=References=
'''[[Monte-Carlo Tree Search|Up one level]]'''
[[Category:Quotes]]
[[Category:Weather Report]]
[[Category:Alphonso Johnson]]
[[Category:Dom Um Romão]]
[[Category:Wayne Shorter]]
[[Category:Joe Zawinul]]

Navigation menu