Difference between revisions of "UCT"

From Chessprogramming wiki
Jump to: navigation, search
m
Line 31: Line 31:
  
 
=See also=
 
=See also=
 +
* [[Ari Weinstein#FSSS-Minimax|FSSS-Minimax]]
 
* [[Match Statistics]]
 
* [[Match Statistics]]
 
* [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]
 
* [[Jakob Erdmann#UCT|MC and UCT poster]] by [[Jakob Erdmann]]
Line 78: Line 79:
 
* [[Simon Viennot]], [[Kokolo Ikeda]] ('''2013'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-09165-5_3 Efficiency of Static Knowledge Bias in Monte-Carlo Tree Search]''. [[CG 2013]]
 
* [[Simon Viennot]], [[Kokolo Ikeda]] ('''2013'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-09165-5_3 Efficiency of Static Knowledge Bias in Monte-Carlo Tree Search]''. [[CG 2013]]
 
* [[Rui Li]], [[Yueqiu Wu]], [[Andi Zhang]], [[Chen Ma]], [[Bo Chen]], [[Shuliang Wang]] ('''2013'''). ''[http://cpfd.cnki.com.cn/Article/CPFDTOTAL-KZJC201309001170.htm Technique Analysis and Designing of Program with UCT Algorithm for NoGo]''. [http://www.ccdc.neu.edu.cn/CCDC2013/ CCDC2013]
 
* [[Rui Li]], [[Yueqiu Wu]], [[Andi Zhang]], [[Chen Ma]], [[Bo Chen]], [[Shuliang Wang]] ('''2013'''). ''[http://cpfd.cnki.com.cn/Article/CPFDTOTAL-KZJC201309001170.htm Technique Analysis and Designing of Program with UCT Algorithm for NoGo]''. [http://www.ccdc.neu.edu.cn/CCDC2013/ CCDC2013]
* [[Ari Weinstein]], [[Michael L. Littman]], [[Sergiu Goschin]] ('''2013'''). ''[http://proceedings.mlr.press/v24/weinstein12a.html Rollout-based Game-tree Search Outprunes Traditional Alpha-beta]''. [http://proceedings.mlr.press/ PMLR], Vol. 24 » [[Monte-Carlo Tree Search|MCTS]]
+
* [[Ari Weinstein]], [[Michael L. Littman]], [[Sergiu Goschin]] ('''2013'''). ''[http://proceedings.mlr.press/v24/weinstein12a.html Rollout-based Game-tree Search Outprunes Traditional Alpha-beta]''. [http://proceedings.mlr.press/ PMLR], Vol. 24 » [[Ari Weinstein#FSSS-Minimax|FSSS-Minimax]], [[Monte-Carlo Tree Search|MCTS]]
 
'''2014'''
 
'''2014'''
 
* [[Rémi Munos]] ('''2014'''). ''From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning''. [http://dblp.uni-trier.de/db/journals/ftml/ftml7.html#Munos14 Foundations and Trends in Machine Learning, Vol. 7, No 1], [https://hal.archives-ouvertes.fr/hal-00747575 hal-00747575v5], [http://chercheurs.lille.inria.fr/~munos/papers/files/AAAI2013_slides.pdf slides as pdf]
 
* [[Rémi Munos]] ('''2014'''). ''From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning''. [http://dblp.uni-trier.de/db/journals/ftml/ftml7.html#Munos14 Foundations and Trends in Machine Learning, Vol. 7, No 1], [https://hal.archives-ouvertes.fr/hal-00747575 hal-00747575v5], [http://chercheurs.lille.inria.fr/~munos/papers/files/AAAI2013_slides.pdf slides as pdf]

Revision as of 19:05, 8 June 2018

Home * Search * Monte-Carlo Tree Search * UCT

UCT (Upper Confidence bounds applied to Trees),
a popular algorithm that deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves. UCT was introduced by Levente Kocsis and Csaba Szepesvári in 2006 [1], which accelerated the Monte-Carlo revolution in computer Go [2] and games difficult to evaluate statically. If given infinite time and memory, UCT theoretically converges to Minimax.

Selection

In UCT, upper confidence bounds (UCB1) guide the selection of a node [3], treating selection as a multi-armed bandit problem, where the crucial tradeoff the gambler faces at each trial is between exploration and exploitation - exploitation of the slot machine that has the highest expected payoff and exploration to get more information about the expected payoffs of the other machines. Child node j is selected which maximizes the UCT Evaluation:

UCTFormula.jpg

where:

  • Xj is the win ratio of the child
  • n is the number of times the parent has been visited
  • nj is the number of times the child has been visited
  • C is a constant to adjust the amount of exploration and incorporates the sqrt(2) from the UCB1 formula

The first component of the UCB1 formula above corresponds to exploitation, as it is high for moves with high average win ratio. The second component corresponds to exploration, since it is high for moves with few simulations. Most contemporary implementations of MCTS are based on some variant of UCT [4]. Modifications have been proposed, with the aim of shortening the time to find good moves. They can be divided into improvements based on expert knowledge and into domain-independent improvements in the playouts, and in building the tree in modifying the exploitation part of the UCB1 formula, for instance in Rapid Action Value Estimation (RAVE) [5] considering transpositions.

Quotes

Gian-Carlo Pascutto

Quote by Gian-Carlo Pascutto in 2010 [6]:

There is no significant difference between an alpha-beta search with heavy LMR  and a static evaluator (current state of the art in chess) and an UCT searcher with a small exploration constant that does playouts (state of the art in go).
The shape of the tree they search is very similar. The main breakthrough in Go the last few years was how to backup an uncertain Monte Carlo score. This was solved. For chess this same problem was solved around the time quiescent search was developed.
Both are producing strong programs and we've proven for both the methods that they scale in strength as hardware speed goes up.
So I would say that we've successfully adopted the simple, brute force methods for chess to Go and they already work without increases in computer speed. The increases will make them progressively stronger though, and with further software tweaks they will eventually surpass humans. 

Raghuram Ramanujan et al.

Quote by Raghuram Ramanujan, Ashish Sabharwal, and Bart Selman from their abstract On Adversarial Search Spaces and Sampling-Based Planning [7]:

UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and KriegSpiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed. 

See also

Publications

2000 ...

2005 ...

2006

2007

2008

2009

2010 ...

2011

2012

2013

2014

2015 ...

Forum Posts

External Links

UCT Monte Carlo Tree Search algorithm in Python 2.7 by Peter Cowling, Ed Powley, and Daniel Whitehouse
UCT MCTS implementation in Java by Simon Lucas

References

Up one level