Changes

Jump to: navigation, search

Monte-Carlo Tree Search

4 bytes added, 16:54, 17 November 2021
no edit summary
=Playouts by NN=
Historically, at the root of MCTS were random and noisy playouts. Many such playouts were necessary to accurately evaluate a state. Since [[AlphaGo]] and [[AlphaZero]] it is not the case anymore. Strong policies and evaluations are now provided by [[Neural Networks|neural networks]] that are trained with [[Reinforcement Learning]]. In AlphaGo and its descendants the policy is used as a prior in the [[Christopher D. Rosin#PUCT|PUCT]] bandit to explore first the most promising moves advised by the neural network policy and the evaluations replace the playouts <ref>[[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]</ref>.
=See also=

Navigation menu