Changes

Jump to: navigation, search

AlphaZero

4,038 bytes added, 22:18, 9 December 2018
no edit summary
'''AlphaZero''',<br/>
a chess and [[Go]] playing entity by [[Google]] [[DeepMind]] based on a general [[Reinforcement Learning|reinforcement learning]] algorithm with the same name. On [https://en.wikipedia.org/wiki/December_5#Holidays_and_observances December 5], [https://en.wikipedia.org/wiki/Portal:Current_events/2017_December_5 2017], the DeepMind team around [[David Silver]], [[Thomas Hubert]], and [[Julian Schrittwieser]] along with former [[Giraffe]] author [[Matthew Lai]], reported on their generalized algorithm, combining [[Deep Learning|Deep learning]] with [[Monte-Carlo Tree Search]] (MCTS) <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref> . =Stockfish Match=The final [https://en.wikipedia.org/wiki/Peer_review peer reviewed] paper with various clarifications was published almost one year later in the [https://en.wikipedia.org/wiki/Science_(journal) Science magazine] under the title ''A 100 game match versus general reinforcement learning algorithm that masters chess, shogi, and Go through self-play'' <ref>[[David Silver]], [[Stockfish|Stockfish 8Thomas Hubert]] using 64 , [[Thread|threadsJulian Schrittwieser]] and a , [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Transposition Table|transposition tableKaren Simonyan]] size of 1GiB, was won by AlphaZero using a single machine with 4 [https[Demis Hassabis]] ('''2018'''). ''[http://enscience.wikipediasciencemag.org/wikicontent/362/6419/Tensor_processing_unit Tensor processing units1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play] (TPUs) with +28=72-0''. Despite a possible hardware advantage of AlphaZero and criticized playing conditions <ref>[httphttps://wwwen.open-chesswikipedia.org/viewtopic.php?f=5&t=3153 Alpha Zero] by [[Mark Watkins|BB+]wiki/Science_(journal) Science], [[Computer Chess Forums|OpenChess Forum]]Vol. 362, December 06, 2017No. 6419</ref>, this seems a tremendous achievement.
=Description=
==Training==
AlphaZero was trained in 700,000 steps or mini-batches of size 4096 each, starting from randomly initialized parameters, using 5,000 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs] <ref>[https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/ First In-Depth Look at Google’s TPU Architecture] by [https://www.nextplatform.com/author/nicole/ Nicole Hemsoth], [https://www.nextplatform.com/ The Next Platform], April 05, 2017</ref> to generate self-play games and 64 [https://en.wikipedia.org/wiki/Tensor_processing_unit#Second_generation second-generation TPUs] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65945 Photo of Google Cloud TPU cluster] by [[Norman Schmidt]], [[CCC]], December 09, 2017</ref> <ref>[https://www.nextplatform.com/2017/05/17/first-depth-look-googles-new-second-generation-tpu/ First In-Depth Look at Google’s New Second-Generation TPU] by [https://www.nextplatform.com/author/nicole/ Nicole Hemsoth], [https://www.nextplatform.com/ The Next Platform], May 17, 2017</ref> <ref>[https://www.nextplatform.com/2017/05/22/hood-googles-tpu2-machine-learning-clusters/ Under The Hood Of Google’s TPU2 Machine Learning Clusters] by Paul Teich, [https://www.nextplatform.com/ The Next Platform], May 22, 2017</ref> to train the neural networks <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref> .
 
=Stockfish Match=
As mentioned in the December 2017 paper <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>,
a 100 game match versus [[Stockfish|Stockfish 8]] using 64 [[Thread|threads]] and a [[Transposition Table|transposition table]] size of 1GiB,
was won by AlphaZero using a single machine with 4 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs] with +28=72-0, 10 games were published. Despite a possible hardware advantage of AlphaZero and criticized playing conditions <ref>[http://www.open-chess.org/viewtopic.php?f=5&t=3153 Alpha Zero] by [[Mark Watkins|BB+]], [[Computer Chess Forums|OpenChess Forum]], December 06, 2017</ref>, this is a tremendous achievement.
 
In the final [https://en.wikipedia.org/wiki/Peer_review peer reviewed] paper, published in [https://en.wikipedia.org/wiki/Science_(journal) Science magazine] in December 2018 <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419</ref> along with supplementary materials <ref>[http://science.sciencemag.org/content/suppl/2018/12/05/362.6419.1140.DC1 Supplementary Materials]</ref>, a 1000 game match was reported with about 200 games published, versus various most recent Stockfish versions available at the time of the matches, that is Stockfish 8, a development version as of January 13, 2018 close to Stockfish 9, [[Brainfish]] with [[Cerebellum]] book, and Stockfish 9, in total winning 155 games and losing 6 games.
 
Stockfish was configured according to its [[TCEC Season 9#Superfinal|2016 TCEC Season 9 superfinal]] settings: 44 threads on 44 cores (two 2.2GHz [[Intel]] [https://en.wikipedia.org/wiki/Xeon#E3-12xx_v4_series_%22Broadwell-WS%22 Xeon Broadwell] [[x86-64]] CPUs with 22 cores, running [[Linux]]), a transposition table size of 32 GiB, and 6-men [[Syzygy Bases|Syzygy bases]]. Time control was 3 hours per side and game plus 15 seconds increment per move. AlphaZero used a simple time control strategy: thinking for 1/20th of the remaining time, and selects moves greedily with respect to the root visit count. Each MCTS was executed on a single machine with 4 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs].
 
AlphaZero and Stockfish (except Brainfish) used no [[Opening Book|opening book]], 12 common human positions as well as the [[TCEC Season 9#Superfinal|2016 TCEC Season 9 superfinal]] positions were played, originally selected by [[Jeroen Noomen]] <ref>[http://science.sciencemag.org/content/suppl/2018/12/05/362.6419.1140.DC1 Supplementary Materials] S4</ref>. To ensure diversity against opponents (Brainfish) with a deterministic opening book, AlphaZero used a small amount of randomization in its opening moves. This avoided duplicate games but also resulted in more losses by AlphaZero.
=See also=
===Round 2, 3===
* [https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish AlphaZero vs. Stockfish] from [[chess24]]
* [https://youtu.be/nPexHaFL1uo AlphaZero's Attacking Chess] by [https://en.wikipedia.org/wiki/Anna_Rudolf Anna Rudolf], December 06, 2018, [https://en.wikipedia.org/wiki/YouTube YouTube] Video<ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69181 Anna Rudolf analyzes a game of AlphaZero's] by [[Stuart Cracraft]], [[CCC]], December 07, 2018</ref>
: {{#evu:https://www.youtube.com/watch?v=nPexHaFL1uo|alignment=left|valignment=top}}
* [https://youtu.be/bo5plUo86BU "Exactly How to Attack" | DeepMind's AlphaZero vs. Stockfish] by [https://en.wikipedia.org/wiki/Matthew_Sadler Matthew Sadler], December 06, 2018, [https://en.wikipedia.org/wiki/YouTube YouTube] Video

Navigation menu