Changes

AlphaZero

4,038 bytes added, 22:18, 9 December 2018

no edit summary

'''AlphaZero''',<br/>

a chess and [[Go]] playing entity by [[Google]] [[DeepMind]] based on a general [[Reinforcement Learning|reinforcement learning]] algorithm with the same name. On [https://en.wikipedia.org/wiki/December_5#Holidays_and_observances December 5], [https://en.wikipedia.org/wiki/Portal:Current_events/2017_December_5 2017], the DeepMind team around [[David Silver]], [[Thomas Hubert]], and [[Julian Schrittwieser]] along with former [[Giraffe]] author [[Matthew Lai]], reported on their generalized algorithm, combining [[Deep Learning|Deep learning]] with [[Monte-Carlo Tree Search]] (MCTS) <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref> . ~~=Stockfish Match=~~The final [https://en.wikipedia.org/wiki/Peer_review peer reviewed] paper with various clarifications was published almost one year later in the [https://en.wikipedia.org/wiki/Science_(journal) Science magazine] under the title ''A ~~100 game match versus~~ general reinforcement learning algorithm that masters chess, shogi, and Go through self-play'' <ref>[[David Silver]], [[~~Stockfish|Stockfish 8~~Thomas Hubert]] ~~using 64~~ , [[~~Thread|threads~~Julian Schrittwieser]] ~~and a~~ , [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[~~Transposition Table|transposition table~~Karen Simonyan]] ~~size of 1GiB~~, ~~was won by AlphaZero using a single machine with 4~~ [~~https~~[Demis Hassabis]] ('''2018'''). ''[http://enscience.~~wikipedia~~sciencemag.org/~~wiki~~content/362/6419/~~Tensor_processing_unit Tensor processing units~~1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play] ~~(TPUs) with +28=72-0~~''. ~~Despite a possible hardware advantage of AlphaZero and criticized playing conditions <ref>~~[~~http~~https://~~www~~en.~~open-chess~~wikipedia.org/~~viewtopic.php?f=5&t=3153 Alpha Zero] by [[Mark Watkins|BB+]~~wiki/Science_(journal) Science], ~~[[Computer Chess Forums|OpenChess Forum]]~~Vol. 362, ~~December 06, 2017~~No. 6419</ref>~~, this seems a tremendous achievement~~.

=Description=

==Training==

AlphaZero was trained in 700,000 steps or mini-batches of size 4096 each, starting from randomly initialized parameters, using 5,000 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs] <ref>[https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-architecture/ First In-Depth Look at Google’s TPU Architecture] by [https://www.nextplatform.com/author/nicole/ Nicole Hemsoth], [https://www.nextplatform.com/ The Next Platform], April 05, 2017</ref> to generate self-play games and 64 [https://en.wikipedia.org/wiki/Tensor_processing_unit#Second_generation second-generation TPUs] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65945 Photo of Google Cloud TPU cluster] by [[Norman Schmidt]], [[CCC]], December 09, 2017</ref> <ref>[https://www.nextplatform.com/2017/05/17/first-depth-look-googles-new-second-generation-tpu/ First In-Depth Look at Google’s New Second-Generation TPU] by [https://www.nextplatform.com/author/nicole/ Nicole Hemsoth], [https://www.nextplatform.com/ The Next Platform], May 17, 2017</ref> <ref>[https://www.nextplatform.com/2017/05/22/hood-googles-tpu2-machine-learning-clusters/ Under The Hood Of Google’s TPU2 Machine Learning Clusters] by Paul Teich, [https://www.nextplatform.com/ The Next Platform], May 22, 2017</ref> to train the neural networks <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref> .

=Stockfish Match=

As mentioned in the December 2017 paper <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>,

a 100 game match versus [[Stockfish|Stockfish 8]] using 64 [[Thread|threads]] and a [[Transposition Table|transposition table]] size of 1GiB,

was won by AlphaZero using a single machine with 4 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs] with +28=72-0, 10 games were published. Despite a possible hardware advantage of AlphaZero and criticized playing conditions <ref>[http://www.open-chess.org/viewtopic.php?f=5&t=3153 Alpha Zero] by [[Mark Watkins|BB+]], [[Computer Chess Forums|OpenChess Forum]], December 06, 2017</ref>, this is a tremendous achievement.

In the final [https://en.wikipedia.org/wiki/Peer_review peer reviewed] paper, published in [https://en.wikipedia.org/wiki/Science_(journal) Science magazine] in December 2018 <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419</ref> along with supplementary materials <ref>[http://science.sciencemag.org/content/suppl/2018/12/05/362.6419.1140.DC1 Supplementary Materials]</ref>, a 1000 game match was reported with about 200 games published, versus various most recent Stockfish versions available at the time of the matches, that is Stockfish 8, a development version as of January 13, 2018 close to Stockfish 9, [[Brainfish]] with [[Cerebellum]] book, and Stockfish 9, in total winning 155 games and losing 6 games.

Stockfish was configured according to its [[TCEC Season 9#Superfinal|2016 TCEC Season 9 superfinal]] settings: 44 threads on 44 cores (two 2.2GHz [[Intel]] [https://en.wikipedia.org/wiki/Xeon#E3-12xx_v4_series_%22Broadwell-WS%22 Xeon Broadwell] [[x86-64]] CPUs with 22 cores, running [[Linux]]), a transposition table size of 32 GiB, and 6-men [[Syzygy Bases|Syzygy bases]]. Time control was 3 hours per side and game plus 15 seconds increment per move. AlphaZero used a simple time control strategy: thinking for 1/20th of the remaining time, and selects moves greedily with respect to the root visit count. Each MCTS was executed on a single machine with 4 [https://en.wikipedia.org/wiki/Tensor_processing_unit#First_generation first-generation TPUs].

AlphaZero and Stockfish (except Brainfish) used no [[Opening Book|opening book]], 12 common human positions as well as the [[TCEC Season 9#Superfinal|2016 TCEC Season 9 superfinal]] positions were played, originally selected by [[Jeroen Noomen]] <ref>[http://science.sciencemag.org/content/suppl/2018/12/05/362.6419.1140.DC1 Supplementary Materials] S4</ref>. To ensure diversity against opponents (Brainfish) with a deterministic opening book, AlphaZero used a small amount of randomization in its opening moves. This avoided duplicate games but also resulted in more losses by AlphaZero.

=See also=

===Round 2, 3===

* [https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish AlphaZero vs. Stockfish] from [[chess24]]

* [https://youtu.be/nPexHaFL1uo AlphaZero's Attacking Chess] by [https://en.wikipedia.org/wiki/Anna_Rudolf Anna Rudolf], December 06, 2018, [https://en.wikipedia.org/wiki/YouTube YouTube] Video<ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69181 Anna Rudolf analyzes a game of AlphaZero's] by [[Stuart Cracraft]], [[CCC]], December 07, 2018</ref>

: {{#evu:https://www.youtube.com/watch?v=nPexHaFL1uo|alignment=left|valignment=top}}

* [https://youtu.be/bo5plUo86BU "Exactly How to Attack" | DeepMind's AlphaZero vs. Stockfish] by [https://en.wikipedia.org/wiki/Matthew_Sadler Matthew Sadler], December 06, 2018, [https://en.wikipedia.org/wiki/YouTube YouTube] Video

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

AlphaZero

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools