Revision as of 13:03, 9 December 2018

Home * Engines * AlphaZero

The Krampus has come ^[1] ^[2]

AlphaZero,
a chess and Go playing entity by Google DeepMind based on a general reinforcement learning algorithm with the same name. On December 5, 2017, the DeepMind team around David Silver, Thomas Hubert, and Julian Schrittwieser along with former Giraffe author Matthew Lai, reported on their generalized algorithm, combining Deep learning with Monte-Carlo Tree Search (MCTS) ^[3] .

Stockfish Match

A 100 game match versus Stockfish 8 using 64 threads and a transposition table size of 1GiB, was won by AlphaZero using a single machine with 4 Tensor processing units (TPUs) with +28=72-0. Despite a possible hardware advantage of AlphaZero and criticized playing conditions ^[4], this seems a tremendous achievement.

Description

Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved a superhuman level of play in the games of chess and Shogi as well as in Go. The algorithm is a more generic version of the AlphaGo Zero algorithm that was first introduced in the domain of Go ^[5]. AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation as used in classical chess programs. This neural network takes the board position as input and outputs a vector of move probabilities (policy) and a position evaluation. Once trained, these network is combined with a Monte-Carlo Tree Search (MCTS) using the policy to narrow down the search to high probability moves, and using the value in conjunction with a fast rollout policy to evaluate positions in the tree. The selection is done by a variation of Rosin's UCT improvement dubbed PUCT.

Network Architecture

The deep neural network consists of a “body” with input and hidden layers of spatial NxN planes, 8x8 board arrays for chess, followed by both policy and value “heads” ^[6] ^[7]. Each square cell of the input plane contains 6x2 piece-type and color bits of the current chess position from the current player's point of view, plus two bits of a repetition counter concerning the draw rule, and to further address graph history and path-dependency issues - these 14 bits times eight, that is up to seven predecessor positions as well - so that en passant, or some sense of progress is implicit. Additional 7 input bits consider castling rights, total move count and side to move, yielding in 119 bits per square cell for chess.

The body consists of a rectified batch-normalized convolutional layer followed by 19 residual blocks. Each such block consists of two rectified batch-normalized residual convolutional layers with a skip connection ^[8] ^[9]. Each convolution applies 256 filters (shared weight vectors) of kernel size 3x3 with stride 1. These layers connect the pieces on different squares to each other due to consecutive convolutions, where a cell of a layer is connected to the correspondent 3x3 receptive field of the previous layer, so that after 4 convolutions, each square is connected to every other cell in the original input layer ^[10].

The policy head applies an additional rectified, batch-normalized convolutional layer, followed by a final convolution of 73 filters for chess, with the final policy output represented as an 8x8 board array as well, for every origin square up to 73 target square possibilities (NRayDirs x MaxRayLength + NKnightDirs + NPawnDirs * NMinorPromotions), encoding a probability distribution over 64x73 = 4,672 possible moves, where illegal moves were masked out by setting their probabilities to zero, re-normalising the probabilities for remaining moves. The value head applies an additional rectified, batch-normalized convolution of 1 filter of kernel size 1x1 with stride 1, followed by a rectified linear layer of size 256 and a tanh-linear layer of size 1.

Training

AlphaZero was trained in 700,000 steps or mini-batches of size 4096 each, starting from randomly initialized parameters, using 5,000 first-generation TPUs ^[11] to generate self-play games and 64 second-generation TPUs ^[12] ^[13] ^[14] to train the neural networks ^[15] .

Publications

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis (2017). Mastering the game of Go without human knowledge. Nature, Vol. 550, pdf
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, No. 6419 ^[16]
Garry Kasparov (2018). Chess, a Drosophila of reasoning. Science, Vol. 362, No. 6419
Murray Campbell (2018). Mastering board games. Science, Vol. 362, No. 6419

Forum Posts

2017

Google's AlphaGo team has been working on chess by Peter Kappler, CCC, December 06, 2017
Historic Milestone: AlphaZero by Miguel Castanuela, CCC, December 06, 2017
AlphaZero beats AlphaGo Zero, Stockfish, and Elmo by Carl Lumma, CCC, December 06, 2017
AlphaZero vs Stockfish by Bigler, CCC, December 06, 2017
Deepmind drops the bomb by Leebot, FishCooking, December 06, 2017
AlphaZero beats Stockfish 8 by 64-36 by Venator, Rybka Forum, December 06, 2017
Alpha Zero by BB+, OpenChess Forum, December 06, 2017
AlphaGo Zero And AlphaZero, RomiChess done better by Michael Sherwin, CCC, December 07, 2017 » RomiChess
BBC News; 'Google's ... DeepMind AI claims chess crown' by pennine22, Hiarcs Forum, December 07, 2017
Press Release Stockfish vs. AlphaZero by Michael Whiteley, FishCooking, December 08, 2017
AlphaZero reinvents mobility and romanticism by Chris Whittington, Rybka Forum, December 08, 2017 » Alpha Zero's "Immortal Zugzwang Game"
Reactions about AlphaZero from top GMs... by Norman Schmidt, CCC, December 08, 2017 » Reactions From Top GMs, Stockfish Author
AlphaZero is not like other chess programs by Dann Corbit, CCC, December 08, 2017

Re: AlphaZero is not like other chess programs by Rein Halbersma, CCC, December 09, 2017

Photo of Google Cloud TPU cluster by Norman Schmidt, CCC, December 09, 2017
Cerebellum analysis of the AlphaZero - Stockfish Games by Thomas Zipproth, CCC, December 11, 2017 » Cerebellum
Open letter to Google DeepMind by Michael Stembera, FishCooking, December 12, 2017
recent article on alphazero ... 12/11/2017 ... by Dan Ellwein, CCC, December 14, 2017
An AlphaZero inspired project by Truls Edvard Stokke, CCC, December 14, 2017
AlphaZero - Tactical Abilities by David Rasmussen, CCC, December 16, 2017
In chess,AlphaZero outperformed Stockfish after just 4 hours by Ed Schroder, CCC, December 18, 2017
AlphaZero - Youtube Videos by Christoph Fieberg, CSS Forum, December 18, 2017
AlphaZero Chess is not that strong ... by Vincent Lejeune, CCC, December 19, 2017
David Silver (Deepmind) inaccuracies by Ed Schroder, CCC, December 21, 2017
AZ vs SF - game 99 by Rebel, Rybka Forum, December 23, 2017
AlphaZero performance by Martin Sedlak, CCC, December 25, 2017
A Simple Alpha(Go) Zero Tutorial by Oliver Roese, CCC, December 30, 2017
AlphaZero: The 10 Top Shots by Walter Eigenmann, CCC, December 30, 2017

2018

SF was more seriously handicapped than I thought by Kai Laskos, CCC, January 02, 2018
Chess World to Google Deep Mind..Prove You beat Stockfish 8! by AA Ross, CCC, January 11, 2018
Article:"How Alpha Zero Sees/Wins" by AA Ross, CCC, January 17, 2018 » How AlphaZero Wins
Connect 4 AlphaZero implemented using Python... by Steve Maughan, CCC, January 29, 2018 » Connect Four, Python
Seeing Alphazero in perspective ... by Dan Ellwein, CCC, February 10, 2018
Matthew Sadler analysis of A0 vs SF [Edit: A0 published in Science?] by trulses, CCC, December 06, 2018
Alphazero news by arunsoorya1309, CCC, December 06, 2018

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018

Re: Alphazero news by Larry Kaufman, CCC, December 07, 2018

Re: Alphazero news by Kai Laskos, CCC, December 07, 2018

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018

Re: Alphazero news by crem, CCC, December 07, 2018

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018

Re: Alphazero news by crem, CCC, December 07, 2018

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018

Re: Alphazero news by Gian-Carlo Pascutto, CCC, December 07, 2018 » Leela Chess Zero

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018

Re: Alphazero news by Matthew Lai, CCC, December 07, 2018 » Giraffe

Re: Alphazero news by Matthew Lai, CCC, December 08, 2018

External Links

AlphaZero from Wikipedia
AlphaGo Zero - AlphaZero from Wikipedia
Keynote David Silver NIPS 2017 Deep Reinforcement Learning Symposium AlphaZero, December 06, 2017, YouTube Video ^[17]

A Simple Alpha(Go) Zero Tutorial by Surag Nair, Stanford University, December 29, 2017 ^[18]

GitHub - suragnair/alpha-zero-general: A clean and simple implementation of a self-play learning algorithm based on AlphaGo Zero (any game, any framework!)

AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
AlphaZero: Shedding new light on the grand games of chess, shogi and Go, December 06, 2018, YouTube Video

Reports

2017

DeepMind’s AI became a superhuman chess player in a few hours, just for fun by James Vincent, The Verge, December 06, 2017
Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours by Sarah Knapton, and Leon Watson, The Telegraph, December 06, 2017
Google's 'superhuman' DeepMind AI claims chess crown, BBC News, December 06, 2017 ^[19]
DeepMind’s AlphaZero crushes chess by Colin McGourty, chess24, December 06, 2017
One Small Step for Computers, One Giant Leap for Mankind by Dana Mackenzie, Dana Blogs Chess, December 06, 2017
Google's AlphaZero Destroys Stockfish In 100-Game Match by Mike Klein, Chess.com, December 06, 2017
The future is here – AlphaZero learns chess by Albert Silver, ChessBase News, December 06, 2017
AlphaZero: Reactions From Top GMs, Stockfish Author by Peter Doggers, Chess.com, December 08, 2017 » Stockfish, Tord Romstad ^[20]
Is AlphaZero really a scientific breakthrough in AI? by Jose Camacho Collados, Medium, December 11, 2017 ^[21]
Alpha Zero: Comparing "Orangutans and Apples" by André Schulz, ChessBase News, December 13, 2017
Kasparov on Deep Learning in chess by Frederic Friedel, ChessBase News, December 13, 2017

2018

AlphaZero really is that good by Colin McGourty, chess24, December 06, 2018
Inside the (deep) mind of AlphaZero by Albert Silver, ChessBase News, December 07, 2018

Stockfish Match

Round 1

The chess games of AlphaZero (Computer) from chessgames.com
Cerebellum AlphaZero Analysis » Cerebellum ^[22]
Deep Mind Alpha Zero's "Immortal Zugzwang Game" against Stockfish by Antonio Radic, December 07, 2017, YouTube Video ^[23] ^[24] » Zugzwang
Deep Mind AI Alpha Zero Dismantles Stockfish's French Defense by Antonio Radic, December 08, 2017, YouTube Video
How AlphaZero Wins by Dana Mackenzie, Dana Blogs Chess, December 15, 2017 ^[25]

Round 2, 3

AlphaZero vs. Stockfish from chess24
AlphaZero's Attacking Chess by Anna Rudolf, December 06, 2018, YouTube Video

"Exactly How to Attack" | DeepMind's AlphaZero vs. Stockfish by Matthew Sadler, December 06, 2018, YouTube Video
"Bold Sir Lancelot" | DeepMind's AlphaZero vs. Stockfish by Matthew Sadler, December 06, 2018, YouTube Video
"All-in Defence" | DeepMind's AlphaZero vs. Stockfish by Matthew Sadler, December 06, 2018, YouTube Video
"Long-term Sacrifice" | DeepMind's AlphaZero vs. Stockfish by Matthew Sadler, December 06, 2018, YouTube Video
"Endgame Class" | DeepMind's AlphaZero vs. Stockfish by Matthew Sadler, December 06, 2018, YouTube Video

Misc

How to build your own AlphaZero AI using Python and Keras by David Foster, January 26, 2018 » Connect Four, Python ^[26]
Can - Halleluwah, from Tago Mago 1971, YouTube Video

lineup: Irmin Schmidt, Michael Karoli, Holger Czukay, Damo Suzuki, Jaki Liebezeit

References

↑ Krampus, figure used in threatening children, Image from the 1900s, source: Historie čertů Krampus, Category:Krampus, Wikimedia Commons
↑ "5th of December - The Krampus has come" was suggested by Michael Scheidl in AlphaZero by Peter Martan, CSS Forum, December 06, 2017, with further comments by Ingo Althöfer
↑ David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815
↑ Alpha Zero by BB+, OpenChess Forum, December 06, 2017
↑ AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017
↑ David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, No. 6419, Supplementary Materials - Architecture
↑ Re: Alphazero news by Matthew Lai, CCC, December 08, 2018
↑ The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks, see Tristan Cazenave (2017). Residual Networks for Computer Go. IEEE Transactions on Computational Intelligence and AI in Games, Vol. PP, No. 99, pdf
↑ Residual Networks for Computer Go by Brahim Hamadicharef, CCC, December 07, 2017
↑ Re: AlphaZero is not like other chess programs by Rein Halbersma, CCC, December 09, 2017
↑ First In-Depth Look at Google’s TPU Architecture by Nicole Hemsoth, The Next Platform, April 05, 2017
↑ Photo of Google Cloud TPU cluster by Norman Schmidt, CCC, December 09, 2017
↑ First In-Depth Look at Google’s New Second-Generation TPU by Nicole Hemsoth, The Next Platform, May 17, 2017
↑ Under The Hood Of Google’s TPU2 Machine Learning Clusters by Paul Teich, The Next Platform, May 22, 2017
↑ David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815
↑ AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
↑ AlphaZero explained by one creator by Mario Carbonell Martinez, CCC, December 19, 2017
↑ A Simple Alpha(Go) Zero Tutorial by Oliver Roese, CCC, December 30, 2017
↑ BBC News; 'Google's ... DeepMind AI claims chess crown' by pennine22, Hiarcs Forum, December 07, 2017
↑ Reactions about AlphaZero from top GMs... by Norman Schmidt, CCC, December 08, 2017
↑ recent article on alphazero ... 12/11/2017 ... by Dan Ellwein, CCC, December 14, 2017
↑ Cerebellum analysis of the AlphaZero - Stockfish Games by Thomas Zipproth, CCC, December 11, 2017
↑ AlphaZero reinvents mobility and romanticism by Chris Whittington, Rybka Forum, December 08, 2017
↑ Immortal Zugzwang Game from Wikipedia
↑ Article:"How Alpha Zero Sees/Wins" by AA Ross, CCC, January 17, 2018
↑ Connect 4 AlphaZero implemented using Python... by Steve Maughan, CCC, January 29, 2018

Up one Level

[1] Krampus, figure used in threatening children, Image from the 1900s, source: Historie čertů Krampus, Category:Krampus, Wikimedia Commons

[2] "5th of December - The Krampus has come" was suggested by Michael Scheidl in AlphaZero by Peter Martan, CSS Forum, December 06, 2017, with further comments by Ingo Althöfer

[3] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815

[4] Alpha Zero by BB+, OpenChess Forum, December 06, 2017

[5] AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017

[6] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, No. 6419, Supplementary Materials - Architecture

[7] Re: Alphazero news by Matthew Lai, CCC, December 08, 2018

[8] The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks, see Tristan Cazenave (2017). Residual Networks for Computer Go. IEEE Transactions on Computational Intelligence and AI in Games, Vol. PP, No. 99, pdf

[9] Residual Networks for Computer Go by Brahim Hamadicharef, CCC, December 07, 2017

[10] Re: AlphaZero is not like other chess programs by Rein Halbersma, CCC, December 09, 2017

[11] First In-Depth Look at Google’s TPU Architecture by Nicole Hemsoth, The Next Platform, April 05, 2017

[12] Photo of Google Cloud TPU cluster by Norman Schmidt, CCC, December 09, 2017

[13] First In-Depth Look at Google’s New Second-Generation TPU by Nicole Hemsoth, The Next Platform, May 17, 2017

[14] Under The Hood Of Google’s TPU2 Machine Learning Clusters by Paul Teich, The Next Platform, May 22, 2017

[15] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815

[16] AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018

[17] AlphaZero explained by one creator by Mario Carbonell Martinez, CCC, December 19, 2017

[18] A Simple Alpha(Go) Zero Tutorial by Oliver Roese, CCC, December 30, 2017

[19] BBC News; 'Google's ... DeepMind AI claims chess crown' by pennine22, Hiarcs Forum, December 07, 2017

[20] Reactions about AlphaZero from top GMs... by Norman Schmidt, CCC, December 08, 2017

[21] recent article on alphazero ... 12/11/2017 ... by Dan Ellwein, CCC, December 14, 2017

[22] Cerebellum analysis of the AlphaZero - Stockfish Games by Thomas Zipproth, CCC, December 11, 2017

[23] AlphaZero reinvents mobility and romanticism by Chris Whittington, Rybka Forum, December 08, 2017

[24] Immortal Zugzwang Game from Wikipedia

[25] Article:"How Alpha Zero Sees/Wins" by AA Ross, CCC, January 17, 2018

[26] Connect 4 AlphaZero implemented using Python... by Steve Maughan, CCC, January 29, 2018

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

@@ Line 10: / Line 10: @@
 =Description=
-Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved a superhuman level of play in the games of chess and [[Shogi]] as well as in [[Go]]. The algorithm is a more generic version of the [[AlphaGo#Zero|AlphaGo Zero]] algorithm that was first introduced in the domain of Go <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref> . AlphaZero [[Evaluation|evaluates]] [[Chess Position|positions]] using non-linear function approximation based on a [[Neural Networks|deep neural network]], rather than the [[Evaluation#Linear|linear function approximation]] as used in classical chess programs. This neural network takes the board position as input and outputs a vector of move probabilities. The MCTS consists of a series of simulated games of self-play whose move selection is controlled by the neural network. The search returns a vector representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.
+Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved a superhuman level of play in the games of chess and [[Shogi]] as well as in [[Go]]. The algorithm is a more generic version of the [[AlphaGo#Zero|AlphaGo Zero]] algorithm that was first introduced in the domain of Go <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref>. AlphaZero [[Evaluation|evaluates]] [[Chess Position|positions]] using non-linear function approximation based on a [[Neural Networks|deep neural network]], rather than the [[Evaluation#Linear|linear function approximation]] as used in classical chess programs.
+This neural network takes the board position as input and outputs a vector of move probabilities (policy) and a position evaluation. Once trained, these network is combined with a [[Monte-Carlo Tree Search]] (MCTS) using the policy to narrow down the search to high probability moves, and using the value in conjunction with a fast rollout policy to evaluate positions in the tree. The selection is done by a variation of [[Christopher D. Rosin|Rosin's]] [[UCT]] improvement dubbed [[Christopher D. Rosin#PUCT|PUCT]].
 ==Network Architecture==
-The network is a [[Neural Networks#Deep|deep]] [[Neural Networks#Residual|residual]] [[Neural Networks#Convolutional|convolutional neural network]] <ref>The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks, see  [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''.  [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65923 Residual Networks for Computer Go] by Brahim Hamadicharef, [[CCC]], December 07, 2017</ref> with many layers of spatial NxN planes - [[8x8 Board|8x8 board]] arrays for chess. The input describes the [[Chess Position|chess position]] from [[Side to move|side's to move]] point of view - that is [[Color Flipping|color flipped]] for black to move. Each square cell consists of 12 [[Pieces#PieceTypeCoding|piece-type]] and [[Color|color]] bits, e.g. from the current [[Bitboard Board-Definition|bitboard board definition]], and to address [[Graph History Interaction|graph history]] and [[Path-Dependency|path-dependency]] - times eight, that is up to seven predecessor positions as well - so that [[En passant|en passant]], immediate [[Repetitions|repetitions]], or some sense of progress is implicit. Additional inputs, redundant inside each square cell  to be conform to the convolution net, consider [[Castling Rights|castling rights]], [[Halfmove Clock|halfmove clock]], total move count and side to move.
+The [[Neural Networks#Deep|deep neural network]] consists of a “body” with input and hidden layers of spatial NxN planes, [[8x8 Board|8x8 board]] arrays for chess, followed by both policy and value “heads” <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419, [http://science.sciencemag.org/content/suppl/2018/12/05/362.6419.1140.DC1 Supplementary Materials] - Architecture</ref> <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=93 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 08, 2018</ref>.
+Each square cell of the input plane contains 6x2 [[Pieces#PieceTypeCoding|piece-type]] and [[Color|color]] bits of the current [[Chess Position|chess position]] from the current player's point of view, plus two bits of a [[Repetitions|repetition counter]] concerning the [[Draw|draw]] rule,
+and to further address [[Graph History Interaction|graph history]] and [[Path-Dependency|path-dependency]] issues - these 14 bits times eight, that is up to seven predecessor positions as well - so that [[En passant|en passant]], or some sense of progress is implicit.
+Additional 7 input bits consider [[Castling Rights|castling rights]], total move count and [[Side to move|side to move]], yielding in 119 bits per square cell for chess.
-The deep hidden layers connect the pieces on different squares to each other due to consecutive 3x3 convolutions, where a cell of a layer is connected to the correspondent 3x3 [https://en.wikipedia.org/wiki/Receptive_field receptive field] of the previous layer, so that after 4 layers, each square is connected to every other cell in the original input layer <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65937&start=10 Re: AlphaZero is not like other chess programs] by [[Rein Halbersma]], [[CCC]], December 09, 2017</ref>. The output of the neural network is finally represented as an 8x8 board array as well, for every [[Origin Square|origin square]] up to 73 [[Target Square|target square]] possibilities ([[Direction#MoveDirections|NRayDirs]] x [[Rays|MaxRayLength]] + [[Direction#KnightDirections|NKnightDirs]] + NPawnDirs * [[Promotions|NMinorPromotions]]), encoding a probability distribution over 64x73 = 4,672 possible moves, where illegal moves were masked out by setting their probabilities to zero, re-normalising the probabilities for remaining moves.
+The body consists of a [https://en.wikipedia.org/wiki/Rectifier_(neural_networks) rectified] [https://en.wikipedia.org/wiki/Batch_normalization batch-normalized] [[Neural Networks#Convolutional|convolutional layer]] followed by 19 [[Neural Networks#Residual|residual]] blocks. Each such block consists of two rectified batch-normalized residual convolutional layers with a skip connection <ref>The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks,  see [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''.  [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65923 Residual Networks for Computer Go] by Brahim Hamadicharef, [[CCC]], December 07, 2017</ref>. Each convolution applies 256 filters (shared weight vectors) of kernel size 3x3 with [https://en.wikipedia.org/wiki/Stride_of_an_array stride] 1.
+These layers connect the pieces on different squares to each other due to consecutive convolutions, where a cell of a layer is connected to the correspondent 3x3 [https://en.wikipedia.org/wiki/Receptive_field receptive field] of the previous layer,
+so that after 4 convolutions, each square is connected to every other cell in the original input layer <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65937&start=10 Re: AlphaZero is not like other chess programs] by [[Rein Halbersma]], [[CCC]], December 09, 2017</ref>.
+The policy head applies an additional rectified, batch-normalized convolutional layer, followed by a final convolution of 73 filters for chess,
+with the final policy output represented as an 8x8 board array as well, for every [[Origin Square|origin square]] up to 73 [[Target Square|target square]] possibilities ([[Direction#MoveDirections|NRayDirs]] x [[Rays|MaxRayLength]] + [[Direction#KnightDirections|NKnightDirs]] + NPawnDirs * [[Promotions|NMinorPromotions]]), encoding a probability distribution over 64x73 = 4,672 possible moves, where illegal moves were masked out by setting their probabilities to zero, re-normalising the probabilities for remaining moves. The value head applies an additional rectified, batch-normalized convolution of 1 filter of kernel size 1x1 with stride 1, followed by a rectified linear layer of size 256 and a [https://en.wikipedia.org/wiki/Hyperbolic_function tanh]-linear layer of size 1.
 ==Training==
@@ Line 85: / Line 94: @@
 : [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=82 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 07, 2018
 : [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=86 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 07, 2018 » [[Giraffe]]
+: [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=93 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 08, 2018
 =External Links=

Difference between revisions of "AlphaZero"

Revision as of 13:03, 9 December 2018

Contents

Stockfish Match

Description

Network Architecture

Training

See also

Publications

Forum Posts

2017

2018

External Links

Reports

2017

2018

Stockfish Match

Round 1

Round 2, 3

Misc

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools