Changes

Jump to: navigation, search

AlphaZero

1,856 bytes added, 13:03, 9 December 2018
no edit summary
=Description=
Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved a superhuman level of play in the games of chess and [[Shogi]] as well as in [[Go]]. The algorithm is a more generic version of the [[AlphaGo#Zero|AlphaGo Zero]] algorithm that was first introduced in the domain of Go <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref> . AlphaZero [[Evaluation|evaluates]] [[Chess Position|positions]] using non-linear function approximation based on a [[Neural Networks|deep neural network]], rather than the [[Evaluation#Linear|linear function approximation]] as used in classical chess programs. This neural network takes the board position as input and outputs a vector of move probabilities(policy) and a position evaluation. The MCTS consists of Once trained, these network is combined with a series of simulated games of self[[Monte-play whose move selection is controlled by Carlo Tree Search]] (MCTS) using the policy to narrow down the neural network. The search returns a vector representing a probability distribution over to high ­probability moves, either proportionally or greedily and using the value in conjunction with respect a fast rollout policy to evaluate positions in the visit counts at the root statetree. The selection is done by a variation of [[Christopher D. Rosin|Rosin's]] [[UCT]] improvement dubbed [[Christopher D. Rosin#PUCT|PUCT]].
==Network Architecture==
The network is a [[Neural Networks#Deep|deepneural network]] consists of a “body” with input and hidden layers of spatial NxN planes, [[Neural Networks#Residual8x8 Board|residual8x8 board]] arrays for chess, followed by both policy and value “heads” <ref>[[David Silver]], [[Thomas Hubert]] , [[Neural Networks#Convolutional|convolutional neural networkJulian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]] <ref>The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks, see [[Tristan CazenaveDemis Hassabis]] ('''20172018'''). ''[http://ieeexplorescience.ieeesciencemag.org/documentcontent/7875402362/ Residual Networks for Computer 6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Gothrough self-play]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. PP362, No. 996419, [http://wwwscience.lamsadesciencemag.dauphine.frorg/content/suppl/2018/~cazenave12/papers05/resnet362.6419.pdf pdf1140.DC1 Supplementary Materials]- Architecture</ref> <ref>[http://www.talkchess.com/forumforum3/viewtopic.php?f=2&t=65923 Residual Networks for Computer Go69175&start=93 Re: Alphazero news] by Brahim Hamadicharef[[Matthew Lai]], [[CCC]], December 0708, 20172018</ref> with many layers . Each square cell of spatial NxN planes the input plane contains 6x2 [[Pieces#PieceTypeCoding|piece- type]] and [[8x8 BoardColor|8x8 boardcolor]] arrays for chess. The input describes bits of the current [[Chess Position|chess position]] from [[Side to move|sidethe current player's to move]] point of view - that is [[Color Flipping|color flipped]] for black to move. Each square cell consists , plus two bits of 12 a [[Pieces#PieceTypeCodingRepetitions|piece-typerepetition counter]] and [[Color|color]] bits, e.g. from concerning the current [[Bitboard Board-DefinitionDraw|bitboard board definitiondraw]]rule, and to further address [[Graph History Interaction|graph history]] and [[Path-Dependency|path-dependency]] issues - these 14 bits times eight, that is up to seven predecessor positions as well - so that [[En passant|en passant]], immediate [[Repetitions|repetitions]], or some sense of progress is implicit. Additional inputs, redundant inside each square cell to be conform to the convolution net, 7 input bits consider [[Castling Rights|castling rights]], [[Halfmove Clock|halfmove clock]], total move count and [[Side to move|side to move]], yielding in 119 bits per square cell for chess.
The deep hidden body consists of a [https://en.wikipedia.org/wiki/Rectifier_(neural_networks) rectified] [https://en.wikipedia.org/wiki/Batch_normalization batch-normalized] [[Neural Networks#Convolutional|convolutional layer]] followed by 19 [[Neural Networks#Residual|residual]] blocks. Each such block consists of two rectified batch-normalized residual convolutional layers with a skip connection <ref>The principle of residual nets is to add the input of the layer to the output of each layer. With this simple modification training is faster and enables deeper networks, see [[Tristan Cazenave]] ('''2017'''). ''[http://ieeexplore.ieee.org/document/7875402/ Residual Networks for Computer Go]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. PP, No. 99, [http://www.lamsade.dauphine.fr/~cazenave/papers/resnet.pdf pdf]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65923 Residual Networks for Computer Go] by Brahim Hamadicharef, [[CCC]], December 07, 2017</ref>. Each convolution applies 256 filters (shared weight vectors) of kernel size 3x3 with [https://en.wikipedia.org/wiki/Stride_of_an_array stride] 1. These layers connect the pieces on different squares to each other due to consecutive 3x3 convolutions, where a cell of a layer is connected to the correspondent 3x3 [https://en.wikipedia.org/wiki/Receptive_field receptive field] of the previous layer, so that after 4 layersconvolutions, each square is connected to every other cell in the original input layer <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65937&start=10 Re: AlphaZero is not like other chess programs] by [[Rein Halbersma]], [[CCC]], December 09, 2017</ref>.  The output policy head applies an additional rectified, batch-normalized convolutional layer, followed by a final convolution of 73 filters for chess,with the neural network is finally final policy output represented as an 8x8 board array as well, for every [[Origin Square|origin square]] up to 73 [[Target Square|target square]] possibilities ([[Direction#MoveDirections|NRayDirs]] x [[Rays|MaxRayLength]] + [[Direction#KnightDirections|NKnightDirs]] + NPawnDirs * [[Promotions|NMinorPromotions]]), encoding a probability distribution over 64x73 = 4,672 possible moves, where illegal moves were masked out by setting their probabilities to zero, re-normalising the probabilities for remaining moves. The value head applies an additional rectified, batch-normalized convolution of 1 filter of kernel size 1x1 with stride 1, followed by a rectified linear layer of size 256 and a [https://en.wikipedia.org/wiki/Hyperbolic_function tanh]-linear layer of size 1.
==Training==
: [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=82 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 07, 2018
: [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=86 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 07, 2018 » [[Giraffe]]
: [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=93 Re: Alphazero news] by [[Matthew Lai]], [[CCC]], December 08, 2018
=External Links=

Navigation menu