Changes

Jump to: navigation, search

Leela Chess Zero

13 bytes removed, 11:12, 14 January 2019
no edit summary
Concerning [[Nodes per Second|nodes per second]] of the MCTS, smaller models are faster to calculate than larger models. They are faster to train and will earlier recognize progress, but they will also saturate earlier so that at some point more training will no longer improve the engine. Larger and deeper network models will improve the receptivity, the amount of knowledge and pattern to extract from the training samples, with potential for a [[Playing Strength|stronger]] engine.
As a further improvement, Leele Chess Zero applies the ''Squeeze and Excite'' (SE) extension to the residual block architecture <ref>[https://github.com/LeelaChessZero/lc0/wiki/Technical-Explanation-of-Leela-Chess-Zero Technical Explanation of Leela Chess Zero · LeelaChessZero/lc0 Wiki · GitHub]</ref> <ref>[https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Squeeze-and-Excitation Networks – Towards Data Science] by [http://plpp.de/ Paul-Louis Pröve], October 17, 2017</ref>.The body is fully connected to both the policy "head" for the move probability distribution, and the value "head" for the evaluation score aka [[Pawn Advantage, Win Percentage, and Elo|winning probability]] of the the the current position and up to seven predecessor positions on the input planes.
==Training==
Like in [[AlphaZero]], the '''Zero''' suffix implies no other initial knowledge than the rules of the game, to build a superhuman player, starting with truly random self-play games to apply [[Reinforcement Learning|reinforcement learning]] based on the outcome of that games.However, there are derived approaches, such as [[Albert Silver|Albert Silver's]] [[Deus X]], trying to take a short-cut by initially using [[Supervised Learning|supervised learning]] techniques, such as feeding in high quality games played by other strong chess playing entities, or huge records of positions with a given preferred move.
The unsupervised training of the NN is about to minimize the [https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm L2-norm] of the [https://en.wikipedia.org/wiki/Mean_squared_error mean squared error] loss of the value output and the policy loss. Further there are experiments to train the value head against not the game outcome, but against the accumulated value for a position after exploring some number of nodes with [[UCT]] <ref>[https://medium.com/oracledevs/lessons-from-alphazero-part-4-improving-the-training-target-6efba2e71628 Lessons From AlphaZero (part 4): Improving the Training Target] by [https://blogs.oracle.com/author/vish-abrams Vish Abrams], [https://blogs.oracle.com/ Oracle Blog], June 27, 2018</ref>.
The distributed training is realized with an sophisticated [https://en.wikipedia.org/wiki/Client%E2%80%93server_model client-server model]. The client, written entirely in the [[Go (Programming Language)|Go programming language]], incorporates Lc0 to produce self-play games.Controlled by the server, the client may download the latest network, will start self-playing, and uploading games to the server, who on the other hand will regularly produce and distribute new neural network weights after a certain amount of games available from contributors. The training software consists of [[Python]] code, the pipeline requires [https://en.wikipedia.org/wiki/NumPy NumPy] and [https://en.wikipedia.org/wiki/TensorFlow TensorFlow] running on [[Linux]] <ref>[https://github.com/LeelaChessZero/lczero-training GitHub - LeelaChessZero/lczero-training: For code etc relating to the network training process.]</ref>.
The server is written in [[Go (Programming Language)|Go]] along with [[Python]] and [https://en.wikipedia.org/wiki/Shell_script shell scripts].

Navigation menu