Changes

AlphaZero

698 bytes added, 11:01, 20 February 2021

no edit summary

'''[[Main Page|Home]] * [[Engines]] * AlphaZero'''

~~[[FILE:ZERO-Manifest1.jpg~~{|~~border~~|~~right~~- style="vertical-align:top;"|~~thumb| ZERO Manifesto~~ '''AlphaZero''',<~~ref~~br/>a chess and [[Go]] playing entity by [[Google]] [[DeepMind]] based on a general [[Reinforcement Learning|reinforcement learning]] algorithm with the same name. On [https://en.wikipedia.org/wiki/~~Manifesto Manifesto~~December_5#Holidays_and_observances December 5] ~~of the~~ , [https://en.wikipedia.org/wiki/~~Zero_(art) ZERO~~Portal:Current_events/2017_December_5 2017] <ref>"5th of December - The [https://en.wikipedia.org/wiki/~~Art_movement Art group~~Krampus Krampus] ~~1963~~has come", ~~Source:~~ suggested by [[Michael Scheidl]] in [~~https~~http://deforum.~~wikipedia~~computerschach.~~org~~de/cgi-bin/mwf/~~wiki~~topic_show.pl?tid=9635 AlphaZero] by Peter Martan, [[Computer Chess Forums|CSS Forum]], December 06, 2017, with further comments by [[Ingo Althöfer]]</~~Klaus_Schrenk Klaus Schrenk~~ref>, the DeepMind team around [[David Silver]], [[Thomas Hubert]], and [[Julian Schrittwieser]] along with former [[Giraffe]] author [[Matthew Lai]], reported on their generalized algorithm, combining [[Deep Learning|Deep learning]] with [[Monte-Carlo Tree Search]] (~~Ed.~~MCTS) <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''~~1984~~2017'''). ''~~Aufbrüche. Manifeste, Manifestationen. Positionen in der bildenden Kunst zu Beginn der 60er Jahre in Berlin, Düsseldorf und München~~Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://~~en.wikipedia~~arxiv.org/~~wiki~~abs/M1712.01815 arXiv:1712.~~_DuMont_Schauberg DuMont~~01815] ~~(German),~~ </ref>. The final [https://en.wikipedia.org/wiki/~~Wikimedia_Commons Wikimedia Commons~~Peer_review peer reviewed]~~, Translation by~~ paper with various clarifications was published almost one year later in the [https://en.wikipedia.org/wiki/~~Google_Translate Google Translate~~Science_(journal) Science magazine]under the title ''A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play'' <~~br/~~ref>~~"Zero is silence. Zero is the beginning. Zero is round. Zero turns. Zero is the moon. The sun is zero. Zero is white. The desert zero. The sky over zero. The night -~~[[David Silver]], ~~Zero flows. The eye zero. Navel. Mouth. Kiss. The milk is round. The flower zero the bird. Silently. Pending. I eat Zero~~[[Thomas Hubert]], ~~I drink Zero~~[[Julian Schrittwieser]], ~~I sleep Zero~~[[Ioannis Antonoglou]], ~~I watch Zero~~[[Matthew Lai]], ~~I love Zero. Zero is beautiful~~[[Arthur Guez]], ~~dynamo~~[[Marc Lanctot]], ~~dynamo~~[[Laurent Sifre]], ~~dynamo. The trees in spring~~[[Dharshan Kumaran]], ~~the snow~~[[Thore Graepel]], ~~fire~~[[Timothy Lillicrap]], ~~water~~[[Karen Simonyan]], ~~sea~~[[Demis Hassabis]] ('''2018'''). ~~Red orange yellow green indigo blue violet zero zero rainbow~~''[http://science. ~~4 3 2 1 Zero~~sciencemag. ~~Gold and silver~~org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, ~~noise~~ and ~~smoke~~Go through self-play]''. ~~Zero circus~~[https://en. ~~Zero is silence~~wikipedia. ~~Zero is the beginning~~org/wiki/Science_(journal) Science], Vol. ~~Zero is round~~362, No. ~~Zero is zero. " Zero the new Idealism~~6419</ref> ]] .

~~'''AlphaZero''', ~~

a chess and [[Go]] playing entity by [[Google]] [[DeepMind]] based on a general [[Reinforcement Learning|reinforcement learning]] algorithm with the same name. On [https://en.wikipedia.org/wiki/December_5#Holidays_and_observances December 5], [https://en.wikipedia.org/wiki/Portal:Current_events/2017_December_5 2017] <ref>"5th of December - The [https://en.wikipedia.org/wiki/Krampus Krampus] has come", suggested by [[Michael Scheidl]] in [http://forum.computerschach.de/cgi-bin/mwf/topic_show.pl?tid=9635 AlphaZero] by Peter Martan, [[Computer Chess Forums|CSS Forum]], December 06, 2017, with further comments by [[Ingo Althöfer]]</ref>, the DeepMind team around [[David Silver]], [[Thomas Hubert]], and [[Julian Schrittwieser]] along with former [[Giraffe]] author [[Matthew Lai]], reported on their generalized algorithm, combining [[Deep Learning|Deep learning]] with [[Monte-Carlo Tree Search]] (MCTS) <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815]</ref>. The final [https://en.wikipedia.org/wiki/Peer_review peer reviewed] paper with various clarifications was published almost one year later in the [https://en.wikipedia.org/wiki/Science_(journal) Science magazine] under the title ''A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play'' <ref>[[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419</ref>.

~~=Description=~~

Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved a superhuman level of play in the games of chess and [[Shogi]] as well as in [[Go]]. The algorithm is a more generic version of the [[AlphaGo#Zero|AlphaGo Zero]] algorithm that was first introduced in the domain of Go <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref>. AlphaZero [[Evaluation|evaluates]] [[Chess Position|positions]] using non-linear function approximation based on a [[Neural Networks|deep neural network]], rather than the [[Evaluation#Linear|linear function approximation]] as used in classical chess programs.

This neural network takes the board position as input and outputs a vector of move probabilities (policy) and a position evaluation. Once trained, these network is combined with a [[Monte-Carlo Tree Search]] (MCTS) using the policy to narrow down the search to high probability moves, and using the value in conjunction with a fast rollout policy to evaluate positions in the tree. The selection is done by a variation of [[Christopher D. Rosin|Rosin's]] [[UCT]] improvement dubbed [[Christopher D. Rosin#PUCT|PUCT]].

| style="width: 30%" | „Zero ist die Stille. Zero ist der Anfang. Zero ist rund. Zero dreht sich. Zero ist der Mond. Die Sonne ist Zero. Zero ist weiss. Die Wüste Zero. Der Himmel über Zero. Die Nacht –, Zero fließt. Das Auge Zero. Nabel. Mund. Kuß. Die Milch ist rund. Die Blume Zero der Vogel. Schweigend. Schwebend. Ich esse Zero, ich trinke Zero, ich schlafe Zero, ich wache Zero, ich liebe Zero. Zero ist schön, dynamo, dynamo, dynamo. Die Bäume im Frühling, der Schnee, Feuer, Wasser, Meer. Rot orange gelb grün indigo blau violett Zero Zero Regenbogen. 4 3 2 1 Zero. Gold und Silber, Schall und Rauch. Wanderzirkus Zero. Zero ist die Stille. Zero ist der Anfang. Zero ist rund. Zero ist Zero.“ <ref>[https://de.wikipedia.org/wiki/ZERO#Manifest Zero Manifesto] by [https://en.wikipedia.org/wiki/G%C3%BCnther_Uecker Günther Uecker], [https://en.wikipedia.org/wiki/Heinz_Mack Heinz Mack] and [https://en.wikipedia.org/wiki/Otto_Piene Otto Piene] of the [https://en.wikipedia.org/wiki/Zero_(art) ZERO] [https://en.wikipedia.org/wiki/Art_movement Art group] 1963, Translation by [https://en.wikipedia.org/wiki/Google_Translate Google Translate] "Zero is silence. Zero is the beginning. Zero is round. Zero turns. Zero is the moon. The sun is zero. Zero is white. The desert zero. The sky over zero. The night -, Zero flows. The eye zero. Navel. Mouth. Kiss. The milk is round. The flower zero the bird. Silently. Pending. I eat Zero, I drink Zero, I sleep Zero, I watch Zero, I love Zero. Zero is beautiful, dynamo, dynamo, dynamo. The trees in spring, the snow, fire, water, sea. Red orange yellow green indigo blue violet zero zero rainbow. 4 3 2 1 Zero. Gold and silver, noise and smoke. Zero circus. Zero is silence. Zero is the beginning. Zero is round. Zero is zero. " Zero the new Idealism</ref>

|}

==Network Architecture==

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

AlphaZero

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools