Difference between revisions of "Reinforcement Learning"

From Chessprogramming wiki
Jump to: navigation, search
 
(19 intermediate revisions by the same user not shown)
Line 72: Line 72:
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
 
* [[Marco Wiering]] ('''1999'''). ''Explorations in Efficient Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
 
* [[Marco Wiering]] ('''1999'''). ''Explorations in Efficient Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
 +
* [[Richard Sutton]], [[Doina Precup]], [[Mathematician#SSingh|Satinder Singh]] ('''1999'''). ''Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_(journal) Artificial Intelligence], Vol. 112,  [https://people.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf pdf]
 
==2000 ...==
 
==2000 ...==
 
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1
 
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1
Line 81: Line 82:
 
* [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf]
 
* [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf]
 
* [[Marco Block-Berlitz]] ('''2003'''). ''Reinforcement Learning in der Schachprogrammierung''. Studienarbeit, Freie Universität Berlin, Dozent: [[Raúl Rojas|Prof. Dr. Raúl Rojas]], [http://page.mi.fu-berlin.de/block/Skripte/Reinforcement.pdf pdf] (German)
 
* [[Marco Block-Berlitz]] ('''2003'''). ''Reinforcement Learning in der Schachprogrammierung''. Studienarbeit, Freie Universität Berlin, Dozent: [[Raúl Rojas|Prof. Dr. Raúl Rojas]], [http://page.mi.fu-berlin.de/block/Skripte/Reinforcement.pdf pdf] (German)
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]
+
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, Cognitive Artificial Intelligence, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.810&rep=rep1&type=pdf pdf]
 
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf]
 
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf]
 
* [https://dblp.uni-trier.de/pers/hd/k/Kerr:Amy_J= Amy J. Kerr], [[Todd W. Neller]], [https://dblp.uni-trier.de/pers/hd/p/Pilla:Christopher_J=_La Christopher J. La Pilla] , [https://dblp.uni-trier.de/pers/hd/s/Schompert:Michael_D= Michael D. Schompert] ('''2002'''). ''[https://www.semanticscholar.org/paper/Java-Resources-for-Teaching-Reinforcement-Learning-Kerr-Neller/3d84018eb8b8668c13d1d4f6efca4442af2915b4 Java Resources for Teaching Reinforcement Learning]''. [https://dblp.uni-trier.de/db/conf/pdpta/pdpta2003-3.html PDPTA 2003]
 
* [https://dblp.uni-trier.de/pers/hd/k/Kerr:Amy_J= Amy J. Kerr], [[Todd W. Neller]], [https://dblp.uni-trier.de/pers/hd/p/Pilla:Christopher_J=_La Christopher J. La Pilla] , [https://dblp.uni-trier.de/pers/hd/s/Schompert:Michael_D= Michael D. Schompert] ('''2002'''). ''[https://www.semanticscholar.org/paper/Java-Resources-for-Teaching-Reinforcement-Learning-Kerr-Neller/3d84018eb8b8668c13d1d4f6efca4442af2915b4 Java Resources for Teaching Reinforcement Learning]''. [https://dblp.uni-trier.de/db/conf/pdpta/pdpta2003-3.html PDPTA 2003]
Line 99: Line 100:
 
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf]
 
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf]
 
* [[Marcin Szubert]] ('''2009'''). ''Coevolutionary Reinforcement Learning and its Application to Othello''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], [https://mszubert.github.io/papers/Szubert_2009_MSC.pdf pdf]
 
* [[Marcin Szubert]] ('''2009'''). ''Coevolutionary Reinforcement Learning and its Application to Othello''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], [https://mszubert.github.io/papers/Szubert_2009_MSC.pdf pdf]
 +
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2006, 2011'''). ''Anytime Point-Based Approximations for Large POMDPs''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 27, [https://arxiv.org/abs/1110.0027 arXiv:1110.0027]
 
==2010 ...==
 
==2010 ...==
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf]
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf]
Line 136: Line 138:
 
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]
 
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]
 
* [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matthew Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763]
 
* [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matthew Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763]
 +
* [[Zacharias Georgiou]], [[Evangelos Karountzos]], [[Yaroslav Shkarupa]], [[Matthia Sabatelli]] ('''2016'''). ''A Reinforcement Learning Approach for Solving KRK Chess Endgames''. [https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames/blob/master/project_papers/final_paper/reinforcement-learning-approach(2).pdf pdf] <ref>[https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames GitHub - paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames: Machine Learning - Reinforcement Learning]</ref>
 
'''2017'''
 
'''2017'''
 
* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf]
 
* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf]
Line 142: Line 145:
 
* [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560]
 
* [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560]
 
* [https://scholar.google.com/citations?user=tiE4g64AAAAJ&hl=en Maithra Raghu], [https://scholar.google.com/citations?user=ZZNxNAYAAAAJ&hl=en Alex Irpan], [[Mathematician#JAndreas|Jacob Andreas]], [[Mathematician#RKleinberg|Robert Kleinberg]], [[Quoc V. Le]], [[Jon Kleinberg]] ('''2017'''). ''Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?'' [https://arxiv.org/abs/1711.02301 arXiv:1711.02301]
 
* [https://scholar.google.com/citations?user=tiE4g64AAAAJ&hl=en Maithra Raghu], [https://scholar.google.com/citations?user=ZZNxNAYAAAAJ&hl=en Alex Irpan], [[Mathematician#JAndreas|Jacob Andreas]], [[Mathematician#RKleinberg|Robert Kleinberg]], [[Quoc V. Le]], [[Jon Kleinberg]] ('''2017'''). ''Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?'' [https://arxiv.org/abs/1711.02301 arXiv:1711.02301]
 
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. [[TAAI 2017]] » [[Hex]], [[Neural Networks#Convolutional|CNN]]
 
* [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. [[TAAI 2017]] » [[Hex]], [[Neural Networks#Convolutional|CNN]]
Line 151: Line 153:
 
'''2018'''
 
'''2018'''
 
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Monte Carlo Q-learning for General Game Playing''. [https://arxiv.org/abs/1802.05944 arXiv:1802.05944] » [[Monte-Carlo Tree Search|MCTS]], [[General Game Playing]]
 
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Monte Carlo Q-learning for General Game Playing''. [https://arxiv.org/abs/1802.05944 arXiv:1802.05944] » [[Monte-Carlo Tree Search|MCTS]], [[General Game Playing]]
 +
* [[Vinícius Flores Zambaldi]], [[David Raposo]], [[Adam Santoro]], [[Victor Bapst]], [[Yujia Li]], [[Igor Babuschkin]], [[Karl Tuyls]], [[David P. Reichert]], [[Timothy Lillicrap]], [[Edward Lockhart]], [[Murray Shanahan]], [[Victoria Langston]], [[Razvan Pascanu]], [[Matthew Botvinick]], [[Oriol Vinyals]], [[Peter W. Battaglia]] ('''2018'''). ''Relational Deep Reinforcement Learning''. [https://arxiv.org/abs/1806.01830 arXiv:1806.01830]
 
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Assessing the Potential of Classical Q-learning in General Game Playing''. [https://arxiv.org/abs/1810.06078 arXiv:1810.06078]
 
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Assessing the Potential of Classical Q-learning in General Game Playing''. [https://arxiv.org/abs/1810.06078 arXiv:1810.06078]
 +
* [https://scholar.google.com/citations?user=n12uNYcAAAAJ&hl=en Vincent Francois-Lavet], [https://scholar.google.com/citations?user=dy_JBs0AAAAJ&hl=en Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [https://scholar.google.com/citations?user=uyYPun0AAAAJ&hl=en Marc G. Bellemare], [[Joelle Pineau]] ('''2018'''). ''An Introduction to Deep Reinforcement Learning''. [https://arxiv.org/abs/1811.12560 arXiv:1811.12560]
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419 <ref>[https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ AlphaZero: Shedding new light on the grand games of chess, shogi and Go] by [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]] and [[Demis Hassabis]], [[DeepMind]], December 03, 2018</ref>
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419 <ref>[https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ AlphaZero: Shedding new light on the grand games of chess, shogi and Go] by [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]] and [[Demis Hassabis]], [[DeepMind]], December 03, 2018</ref>
 
* [[Tianhe Wang]], [[Tomoyuki Kaneko]] ('''2018'''). ''Application of Deep Reinforcement Learning in Werewolf Game Agents''. [[TAAI 2018]]
 
* [[Tianhe Wang]], [[Tomoyuki Kaneko]] ('''2018'''). ''Application of Deep Reinforcement Learning in Werewolf Game Agents''. [[TAAI 2018]]
Line 160: Line 164:
 
* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Successive Over Relaxation Q-Learning''. [https://arxiv.org/abs/1903.03812 arXiv:1903.03812]
 
* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Successive Over Relaxation Q-Learning''. [https://arxiv.org/abs/1903.03812 arXiv:1903.03812]
 
* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Second Order Value Iteration in Reinforcement Learning''. [https://arxiv.org/abs/1905.03927 arXiv:1905.03927]
 
* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Second Order Value Iteration in Reinforcement Learning''. [https://arxiv.org/abs/1905.03927 arXiv:1905.03927]
* [[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinicius Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453] <ref>[https://github.com/deepmind/open_spiel/blob/master/docs/contributing.md open_spiel/contributing.md at master · deepmind/open_spiel · GitHub]</ref>
+
* [[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453] <ref>[https://github.com/deepmind/open_spiel/blob/master/docs/contributing.md open_spiel/contributing.md at master · deepmind/open_spiel · GitHub]</ref>
 
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>
 
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>
 
* [[Mathematician#SrbhBose|Sourabh Bose]] ('''2019'''). ''[https://rc.library.uta.edu/uta-ir/handle/10106/28094 Learning Representations Using Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Texas_at_Arlington University of Texas at Arlington], advisor [[Mathematician#MHuber|Manfred Huber]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810&start=6 e: Board adaptive / tuning evaluation function - no NN/AI] by Tony P., [[CCC]], January 15, 2020</ref>
 
* [[Mathematician#SrbhBose|Sourabh Bose]] ('''2019'''). ''[https://rc.library.uta.edu/uta-ir/handle/10106/28094 Learning Representations Using Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Texas_at_Arlington University of Texas at Arlington], advisor [[Mathematician#MHuber|Manfred Huber]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810&start=6 e: Board adaptive / tuning evaluation function - no NN/AI] by Tony P., [[CCC]], January 15, 2020</ref>
 +
* [[Johannes Czech]] ('''2019'''). ''Deep Reinforcement Learning for Crazyhouse''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/czech2019deep.pdf pdf] » [[CrazyAra]]
 
==2020 ...==
 
==2020 ...==
 
* [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2020'''). ''2048-like games for teaching reinforcement learning''. [[ICGA Journal#42_1|ICGA Journal, Vol. 42, No. 1]]
 
* [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2020'''). ''2048-like games for teaching reinforcement learning''. [[ICGA Journal#42_1|ICGA Journal, Vol. 42, No. 1]]
 
* [https://dblp.org/pid/233/8144.html Indu John], [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [[Shalabh Bhatnagar]] ('''2020'''). ''Generalized Speedy Q-Learning''. [[IEEE#CSL|IEEE Control Systems Letters]], Vol. 4, No. 3, [https://arxiv.org/abs/1911.00397 arXiv:1911.00397]
 
* [https://dblp.org/pid/233/8144.html Indu John], [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [[Shalabh Bhatnagar]] ('''2020'''). ''Generalized Speedy Q-Learning''. [[IEEE#CSL|IEEE Control Systems Letters]], Vol. 4, No. 3, [https://arxiv.org/abs/1911.00397 arXiv:1911.00397]
 
* [[Takuya Hiraoka]], [https://dblp.org/pers/hd/i/Imagawa:Takahisa Takahisa Imagawa], [https://dblp.org/pers/hd/t/Tangkaratt:Voot Voot Tangkaratt], [https://dblp.org/pers/hd/o/Osa:Takayuki Takayuki Osa], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka]  ('''2020'''). ''Meta-Model-Based Meta-Policy Optimization''. [https://arxiv.org/abs/2006.02608 arXiv:2006.02608]
 
* [[Takuya Hiraoka]], [https://dblp.org/pers/hd/i/Imagawa:Takahisa Takahisa Imagawa], [https://dblp.org/pers/hd/t/Tangkaratt:Voot Voot Tangkaratt], [https://dblp.org/pers/hd/o/Osa:Takayuki Takayuki Osa], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka]  ('''2020'''). ''Meta-Model-Based Meta-Policy Optimization''. [https://arxiv.org/abs/2006.02608 arXiv:2006.02608]
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhar]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588 <ref>[https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?fbclid=IwAR3mSwrn1YXDKr9uuGm2GlFKh76wBilex7f8QvBiQecwiVmAvD6Bkyjx-rE MuZero: Mastering Go, chess, shogi and Atari without rules]</ref>
+
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588 <ref>[https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?fbclid=IwAR3mSwrn1YXDKr9uuGm2GlFKh76wBilex7f8QvBiQecwiVmAvD6Bkyjx-rE MuZero: Mastering Go, chess, shogi and Atari without rules]</ref> <ref>[https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero]</ref>
 
* [[Tristan Cazenave]], [[Yen-Chi Chen]], [[Guan-Wei Chen]], [[Shi-Yu Chen]], [[Xian-Dong Chiu]], [[Julien Dehos]], [[Maria Elsa]], [[Qucheng Gong]], [[Hengyuan Hu]], [[Vasil Khalidov]], [[Cheng-Ling Li]], [[Hsin-I Lin]], [[Yu-Jin Lin]], [[Xavier Martinet]], [[Vegard Mella]], [[Jeremy Rapin]], [[Baptiste Roziere]], [[Gabriel Synnaeve]], [[Fabien Teytaud]], [[Olivier Teytaud]], [[Shi-Cheng Ye]], [[Yi-Jun Ye]], [[Shi-Jim Yen]], [[Sergey Zagoruyko]] ('''2020''').  ''Polygames: Improved zero learning''. [[ICGA Journal#42_4|ICGA Journal, Vol. 42, No. 4]], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832]
 
* [[Tristan Cazenave]], [[Yen-Chi Chen]], [[Guan-Wei Chen]], [[Shi-Yu Chen]], [[Xian-Dong Chiu]], [[Julien Dehos]], [[Maria Elsa]], [[Qucheng Gong]], [[Hengyuan Hu]], [[Vasil Khalidov]], [[Cheng-Ling Li]], [[Hsin-I Lin]], [[Yu-Jin Lin]], [[Xavier Martinet]], [[Vegard Mella]], [[Jeremy Rapin]], [[Baptiste Roziere]], [[Gabriel Synnaeve]], [[Fabien Teytaud]], [[Olivier Teytaud]], [[Shi-Cheng Ye]], [[Yi-Jun Ye]], [[Shi-Jim Yen]], [[Sergey Zagoruyko]] ('''2020''').  ''Polygames: Improved zero learning''. [[ICGA Journal#42_4|ICGA Journal, Vol. 42, No. 4]], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832]
 +
* [[Matthia Sabatelli]], [https://github.com/glouppe Gilles Louppe], [https://scholar.google.com/citations?user=tyFTsmIAAAAJ&hl=en Pierre Geurts], [[Marco Wiering]] ('''2020'''). ''The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms''. [https://dblp.org/db/conf/ijcnn/ijcnn2020.html#SabatelliLGW20 IJCNN 2020] <ref>[https://github.com/paintception/Deep-Quality-Value-DQV-Learning- GitHub - paintception/Deep-Quality-Value-DQV-Learning-: DQV-Learning: a novel faster synchronous Deep Reinforcement Learning algorithm]</ref>
 +
* [[Quentin Cohen-Solal]] ('''2020'''). ''Learning to Play Two-Player Perfect-Information Games without Knowledge''. [https://arxiv.org/abs/2008.01188 arXiv:2008.01188]
 +
* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]
 +
'''2021'''
 +
* [[Maximilian Alexander Gehrke]] ('''2021'''). ''Assessing Popular Chess Variants Using Deep Reinforcement Learning''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/gehrke2021assessing.pdf pdf] » [[CrazyAra]]
 +
* [[Dominik Klein]] ('''2021'''). ''[https://github.com/asdfjkl/neural_network_chess Neural Networks For Chess]''. [https://github.com/asdfjkl/neural_network_chess/releases/tag/v1.1 Release Version 1.1 · GitHub] <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=2&t=78283 Book about Neural Networks for Chess] by dkl, [[CCC]], September 29, 2021</ref>
 +
* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2021'''). ''DESCENT wins five gold medals at the Computer Olympiad''. [[ICGA Journal#43_2|ICGA Journal, Vol. 43, No. 2]]
 +
* [[Boris Doux]], [[Benjamin Negrevergne]], [[Tristan Cazenave]] ('''2021'''). ''Deep Reinforcement Learning for Morpion Solitaire''. [[Advances in Computer Games 17]]
 +
* [[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210] <ref>[https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021]</ref> <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=7&t=78790 Want to train nets faster?] by [[Dann Corbit]], [[CCC]], December 01, 2021</ref>
 +
* [[Dennis Soemers]], [[Vegard Mella]], [[Cameron Browne]], [[Olivier Teytaud]] ('''2021'''). ''Deep learning for general game playing with Ludii and Polygames''. [[ICGA Journal#43_3|ICGA Journal, Vol. 43, No. 3]]
  
 
=Postings=
 
=Postings=
Line 188: Line 203:
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75411 Unsupervised reinforcement tuning from zero] by Madeleine Birchfield, [[CCC]], October 16, 2020 » [[Automated Tuning]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75411 Unsupervised reinforcement tuning from zero] by Madeleine Birchfield, [[CCC]], October 16, 2020 » [[Automated Tuning]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75606 Transhuman Chess with NN and RL...] by [[Srdja Matovic]], [[CCC]], October 30, 2020 » [[Neural Networks|NN]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75606 Transhuman Chess with NN and RL...] by [[Srdja Matovic]], [[CCC]], October 30, 2020 » [[Neural Networks|NN]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76465 Reinforcement learning project] by [[Harm Geert Muller]], [[CCC]], January 31, 2021 » [[Texel's Tuning Method]]
  
 
=External Links=  
 
=External Links=  
Line 220: Line 236:
 
* [http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ Introduction to Reinforcement Learning] by [[Joelle Pineau]], [[McGill University]], 2016, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 
* [http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ Introduction to Reinforcement Learning] by [[Joelle Pineau]], [[McGill University]], 2016, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 
: {{#evu:https://www.youtube.com/watch?v=O_1Z63EDMvQ|alignment=left|valignment=top}}
 
: {{#evu:https://www.youtube.com/watch?v=O_1Z63EDMvQ|alignment=left|valignment=top}}
==OpenSpiel==
+
==GitHub==
* [https://github.com/deepmind/open_spiel GitHub - deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games] <ref>[[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinicius Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453]</ref>
+
* [https://github.com/deepmind/open_spiel GitHub - deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games] <ref>[[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453]</ref>
 
** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms open_spiel/open_spiel/algorithms at master · deepmind/open_spiel · GitHub]
 
** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms open_spiel/open_spiel/algorithms at master · deepmind/open_spiel · GitHub]
 
*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero open_spiel/open_spiel/algorithms/alpha_zero at master · deepmind/open_spiel · GitHub]
 
*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero open_spiel/open_spiel/algorithms/alpha_zero at master · deepmind/open_spiel · GitHub]
 
** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games open_spiel/open_spiel/games at master · deepmind/open_spiel · GitHub]
 
** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games open_spiel/open_spiel/games at master · deepmind/open_spiel · GitHub]
 
*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games/chess open_spiel/open_spiel/games/chess at master · deepmind/open_spiel · GitHub]
 
*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games/chess open_spiel/open_spiel/games/chess at master · deepmind/open_spiel · GitHub]
 +
* [https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero] <ref>[[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588</ref>
 +
* [https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021] <ref>[[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210]</ref>
 +
* [https://github.com/facebookarchive/Polygames GitHub - facebookarchive/Polygames: The project is a platform of zero learning with a library of games]
  
 
=References=  
 
=References=  

Latest revision as of 11:47, 14 March 2022

Home * Learning * Reinforcement Learning

Reinforcement Learning,
a learning paradigm inspired by behaviourist psychology and classical conditioning - learning by trial and error, interacting with an environment to map situations to actions in such a way that some notion of cumulative reward is maximized. In computer games, reinforcement learning deals with adjusting feature weights based on results or their subsequent predictions during self play.

Reinforcement learning is indebted to the idea of Markov decision processes (MDPs) in the field of optimal control utilizing dynamic programming techniques. The crucial exploitation and exploration tradeoff in multi-armed bandit problems as also considered in UCT of Monte-Carlo Tree Search - between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines - is also faced in reinforcement learning.

Q-Learning

Q-Learning, introduced by Chris Watkins in 1989, is a simple way for agents to learn how to act optimally in controlled Markovian domains [2]. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely [3]. Q-learning has been successfully applied to deep learning by a Google DeepMind team in playing some Atari 2600 games as published in Nature, 2015, dubbed deep reinforcement learning or deep Q-networks [4], soon followed by the spectacular AlphaGo and AlphaZero breakthroughs.

Temporal Difference Learning

see main page Temporal Difference Learning

Q-learning at its simplest uses tables to store data. This very quickly loses viability with increasing sizes of state/action space of the system it is monitoring/controlling. One solution to this problem is to use an (adapted) artificial neural network as a function approximator, as demonstrated by Gerald Tesauro in his Backgammon playing temporal difference learning research [5] [6].

Temporal Difference Learning is a prediction method primarily used for reinforcement learning. In the domain of computer games and computer chess, TD learning is applied through self play, subsequently predicting the probability of winning a game during the sequence of moves from the initial position until the end, to adjust weights for a more reliable prediction.

See also

UCT

Selected Publications

1954 ...

1960 ...

1970 ...

1980 ...

1990 ...

1995 ...

2000 ...

2005 ...

2010 ...

2011

2012

István Szita (2012). Reinforcement Learning in Games. Chapter 17

2013

2014

2015 ...

2016

2017

2018

2019

2020 ...

2021

Postings

1995 ...

Re: Parameter Tuning by Don Beal, CCC, October 02, 1998

2000 ...

2010 ...

2020 ...

External Links

Reinforcement Learning

MDP

Q-Learning

Courses

  1. Lecture 1: Introduction to Reinforcement Learning
  2. Lecture 2: Markov Decision Process
  3. Lecture 3: Planning by Dynamic Programming
  4. Lecture 4: Model-Free Prediction
  5. Lecture 5: Model Free Control
  6. Lecture 6: Value Function Approximation
  7. Lecture 7: Policy Gradient Methods
  8. Lecture 8: Integrating Learning and Planning
  9. Lecture 9: Exploration and Exploitation
  10. Lecture 10: Classic Games

GitHub

References

  1. Example of a simple Markov decision processes with three states (green circles) and two actions (orange circles), with two rewards (orange arrows), image by waldoalvarez CC BY-SA 4.0, Wikimedia Commons
  2. Q-learning from Wikipedia
  3. Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
  4. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, Vol. 518
  5. Q-learning from Wikipedia
  6. Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3
  7. University of Bristol - Department of Computer Science - Technical Reports
  8. Ms. Pac-Man from Wikipedia
  9. Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 22, 2015
  10. Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents
  11. DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
  12. ICANN 2016 | Recipients of the best paper awards
  13. GitHub - paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames: Machine Learning - Reinforcement Learning
  14. AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017
  15. AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
  16. open_spiel/contributing.md at master · deepmind/open_spiel · GitHub
  17. New DeepMind paper by GregNeto, CCC, November 21, 2019
  18. e: Board adaptive / tuning evaluation function - no NN/AI by Tony P., CCC, January 15, 2020
  19. MuZero: Mastering Go, chess, shogi and Atari without rules
  20. GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero
  21. GitHub - paintception/Deep-Quality-Value-DQV-Learning-: DQV-Learning: a novel faster synchronous Deep Reinforcement Learning algorithm
  22. Book about Neural Networks for Chess by dkl, CCC, September 29, 2021
  23. GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021
  24. Want to train nets faster? by Dann Corbit, CCC, December 01, 2021
  25. Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinícius Flores Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis (2019). OpenSpiel: A Framework for Reinforcement Learning in Games. arXiv:1908.09453
  26. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, Vol. 588
  27. Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao (2021). Mastering Atari Games with Limited Data. arXiv:2111.00210

Up one Level