Changes

Reinforcement Learning

4,287 bytes added, 21:06, 2 December 2021

no edit summary

* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]

* [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matthew Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763]

* [[Zacharias Georgiou]], [[Evangelos Karountzos]], [[Yaroslav Shkarupa]], [[Matthia Sabatelli]] ('''2016'''). ''A Reinforcement Learning Approach for Solving KRK Chess Endgames''. [https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames/blob/master/project_papers/final_paper/reinforcement-learning-approach(2).pdf pdf] <ref>[https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames GitHub - paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames: Machine Learning - Reinforcement Learning]</ref>

'''2017'''

* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf]

* [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560]

* [https://scholar.google.com/citations?user=tiE4g64AAAAJ&hl=en Maithra Raghu], [https://scholar.google.com/citations?user=ZZNxNAYAAAAJ&hl=en Alex Irpan], [[Mathematician#JAndreas|Jacob Andreas]], [[Mathematician#RKleinberg|Robert Kleinberg]], [[Quoc V. Le]], [[Jon Kleinberg]] ('''2017'''). ''Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?'' [https://arxiv.org/abs/1711.02301 arXiv:1711.02301]

* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]

* [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. [[TAAI 2017]] » [[Hex]], [[Neural Networks#Convolutional|CNN]]

* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Successive Over Relaxation Q-Learning''. [https://arxiv.org/abs/1903.03812 arXiv:1903.03812]

* [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Second Order Value Iteration in Reinforcement Learning''. [https://arxiv.org/abs/1905.03927 arXiv:1905.03927]

* [[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[~~Vinicius~~ Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453] <ref>[https://github.com/deepmind/open_spiel/blob/master/docs/contributing.md open_spiel/contributing.md at master · deepmind/open_spiel · GitHub]</ref>

* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>

* [[Mathematician#SrbhBose|Sourabh Bose]] ('''2019'''). ''[https://rc.library.uta.edu/uta-ir/handle/10106/28094 Learning Representations Using Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Texas_at_Arlington University of Texas at Arlington], advisor [[Mathematician#MHuber|Manfred Huber]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810&start=6 e: Board adaptive / tuning evaluation function - no NN/AI] by Tony P., [[CCC]], January 15, 2020</ref>

* [[Johannes Czech]] ('''2019'''). ''Deep Reinforcement Learning for Crazyhouse''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/czech2019deep.pdf pdf] » [[CrazyAra]]

==2020 ...==

* [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2020'''). ''2048-like games for teaching reinforcement learning''. [[ICGA Journal#42_1|ICGA Journal, Vol. 42, No. 1]]

* [https://dblp.org/pid/233/8144.html Indu John], [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [[Shalabh Bhatnagar]] ('''2020'''). ''Generalized Speedy Q-Learning''. [[IEEE#CSL|IEEE Control Systems Letters]], Vol. 4, No. 3, [https://arxiv.org/abs/1911.00397 arXiv:1911.00397]

* [[Takuya Hiraoka]], [https://dblp.org/pers/hd/i/Imagawa:Takahisa Takahisa Imagawa], [https://dblp.org/pers/hd/t/Tangkaratt:Voot Voot Tangkaratt], [https://dblp.org/pers/hd/o/Osa:Takayuki Takayuki Osa], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka] ('''2020'''). ''Meta-Model-Based Meta-Policy Optimization''. [https://arxiv.org/abs/2006.02608 arXiv:2006.02608]

* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588 <ref>[https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?fbclid=IwAR3mSwrn1YXDKr9uuGm2GlFKh76wBilex7f8QvBiQecwiVmAvD6Bkyjx-rE MuZero: Mastering Go, chess, shogi and Atari without rules]</ref> <ref>[https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero]</ref>

* [[Tristan Cazenave]], [[Yen-Chi Chen]], [[Guan-Wei Chen]], [[Shi-Yu Chen]], [[Xian-Dong Chiu]], [[Julien Dehos]], [[Maria Elsa]], [[Qucheng Gong]], [[Hengyuan Hu]], [[Vasil Khalidov]], [[Cheng-Ling Li]], [[Hsin-I Lin]], [[Yu-Jin Lin]], [[Xavier Martinet]], [[Vegard Mella]], [[Jeremy Rapin]], [[Baptiste Roziere]], [[Gabriel Synnaeve]], [[Fabien Teytaud]], [[Olivier Teytaud]], [[Shi-Cheng Ye]], [[Yi-Jun Ye]], [[Shi-Jim Yen]], [[Sergey Zagoruyko]] ('''2020'''). ''Polygames: Improved zero learning''. [[ICGA Journal#42_4|ICGA Journal, Vol. 42, No. 4]], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832]

* [[Matthia Sabatelli]], [https://github.com/glouppe Gilles Louppe], [https://scholar.google.com/citations?user=tyFTsmIAAAAJ&hl=en Pierre Geurts], [[Marco Wiering]] ('''2020'''). ''The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms''. [https://dblp.org/db/conf/ijcnn/ijcnn2020.html#SabatelliLGW20 IJCNN 2020] <ref>[https://github.com/paintception/Deep-Quality-Value-DQV-Learning- GitHub - paintception/Deep-Quality-Value-DQV-Learning-: DQV-Learning: a novel faster synchronous Deep Reinforcement Learning algorithm]</ref>

* [[Quentin Cohen-Solal]] ('''2020'''). ''Learning to Play Two-Player Perfect-Information Games without Knowledge''. [https://arxiv.org/abs/2008.01188 arXiv:2008.01188]

* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700]

'''2021'''

* [[Maximilian Alexander Gehrke]] ('''2021'''). ''Assessing Popular Chess Variants Using Deep Reinforcement Learning''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/gehrke2021assessing.pdf pdf] » [[CrazyAra]]

* [[Dominik Klein]] ('''2021'''). ''[https://github.com/asdfjkl/neural_network_chess Neural Networks For Chess]''. [https://github.com/asdfjkl/neural_network_chess/releases/tag/v1.1 Release Version 1.1 · GitHub] <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=2&t=78283 Book about Neural Networks for Chess] by dkl, [[CCC]], September 29, 2021</ref>

* [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2021'''). ''DESCENT wins five gold medals at the Computer Olympiad''. [[ICGA Journal#43_2|ICGA Journal, Vol. 43, No. 2]]

* [[Boris Doux]], [[Benjamin Negrevergne]], [[Tristan Cazenave]] ('''2021'''). ''Deep Reinforcement Learning for Morpion Solitaire''. [[Advances in Computer Games 17]]

* [[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210] <ref>[https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021]</ref> <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=7&t=78790 Want to train nets faster?] by [[Dann Corbit]], [[CCC]], December 01, 2021</ref>

=Postings=

* [http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ Introduction to Reinforcement Learning] by [[Joelle Pineau]], [[McGill University]], 2016, [https://en.wikipedia.org/wiki/YouTube YouTube] Video

: {{#evu:https://www.youtube.com/watch?v=O_1Z63EDMvQ|alignment=left|valignment=top}}

==~~OpenSpiel~~GitHub==* [https://github.com/deepmind/open_spiel GitHub - deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games] <ref>[[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[~~Vinicius~~ Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453]</ref>

** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms open_spiel/open_spiel/algorithms at master · deepmind/open_spiel · GitHub]

*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero open_spiel/open_spiel/algorithms/alpha_zero at master · deepmind/open_spiel · GitHub]

** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games open_spiel/open_spiel/games at master · deepmind/open_spiel · GitHub]

*** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games/chess open_spiel/open_spiel/games/chess at master · deepmind/open_spiel · GitHub]

* [https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero] <ref>[[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588</ref>

* [https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021] <ref>[[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210]</ref>

=References=

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Reinforcement Learning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools