Difference between revisions of "Reinforcement Learning"
GerdIsenberg (talk | contribs) |
GerdIsenberg (talk | contribs) |
||
(65 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
=Q-Learning= | =Q-Learning= | ||
− | Q-Learning, introduced by [[Chris Watkins]] in 1989, is a simple way for [https://en.wikipedia.org/wiki/Intelligent_agent agents] to learn how to act optimally in controlled Markovian domains <ref>[https://en.wikipedia.org/wiki/Q-learning Q-learning from Wikipedia]</ref>. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely <ref>[[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2</ref>. Q-learning has been successfully applied to [[Deep Learning|deep learning]] by a [[Google]] [[DeepMind]] team in playing some [[Atari 8-bit|Atari 2600]] [https://en.wikipedia.org/wiki/List_of_Atari_2600_games games] as published in [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], 2015, dubbed ''deep reinforcement learning'' or ''deep Q-networks'' <ref>[[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518</ref>, soon followed by the spectacular [[AlphaGo]] and [[AlphaZero]] breakthroughs. | + | Q-Learning, introduced by [[Chris Watkins]] in 1989, is a simple way for [https://en.wikipedia.org/wiki/Intelligent_agent agents] to learn how to act optimally in controlled Markovian domains <ref>[https://en.wikipedia.org/wiki/Q-learning Q-learning from Wikipedia]</ref>. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely <ref>[[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2</ref>. Q-learning has been successfully applied to [[Deep Learning|deep learning]] by a [[Google]] [[DeepMind]] team in playing some [[Atari 8-bit|Atari 2600]] [https://en.wikipedia.org/wiki/List_of_Atari_2600_games games] as published in [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], 2015, dubbed ''deep reinforcement learning'' or ''deep Q-networks'' <ref>[[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Mathematician#AARusu|Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518</ref>, soon followed by the spectacular [[AlphaGo]] and [[AlphaZero]] breakthroughs. |
=Temporal Difference Learning= | =Temporal Difference Learning= | ||
Line 20: | Line 20: | ||
=See also= | =See also= | ||
* [[AlphaZero]] | * [[AlphaZero]] | ||
+ | * [[Automated Tuning]] | ||
* [[Deep Learning]] | * [[Deep Learning]] | ||
* [[Dynamic Programming]] | * [[Dynamic Programming]] | ||
Line 43: | Line 44: | ||
* [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf] | * [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf] | ||
* [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com] | * [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com] | ||
+ | * [[Ian H. Witten]] ('''1977'''). ''An Adaptive Optimal Controller for Discrete-Time Markov Environments''. [https://en.wikipedia.org/wiki/Information_and_Computation Information and Control], Vol. 34, No. 4, [https://core.ac.uk/download/pdf/82451748.pdf pdf] | ||
==1980 ...== | ==1980 ...== | ||
* [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts] | * [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts] | ||
Line 49: | Line 51: | ||
==1990 ...== | ==1990 ...== | ||
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time Derivative Models of Pavlovian Reinforcement''. Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497-537 | * [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time Derivative Models of Pavlovian Reinforcement''. Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497-537 | ||
+ | * [[Jürgen Schmidhuber]] ('''1990'''). ''Reinforcement Learning in Markovian and Non-Markovian Environments''. [https://dblp.uni-trier.de/db/conf/nips/nips1990.html NIPS 1990], [ftp://ftp.idsia.ch/pub/juergen/nipsnonmarkov.pdf pdf] | ||
+ | * [[Peter Dayan]] ('''1991'''). ''[https://www.era.lib.ed.ac.uk/handle/1842/14754 Reinforcing Connectionism: Learning the Statistical Way]''. Ph.D. thesis, [[University of Edinburgh]] | ||
* [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2 | * [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2 | ||
* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992] | * [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992] | ||
Line 54: | Line 58: | ||
* [[Michael L. Littman]] ('''1994'''). ''Markov Games as a Framework for Multi-Agent Reinforcement Learning''. International Conference on Machine Learning, [http://www.cs.duke.edu/courses/spring07/cps296.3/littman94markov.pdf pdf] | * [[Michael L. Littman]] ('''1994'''). ''Markov Games as a Framework for Multi-Agent Reinforcement Learning''. International Conference on Machine Learning, [http://www.cs.duke.edu/courses/spring07/cps296.3/littman94markov.pdf pdf] | ||
==1995 ...== | ==1995 ...== | ||
− | * [[Marco Wiering]] ('''1995'''). '' | + | * [[Marco Wiering]] ('''1995'''). ''TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], [http://webber.physik.uni-freiburg.de/~hon/vorlss02/Literatur/reinforcement/GameEvaluationWithNeuronal.pdf pdf] |
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]], Vol. 38, No. 3 | * [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]], Vol. 38, No. 3 | ||
* [http://dblp.uni-trier.de/pers/hd/b/Baird_III:Leemon_C= Leemon C. Baird III], [http://dblp.uni-trier.de/pers/hd/h/Harmon:Mance_E= Mance E. Harmon], [[A. Harry Klopf]] ('''1996'''). ''Reinforcement Learning: An Alternative Approach to Machine Intelligence''. [http://www.leemon.com/papers/1996bhk.pdf pdf] | * [http://dblp.uni-trier.de/pers/hd/b/Baird_III:Leemon_C= Leemon C. Baird III], [http://dblp.uni-trier.de/pers/hd/h/Harmon:Mance_E= Mance E. Harmon], [[A. Harry Klopf]] ('''1996'''). ''Reinforcement Learning: An Alternative Approach to Machine Intelligence''. [http://www.leemon.com/papers/1996bhk.pdf pdf] | ||
* [[Mathematician#LPKaelbling|Leslie Pack Kaelbling]], [[Michael L. Littman]], [[Mathematician#AWMoore|Andrew W. Moore]] ('''1996'''). ''[http://www.cs.washington.edu/research/jair/volume4/kaelbling96a-html/rl-survey.html Reinforcement Learning: A Survey]''. [http://www.jair.org/vol/vol4.html JAIR Vol. 4], [http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a.pdf pdf] | * [[Mathematician#LPKaelbling|Leslie Pack Kaelbling]], [[Michael L. Littman]], [[Mathematician#AWMoore|Andrew W. Moore]] ('''1996'''). ''[http://www.cs.washington.edu/research/jair/volume4/kaelbling96a-html/rl-survey.html Reinforcement Learning: A Survey]''. [http://www.jair.org/vol/vol4.html JAIR Vol. 4], [http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a.pdf pdf] | ||
* [[Robert Levinson]] ('''1996'''). ''[http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.1996.tb00257.x/abstract General Game-Playing and Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/ci/ci12.html#PellEL96 Computational Intelligence, Vol. 12], No. 1 | * [[Robert Levinson]] ('''1996'''). ''[http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.1996.tb00257.x/abstract General Game-Playing and Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/ci/ci12.html#PellEL96 Computational Intelligence, Vol. 12], No. 1 | ||
+ | * [[David E. Moriarty]], [[Risto Miikkulainen]] ('''1996'''). ''[https://link.springer.com/article/10.1023/A:1018004120707 Efficient Reinforcement Learning through Symbiotic Evolution]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 22 | ||
* [[Ronald Parr]], [[Stuart Russell]] ('''1997'''). ''Reinforcement Learning with Hierarchies of Machines.'' In Advances in Neural Information Processing Systems 10, MIT Press, [http://www.cs.berkeley.edu/~russell/papers/nips97-ham.ps.gz zipped ps] | * [[Ronald Parr]], [[Stuart Russell]] ('''1997'''). ''Reinforcement Learning with Hierarchies of Machines.'' In Advances in Neural Information Processing Systems 10, MIT Press, [http://www.cs.berkeley.edu/~russell/papers/nips97-ham.ps.gz zipped ps] | ||
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps] | * [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps] | ||
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps] | * [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps] | ||
− | * [[Marco Wiering]], [[Jürgen Schmidhuber]] ('''1997'''). '' | + | * [[Marco Wiering]], [[Jürgen Schmidhuber]] ('''1997'''). ''HQ-learning''. [https://en.wikipedia.org/wiki/Adaptive_Behavior_%28journal%29 Adaptive Behavior], Vol. 6, No 2 |
* [[Csaba Szepesvári]] ('''1998'''). ''Reinforcement Learning: Theory and Practice''. Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, [http://www.sztaki.hu/%7Eszcsaba/papers/scann98.ps.gz zipped ps] | * [[Csaba Szepesvári]] ('''1998'''). ''Reinforcement Learning: Theory and Practice''. Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, [http://www.sztaki.hu/%7Eszcsaba/papers/scann98.ps.gz zipped ps] | ||
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[https://mitpress.mit.edu/books/reinforcement-learning Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press] | * [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[https://mitpress.mit.edu/books/reinforcement-learning Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press] | ||
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps] | * [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps] | ||
− | * [[Marco Wiering]] ('''1999'''). '' | + | * [[Marco Wiering]] ('''1999'''). ''Explorations in Efficient Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]] |
+ | * [[Richard Sutton]], [[Doina Precup]], [[Mathematician#SSingh|Satinder Singh]] ('''1999'''). ''Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_(journal) Artificial Intelligence], Vol. 112, [https://people.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf pdf] | ||
==2000 ...== | ==2000 ...== | ||
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1 | * [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1 | ||
Line 72: | Line 78: | ||
* [[Andrew Ng]], [[Stuart Russell]] ('''2000'''). ''Algorithms for inverse reinforcement learning.'' In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California: Morgan Kaufmann, [http://www.cs.berkeley.edu/~russell/papers/ml00-irl.pdf pdf] | * [[Andrew Ng]], [[Stuart Russell]] ('''2000'''). ''Algorithms for inverse reinforcement learning.'' In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California: Morgan Kaufmann, [http://www.cs.berkeley.edu/~russell/papers/ml00-irl.pdf pdf] | ||
* [http://www.cs.ou.edu/~hougen/ Dean F. Hougen], [http://www-users.cs.umn.edu/~gini/ Maria Gini], [[James R. Slagle]] ('''2000'''). ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.2633 An Integrated Connectionist Approach to Reinforcement Learning for Robotic Control]''. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning | * [http://www.cs.ou.edu/~hougen/ Dean F. Hougen], [http://www-users.cs.umn.edu/~gini/ Maria Gini], [[James R. Slagle]] ('''2000'''). ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.2633 An Integrated Connectionist Approach to Reinforcement Learning for Robotic Control]''. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning | ||
− | * [[Jonathan Baxter]], [[Peter Bartlett]] ('''2000'''). ''Reinforcement Learning on POMDPs via Direct Gradient Ascent''. [http://dblp.uni-trier.de/db/conf/icml/icml2000.html ICML 2000], [https://pdfs.semanticscholar.org/b874/98f0879d312c308889135203b17069aa0486.pdf pdf] | + | * [[Jonathan Baxter]], [[Mathematician#PBartlett|Peter Bartlett]] ('''2000'''). ''Reinforcement Learning on POMDPs via Direct Gradient Ascent''. [http://dblp.uni-trier.de/db/conf/icml/icml2000.html ICML 2000], [https://pdfs.semanticscholar.org/b874/98f0879d312c308889135203b17069aa0486.pdf pdf] |
* [[Doina Precup]] ('''2000'''). ''Temporal Abstraction in Reinforcement Learning''. Ph.D. Dissertation, Department of Computer Science, [https://en.wikipedia.org/wiki/University_of_Massachusetts_Amherst University of Massachusetts], [https://en.wikipedia.org/wiki/Amherst,_Massachusetts Amherst]. | * [[Doina Precup]] ('''2000'''). ''Temporal Abstraction in Reinforcement Learning''. Ph.D. Dissertation, Department of Computer Science, [https://en.wikipedia.org/wiki/University_of_Massachusetts_Amherst University of Massachusetts], [https://en.wikipedia.org/wiki/Amherst,_Massachusetts Amherst]. | ||
* [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf] | * [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf] | ||
* [[Marco Block-Berlitz]] ('''2003'''). ''Reinforcement Learning in der Schachprogrammierung''. Studienarbeit, Freie Universität Berlin, Dozent: [[Raúl Rojas|Prof. Dr. Raúl Rojas]], [http://page.mi.fu-berlin.de/block/Skripte/Reinforcement.pdf pdf] (German) | * [[Marco Block-Berlitz]] ('''2003'''). ''Reinforcement Learning in der Schachprogrammierung''. Studienarbeit, Freie Universität Berlin, Dozent: [[Raúl Rojas|Prof. Dr. Raúl Rojas]], [http://page.mi.fu-berlin.de/block/Skripte/Reinforcement.pdf pdf] (German) | ||
− | * [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, | + | * [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, Cognitive Artificial Intelligence, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.810&rep=rep1&type=pdf pdf] |
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf] | * [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf] | ||
+ | * [https://dblp.uni-trier.de/pers/hd/k/Kerr:Amy_J= Amy J. Kerr], [[Todd W. Neller]], [https://dblp.uni-trier.de/pers/hd/p/Pilla:Christopher_J=_La Christopher J. La Pilla] , [https://dblp.uni-trier.de/pers/hd/s/Schompert:Michael_D= Michael D. Schompert] ('''2002'''). ''[https://www.semanticscholar.org/paper/Java-Resources-for-Teaching-Reinforcement-Learning-Kerr-Neller/3d84018eb8b8668c13d1d4f6efca4442af2915b4 Java Resources for Teaching Reinforcement Learning]''. [https://dblp.uni-trier.de/db/conf/pdpta/pdpta2003-3.html PDPTA 2003] | ||
* [[Yngvi Björnsson]], Vignir Hafsteinsson, Ársæll Jóhannsson, Einar Jónsson ('''2004'''). ''Efficient Use of Reinforcement Learning in a Computer Game''. In Computer Games: Artificial Intellignece, Design and Education (CGAIDE'04), pp. 379–383, 2004. [http://www.ru.is/faculty/yngvi/pdf/BjornssonHJJ04.pdf pdf] | * [[Yngvi Björnsson]], Vignir Hafsteinsson, Ársæll Jóhannsson, Einar Jónsson ('''2004'''). ''Efficient Use of Reinforcement Learning in a Computer Game''. In Computer Games: Artificial Intellignece, Design and Education (CGAIDE'04), pp. 379–383, 2004. [http://www.ru.is/faculty/yngvi/pdf/BjornssonHJJ04.pdf pdf] | ||
* [http://imranontech.com/ Imran Ghory] ('''2004'''). ''Reinforcement learning in board games''. CSTR-04-004, [http://www.cs.bris.ac.uk/ Department of Computer Science], [https://en.wikipedia.org/wiki/University_of_Bristol University of Bristol]. [http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf pdf] <ref>[http://www.cs.bris.ac.uk/Publications/pub_master.jsp?type=117 University of Bristol - Department of Computer Science - Technical Reports]</ref> | * [http://imranontech.com/ Imran Ghory] ('''2004'''). ''Reinforcement learning in board games''. CSTR-04-004, [http://www.cs.bris.ac.uk/ Department of Computer Science], [https://en.wikipedia.org/wiki/University_of_Bristol University of Bristol]. [http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf pdf] <ref>[http://www.cs.bris.ac.uk/Publications/pub_master.jsp?type=117 University of Bristol - Department of Computer Science - Technical Reports]</ref> | ||
Line 87: | Line 94: | ||
* [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duan:Yong.html Yong Duan], [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cui:Baoxia.html Baoxia Cui], [[Xinhe Xu]] ('''2007'''). ''State Space Partition for Reinforcement Learning Based on Fuzzy Min-Max Neural Network''. [http://www.informatik.uni-trier.de/~ley/db/conf/isnn/isnn2007-2.html#DuanCX07 ISNN 2007] | * [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duan:Yong.html Yong Duan], [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cui:Baoxia.html Baoxia Cui], [[Xinhe Xu]] ('''2007'''). ''State Space Partition for Reinforcement Learning Based on Fuzzy Min-Max Neural Network''. [http://www.informatik.uni-trier.de/~ley/db/conf/isnn/isnn2007-2.html#DuanCX07 ISNN 2007] | ||
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2007'''). ''Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method''. [[Conferences#GPW|12th Game Programming Workshop]] | * [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2007'''). ''Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method''. [[Conferences#GPW|12th Game Programming Workshop]] | ||
+ | * [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2007'''). ''Reinforcement learning of local shape in the game of Go''. [[Conferences#IJCAI2007|20th IJCAI]], [http://webdocs.cs.ualberta.ca/~mmueller/ps/silver-ijcai2007.pdf pdf] | ||
* [[Marco Block-Berlitz|Marco Block]], Maro Bader, [http://page.mi.fu-berlin.de/tapia/ Ernesto Tapia], Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, [[Raúl Rojas]] ('''2008'''). ''Using Reinforcement Learning in Chess Engines''. CONCIBE SCIENCE 2008, [http://www.micai.org/rcs/ Research in Computing Science]: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, ISSN:1870-4069, Vol. 35, pp. 31-40, [https://en.wikipedia.org/wiki/Guadalajara Guadalajara], Mexico, [http://page.mi.fu-berlin.de/block/concibe2008.pdf pdf] | * [[Marco Block-Berlitz|Marco Block]], Maro Bader, [http://page.mi.fu-berlin.de/tapia/ Ernesto Tapia], Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, [[Raúl Rojas]] ('''2008'''). ''Using Reinforcement Learning in Chess Engines''. CONCIBE SCIENCE 2008, [http://www.micai.org/rcs/ Research in Computing Science]: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, ISSN:1870-4069, Vol. 35, pp. 31-40, [https://en.wikipedia.org/wiki/Guadalajara Guadalajara], Mexico, [http://page.mi.fu-berlin.de/block/concibe2008.pdf pdf] | ||
* [[Cécile Germain-Renaud]], [[Julien Pérez]], [[Balázs Kégl]], [[Charles Loomis]] ('''2008'''). ''Grid Differentiated Services: a Reinforcement Learning Approach''. In 8th [[IEEE]] Symposium on Cluster Computing and the Grid. Lyon, [http://hal.inria.fr/docs/00/28/78/26/PDF/RLccg08.pdf pdf] | * [[Cécile Germain-Renaud]], [[Julien Pérez]], [[Balázs Kégl]], [[Charles Loomis]] ('''2008'''). ''Grid Differentiated Services: a Reinforcement Learning Approach''. In 8th [[IEEE]] Symposium on Cluster Computing and the Grid. Lyon, [http://hal.inria.fr/docs/00/28/78/26/PDF/RLccg08.pdf pdf] | ||
+ | * [[Balázs Csanád Csáji]], [https://dblp.dagstuhl.de/pers/hd/m/Monostori:L=aacute=szl=oacute= László Monostori] ('''2008'''). ''Value function based reinforcement learning in changing Markovian environments''. [https://en.wikipedia.org/wiki/Journal_of_Machine_Learning_Research Journal of Machine Learning Research], Vol. 9, [http://www.jmlr.org/papers/volume9/csaji08a/csaji08a.pdf pdf] | ||
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf] | * [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf] | ||
+ | * [[Marcin Szubert]] ('''2009'''). ''Coevolutionary Reinforcement Learning and its Application to Othello''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], [https://mszubert.github.io/papers/Szubert_2009_MSC.pdf pdf] | ||
+ | * [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2006, 2011'''). ''Anytime Point-Based Approximations for Large POMDPs''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 27, [https://arxiv.org/abs/1110.0027 arXiv:1110.0027] | ||
==2010 ...== | ==2010 ...== | ||
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf] | * [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf] | ||
* [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf] | * [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf] | ||
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool | * [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool | ||
+ | * [https://dblp.org/pers/hd/z/Zaragoza:Julio_H= Julio H. Zaragoza], [[Eduardo F. Morales]] ('''2010'''). ''Relational Reinforcement Learning with Continuous Actions by Combining Behavioral Cloning and Locally Weighted Regression''. Journal of Intelligent Systems and Applications, Vol. 2 | ||
'''2011''' | '''2011''' | ||
* [[Peter Auer]] ('''2011'''). ''Exploration and Exploitation in Online Learning''. [http://dblp.uni-trier.de/db/conf/icais/icais2011.html#Auer11 ICAIS 2011] | * [[Peter Auer]] ('''2011'''). ''Exploration and Exploitation in Online Learning''. [http://dblp.uni-trier.de/db/conf/icais/icais2011.html#Auer11 ICAIS 2011] | ||
* [[Charles Elkan]] ('''2011'''). ''Reinforcement Learning with a Bilinear Q Function''. [http://www.informatik.uni-trier.de/~ley/db/conf/ewrl/ewrl2011.html#Elkan11 EWRL 2011] | * [[Charles Elkan]] ('''2011'''). ''Reinforcement Learning with a Bilinear Q Function''. [http://www.informatik.uni-trier.de/~ley/db/conf/ewrl/ewrl2011.html#Elkan11 EWRL 2011] | ||
'''2012''' | '''2012''' | ||
− | * [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). '' | + | * [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). ''Reinforcement learning: State-of-the-art''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer] |
: [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17 | : [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17 | ||
− | * [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search''. [ | + | * [[Thomas J. Walsh]], [[István Szita]], [[Carlos Diuk]], [[Michael L. Littman]] ('''2012'''). ''Exploring compact reinforcement-learning representations with linear regression''. [https://arxiv.org/abs/1205.2606 arXiv:1205.2606] |
+ | * [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''[https://papers.nips.cc/paper/4767-efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012 NIPS 2012] | ||
'''2013''' | '''2013''' | ||
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf] | * [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf] | ||
− | * [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). '' | + | * [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#ReeW13 ADPRL 2013] |
− | * [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). '' | + | * [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#BomHW13 ADPRL 2013] <ref>[https://en.wikipedia.org/wiki/Ms._Pac-Man Ms. Pac-Man from Wikipedia]</ref> |
* [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409] | * [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409] | ||
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Alex Graves]], [[Ioannis Antonoglou]], [[Daan Wierstra]], [[Martin Riedmiller]] ('''2013'''). ''Playing Atari with Deep Reinforcement Learning''. [http://arxiv.org/abs/1312.5602 arXiv:1312.5602] <ref>[http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015</ref> <ref>[http://www.google.com/patents/US20150100530 Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents]</ref> | * [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Alex Graves]], [[Ioannis Antonoglou]], [[Daan Wierstra]], [[Martin Riedmiller]] ('''2013'''). ''Playing Atari with Deep Reinforcement Learning''. [http://arxiv.org/abs/1312.5602 arXiv:1312.5602] <ref>[http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015</ref> <ref>[http://www.google.com/patents/US20150100530 Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents]</ref> | ||
Line 111: | Line 124: | ||
* [[Marcin Szubert]] ('''2014'''). ''Coevolutionary Shaping for Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], co-supervisor [[Wojciech Jaśkowski]], [http://www.cs.put.poznan.pl/mszubert/pub/phdthesis.pdf pdf] | * [[Marcin Szubert]] ('''2014'''). ''Coevolutionary Shaping for Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], co-supervisor [[Wojciech Jaśkowski]], [http://www.cs.put.poznan.pl/mszubert/pub/phdthesis.pdf pdf] | ||
==2015 ...== | ==2015 ...== | ||
− | * [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518 | + | * [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Mathematician#AARusu|Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518 |
* [[Tobias Graf]], [[Marco Platzner]] ('''2015'''). ''Adaptive Playouts in Monte Carlo Tree Search with Policy Gradient Reinforcement Learning''. [[Advances in Computer Games 14]] | * [[Tobias Graf]], [[Marco Platzner]] ('''2015'''). ''Adaptive Playouts in Monte Carlo Tree Search with Policy Gradient Reinforcement Learning''. [[Advances in Computer Games 14]] | ||
* [[Arun Nair]], [[Praveen Srinivasan]], [[Sam Blackwell]], [[Cagdas Alcicek]], [[Rory Fearon]], [[Alessandro De Maria]], [[Veda Panneershelvam]], [[Mustafa Suleyman]], [[Charles Beattie]], [[Stig Petersen]], [[Shane Legg]], [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]] ('''2015'''). ''Massively Parallel Methods for Deep Reinforcement Learning''. [http://arxiv.org/abs/1507.04296 arXiv:1507.04296] | * [[Arun Nair]], [[Praveen Srinivasan]], [[Sam Blackwell]], [[Cagdas Alcicek]], [[Rory Fearon]], [[Alessandro De Maria]], [[Veda Panneershelvam]], [[Mustafa Suleyman]], [[Charles Beattie]], [[Stig Petersen]], [[Shane Legg]], [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]] ('''2015'''). ''Massively Parallel Methods for Deep Reinforcement Learning''. [http://arxiv.org/abs/1507.04296 arXiv:1507.04296] | ||
Line 120: | Line 133: | ||
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]] | * [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]] | ||
* [[Hung Guei]], [[Tinghan Wei]], [[Jin-Bo Huang]], [[I-Chen Wu]] ('''2016'''). ''An Empirical Study on Applying Deep Reinforcement Learning to the Game 2048''. [[CG 2016]] | * [[Hung Guei]], [[Tinghan Wei]], [[Jin-Bo Huang]], [[I-Chen Wu]] ('''2016'''). ''An Empirical Study on Applying Deep Reinforcement Learning to the Game 2048''. [[CG 2016]] | ||
− | * [[ | + | * [[Eli David|Omid E. David]], [[Nathan S. Netanyahu]], [[Lior Wolf]] ('''2016'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-44781-0_11 DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess]''. [http://icann2016.org/ ICAAN 2016], [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], Vol. 9887, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://www.cs.tau.ac.il/~wolf/papers/deepchess.pdf pdf preprint] » [[DeepChess]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61748 DeepChess: Another deep-learning based chess program] by [[Matthew Lai]], [[CCC]], October 17, 2016</ref> <ref>[http://icann2016.org/index.php/conference-programme/recipients-of-the-best-paper-awards/ ICANN 2016 | Recipients of the best paper awards]</ref> |
* [[Volodymyr Mnih]], [[Adrià Puigdomènech Badia]], [[Mehdi Mirza]], [[Alex Graves]], [[Timothy Lillicrap]], [[Tim Harley]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Asynchronous Methods for Deep Reinforcement Learning''. [https://arxiv.org/abs/1602.01783 arXiv:1602.01783v2] | * [[Volodymyr Mnih]], [[Adrià Puigdomènech Badia]], [[Mehdi Mirza]], [[Alex Graves]], [[Timothy Lillicrap]], [[Tim Harley]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Asynchronous Methods for Deep Reinforcement Learning''. [https://arxiv.org/abs/1602.01783 arXiv:1602.01783v2] | ||
* [[Shixiang Gu]], [[Ethan Holly]], [[Timothy Lillicrap]], [[Sergey Levine]] ('''2016'''). ''Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates''. [https://arxiv.org/abs/1610.00633 arXiv:1610.00633] | * [[Shixiang Gu]], [[Ethan Holly]], [[Timothy Lillicrap]], [[Sergey Levine]] ('''2016'''). ''Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates''. [https://arxiv.org/abs/1610.00633 arXiv:1610.00633] | ||
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1] | * [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1] | ||
− | * [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[ | + | * [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matthew Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763] |
+ | * [[Zacharias Georgiou]], [[Evangelos Karountzos]], [[Yaroslav Shkarupa]], [[Matthia Sabatelli]] ('''2016'''). ''A Reinforcement Learning Approach for Solving KRK Chess Endgames''. [https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames/blob/master/project_papers/final_paper/reinforcement-learning-approach(2).pdf pdf] <ref>[https://github.com/paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames GitHub - paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames: Machine Learning - Reinforcement Learning]</ref> | ||
'''2017''' | '''2017''' | ||
* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf] | * [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf] | ||
* [[Marc Lanctot]], [[Vinícius Flores Zambaldi]], [[Audrunas Gruslys]], [[Angeliki Lazaridou]], [[Karl Tuyls]], [[Julien Pérolat]], [[David Silver]], [[Thore Graepel]] ('''2017'''). ''A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning''. [https://arxiv.org/abs/1711.00832 arXiv:1711.00832] | * [[Marc Lanctot]], [[Vinícius Flores Zambaldi]], [[Audrunas Gruslys]], [[Angeliki Lazaridou]], [[Karl Tuyls]], [[Julien Pérolat]], [[David Silver]], [[Thore Graepel]] ('''2017'''). ''A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning''. [https://arxiv.org/abs/1711.00832 arXiv:1711.00832] | ||
* [[David Silver]], [[Julian Schrittwieser]], [[Karen Simonyan]], [[Ioannis Antonoglou]], [[Shih-Chieh Huang|Aja Huang]], [[Arthur Guez]], [[Thomas Hubert]], [[Lucas Baker]], [[Matthew Lai]], [[Adrian Bolton]], [[Yutian Chen]], [[Timothy Lillicrap]], [[Fan Hui]], [[Laurent Sifre]], [[George van den Driessche]], [[Thore Graepel]], [[Demis Hassabis]] ('''2017'''). ''[https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html Mastering the game of Go without human knowledge]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 550, [https://www.gwern.net/docs/rl/2017-silver.pdf pdf] <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref> | * [[David Silver]], [[Julian Schrittwieser]], [[Karen Simonyan]], [[Ioannis Antonoglou]], [[Shih-Chieh Huang|Aja Huang]], [[Arthur Guez]], [[Thomas Hubert]], [[Lucas Baker]], [[Matthew Lai]], [[Adrian Bolton]], [[Yutian Chen]], [[Timothy Lillicrap]], [[Fan Hui]], [[Laurent Sifre]], [[George van den Driessche]], [[Thore Graepel]], [[Demis Hassabis]] ('''2017'''). ''[https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html Mastering the game of Go without human knowledge]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 550, [https://www.gwern.net/docs/rl/2017-silver.pdf pdf] <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref> | ||
− | * [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560] | + | * [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560] |
+ | * [https://scholar.google.com/citations?user=tiE4g64AAAAJ&hl=en Maithra Raghu], [https://scholar.google.com/citations?user=ZZNxNAYAAAAJ&hl=en Alex Irpan], [[Mathematician#JAndreas|Jacob Andreas]], [[Mathematician#RKleinberg|Robert Kleinberg]], [[Quoc V. Le]], [[Jon Kleinberg]] ('''2017'''). ''Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?'' [https://arxiv.org/abs/1711.02301 arXiv:1711.02301] | ||
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]] | * [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]] | ||
− | * [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. TAAI 2017 » [[Hex]], [[Neural Networks#Convolutional|CNN]] | + | * [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. [[TAAI 2017]] » [[Hex]], [[Neural Networks#Convolutional|CNN]] |
* [[Ari Weinstein]], [[Matthew Botvinick]] ('''2017'''). ''Structure Learning in Motor Control: A Deep Reinforcement Learning Model''. [https://arxiv.org/abs/1706.06827 arXiv:1706.06827] | * [[Ari Weinstein]], [[Matthew Botvinick]] ('''2017'''). ''Structure Learning in Motor Control: A Deep Reinforcement Learning Model''. [https://arxiv.org/abs/1706.06827 arXiv:1706.06827] | ||
+ | * [[Takuya Hiraoka]], [https://dblp.org/pers/hd/t/Tsuchida:Masaaki Masaaki Tsuchida], [https://dblp.org/pers/hd/w/Watanabe:Yotaro Yotaro Watanabe] ('''2017'''). ''Deep Reinforcement Learning for Inquiry Dialog Policies with Logical Formula Embeddings''. [https://arxiv.org/abs/1708.00667 arXiv:1708.00667] | ||
+ | * [[William Uther]] ('''2017'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_512 Markov Decision Processes]''. in [https://en.wikipedia.org/wiki/Claude_Sammut Claude Sammut], [https://en.wikipedia.org/wiki/Geoff_Webb Geoffrey I. Webb] (eds) ('''2017'''). ''[https://link.springer.com/referencework/10.1007%2F978-1-4899-7687-1 Encyclopedia of Machine Learning and Data Mining]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer] | ||
+ | * [https://scholar.google.com/citations?user=zLksndkAAAAJ&hl=en Jayvant Anantpur], [https://dblp.org/pid/09/10702.html Nagendra Gulur Dwarakanath], [https://dblp.org/pid/16/4410.html Shivaram Kalyanakrishnan], [[Shalabh Bhatnagar]], [https://dblp.org/pid/45/3592.html R. Govindarajan] ('''2017'''). ''RLWS: A Reinforcement Learning based GPU Warp Scheduler''. [https://arxiv.org/abs/1712.04303 arXiv:1712.04303] | ||
+ | '''2018''' | ||
+ | * [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Monte Carlo Q-learning for General Game Playing''. [https://arxiv.org/abs/1802.05944 arXiv:1802.05944] » [[Monte-Carlo Tree Search|MCTS]], [[General Game Playing]] | ||
+ | * [[Vinícius Flores Zambaldi]], [[David Raposo]], [[Adam Santoro]], [[Victor Bapst]], [[Yujia Li]], [[Igor Babuschkin]], [[Karl Tuyls]], [[David P. Reichert]], [[Timothy Lillicrap]], [[Edward Lockhart]], [[Murray Shanahan]], [[Victoria Langston]], [[Razvan Pascanu]], [[Matthew Botvinick]], [[Oriol Vinyals]], [[Peter W. Battaglia]] ('''2018'''). ''Relational Deep Reinforcement Learning''. [https://arxiv.org/abs/1806.01830 arXiv:1806.01830] | ||
+ | * [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Assessing the Potential of Classical Q-learning in General Game Playing''. [https://arxiv.org/abs/1810.06078 arXiv:1810.06078] | ||
+ | * [https://scholar.google.com/citations?user=n12uNYcAAAAJ&hl=en Vincent Francois-Lavet], [https://scholar.google.com/citations?user=dy_JBs0AAAAJ&hl=en Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [https://scholar.google.com/citations?user=uyYPun0AAAAJ&hl=en Marc G. Bellemare], [[Joelle Pineau]] ('''2018'''). ''An Introduction to Deep Reinforcement Learning''. [https://arxiv.org/abs/1811.12560 arXiv:1811.12560] | ||
+ | * [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419 <ref>[https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ AlphaZero: Shedding new light on the grand games of chess, shogi and Go] by [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]] and [[Demis Hassabis]], [[DeepMind]], December 03, 2018</ref> | ||
+ | * [[Tianhe Wang]], [[Tomoyuki Kaneko]] ('''2018'''). ''Application of Deep Reinforcement Learning in Werewolf Game Agents''. [[TAAI 2018]] | ||
+ | * [[Taichi Nakayashiki]], [[Tomoyuki Kaneko]] ('''2018'''). ''Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search''. [[TAAI 2018]] | ||
+ | * [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2018'''). ''Using 2048-like games as a pedagogical tool for reinforcement learning''. [[CG 2018]], [[ICGA Journal#40_3|ICGA Journal, Vol. 40, No. 3]] | ||
+ | '''2019''' | ||
+ | * [https://scholar.google.co.uk/citations?user=OAkRr-YAAAAJ&hl=en Sanjeevan Ahilan], [[Peter Dayan]] ('''2019'''). ''Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning''. [https://arxiv.org/abs/1901.08492 arXiv:1901.08492] | ||
+ | * [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Successive Over Relaxation Q-Learning''. [https://arxiv.org/abs/1903.03812 arXiv:1903.03812] | ||
+ | * [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [https://scholar.google.co.in/citations?user=nx4NlpsAAAAJ&hl=en Raghuram Bharadwaj Diddigi], [[Shalabh Bhatnagar]] ('''2019'''). ''Second Order Value Iteration in Reinforcement Learning''. [https://arxiv.org/abs/1905.03927 arXiv:1905.03927] | ||
+ | * [[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453] <ref>[https://github.com/deepmind/open_spiel/blob/master/docs/contributing.md open_spiel/contributing.md at master · deepmind/open_spiel · GitHub]</ref> | ||
+ | * [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref> | ||
+ | * [[Mathematician#SrbhBose|Sourabh Bose]] ('''2019'''). ''[https://rc.library.uta.edu/uta-ir/handle/10106/28094 Learning Representations Using Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Texas_at_Arlington University of Texas at Arlington], advisor [[Mathematician#MHuber|Manfred Huber]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810&start=6 e: Board adaptive / tuning evaluation function - no NN/AI] by Tony P., [[CCC]], January 15, 2020</ref> | ||
+ | * [[Johannes Czech]] ('''2019'''). ''Deep Reinforcement Learning for Crazyhouse''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/czech2019deep.pdf pdf] » [[CrazyAra]] | ||
+ | ==2020 ...== | ||
+ | * [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2020'''). ''2048-like games for teaching reinforcement learning''. [[ICGA Journal#42_1|ICGA Journal, Vol. 42, No. 1]] | ||
+ | * [https://dblp.org/pid/233/8144.html Indu John], [https://scholar.google.co.in/citations?user=1QlrvHkAAAAJ&hl=en Chandramouli Kamanchi], [[Shalabh Bhatnagar]] ('''2020'''). ''Generalized Speedy Q-Learning''. [[IEEE#CSL|IEEE Control Systems Letters]], Vol. 4, No. 3, [https://arxiv.org/abs/1911.00397 arXiv:1911.00397] | ||
+ | * [[Takuya Hiraoka]], [https://dblp.org/pers/hd/i/Imagawa:Takahisa Takahisa Imagawa], [https://dblp.org/pers/hd/t/Tangkaratt:Voot Voot Tangkaratt], [https://dblp.org/pers/hd/o/Osa:Takayuki Takayuki Osa], [https://dblp.org/pers/hd/o/Onishi:Takashi Takashi Onishi], [https://dblp.org/pers/hd/t/Tsuruoka:Yoshimasa Yoshimasa Tsuruoka] ('''2020'''). ''Meta-Model-Based Meta-Policy Optimization''. [https://arxiv.org/abs/2006.02608 arXiv:2006.02608] | ||
+ | * [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588 <ref>[https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?fbclid=IwAR3mSwrn1YXDKr9uuGm2GlFKh76wBilex7f8QvBiQecwiVmAvD6Bkyjx-rE MuZero: Mastering Go, chess, shogi and Atari without rules]</ref> <ref>[https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero]</ref> | ||
+ | * [[Tristan Cazenave]], [[Yen-Chi Chen]], [[Guan-Wei Chen]], [[Shi-Yu Chen]], [[Xian-Dong Chiu]], [[Julien Dehos]], [[Maria Elsa]], [[Qucheng Gong]], [[Hengyuan Hu]], [[Vasil Khalidov]], [[Cheng-Ling Li]], [[Hsin-I Lin]], [[Yu-Jin Lin]], [[Xavier Martinet]], [[Vegard Mella]], [[Jeremy Rapin]], [[Baptiste Roziere]], [[Gabriel Synnaeve]], [[Fabien Teytaud]], [[Olivier Teytaud]], [[Shi-Cheng Ye]], [[Yi-Jun Ye]], [[Shi-Jim Yen]], [[Sergey Zagoruyko]] ('''2020'''). ''Polygames: Improved zero learning''. [[ICGA Journal#42_4|ICGA Journal, Vol. 42, No. 4]], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832], [https://arxiv.org/abs/2001.09832 arXiv:2001.09832] | ||
+ | * [[Matthia Sabatelli]], [https://github.com/glouppe Gilles Louppe], [https://scholar.google.com/citations?user=tyFTsmIAAAAJ&hl=en Pierre Geurts], [[Marco Wiering]] ('''2020'''). ''The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms''. [https://dblp.org/db/conf/ijcnn/ijcnn2020.html#SabatelliLGW20 IJCNN 2020] <ref>[https://github.com/paintception/Deep-Quality-Value-DQV-Learning- GitHub - paintception/Deep-Quality-Value-DQV-Learning-: DQV-Learning: a novel faster synchronous Deep Reinforcement Learning algorithm]</ref> | ||
+ | * [[Quentin Cohen-Solal]] ('''2020'''). ''Learning to Play Two-Player Perfect-Information Games without Knowledge''. [https://arxiv.org/abs/2008.01188 arXiv:2008.01188] | ||
+ | * [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2020'''). ''Minimax Strikes Back''. [https://arxiv.org/abs/2012.10700 arXiv:2012.10700] | ||
+ | '''2021''' | ||
+ | * [[Maximilian Alexander Gehrke]] ('''2021'''). ''Assessing Popular Chess Variants Using Deep Reinforcement Learning''. Master thesis, [[Darmstadt University of Technology|TU Darmstadt]], [https://ml-research.github.io/papers/gehrke2021assessing.pdf pdf] » [[CrazyAra]] | ||
+ | * [[Dominik Klein]] ('''2021'''). ''[https://github.com/asdfjkl/neural_network_chess Neural Networks For Chess]''. [https://github.com/asdfjkl/neural_network_chess/releases/tag/v1.1 Release Version 1.1 · GitHub] <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=2&t=78283 Book about Neural Networks for Chess] by dkl, [[CCC]], September 29, 2021</ref> | ||
+ | * [[Quentin Cohen-Solal]], [[Tristan Cazenave]] ('''2021'''). ''DESCENT wins five gold medals at the Computer Olympiad''. [[ICGA Journal#43_2|ICGA Journal, Vol. 43, No. 2]] | ||
+ | * [[Boris Doux]], [[Benjamin Negrevergne]], [[Tristan Cazenave]] ('''2021'''). ''Deep Reinforcement Learning for Morpion Solitaire''. [[Advances in Computer Games 17]] | ||
+ | * [[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210] <ref>[https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021]</ref> <ref>[https://www.talkchess.com/forum3/viewtopic.php?f=7&t=78790 Want to train nets faster?] by [[Dann Corbit]], [[CCC]], December 01, 2021</ref> | ||
+ | * [[Dennis Soemers]], [[Vegard Mella]], [[Cameron Browne]], [[Olivier Teytaud]] ('''2021'''). ''Deep learning for general game playing with Ludii and Polygames''. [[ICGA Journal#43_3|ICGA Journal, Vol. 43, No. 3]] | ||
=Postings= | =Postings= | ||
+ | ==1995 ...== | ||
* [https://www.stmintz.com/ccc/index.php?id=28584 Parameter Tuning] by [[Jonathan Baxter]], [[CCC]], October 01, 1998 » [[KnightCap]] | * [https://www.stmintz.com/ccc/index.php?id=28584 Parameter Tuning] by [[Jonathan Baxter]], [[CCC]], October 01, 1998 » [[KnightCap]] | ||
+ | : [https://www.stmintz.com/ccc/index.php?id=28819 Re: Parameter Tuning] by [[Don Beal]], [[CCC]], October 02, 1998 | ||
+ | ==2000 ...== | ||
+ | * [https://www.stmintz.com/ccc/index.php?id=117970 Pseudo-code for TD learning] by [[Daniel Homan]], [[CCC]], July 06, 2000 » [[Temporal Difference Learning]] | ||
* [https://www.stmintz.com/ccc/index.php?id=147377 any good experiences with genetic algos or temporal difference learning?] by [[Rafael B. Andrist]], [[CCC]], January 01, 2001 | * [https://www.stmintz.com/ccc/index.php?id=147377 any good experiences with genetic algos or temporal difference learning?] by [[Rafael B. Andrist]], [[CCC]], January 01, 2001 | ||
+ | * [https://www.stmintz.com/ccc/index.php?id=401974 Temporal Differences] by [[Peter Fendrich]], [[CCC]], December 21, 2004 | ||
+ | ==2010 ...== | ||
* [http://talkchess.com/forum/viewtopic.php?t=56913 *First release* Giraffe, a new engine based on deep learning] by [[Matthew Lai]], [[CCC]], July 08, 2015 » [[Deep Learning]], [[Giraffe]] | * [http://talkchess.com/forum/viewtopic.php?t=56913 *First release* Giraffe, a new engine based on deep learning] by [[Matthew Lai]], [[CCC]], July 08, 2015 » [[Deep Learning]], [[Giraffe]] | ||
* [http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015 | * [http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015 | ||
Line 142: | Line 199: | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=65909 Google's AlphaGo team has been working on chess] by [[Peter Kappler]], [[CCC]], December 06, 2017 » [[AlphaZero]] | * [http://www.talkchess.com/forum/viewtopic.php?t=65909 Google's AlphaGo team has been working on chess] by [[Peter Kappler]], [[CCC]], December 06, 2017 » [[AlphaZero]] | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=65990 Understanding the power of reinforcement learning] by [[Michael Sherwin]], [[CCC]], December 12, 2017 | * [http://www.talkchess.com/forum/viewtopic.php?t=65990 Understanding the power of reinforcement learning] by [[Michael Sherwin]], [[CCC]], December 12, 2017 | ||
+ | ==2020 ...== | ||
+ | * [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810 Board adaptive / tuning evaluation function - no NN/AI] by Moritz Gedig, [[CCC]], January 14, 2020 | ||
+ | * [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75411 Unsupervised reinforcement tuning from zero] by Madeleine Birchfield, [[CCC]], October 16, 2020 » [[Automated Tuning]] | ||
+ | * [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75606 Transhuman Chess with NN and RL...] by [[Srdja Matovic]], [[CCC]], October 30, 2020 » [[Neural Networks|NN]] | ||
+ | * [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76465 Reinforcement learning project] by [[Harm Geert Muller]], [[CCC]], January 31, 2021 » [[Texel's Tuning Method]] | ||
=External Links= | =External Links= | ||
Line 174: | Line 236: | ||
* [http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ Introduction to Reinforcement Learning] by [[Joelle Pineau]], [[McGill University]], 2016, [https://en.wikipedia.org/wiki/YouTube YouTube] Video | * [http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/ Introduction to Reinforcement Learning] by [[Joelle Pineau]], [[McGill University]], 2016, [https://en.wikipedia.org/wiki/YouTube YouTube] Video | ||
: {{#evu:https://www.youtube.com/watch?v=O_1Z63EDMvQ|alignment=left|valignment=top}} | : {{#evu:https://www.youtube.com/watch?v=O_1Z63EDMvQ|alignment=left|valignment=top}} | ||
+ | ==GitHub== | ||
+ | * [https://github.com/deepmind/open_spiel GitHub - deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games] <ref>[[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinícius Flores Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453]</ref> | ||
+ | ** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms open_spiel/open_spiel/algorithms at master · deepmind/open_spiel · GitHub] | ||
+ | *** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero open_spiel/open_spiel/algorithms/alpha_zero at master · deepmind/open_spiel · GitHub] | ||
+ | ** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games open_spiel/open_spiel/games at master · deepmind/open_spiel · GitHub] | ||
+ | *** [https://github.com/deepmind/open_spiel/tree/master/open_spiel/games/chess open_spiel/open_spiel/games/chess at master · deepmind/open_spiel · GitHub] | ||
+ | * [https://github.com/koulanurag/muzero-pytorch GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero] <ref>[[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588</ref> | ||
+ | * [https://github.com/YeWR/EfficientZero GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021] <ref>[[Weirui Ye]], [[Shaohuai Liu]], [[Thanard Kurutach]], [[Pieter Abbeel]], [[Yang Gao]] ('''2021'''). ''Mastering Atari Games with Limited Data''. [https://arxiv.org/abs/2111.00210 arXiv:2111.00210]</ref> | ||
+ | * [https://github.com/facebookarchive/Polygames GitHub - facebookarchive/Polygames: The project is a platform of zero learning with a library of games] | ||
=References= | =References= | ||
<references /> | <references /> | ||
− | |||
'''[[Learning|Up one Level]]''' | '''[[Learning|Up one Level]]''' | ||
+ | [[Category:Videos]] |
Latest revision as of 11:47, 14 March 2022
Home * Learning * Reinforcement Learning
Reinforcement Learning,
a learning paradigm inspired by behaviourist psychology and classical conditioning - learning by trial and error, interacting with an environment to map situations to actions in such a way that some notion of cumulative reward is maximized. In computer games, reinforcement learning deals with adjusting feature weights based on results or their subsequent predictions during self play.
Reinforcement learning is indebted to the idea of Markov decision processes (MDPs) in the field of optimal control utilizing dynamic programming techniques. The crucial exploitation and exploration tradeoff in multi-armed bandit problems as also considered in UCT of Monte-Carlo Tree Search - between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines - is also faced in reinforcement learning.
Contents
Q-Learning
Q-Learning, introduced by Chris Watkins in 1989, is a simple way for agents to learn how to act optimally in controlled Markovian domains [2]. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely [3]. Q-learning has been successfully applied to deep learning by a Google DeepMind team in playing some Atari 2600 games as published in Nature, 2015, dubbed deep reinforcement learning or deep Q-networks [4], soon followed by the spectacular AlphaGo and AlphaZero breakthroughs.
Temporal Difference Learning
see main page Temporal Difference Learning
Q-learning at its simplest uses tables to store data. This very quickly loses viability with increasing sizes of state/action space of the system it is monitoring/controlling. One solution to this problem is to use an (adapted) artificial neural network as a function approximator, as demonstrated by Gerald Tesauro in his Backgammon playing temporal difference learning research [5] [6].
Temporal Difference Learning is a prediction method primarily used for reinforcement learning. In the domain of computer games and computer chess, TD learning is applied through self play, subsequently predicting the probability of winning a game during the sequence of moves from the initial position until the end, to adjust weights for a more reliable prediction.
See also
- AlphaZero
- Automated Tuning
- Deep Learning
- Dynamic Programming
- Markov Models by Michael L. Littman
- MENACE by Donald Michie
- Monte-Carlo Tree Search
Selected Publications
1954 ...
- Richard E. Bellman (1954). On a new Iterative Algorithm for Finding the Solutions of Games and Linear Programming Problems. Technical Report P-473, RAND Corporation, U. S. Air Force Project RAND
- Arthur Samuel (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal July 1959
1960 ...
- Richard E. Bellman (1960). Sequential Machines, Ambiguity, and Dynamic Programming. Journal of the ACM, Vol. 7, No. 1
- Ronald A. Howard (1960). Dynamic Programming and Markov Processes. MIT Press, amazon
- Donald Michie (1961). Trial and Error. Penguin Science Survey
- Donald Michie, Roger A. Chambers (1968). Boxes: An experiment on adaptive control. Machine Intelligence 2, Edinburgh: Oliver & Boyd, pdf
1970 ...
- A. Harry Klopf (1972). Brain Function and Adaptive Systems - A Heterostatic Theory. Air Force Cambridge Research Laboratories, Special Reports, No. 133, pdf
- John H. Holland (1975). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. amazon.com
- Ian H. Witten (1977). An Adaptive Optimal Controller for Discrete-Time Markov Environments. Information and Control, Vol. 34, No. 4, pdf
1980 ...
- Richard Sutton (1984). Temporal Credit Assignment in Reinforcement Learning. Ph.D. dissertation, University of Massachusetts
- Leslie Valiant (1984). A Theory of the Learnable. Communications of the ACM, Vol. 27, No. 11, pdf
- Chris Watkins (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University, pdf
1990 ...
- Richard Sutton, Andrew Barto (1990). Time Derivative Models of Pavlovian Reinforcement. Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497-537
- Jürgen Schmidhuber (1990). Reinforcement Learning in Markovian and Non-Markovian Environments. NIPS 1990, pdf
- Peter Dayan (1991). Reinforcing Connectionism: Learning the Statistical Way. Ph.D. thesis, University of Edinburgh
- Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
- Gerald Tesauro (1992). Temporal Difference Learning of Backgammon Strategy. ML 1992
- Justin A. Boyan, Michael L. Littman (1993). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. NIPS 1993, pdf
- Michael L. Littman (1994). Markov Games as a Framework for Multi-Agent Reinforcement Learning. International Conference on Machine Learning, pdf
1995 ...
- Marco Wiering (1995). TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures. Master's thesis, University of Amsterdam, pdf
- Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3
- Leemon C. Baird III, Mance E. Harmon, A. Harry Klopf (1996). Reinforcement Learning: An Alternative Approach to Machine Intelligence. pdf
- Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore (1996). Reinforcement Learning: A Survey. JAIR Vol. 4, pdf
- Robert Levinson (1996). General Game-Playing and Reinforcement Learning. Computational Intelligence, Vol. 12, No. 1
- David E. Moriarty, Risto Miikkulainen (1996). Efficient Reinforcement Learning through Symbiotic Evolution. Machine Learning, Vol. 22
- Ronald Parr, Stuart Russell (1997). Reinforcement Learning with Hierarchies of Machines. In Advances in Neural Information Processing Systems 10, MIT Press, zipped ps
- William Uther, Manuela M. Veloso (1997). Adversarial Reinforcement Learning. Carnegie Mellon University, ps
- William Uther, Manuela M. Veloso (1997). Generalizing Adversarial Reinforcement Learning. Carnegie Mellon University, ps
- Marco Wiering, Jürgen Schmidhuber (1997). HQ-learning. Adaptive Behavior, Vol. 6, No 2
- Csaba Szepesvári (1998). Reinforcement Learning: Theory and Practice. Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, zipped ps
- Richard Sutton, Andrew Barto (1998). Reinforcement Learning: An Introduction. MIT Press
- Vassilis Papavassiliou, Stuart Russell (1999). Convergence of reinforcement learning with general function approximators. In Proc. IJCAI-99, Stockholm, ps
- Marco Wiering (1999). Explorations in Efficient Reinforcement Learning. Ph.D. thesis, University of Amsterdam, advisors Frans Groen and Jürgen Schmidhuber
- Richard Sutton, Doina Precup, Satinder Singh (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, Vol. 112, pdf
2000 ...
- Sebastian Thrun, Michael L. Littman (2000). A Review of Reinforcement Learning. AI Magazine, Vol. 21, No. 1
- Robert Levinson, Ryan Weber (2000). Chess Neighborhoods, Function Combination, and Reinforcement Learning. CG 2000
- Andrew Ng, Stuart Russell (2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California: Morgan Kaufmann, pdf
- Dean F. Hougen, Maria Gini, James R. Slagle (2000). An Integrated Connectionist Approach to Reinforcement Learning for Robotic Control. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
- Jonathan Baxter, Peter Bartlett (2000). Reinforcement Learning on POMDPs via Direct Gradient Ascent. ICML 2000, pdf
- Doina Precup (2000). Temporal Abstraction in Reinforcement Learning. Ph.D. Dissertation, Department of Computer Science, University of Massachusetts, Amherst.
- Robert Levinson, Ryan Weber (2001). Chess Neighborhoods, Function Combinations and Reinforcements Learning. In Computers and Games (eds. Tony Marsland and I. Frank). Lecture Notes in Computer Science,. Springer,. pdf
- Marco Block-Berlitz (2003). Reinforcement Learning in der Schachprogrammierung. Studienarbeit, Freie Universität Berlin, Dozent: Prof. Dr. Raúl Rojas, pdf (German)
- Henk Mannen (2003). Learning to play chess using reinforcement learning with database games. Master’s thesis, Cognitive Artificial Intelligence, Utrecht University, pdf
- Joelle Pineau, Geoffrey Gordon, Sebastian Thrun (2003). Point-based value iteration: An anytime algorithm for POMDPs. IJCAI, pdf
- Amy J. Kerr, Todd W. Neller, Christopher J. La Pilla , Michael D. Schompert (2002). Java Resources for Teaching Reinforcement Learning. PDPTA 2003
- Yngvi Björnsson, Vignir Hafsteinsson, Ársæll Jóhannsson, Einar Jónsson (2004). Efficient Use of Reinforcement Learning in a Computer Game. In Computer Games: Artificial Intellignece, Design and Education (CGAIDE'04), pp. 379–383, 2004. pdf
- Imran Ghory (2004). Reinforcement learning in board games. CSTR-04-004, Department of Computer Science, University of Bristol. pdf [7]
- Eric Wiewiora (2004). Efficient Exploration for Reinforcement Learning. MSc thesis, pdf
- Albert Xin Jiang (2004). Multiagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces. pdf
2005 ...
- Sylvain Gelly, Jérémie Mary, Olivier Teytaud (2006). Learning for stochastic dynamic programming. pdf, pdf
- Sylvain Gelly (2007). A Contribution to Reinforcement Learning; Application to Computer Go. Ph.D. thesis, pdf
- Yong Duan, Baoxia Cui, Xinhe Xu (2007). State Space Partition for Reinforcement Learning Based on Fuzzy Min-Max Neural Network. ISNN 2007
- Yasuhiro Osaki, Kazutomo Shibahara, Yasuhiro Tajima, Yoshiyuki Kotani (2007). Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method. 12th Game Programming Workshop
- David Silver, Richard Sutton, Martin Müller (2007). Reinforcement learning of local shape in the game of Go. 20th IJCAI, pdf
- Marco Block, Maro Bader, Ernesto Tapia, Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, Raúl Rojas (2008). Using Reinforcement Learning in Chess Engines. CONCIBE SCIENCE 2008, Research in Computing Science: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, ISSN:1870-4069, Vol. 35, pp. 31-40, Guadalajara, Mexico, pdf
- Cécile Germain-Renaud, Julien Pérez, Balázs Kégl, Charles Loomis (2008). Grid Differentiated Services: a Reinforcement Learning Approach. In 8th IEEE Symposium on Cluster Computing and the Grid. Lyon, pdf
- Balázs Csanád Csáji, László Monostori (2008). Value function based reinforcement learning in changing Markovian environments. Journal of Machine Learning Research, Vol. 9, pdf
- David Silver (2009). Reinforcement Learning and Simulation-Based Search. Ph.D. thesis, University of Alberta. pdf
- Marcin Szubert (2009). Coevolutionary Reinforcement Learning and its Application to Othello. M.Sc. thesis, Poznań University of Technology, supervisor Krzysztof Krawiec, pdf
- Joelle Pineau, Geoffrey Gordon, Sebastian Thrun (2006, 2011). Anytime Point-Based Approximations for Large POMDPs. Journal of Artificial Intelligence Research, Vol. 27, arXiv:1110.0027
2010 ...
- Joel Veness, Kee Siong Ng, Marcus Hutter, David Silver (2010). Reinforcement Learning via AIXI Approximation. Association for the Advancement of Artificial Intelligence (AAAI), pdf
- Julien Pérez, Cécile Germain-Renaud, Balázs Kégl, Charles Loomis (2010). Multi-objective Reinforcement Learning for Responsive Grids. In The Journal of Grid Computing. pdf
- Csaba Szepesvári (2010). Algorithms for Reinforcement Learning. Morgan & Claypool
- Julio H. Zaragoza, Eduardo F. Morales (2010). Relational Reinforcement Learning with Continuous Actions by Combining Behavioral Cloning and Locally Weighted Regression. Journal of Intelligent Systems and Applications, Vol. 2
2011
- Peter Auer (2011). Exploration and Exploitation in Online Learning. ICAIS 2011
- Charles Elkan (2011). Reinforcement Learning with a Bilinear Q Function. EWRL 2011
2012
- Marco Wiering, Martijn Van Otterlo (2012). Reinforcement learning: State-of-the-art. Adaptation, Learning, and Optimization, Vol. 12, Springer
- István Szita (2012). Reinforcement Learning in Games. Chapter 17
- Thomas J. Walsh, István Szita, Carlos Diuk, Michael L. Littman (2012). Exploring compact reinforcement-learning representations with linear regression. arXiv:1205.2606
- Arthur Guez, David Silver, Peter Dayan (2012). Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search. NIPS 2012
2013
- Arthur Guez, David Silver, Peter Dayan (2013). Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search. Journal of Artificial Intelligence Research, Vol. 48, pdf
- Michiel van der Ree, Marco Wiering (2013). Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play. ADPRL 2013
- Luuk Bom, Ruud Henken, Marco Wiering (2013). Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs. ADPRL 2013 [8]
- Peter Auer, Marcus Hutter, Laurent Orseau (2013). Reinforcement Learning. Dagstuhl Reports, Vol. 3, No. 8, DOI: 10.4230/DagRep.3.8.1, URN: urn:nbn:de:0030-drops-43409
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller (2013). Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [9] [10]
- Ari Weinstein (2013). Local Planning For Continuous Markov Decision Processes. Ph.D. thesis, Rutgers University, advisor Michael L. Littman, pdf
2014
- Marcin Szubert (2014). Coevolutionary Shaping for Reinforcement Learning. Ph.D. thesis, Poznań University of Technology, supervisor Krzysztof Krawiec, co-supervisor Wojciech Jaśkowski, pdf
2015 ...
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, Vol. 518
- Tobias Graf, Marco Platzner (2015). Adaptive Playouts in Monte Carlo Tree Search with Policy Gradient Reinforcement Learning. Advances in Computer Games 14
- Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Veda Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, David Silver (2015). Massively Parallel Methods for Deep Reinforcement Learning. arXiv:1507.04296
- Matthew Lai (2015). Giraffe: Using Deep Reinforcement Learning to Play Chess. M.Sc. thesis, Imperial College London, arXiv:1509.01549v1 » Giraffe
- Hado van Hasselt, Arthur Guez, David Silver (2015). Deep Reinforcement Learning with Double Q-learning. arXiv:1509.06461
2016
- Ziyu Wang, Nando de Freitas, Marc Lanctot (2016). Dueling Network Architectures for Deep Reinforcement Learning. arXiv:1511.06581
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529 » AlphaGo
- Hung Guei, Tinghan Wei, Jin-Bo Huang, I-Chen Wu (2016). An Empirical Study on Applying Deep Reinforcement Learning to the Game 2048. CG 2016
- Omid E. David, Nathan S. Netanyahu, Lior Wolf (2016). DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. ICAAN 2016, Lecture Notes in Computer Science, Vol. 9887, Springer, pdf preprint » DeepChess [11] [12]
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783v2
- Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine (2016). Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv:1610.00633
- Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu (2016). Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv:1611.05397v1
- Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Rémi Munos, Charles Blundell, Dharshan Kumaran, Matthew Botvinick (2016). Learning to reinforcement learn. arXiv:1611.05763
- Zacharias Georgiou, Evangelos Karountzos, Yaroslav Shkarupa, Matthia Sabatelli (2016). A Reinforcement Learning Approach for Solving KRK Chess Endgames. pdf [13]
2017
- Hirotaka Kameko, Jun Suzuki, Naoki Mizukami, Yoshimasa Tsuruoka (2017). Deep Reinforcement Learning with Hidden Layers on Future States. Computer Games Workshop at IJCAI 2017, pdf
- Marc Lanctot, Vinícius Flores Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, Thore Graepel (2017). A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. arXiv:1711.00832
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis (2017). Mastering the game of Go without human knowledge. Nature, Vol. 550, pdf [14]
- Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, David Meger (2017). Deep Reinforcement Learning that Matters. arXiv:1709.06560
- Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg (2017). Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? arXiv:1711.02301
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 » AlphaZero
- Kei Takada, Hiroyuki Iizuka, Masahito Yamamoto (2017). Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex. TAAI 2017 » Hex, CNN
- Ari Weinstein, Matthew Botvinick (2017). Structure Learning in Motor Control: A Deep Reinforcement Learning Model. arXiv:1706.06827
- Takuya Hiraoka, Masaaki Tsuchida, Yotaro Watanabe (2017). Deep Reinforcement Learning for Inquiry Dialog Policies with Logical Formula Embeddings. arXiv:1708.00667
- William Uther (2017). Markov Decision Processes. in Claude Sammut, Geoffrey I. Webb (eds) (2017). Encyclopedia of Machine Learning and Data Mining. Springer
- Jayvant Anantpur, Nagendra Gulur Dwarakanath, Shivaram Kalyanakrishnan, Shalabh Bhatnagar, R. Govindarajan (2017). RLWS: A Reinforcement Learning based GPU Warp Scheduler. arXiv:1712.04303
2018
- Hui Wang, Michael Emmerich, Aske Plaat (2018). Monte Carlo Q-learning for General Game Playing. arXiv:1802.05944 » MCTS, General Game Playing
- Vinícius Flores Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David P. Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter W. Battaglia (2018). Relational Deep Reinforcement Learning. arXiv:1806.01830
- Hui Wang, Michael Emmerich, Aske Plaat (2018). Assessing the Potential of Classical Q-learning in General Game Playing. arXiv:1810.06078
- Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau (2018). An Introduction to Deep Reinforcement Learning. arXiv:1811.12560
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, No. 6419 [15]
- Tianhe Wang, Tomoyuki Kaneko (2018). Application of Deep Reinforcement Learning in Werewolf Game Agents. TAAI 2018
- Taichi Nakayashiki, Tomoyuki Kaneko (2018). Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search. TAAI 2018
- Hung Guei, Ting-Han Wei, I-Chen Wu (2018). Using 2048-like games as a pedagogical tool for reinforcement learning. CG 2018, ICGA Journal, Vol. 40, No. 3
2019
- Sanjeevan Ahilan, Peter Dayan (2019). Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. arXiv:1901.08492
- Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar (2019). Successive Over Relaxation Q-Learning. arXiv:1903.03812
- Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar (2019). Second Order Value Iteration in Reinforcement Learning. arXiv:1905.03927
- Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinícius Flores Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis (2019). OpenSpiel: A Framework for Reinforcement Learning in Games. arXiv:1908.09453 [16]
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2019). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. arXiv:1911.08265 [17]
- Sourabh Bose (2019). Learning Representations Using Reinforcement Learning. Ph.D. thesis, University of Texas at Arlington, advisor Manfred Huber [18]
- Johannes Czech (2019). Deep Reinforcement Learning for Crazyhouse. Master thesis, TU Darmstadt, pdf » CrazyAra
2020 ...
- Hung Guei, Ting-Han Wei, I-Chen Wu (2020). 2048-like games for teaching reinforcement learning. ICGA Journal, Vol. 42, No. 1
- Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar (2020). Generalized Speedy Q-Learning. IEEE Control Systems Letters, Vol. 4, No. 3, arXiv:1911.00397
- Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka (2020). Meta-Model-Based Meta-Policy Optimization. arXiv:2006.02608
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, Vol. 588 [19] [20]
- Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, Shi-Yu Chen, Xian-Dong Chiu, Julien Dehos, Maria Elsa, Qucheng Gong, Hengyuan Hu, Vasil Khalidov, Cheng-Ling Li, Hsin-I Lin, Yu-Jin Lin, Xavier Martinet, Vegard Mella, Jeremy Rapin, Baptiste Roziere, Gabriel Synnaeve, Fabien Teytaud, Olivier Teytaud, Shi-Cheng Ye, Yi-Jun Ye, Shi-Jim Yen, Sergey Zagoruyko (2020). Polygames: Improved zero learning. ICGA Journal, Vol. 42, No. 4, arXiv:2001.09832, arXiv:2001.09832
- Matthia Sabatelli, Gilles Louppe, Pierre Geurts, Marco Wiering (2020). The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms. IJCNN 2020 [21]
- Quentin Cohen-Solal (2020). Learning to Play Two-Player Perfect-Information Games without Knowledge. arXiv:2008.01188
- Quentin Cohen-Solal, Tristan Cazenave (2020). Minimax Strikes Back. arXiv:2012.10700
2021
- Maximilian Alexander Gehrke (2021). Assessing Popular Chess Variants Using Deep Reinforcement Learning. Master thesis, TU Darmstadt, pdf » CrazyAra
- Dominik Klein (2021). Neural Networks For Chess. Release Version 1.1 · GitHub [22]
- Quentin Cohen-Solal, Tristan Cazenave (2021). DESCENT wins five gold medals at the Computer Olympiad. ICGA Journal, Vol. 43, No. 2
- Boris Doux, Benjamin Negrevergne, Tristan Cazenave (2021). Deep Reinforcement Learning for Morpion Solitaire. Advances in Computer Games 17
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao (2021). Mastering Atari Games with Limited Data. arXiv:2111.00210 [23] [24]
- Dennis Soemers, Vegard Mella, Cameron Browne, Olivier Teytaud (2021). Deep learning for general game playing with Ludii and Polygames. ICGA Journal, Vol. 43, No. 3
Postings
1995 ...
- Parameter Tuning by Jonathan Baxter, CCC, October 01, 1998 » KnightCap
- Re: Parameter Tuning by Don Beal, CCC, October 02, 1998
2000 ...
- Pseudo-code for TD learning by Daniel Homan, CCC, July 06, 2000 » Temporal Difference Learning
- any good experiences with genetic algos or temporal difference learning? by Rafael B. Andrist, CCC, January 01, 2001
- Temporal Differences by Peter Fendrich, CCC, December 21, 2004
2010 ...
- *First release* Giraffe, a new engine based on deep learning by Matthew Lai, CCC, July 08, 2015 » Deep Learning, Giraffe
- Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 22, 2015
- Deep Reinforcement Learning with Neon by Tambet Matiisen, Nervana, December 29, 2015
- Google's AlphaGo team has been working on chess by Peter Kappler, CCC, December 06, 2017 » AlphaZero
- Understanding the power of reinforcement learning by Michael Sherwin, CCC, December 12, 2017
2020 ...
- Board adaptive / tuning evaluation function - no NN/AI by Moritz Gedig, CCC, January 14, 2020
- Unsupervised reinforcement tuning from zero by Madeleine Birchfield, CCC, October 16, 2020 » Automated Tuning
- Transhuman Chess with NN and RL... by Srdja Matovic, CCC, October 30, 2020 » NN
- Reinforcement learning project by Harm Geert Muller, CCC, January 31, 2021 » Texel's Tuning Method
External Links
Reinforcement Learning
- Reinforcement learning from Wikipedia
- Reinforcement Learning: An Introduction ebook by Richard Sutton and Andrew Barto
- Reinforcement Learning in Classic Board Games (pdf) by David Silver
- Category: Reinforcement Learning - Scholarpedia
- Reinforcement Learning - Scholarpedia
- Reinforcement Learning and Artificial Intelligence – Faculty of Science, University of Alberta
MDP
- Markov decision process from Wikipedia
- Partially observable Markov decision process
- Reinforcement Learning and POMDPs by Jürgen Schmidhuber
Q-Learning
- Q-learning from Wikipedia
- A Painless Q-Learning Tutorial
- State–action–reward–state–action from Wikipedia
- Probably approximately correct learning from Wikipedia
Courses
- Reinforcement Learning Course by David Silver, University College London, 2015, YouTube Videos
- Lecture 1: Introduction to Reinforcement Learning
- Lecture 2: Markov Decision Process
- Lecture 3: Planning by Dynamic Programming
- Lecture 4: Model-Free Prediction
- Lecture 5: Model Free Control
- Lecture 6: Value Function Approximation
- Lecture 7: Policy Gradient Methods
- Lecture 8: Integrating Learning and Planning
- Lecture 9: Exploration and Exploitation
- Lecture 10: Classic Games
- Introduction to Reinforcement Learning by Joelle Pineau, McGill University, 2016, YouTube Video
GitHub
- GitHub - deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games [25]
- GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero [26]
- GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021 [27]
- GitHub - facebookarchive/Polygames: The project is a platform of zero learning with a library of games
References
- ↑ Example of a simple Markov decision processes with three states (green circles) and two actions (orange circles), with two rewards (orange arrows), image by waldoalvarez CC BY-SA 4.0, Wikimedia Commons
- ↑ Q-learning from Wikipedia
- ↑ Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
- ↑ Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, Vol. 518
- ↑ Q-learning from Wikipedia
- ↑ Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3
- ↑ University of Bristol - Department of Computer Science - Technical Reports
- ↑ Ms. Pac-Man from Wikipedia
- ↑ Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 22, 2015
- ↑ Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents
- ↑ DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
- ↑ ICANN 2016 | Recipients of the best paper awards
- ↑ GitHub - paintception/A-Reinforcement-Learning-Approach-for-Solving-Chess-Endgames: Machine Learning - Reinforcement Learning
- ↑ AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017
- ↑ AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
- ↑ open_spiel/contributing.md at master · deepmind/open_spiel · GitHub
- ↑ New DeepMind paper by GregNeto, CCC, November 21, 2019
- ↑ e: Board adaptive / tuning evaluation function - no NN/AI by Tony P., CCC, January 15, 2020
- ↑ MuZero: Mastering Go, chess, shogi and Atari without rules
- ↑ GitHub - koulanurag/muzero-pytorch: Pytorch Implementation of MuZero
- ↑ GitHub - paintception/Deep-Quality-Value-DQV-Learning-: DQV-Learning: a novel faster synchronous Deep Reinforcement Learning algorithm
- ↑ Book about Neural Networks for Chess by dkl, CCC, September 29, 2021
- ↑ GitHub - YeWR/EfficientZero: Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021
- ↑ Want to train nets faster? by Dann Corbit, CCC, December 01, 2021
- ↑ Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinícius Flores Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis (2019). OpenSpiel: A Framework for Reinforcement Learning in Games. arXiv:1908.09453
- ↑ Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, Vol. 588
- ↑ Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao (2021). Mastering Atari Games with Limited Data. arXiv:2111.00210