Difference between revisions of "Reinforcement Learning"

From Chessprogramming wiki
Jump to: navigation, search
(30 intermediate revisions by the same user not shown)
Line 43: Line 43:
 
* [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf]
 
* [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf]
 
* [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com]
 
* [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com]
 +
* [[Ian H. Witten]] ('''1977'''). ''An Adaptive Optimal Controller for Discrete-Time Markov Environments''. [https://en.wikipedia.org/wiki/Information_and_Computation Information and Control], Vol. 34, No. 4, [https://core.ac.uk/download/pdf/82451748.pdf pdf]
 
==1980 ...==
 
==1980 ...==
 
* [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts]
 
* [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts]
Line 49: Line 50:
 
==1990 ...==
 
==1990 ...==
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time Derivative Models of Pavlovian Reinforcement''. Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497-537
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time Derivative Models of Pavlovian Reinforcement''. Learning and Computational Neuroscience: Foundations of Adaptive Networks: 497-537
 +
* [[Jürgen Schmidhuber]] ('''1990'''). ''Reinforcement Learning in Markovian and Non-Markovian Environments''. [https://dblp.uni-trier.de/db/conf/nips/nips1990.html NIPS 1990], [ftp://ftp.idsia.ch/pub/juergen/nipsnonmarkov.pdf pdf]
 +
* [[Peter Dayan]] ('''1991'''). ''[https://www.era.lib.ed.ac.uk/handle/1842/14754 Reinforcing Connectionism: Learning the Statistical Way]''. Ph.D. thesis, [[University of Edinburgh]]
 
* [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2
 
* [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2
 
* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]
 
* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]
Line 54: Line 57:
 
* [[Michael L. Littman]] ('''1994'''). ''Markov Games as a Framework for Multi-Agent Reinforcement Learning''. International Conference on Machine Learning, [http://www.cs.duke.edu/courses/spring07/cps296.3/littman94markov.pdf pdf]
 
* [[Michael L. Littman]] ('''1994'''). ''Markov Games as a Framework for Multi-Agent Reinforcement Learning''. International Conference on Machine Learning, [http://www.cs.duke.edu/courses/spring07/cps296.3/littman94markov.pdf pdf]
 
==1995 ...==
 
==1995 ...==
* [[Marco Wiering]] ('''1995'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&citation_for_view=xVas0I8AAAAJ:roLk4NBRz8UC TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures]''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], [http://webber.physik.uni-freiburg.de/~hon/vorlss02/Literatur/reinforcement/GameEvaluationWithNeuronal.pdf pdf]
+
* [[Marco Wiering]] ('''1995'''). ''TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], [http://webber.physik.uni-freiburg.de/~hon/vorlss02/Literatur/reinforcement/GameEvaluationWithNeuronal.pdf pdf]
 
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]], Vol. 38, No. 3
 
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]], Vol. 38, No. 3
 
* [http://dblp.uni-trier.de/pers/hd/b/Baird_III:Leemon_C= Leemon C. Baird III], [http://dblp.uni-trier.de/pers/hd/h/Harmon:Mance_E= Mance E. Harmon], [[A. Harry Klopf]] ('''1996'''). ''Reinforcement Learning: An Alternative Approach to Machine Intelligence''. [http://www.leemon.com/papers/1996bhk.pdf pdf]
 
* [http://dblp.uni-trier.de/pers/hd/b/Baird_III:Leemon_C= Leemon C. Baird III], [http://dblp.uni-trier.de/pers/hd/h/Harmon:Mance_E= Mance E. Harmon], [[A. Harry Klopf]] ('''1996'''). ''Reinforcement Learning: An Alternative Approach to Machine Intelligence''. [http://www.leemon.com/papers/1996bhk.pdf pdf]
Line 62: Line 65:
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps]
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1997'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:u5HHmVD_uO8C HQ-learning]''. [https://en.wikipedia.org/wiki/Adaptive_Behavior_%28journal%29 Adaptive Behavior], Vol. 6, No 2
+
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1997'''). ''HQ-learning''. [https://en.wikipedia.org/wiki/Adaptive_Behavior_%28journal%29 Adaptive Behavior], Vol. 6, No 2
 
* [[Csaba Szepesvári]] ('''1998'''). ''Reinforcement Learning: Theory and Practice''. Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, [http://www.sztaki.hu/%7Eszcsaba/papers/scann98.ps.gz zipped ps]
 
* [[Csaba Szepesvári]] ('''1998'''). ''Reinforcement Learning: Theory and Practice''. Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, [http://www.sztaki.hu/%7Eszcsaba/papers/scann98.ps.gz zipped ps]
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[https://mitpress.mit.edu/books/reinforcement-learning Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[https://mitpress.mit.edu/books/reinforcement-learning Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
* [[Marco Wiering]] ('''1999'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&pagesize=100&citation_for_view=xVas0I8AAAAJ:9yKSN-GCB0IC Explorations in Efficient Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
+
* [[Marco Wiering]] ('''1999'''). ''Explorations in Efficient Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
 
==2000 ...==
 
==2000 ...==
 
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1
 
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1
Line 72: Line 75:
 
* [[Andrew Ng]], [[Stuart Russell]] ('''2000'''). ''Algorithms for inverse reinforcement learning.'' In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California: Morgan Kaufmann, [http://www.cs.berkeley.edu/~russell/papers/ml00-irl.pdf pdf]
 
* [[Andrew Ng]], [[Stuart Russell]] ('''2000'''). ''Algorithms for inverse reinforcement learning.'' In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California: Morgan Kaufmann, [http://www.cs.berkeley.edu/~russell/papers/ml00-irl.pdf pdf]
 
* [http://www.cs.ou.edu/~hougen/ Dean F. Hougen], [http://www-users.cs.umn.edu/~gini/ Maria Gini], [[James R. Slagle]] ('''2000'''). ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.2633 An Integrated Connectionist Approach to Reinforcement Learning for Robotic Control]''. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
 
* [http://www.cs.ou.edu/~hougen/ Dean F. Hougen], [http://www-users.cs.umn.edu/~gini/ Maria Gini], [[James R. Slagle]] ('''2000'''). ''[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.2633 An Integrated Connectionist Approach to Reinforcement Learning for Robotic Control]''. ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
* [[Jonathan Baxter]], [[Peter Bartlett]] ('''2000'''). ''Reinforcement Learning on POMDPs via Direct Gradient Ascent''. [http://dblp.uni-trier.de/db/conf/icml/icml2000.html ICML 2000], [https://pdfs.semanticscholar.org/b874/98f0879d312c308889135203b17069aa0486.pdf pdf]
+
* [[Jonathan Baxter]], [[Mathematician#PBartlett|Peter Bartlett]] ('''2000'''). ''Reinforcement Learning on POMDPs via Direct Gradient Ascent''. [http://dblp.uni-trier.de/db/conf/icml/icml2000.html ICML 2000], [https://pdfs.semanticscholar.org/b874/98f0879d312c308889135203b17069aa0486.pdf pdf]
 
* [[Doina Precup]] ('''2000'''). ''Temporal Abstraction in Reinforcement Learning''. Ph.D. Dissertation, Department of Computer Science, [https://en.wikipedia.org/wiki/University_of_Massachusetts_Amherst University of Massachusetts], [https://en.wikipedia.org/wiki/Amherst,_Massachusetts Amherst].
 
* [[Doina Precup]] ('''2000'''). ''Temporal Abstraction in Reinforcement Learning''. Ph.D. Dissertation, Department of Computer Science, [https://en.wikipedia.org/wiki/University_of_Massachusetts_Amherst University of Massachusetts], [https://en.wikipedia.org/wiki/Amherst,_Massachusetts Amherst].
 
* [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf]
 
* [[Robert Levinson]], [[Ryan Weber]] ('''2001'''). ''Chess Neighborhoods, Function Combinations and Reinforcements Learning''. In Computers and Games (eds. [[Tony Marsland]] and I. Frank). [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science],. Springer,. [http://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf]
Line 78: Line 81:
 
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]
 
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]
 
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf]
 
* [[Joelle Pineau]], [[Geoffrey Gordon]], [[Sebastian Thrun]] ('''2003'''). ''Point-based value iteration: An anytime algorithm for POMDPs''. [[Conferences#IJCAI2003|IJCAI]], [http://www.fore.robot.cc/papers/Pineau03a.pdf pdf]
 +
* [https://dblp.uni-trier.de/pers/hd/k/Kerr:Amy_J= Amy J. Kerr], [[Todd W. Neller]], [https://dblp.uni-trier.de/pers/hd/p/Pilla:Christopher_J=_La Christopher J. La Pilla] , [https://dblp.uni-trier.de/pers/hd/s/Schompert:Michael_D= Michael D. Schompert] ('''2002'''). ''[https://www.semanticscholar.org/paper/Java-Resources-for-Teaching-Reinforcement-Learning-Kerr-Neller/3d84018eb8b8668c13d1d4f6efca4442af2915b4 Java Resources for Teaching Reinforcement Learning]''. [https://dblp.uni-trier.de/db/conf/pdpta/pdpta2003-3.html PDPTA 2003]
 
* [[Yngvi Björnsson]], Vignir Hafsteinsson, Ársæll Jóhannsson, Einar Jónsson ('''2004'''). ''Efficient Use of Reinforcement Learning in a Computer Game''. In Computer Games: Artificial Intellignece, Design and Education (CGAIDE'04), pp. 379–383, 2004. [http://www.ru.is/faculty/yngvi/pdf/BjornssonHJJ04.pdf pdf]
 
* [[Yngvi Björnsson]], Vignir Hafsteinsson, Ársæll Jóhannsson, Einar Jónsson ('''2004'''). ''Efficient Use of Reinforcement Learning in a Computer Game''. In Computer Games: Artificial Intellignece, Design and Education (CGAIDE'04), pp. 379–383, 2004. [http://www.ru.is/faculty/yngvi/pdf/BjornssonHJJ04.pdf pdf]
 
* [http://imranontech.com/ Imran Ghory] ('''2004'''). ''Reinforcement learning in board games''. CSTR-04-004, [http://www.cs.bris.ac.uk/ Department of Computer Science], [https://en.wikipedia.org/wiki/University_of_Bristol University of Bristol]. [http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf pdf] <ref>[http://www.cs.bris.ac.uk/Publications/pub_master.jsp?type=117 University of Bristol - Department of Computer Science - Technical Reports]</ref>
 
* [http://imranontech.com/ Imran Ghory] ('''2004'''). ''Reinforcement learning in board games''. CSTR-04-004, [http://www.cs.bris.ac.uk/ Department of Computer Science], [https://en.wikipedia.org/wiki/University_of_Bristol University of Bristol]. [http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf pdf] <ref>[http://www.cs.bris.ac.uk/Publications/pub_master.jsp?type=117 University of Bristol - Department of Computer Science - Technical Reports]</ref>
Line 87: Line 91:
 
* [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duan:Yong.html Yong Duan], [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cui:Baoxia.html Baoxia Cui], [[Xinhe Xu]] ('''2007'''). ''State Space Partition for Reinforcement Learning Based on Fuzzy Min-Max Neural Network''. [http://www.informatik.uni-trier.de/~ley/db/conf/isnn/isnn2007-2.html#DuanCX07 ISNN 2007]
 
* [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duan:Yong.html Yong Duan], [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cui:Baoxia.html Baoxia Cui], [[Xinhe Xu]] ('''2007'''). ''State Space Partition for Reinforcement Learning Based on Fuzzy Min-Max Neural Network''. [http://www.informatik.uni-trier.de/~ley/db/conf/isnn/isnn2007-2.html#DuanCX07 ISNN 2007]
 
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2007'''). ''Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method''. [[Conferences#GPW|12th Game Programming Workshop]]
 
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2007'''). ''Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method''. [[Conferences#GPW|12th Game Programming Workshop]]
 +
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2007'''). ''Reinforcement learning of local shape in the game of Go''. [[Conferences#IJCAI2007|20th IJCAI]], [http://webdocs.cs.ualberta.ca/~mmueller/ps/silver-ijcai2007.pdf pdf]
 
* [[Marco Block-Berlitz|Marco Block]], Maro Bader, [http://page.mi.fu-berlin.de/tapia/ Ernesto Tapia], Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, [[Raúl Rojas]] ('''2008'''). ''Using Reinforcement Learning in Chess Engines''. CONCIBE SCIENCE 2008, [http://www.micai.org/rcs/ Research in Computing Science]: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, ISSN:1870-4069, Vol. 35, pp. 31-40, [https://en.wikipedia.org/wiki/Guadalajara Guadalajara], Mexico, [http://page.mi.fu-berlin.de/block/concibe2008.pdf pdf]
 
* [[Marco Block-Berlitz|Marco Block]], Maro Bader, [http://page.mi.fu-berlin.de/tapia/ Ernesto Tapia], Marte Ramírez, Ketill Gunnarsson, Erik Cuevas, Daniel Zaldivar, [[Raúl Rojas]] ('''2008'''). ''Using Reinforcement Learning in Chess Engines''. CONCIBE SCIENCE 2008, [http://www.micai.org/rcs/ Research in Computing Science]: Special Issue in Electronics and Biomedical Engineering, Computer Science and Informatics, ISSN:1870-4069, Vol. 35, pp. 31-40, [https://en.wikipedia.org/wiki/Guadalajara Guadalajara], Mexico, [http://page.mi.fu-berlin.de/block/concibe2008.pdf pdf]
 
* [[Cécile Germain-Renaud]], [[Julien Pérez]], [[Balázs Kégl]], [[Charles Loomis]] ('''2008'''). ''Grid Differentiated Services: a Reinforcement Learning Approach''. In 8th [[IEEE]] Symposium on Cluster Computing and the Grid. Lyon, [http://hal.inria.fr/docs/00/28/78/26/PDF/RLccg08.pdf pdf]
 
* [[Cécile Germain-Renaud]], [[Julien Pérez]], [[Balázs Kégl]], [[Charles Loomis]] ('''2008'''). ''Grid Differentiated Services: a Reinforcement Learning Approach''. In 8th [[IEEE]] Symposium on Cluster Computing and the Grid. Lyon, [http://hal.inria.fr/docs/00/28/78/26/PDF/RLccg08.pdf pdf]
 
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf]
 
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]]. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/thesis.pdf pdf]
 +
* [[Balázs Csanád Csáji]], [https://dblp.dagstuhl.de/pers/hd/m/Monostori:L=aacute=szl=oacute= László Monostori] ('''2008'''). ''Value function based reinforcement learning in changing Markovian environments''. [https://en.wikipedia.org/wiki/Journal_of_Machine_Learning_Research Journal of Machine Learning Research], Vol. 9, [http://www.jmlr.org/papers/volume9/csaji08a/csaji08a.pdf pdf]
 
==2010 ...==
 
==2010 ...==
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf]
 
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf]
 
* [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf]
 
* [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf]
 +
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool
 +
* [https://dblp.org/pers/hd/z/Zaragoza:Julio_H= Julio H. Zaragoza], [[Eduardo F. Morales]] ('''2010'''). ''Relational Reinforcement Learning with Continuous Actions by Combining Behavioral Cloning and Locally Weighted Regression''. Journal of Intelligent Systems and Applications, Vol. 2
 
'''2011'''
 
'''2011'''
 
* [[Peter Auer]] ('''2011'''). ''Exploration and Exploitation in Online Learning''. [http://dblp.uni-trier.de/db/conf/icais/icais2011.html#Auer11 ICAIS 2011]
 
* [[Peter Auer]] ('''2011'''). ''Exploration and Exploitation in Online Learning''. [http://dblp.uni-trier.de/db/conf/icais/icais2011.html#Auer11 ICAIS 2011]
 
* [[Charles Elkan]] ('''2011'''). ''Reinforcement Learning with a Bilinear Q Function''. [http://www.informatik.uni-trier.de/~ley/db/conf/ewrl/ewrl2011.html#Elkan11 EWRL 2011]
 
* [[Charles Elkan]] ('''2011'''). ''Reinforcement Learning with a Bilinear Q Function''. [http://www.informatik.uni-trier.de/~ley/db/conf/ewrl/ewrl2011.html#Elkan11 EWRL 2011]
 
'''2012'''
 
'''2012'''
* [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:abG-DnoFyZgC Reinforcement learning: State-of-the-art]''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
+
* [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). ''Reinforcement learning: State-of-the-art''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
: [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17
 
: [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search''. [http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012 NIPS 2012], [https://papers.nips.cc/paper/4767-efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search.pdf pdf]
+
* [[Thomas J. Walsh]], [[István Szita]], [[Carlos Diuk]], [[Michael L. Littman]] ('''2012'''). ''Exploring compact reinforcement-learning representations with linear regression''. [https://arxiv.org/abs/1205.2606 arXiv:1205.2606]
 +
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''[https://papers.nips.cc/paper/4767-efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012 NIPS 2012]
 
'''2013'''
 
'''2013'''
 
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf]
 
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf]
* [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=60&pagesize=80&citation_for_view=xVas0I8AAAAJ:K3LRdlH-MEoC Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play]''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#ReeW13 ADPRL 2013]
+
* [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#ReeW13 ADPRL 2013]
* [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=40&citation_for_view=xVas0I8AAAAJ:l7t_Zn2s7bgC Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs]''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#BomHW13 ADPRL 2013] <ref>[https://en.wikipedia.org/wiki/Ms._Pac-Man Ms. Pac-Man from Wikipedia]</ref>
+
* [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#BomHW13 ADPRL 2013] <ref>[https://en.wikipedia.org/wiki/Ms._Pac-Man Ms. Pac-Man from Wikipedia]</ref>
 
* [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409]
 
* [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409]
 
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Alex Graves]], [[Ioannis Antonoglou]], [[Daan Wierstra]], [[Martin Riedmiller]] ('''2013'''). ''Playing Atari with Deep Reinforcement Learning''. [http://arxiv.org/abs/1312.5602 arXiv:1312.5602] <ref>[http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015</ref> <ref>[http://www.google.com/patents/US20150100530 Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents]</ref>
 
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Alex Graves]], [[Ioannis Antonoglou]], [[Daan Wierstra]], [[Martin Riedmiller]] ('''2013'''). ''Playing Atari with Deep Reinforcement Learning''. [http://arxiv.org/abs/1312.5602 arXiv:1312.5602] <ref>[http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 22, 2015</ref> <ref>[http://www.google.com/patents/US20150100530 Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents]</ref>
 +
* [[Ari Weinstein]] ('''2013'''). ''Local Planning For Continuous Markov Decision Processes''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Rutgers_University Rutgers University], advisor [[Michael L. Littman]], [http://cs.brown.edu/~mlittman/theses/weinstein.pdf pdf]
 
'''2014'''
 
'''2014'''
 
* [[Marcin Szubert]] ('''2014'''). ''Coevolutionary Shaping for Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], co-supervisor [[Wojciech Jaśkowski]], [http://www.cs.put.poznan.pl/mszubert/pub/phdthesis.pdf pdf]
 
* [[Marcin Szubert]] ('''2014'''). ''Coevolutionary Shaping for Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], co-supervisor [[Wojciech Jaśkowski]], [http://www.cs.put.poznan.pl/mszubert/pub/phdthesis.pdf pdf]
Line 118: Line 128:
 
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
 
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
 
* [[Hung Guei]], [[Tinghan Wei]], [[Jin-Bo Huang]], [[I-Chen Wu]] ('''2016'''). ''An Empirical Study on Applying Deep Reinforcement Learning to the Game 2048''. [[CG 2016]]
 
* [[Hung Guei]], [[Tinghan Wei]], [[Jin-Bo Huang]], [[I-Chen Wu]] ('''2016'''). ''An Empirical Study on Applying Deep Reinforcement Learning to the Game 2048''. [[CG 2016]]
* [[Omid David|Omid E. David]], [[Nathan S. Netanyahu]], [[Lior Wolf]] ('''2016'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-44781-0_11 DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess]''. [http://icann2016.org/ ICAAN 2016], [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], Vol. 9887, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://www.cs.tau.ac.il/~wolf/papers/deepchess.pdf pdf preprint] » [[DeepChess]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61748 DeepChess: Another deep-learning based chess program] by [[Matthew Lai]], [[CCC]], October 17, 2016</ref> <ref>[http://icann2016.org/index.php/conference-programme/recipients-of-the-best-paper-awards/ ICANN 2016 | Recipients of the best paper awards]</ref>
+
* [[Eli David|Omid E. David]], [[Nathan S. Netanyahu]], [[Lior Wolf]] ('''2016'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-319-44781-0_11 DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess]''. [http://icann2016.org/ ICAAN 2016], [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], Vol. 9887, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://www.cs.tau.ac.il/~wolf/papers/deepchess.pdf pdf preprint] » [[DeepChess]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=61748 DeepChess: Another deep-learning based chess program] by [[Matthew Lai]], [[CCC]], October 17, 2016</ref> <ref>[http://icann2016.org/index.php/conference-programme/recipients-of-the-best-paper-awards/ ICANN 2016 | Recipients of the best paper awards]</ref>
 
* [[Volodymyr Mnih]], [[Adrià Puigdomènech Badia]], [[Mehdi Mirza]], [[Alex Graves]], [[Timothy Lillicrap]], [[Tim Harley]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Asynchronous Methods for Deep Reinforcement Learning''.  [https://arxiv.org/abs/1602.01783 arXiv:1602.01783v2]
 
* [[Volodymyr Mnih]], [[Adrià Puigdomènech Badia]], [[Mehdi Mirza]], [[Alex Graves]], [[Timothy Lillicrap]], [[Tim Harley]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Asynchronous Methods for Deep Reinforcement Learning''.  [https://arxiv.org/abs/1602.01783 arXiv:1602.01783v2]
 
* [[Shixiang Gu]], [[Ethan Holly]], [[Timothy Lillicrap]], [[Sergey Levine]] ('''2016'''). ''Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates''. [https://arxiv.org/abs/1610.00633 arXiv:1610.00633]
 
* [[Shixiang Gu]], [[Ethan Holly]], [[Timothy Lillicrap]], [[Sergey Levine]] ('''2016'''). ''Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates''. [https://arxiv.org/abs/1610.00633 arXiv:1610.00633]
 
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]
 
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]
* [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matt Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763]
+
* [[Jane X Wang]], [[Zeb Kurth-Nelson]], [[Dhruva Tirumala]], [[Hubert Soyer]], [[Joel Z Leibo]], [[Rémi Munos]], [[Charles Blundell]], [[Dharshan Kumaran]], [[Matthew Botvinick]] ('''2016'''). ''Learning to reinforcement learn''. [https://arxiv.org/abs/1611.05763 arXiv:1611.05763]
 
'''2017'''
 
'''2017'''
 
* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf]
 
* [[Hirotaka Kameko]], [[Jun Suzuki]], [[Naoki Mizukami]], [[Yoshimasa Tsuruoka]] ('''2017'''). ''Deep Reinforcement Learning with Hidden Layers on Future States''. [[Conferences#IJCA2017|Computer Games Workshop at IJCAI 2017]], [http://www.lamsade.dauphine.fr/~cazenave/cgw2017/Kameko.pdf pdf]
Line 129: Line 139:
 
* [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560]  
 
* [http://www.peterhenderson.co/ Peter Henderson], [https://scholar.google.ca/citations?user=2_4Rs44AAAAJ&hl=en Riashat Islam], [[Philip Bachman]], [[Joelle Pineau]], [[Doina Precup]], [https://scholar.google.ca/citations?user=gFwEytkAAAAJ&hl=en David Meger] ('''2017'''). ''Deep Reinforcement Learning that Matters''. [https://arxiv.org/abs/1709.06560 arXiv:1709.06560]  
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
* [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. TAAI 2017 » [[Hex]], [[Neural Networks#Convolutional|CNN]]
+
* [[Kei Takada]], [[Hiroyuki Iizuka]], [[Masahito Yamamoto]] ('''2017'''). ''Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex''. [[TAAI 2017]] » [[Hex]], [[Neural Networks#Convolutional|CNN]]
 +
* [[Ari Weinstein]], [[Matthew Botvinick]] ('''2017'''). ''Structure Learning in Motor Control: A Deep Reinforcement Learning Model''. [https://arxiv.org/abs/1706.06827 arXiv:1706.06827]
 +
* [[William Uther]] ('''2017'''). ''[https://link.springer.com/referenceworkentry/10.1007/978-1-4899-7687-1_512 Markov Decision Processes]''. in [https://en.wikipedia.org/wiki/Claude_Sammut Claude Sammut], [https://en.wikipedia.org/wiki/Geoff_Webb Geoffrey I. Webb] (eds) ('''2017'''). ''[https://link.springer.com/referencework/10.1007%2F978-1-4899-7687-1 Encyclopedia of Machine Learning and Data Mining]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 +
'''2018'''
 +
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Monte Carlo Q-learning for General Game Playing''. [https://arxiv.org/abs/1802.05944 arXiv:1802.05944] » [[Monte-Carlo Tree Search|MCTS]], [[General Game Playing]]
 +
* [[Hui Wang]], [[Michael Emmerich]], [[Aske Plaat]] ('''2018'''). ''Assessing the Potential of Classical Q-learning in General Game Playing''. [https://arxiv.org/abs/1810.06078 arXiv:1810.06078]
 +
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419 <ref>[https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ AlphaZero: Shedding new light on the grand games of chess, shogi and Go] by [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]] and [[Demis Hassabis]], [[DeepMind]], December 03, 2018</ref>
 +
* [[Tianhe Wang]], [[Tomoyuki Kaneko]] ('''2018'''). ''Application of Deep Reinforcement Learning in Werewolf Game Agents''. [[TAAI 2018]]
 +
* [[Taichi Nakayashiki]], [[Tomoyuki Kaneko]] ('''2018'''). ''Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search''. [[TAAI 2018]]
 +
* [[Hung Guei]], [[Ting-Han Wei]], [[I-Chen Wu]] ('''2018'''). ''Using 2048-like games as a pedagogical tool for reinforcement learning''. [[CG 2018]], [[ICGA Journal#40_3|ICGA Journal, Vol. 40, No. 3]]
 +
'''2019'''
 +
* [https://scholar.google.co.uk/citations?user=OAkRr-YAAAAJ&hl=en Sanjeevan Ahilan], [[Peter Dayan]] ('''2019'''). ''Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning''. [https://arxiv.org/abs/1901.08492 arXiv:1901.08492]
 +
* [[Marc Lanctot]], [[Edward Lockhart]], [[Jean-Baptiste Lespiau]], [[Vinicius Zambaldi]], [[Satyaki Upadhyay]], [[Julien Pérolat]], [[Sriram Srinivasan]], [[Finbarr Timbers]], [[Karl Tuyls]], [[Shayegan Omidshafiei]], [[Daniel Hennes]], [[Dustin Morrill]], [[Paul Muller]], [[Timo Ewalds]], [[Ryan Faulkner]], [[János Kramár]], [[Bart De Vylder]], [[Brennan Saeta]], [[James Bradbury]], [[David Ding]], [[Sebastian Borgeaud]], [[Matthew Lai]], [[Julian Schrittwieser]], [[Thomas Anthony]], [[Edward Hughes]], [[Ivo Danihelka]], [[Jonah Ryan-Davis]] ('''2019'''). ''OpenSpiel: A Framework for Reinforcement Learning in Games''. [https://arxiv.org/abs/1908.09453 arXiv:1908.09453] <ref>[https://github.com/deepmind/open_spiel/blob/master/docs/contributing.md open_spiel/contributing.md at master · deepmind/open_spiel · GitHub]</ref>
 +
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>
 +
* [[Mathematician#SrbhBose|Sourabh Bose]] ('''2019'''). ''[https://rc.library.uta.edu/uta-ir/handle/10106/28094 Learning Representations Using Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Texas_at_Arlington University of Texas at Arlington], advisor [[Mathematician#MHuber|Manfred Huber]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810&start=6 e: Board adaptive / tuning evaluation function - no NN/AI] by Tony P., [[CCC]], January 15, 2020</ref>
  
 
=Postings=
 
=Postings=
Line 139: Line 163:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65909 Google's AlphaGo team has been working on chess] by [[Peter Kappler]], [[CCC]], December 06, 2017 » [[AlphaZero]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65909 Google's AlphaGo team has been working on chess] by [[Peter Kappler]], [[CCC]], December 06, 2017 » [[AlphaZero]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65990 Understanding the power of reinforcement learning] by [[Michael Sherwin]], [[CCC]], December 12, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65990 Understanding the power of reinforcement learning] by [[Michael Sherwin]], [[CCC]], December 12, 2017
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72810 Board adaptive / tuning evaluation function - no NN/AI] by Moritz Gedig, [[CCC]], January 14, 2020
  
 
=External Links=  
 
=External Links=  
Line 174: Line 199:
 
=References=  
 
=References=  
 
<references />
 
<references />
 
 
'''[[Learning|Up one Level]]'''
 
'''[[Learning|Up one Level]]'''
 +
[[Category:Videos]]

Revision as of 22:01, 16 January 2020

Home * Learning * Reinforcement Learning

Reinforcement Learning,
a learning paradigm inspired by behaviourist psychology and classical conditioning - learning by trial and error, interacting with an environment to map situations to actions in such a way that some notion of cumulative reward is maximized. In computer games, reinforcement learning deals with adjusting feature weights based on results or their subsequent predictions during self play.

Reinforcement learning is indebted to the idea of Markov decision processes (MDPs) in the field of optimal control utilizing dynamic programming techniques. The crucial exploitation and exploration tradeoff in multi-armed bandit problems as also considered in UCT of Monte-Carlo Tree Search - between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines - is also faced in reinforcement learning.

Q-Learning

Q-Learning, introduced by Chris Watkins in 1989, is a simple way for agents to learn how to act optimally in controlled Markovian domains [2]. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely [3]. Q-learning has been successfully applied to deep learning by a Google DeepMind team in playing some Atari 2600 games as published in Nature, 2015, dubbed deep reinforcement learning or deep Q-networks [4], soon followed by the spectacular AlphaGo and AlphaZero breakthroughs.

Temporal Difference Learning

see main page Temporal Difference Learning

Q-learning at its simplest uses tables to store data. This very quickly loses viability with increasing sizes of state/action space of the system it is monitoring/controlling. One solution to this problem is to use an (adapted) artificial neural network as a function approximator, as demonstrated by Gerald Tesauro in his Backgammon playing temporal difference learning research [5] [6].

Temporal Difference Learning is a prediction method primarily used for reinforcement learning. In the domain of computer games and computer chess, TD learning is applied through self play, subsequently predicting the probability of winning a game during the sequence of moves from the initial position until the end, to adjust weights for a more reliable prediction.

See also

UCT

Selected Publications

1954 ...

1960 ...

1970 ...

1980 ...

1990 ...

1995 ...

2000 ...

2005 ...

2010 ...

2011

2012

István Szita (2012). Reinforcement Learning in Games. Chapter 17

2013

2014

2015 ...

2016

2017

2018

2019

Postings

External Links

Reinforcement Learning

MDP

Q-Learning

Courses

  1. Lecture 1: Introduction to Reinforcement Learning
  2. Lecture 2: Markov Decision Process
  3. Lecture 3: Planning by Dynamic Programming
  4. Lecture 4: Model-Free Prediction
  5. Lecture 5: Model Free Control
  6. Lecture 6: Value Function Approximation
  7. Lecture 7: Policy Gradient Methods
  8. Lecture 8: Integrating Learning and Planning
  9. Lecture 9: Exploration and Exploitation
  10. Lecture 10: Classic Games

References

  1. Example of a simple Markov decision processes with three states (green circles) and two actions (orange circles), with two rewards (orange arrows), image by waldoalvarez CC BY-SA 4.0, Wikimedia Commons
  2. Q-learning from Wikipedia
  3. Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
  4. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, Vol. 518
  5. Q-learning from Wikipedia
  6. Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3
  7. University of Bristol - Department of Computer Science - Technical Reports
  8. Ms. Pac-Man from Wikipedia
  9. Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 22, 2015
  10. Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents
  11. DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
  12. ICANN 2016 | Recipients of the best paper awards
  13. AlphaGo Zero: Learning from scratch by Demis Hassabis and David Silver, DeepMind, October 18, 2017
  14. AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018
  15. open_spiel/contributing.md at master · deepmind/open_spiel · GitHub
  16. New DeepMind paper by GregNeto, CCC, November 21, 2019
  17. e: Board adaptive / tuning evaluation function - no NN/AI by Tony P., CCC, January 15, 2020

Up one Level