Changes

Jump to: navigation, search

Reinforcement Learning

42 bytes added, 16:55, 4 July 2020
no edit summary
=Q-Learning=
Q-Learning, introduced by [[Chris Watkins]] in 1989, is a simple way for [https://en.wikipedia.org/wiki/Intelligent_agent agents] to learn how to act optimally in controlled Markovian domains <ref>[https://en.wikipedia.org/wiki/Q-learning Q-learning from Wikipedia]</ref>. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely <ref>[[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2</ref>. Q-learning has been successfully applied to [[Deep Learning|deep learning]] by a [[Google]] [[DeepMind]] team in playing some [[Atari 8-bit|Atari 2600]] [https://en.wikipedia.org/wiki/List_of_Atari_2600_games games] as published in [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], 2015, dubbed ''deep reinforcement learning'' or ''deep Q-networks'' <ref>[[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Mathematician#AARusu|Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518</ref>, soon followed by the spectacular [[AlphaGo]] and [[AlphaZero]] breakthroughs.
=Temporal Difference Learning=
* [[Marcin Szubert]] ('''2014'''). ''Coevolutionary Shaping for Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Pozna%C5%84_University_of_Technology Poznań University of Technology], supervisor [[Krzysztof Krawiec]], co-supervisor [[Wojciech Jaśkowski]], [http://www.cs.put.poznan.pl/mszubert/pub/phdthesis.pdf pdf]
==2015 ...==
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Mathematician#AARusu|Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518
* [[Tobias Graf]], [[Marco Platzner]] ('''2015'''). ''Adaptive Playouts in Monte Carlo Tree Search with Policy Gradient Reinforcement Learning''. [[Advances in Computer Games 14]]
* [[Arun Nair]], [[Praveen Srinivasan]], [[Sam Blackwell]], [[Cagdas Alcicek]], [[Rory Fearon]], [[Alessandro De Maria]], [[Veda Panneershelvam]], [[Mustafa Suleyman]], [[Charles Beattie]], [[Stig Petersen]], [[Shane Legg]], [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]] ('''2015'''). ''Massively Parallel Methods for Deep Reinforcement Learning''. [http://arxiv.org/abs/1507.04296 arXiv:1507.04296]

Navigation menu