Difference between revisions of "Peter Dayan"
GerdIsenberg (talk | contribs) |
GerdIsenberg (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 9: | Line 9: | ||
=Work= | =Work= | ||
− | Peter Dayan's work has been influential in several fields impinging on [[Cognition|cognitive science]], including [[Learning|machine learning]], [https://en.wikipedia.org/wiki/Mathematical_statistics mathematical statistics], [https://en.wikipedia.org/wiki/Neuroscience neuroscience] and [[Psychology|psychology]] - he has articulated a view in which [[Neural Networks|neural computation]] is akin to a [https://en.wikipedia.org/wiki/Bayesian_inference Bayesian inference] process <ref>[https://cognitivesciencesociety.org/rumelhart-prize/ 2012 Recipient Peter Dayan] | [https://en.wikipedia.org/wiki/Rumelhart_Prize The David E. Rumelhart Prize 2012]</ref>. His research centers around [[Supervised Learning|self-supervised learning]], [[Reinforcement Learning|reinforcement learning]], [[Temporal Difference Learning|temporal difference learning]] | + | Peter Dayan's work has been influential in several fields impinging on [[Cognition|cognitive science]], including [[Learning|machine learning]], [https://en.wikipedia.org/wiki/Mathematical_statistics mathematical statistics], [https://en.wikipedia.org/wiki/Neuroscience neuroscience] and [[Psychology|psychology]] - he has articulated a view in which [[Neural Networks|neural computation]] is akin to a [https://en.wikipedia.org/wiki/Bayesian_inference Bayesian inference] process <ref>[https://cognitivesciencesociety.org/rumelhart-prize/ 2012 Recipient Peter Dayan] | [https://en.wikipedia.org/wiki/Rumelhart_Prize The David E. Rumelhart Prize 2012]</ref>. His research centers around [[Supervised Learning|self-supervised learning]], [[Reinforcement Learning|reinforcement learning]], [[Temporal Difference Learning|temporal difference learning]], [https://en.wikipedia.org/wiki/Neural_coding#Population_coding population coding] and [[Monte-Carlo Tree Search|Monte-Carlo tree search]]. He researched and published on [https://en.wikipedia.org/wiki/Q-learning Q-learning] with [[Chris Watkins]] <ref> [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2</ref>, |
− | He researched and published on [https://en.wikipedia.org/wiki/Q-learning Q-learning] with [[Chris Watkins]] <ref> [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2</ref>, | ||
and provided a proof of convergence of [[Temporal Difference Learning#TDLamba|TD(λ)]] for arbitrary λ <ref>[[Peter Dayan]] ('''1992'''). ''[https://link.springer.com/article/10.1023/A:1022632907294 The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3</ref>. | and provided a proof of convergence of [[Temporal Difference Learning#TDLamba|TD(λ)]] for arbitrary λ <ref>[[Peter Dayan]] ('''1992'''). ''[https://link.springer.com/article/10.1023/A:1022632907294 The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3</ref>. | ||
Line 23: | Line 22: | ||
* [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2 | * [[Chris Watkins]], [[Peter Dayan]] ('''1992'''). ''[http://www.gatsby.ucl.ac.uk/~dayan/papers/wd92.html Q-learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 2 | ||
* [[Peter Dayan]] ('''1992'''). ''[https://www.researchgate.net/publication/227208155_The_Convergence_of_TDl_for_General_l The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3 | * [[Peter Dayan]] ('''1992'''). ''[https://www.researchgate.net/publication/227208155_The_Convergence_of_TDl_for_General_l The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3 | ||
− | * [[Peter Dayan]], [[Mathematician#GEHinton|Geoffrey E. Hinton]] ('''1992'''). ''Feudal reinforcement learning''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-5-1992 NIPS 1992 | + | * [[Peter Dayan]], [[Mathematician#GEHinton|Geoffrey E. Hinton]] ('''1992'''). ''[https://papers.nips.cc/paper/714-feudal-reinforcement-learning Feudal reinforcement learning]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-5-1992 NIPS 1992] |
* [[Peter Dayan]] ('''1993'''). ''Improving generalisation for temporal difference learning: The successor representation''. [https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation], Vol. 5, [http://www.gatsby.ucl.ac.uk/~dayan/papers/sr93.pdf pdf] | * [[Peter Dayan]] ('''1993'''). ''Improving generalisation for temporal difference learning: The successor representation''. [https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation], Vol. 5, [http://www.gatsby.ucl.ac.uk/~dayan/papers/sr93.pdf pdf] | ||
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1993'''). ''[https://papers.nips.cc/paper/820-temporal-difference-learning-of-position-evaluation-in-the-game-of-go Temporal Difference Learning of Position Evaluation in the Game of Go]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-6-1993 NIPS 1993] | * [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1993'''). ''[https://papers.nips.cc/paper/820-temporal-difference-learning-of-position-evaluation-in-the-game-of-go Temporal Difference Learning of Position Evaluation in the Game of Go]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-6-1993 NIPS 1993] |
Latest revision as of 20:35, 11 June 2019
Peter Dayan,
a British mathematician, computer scientist and neuroscientist, and director at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany, since early 2019 also affiliated with the SMARTStart training program of the Bernstein Network Computational Neuroscience [2] [3]. From 1998 until 2018, he was professor of computational neuroscience at University College London, and director of UCL's Gatsby Computational Neuroscience Unit [4].
Peter Dayan obtained a B.Sc. in mathematics from University of Cambridge and a Ph.D. in artificial intelligence from University of Edinburgh under David Wallace, which focused on Bayesian network and neural network models of machine learning [5]. He was postdoctoral researcher at the Salk Institute for Biological Studies working with Terrence J. Sejnowski, and at the University of Toronto with Geoffrey E. Hinton, and was further assistant professor at MIT before relocating to UCL.
Contents
Work
Peter Dayan's work has been influential in several fields impinging on cognitive science, including machine learning, mathematical statistics, neuroscience and psychology - he has articulated a view in which neural computation is akin to a Bayesian inference process [6]. His research centers around self-supervised learning, reinforcement learning, temporal difference learning, population coding and Monte-Carlo tree search. He researched and published on Q-learning with Chris Watkins [7], and provided a proof of convergence of TD(λ) for arbitrary λ [8].
Learning Go
Along with Nicol N. Schraudolph and Terrence J. Sejnowski, Peter Dayan worked and published on temporal difference learning to evaluate positions in Go [9] [10].
Selected Publications
1990 ...
- Peter Dayan (1990). Navigating Through Temporal Difference. NIPS 1990
- Peter Dayan (1991). Reinforcing Connectionism: Learning the Statistical Way. Ph.D. thesis, University of Edinburgh
- Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
- Peter Dayan (1992). The convergence of TD (λ) for general λ. Machine Learning, Vol. 8, No. 3
- Peter Dayan, Geoffrey E. Hinton (1992). Feudal reinforcement learning. NIPS 1992
- Peter Dayan (1993). Improving generalisation for temporal difference learning: The successor representation. Neural Computation, Vol. 5, pdf
- Nicol N. Schraudolph, Peter Dayan, Terrence J. Sejnowski (1993). Temporal Difference Learning of Position Evaluation in the Game of Go. NIPS 1993
- Peter Dayan, Terrence J. Sejnowski (1994). TD(λ) converges with Probability 1. Machine Learning, Vol. 14, No. 1, pdf
- Peter Dayan, Terrence J. Sejnowski (1996). Exploration Bonuses and Dual Control. Machine Learning, Vol. 25, No. 1, pdf
- Peter Dayan (1999). Recurrent Sampling Models for the Helmholtz Machine. Neural Computation, Vol. 11, No. 3, pdf [12]
2000 ...
- Nicol N. Schraudolph, Peter Dayan, Terrence J. Sejnowski (2001). Learning to Evaluate Go Positions via Temporal Difference Methods. Computational Intelligence in Games, Studies in Fuzziness and Soft Computing. Physica-Verlag, pdf
- Peter Dayan, Laurence F. Abbott (2001, 2005). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press
- Peter Dayan (2008). Load and Attentional Bayes. NIPS 2008
2010 ...
- Peter Dayan (2012). How to set the switches on this thing. Current Opinion in Neurobiology, Vol. 22, pdf
- Arthur Guez, David Silver, Peter Dayan (2012). Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search. NIPS 2012
- Arthur Guez, David Silver, Peter Dayan (2012). Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search. arXiv:1205.3109
- Arthur Guez, David Silver, Peter Dayan (2013). Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search. Journal of Artificial Intelligence Research, Vol. 48
- Arthur Guez, David Silver, Peter Dayan (2014). Better Optimism By Bayes: Adaptive Planning with Rich Models. arXiv:1402.1958v1
- Arthur Guez, Nicolas Heess, David Silver, Peter Dayan (2014). Bayes-Adaptive Simulation-based Search with Value Function Approximation. NIPS 2014, pdf
- Jack W. Rae, Chris Dyer, Peter Dayan, Timothy Lillicrap (2018). Fast Parametric Learning with Activation Memorization. arXiv:1803.10049
- Sanjeevan Ahilan, Peter Dayan (2018). Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. arXiv:1901.08492
External Links
- Peter Dayan from Wikipedia
- Peter Dayan and Li Zhaoping join the faculty — SMART START, January 28, 2019
- Peter Dayan and Li Zhaoping appointed to the Max Planck Institute for Biological Cybernetics | Max Planck Society, September 25, 2018
- Gatsby Computational Neuroscience Unit | Professor Peter Dayan
References
- ↑ Peter Dayan at the Royal Society admissions day, London, July 13, 2018, by Duncan.Hull, Wikimedia Commons
- ↑ Peter Dayan and Li Zhaoping join the faculty — SMART START, January 28, 2019
- ↑ SMARTStart — Bernstein Netzwerk Computational Neuroscience
- ↑ Gatsby Computational Neuroscience Unit | Professor Peter Dayan
- ↑ Peter Dayan (1991). Reinforcing Connectionism: Learning the Statistical Way. Ph.D. thesis, University of Edinburgh
- ↑ 2012 Recipient Peter Dayan | The David E. Rumelhart Prize 2012
- ↑ Chris Watkins, Peter Dayan (1992). Q-learning. Machine Learning, Vol. 8, No. 2
- ↑ Peter Dayan (1992). The convergence of TD (λ) for general λ. Machine Learning, Vol. 8, No. 3
- ↑ Nicol N. Schraudolph, Peter Dayan, Terrence J. Sejnowski (1993). Temporal Difference Learning of Position Evaluation in the Game of Go. NIPS 1993
- ↑ Nici Schraudolph’s go networks, review by Jay Scott
- ↑ dblp: Peter Dayan
- ↑ Helmholtz machine from Wikipedia