Difference between revisions of "Richard Sutton"

From Chessprogramming wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:
  
 
'''Richard Stuart Sutton''',<br/>
 
'''Richard Stuart Sutton''',<br/>
an American computer scientist and [[Artificial Intelligence|AI]]-researcher. Since 2003, Richard S. Sutton is a professor in the Department of Computing Science <ref>[http://www.cs.ualberta.ca/ Home | Department of Computing Science]</ref> at the [[University of Alberta]] and is principal investigator of the [[Reinforcement Learning]] and [[Artificial Intelligence]] (RLAI) <ref>[http://rlai.cs.ualberta.ca/RLAI/ualberta.html Reinforcement Learning and Artificial Intelligence (RLAI)]</ref> group. Rich's research interests center on the [[Learning|learning]] problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is the author of the original paper on [[Temporal Difference Learning]] <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 3, No. 1</ref> and, with [[Andrew Barto]], of the textbook ''Reinforcement Learning: An Introduction'' <ref>[http://incompleteideas.net/book/the-book.html Reinforcement Learning: An Introduction] ebook by Richard Sutton and [[Andrew Barto]]</ref> . He is also interested in animal learning psychology, in [https://en.wikipedia.org/wiki/Connectionism connectionist] networks, and generally in systems that continually improve their representations and models of the world <ref>[http://incompleteideas.net/BriefBio.html Brief Biography for Richard Sutton]</ref> .  
+
an American computer scientist and [[Artificial Intelligence|AI]]-researcher. Since 2003, Richard S. Sutton is a professor in the Department of Computing Science <ref>[http://www.cs.ualberta.ca/ Home | Department of Computing Science]</ref> at the [[University of Alberta]] and is principal investigator of the [[Reinforcement Learning]] and [[Artificial Intelligence]] (RLAI) <ref>[http://rlai.cs.ualberta.ca/RLAI/ualberta.html Reinforcement Learning and Artificial Intelligence (RLAI)]</ref> group. Rich's research interests center on the [[Learning|learning]] problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is the author of the original paper on [[Temporal Difference Learning]] <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 3, No. 1</ref> and, with [[Andrew Barto]], of the textbook ''Reinforcement Learning: An Introduction'' <ref>[http://incompleteideas.net/book/the-book.html Reinforcement Learning: An Introduction] ebook by Richard Sutton and [[Andrew Barto]]</ref> . He is also interested in animal learning psychology, in [https://en.wikipedia.org/wiki/Connectionism connectionist] networks, and generally in systems that continually improve their representations and models of the world <ref>[http://incompleteideas.net/BriefBio.html Brief Biography for Richard Sutton]</ref> .  
  
 
=Selected Publications=  
 
=Selected Publications=  
Line 16: Line 16:
 
==1990 ...==  
 
==1990 ...==  
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time-Derivative Models of Pavlovian Reinforcement''. in [http://node.realityspline.net/ari/work/neuro/people/showpeople.php?person=faculty/mgabriel.php Michael Gabriel], [http://people.umass.edu/jwmoore/people.htm#JWMoore John Moore] (eds.) ('''1990'''). ''Learning and Computational Neuroscience: Foundations of Adaptive Networks''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time-Derivative Models of Pavlovian Reinforcement''. in [http://node.realityspline.net/ari/work/neuro/people/showpeople.php?person=faculty/mgabriel.php Michael Gabriel], [http://people.umass.edu/jwmoore/people.htm#JWMoore John Moore] (eds.) ('''1990'''). ''Learning and Computational Neuroscience: Foundations of Adaptive Networks''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
* [[Doina Precup]], [[Richard Sutton]] ('''1997'''). ''Multi-time Models for Temporally Abstract Planning''. [http://dblp.uni-trier.de/db/conf/nips/nips1997.html#PrecupS97 NIPS 1997]
+
* [[Doina Precup]], [[Richard Sutton]] ('''1997'''). ''Multi-time Models for Temporally Abstract Planning''. [https://dblp.uni-trier.de/db/conf/nips/nips1997.html#PrecupS97 NIPS 1997], [https://papers.nips.cc/paper/1997/file/a9be4c2a4041cadbf9d61ae16dd1389e-Paper.pdf pdf]
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[http://incompleteideas.net/book/the-book.html Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[http://incompleteideas.net/book/the-book.html Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 +
* [[Richard Sutton]], [[Doina Precup]], [[Mathematician#SSingh|Satinder Singh]] ('''1999'''). ''Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_(journal) Artificial Intelligence], Vol. 112,  [https://people.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf pdf]
 
==2000 ...==  
 
==2000 ...==  
 
* [[Michael L. Littman]], [[Richard Sutton]], [[Mathematician#SSingh|Satinder Singh]] ('''2001'''). ''Predictive Representations of State''. [http://dblp.uni-trier.de/db/conf/nips/nips2001.html#LittmanSS01 NIPS 2001], [http://web.eecs.umich.edu/~baveja/Papers/psr.pdf pdf]
 
* [[Michael L. Littman]], [[Richard Sutton]], [[Mathematician#SSingh|Satinder Singh]] ('''2001'''). ''Predictive Representations of State''. [http://dblp.uni-trier.de/db/conf/nips/nips2001.html#LittmanSS01 NIPS 2001], [http://web.eecs.umich.edu/~baveja/Papers/psr.pdf pdf]
 
* [[Richard Sutton]], [http://dblp.uni-trier.de/pers/hd/t/Tanner:Brian Brian Tanner] ('''2005'''). ''Temporal-Difference Networks''. Advances in Neural Information Processing Systems 17
 
* [[Richard Sutton]], [http://dblp.uni-trier.de/pers/hd/t/Tanner:Brian Brian Tanner] ('''2005'''). ''Temporal-Difference Networks''. Advances in Neural Information Processing Systems 17
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2007'''). ''Reinforcement learning of local shape in the game of Go''. [[Conferences#IJCAI2007|20th IJCAI]], [http://webdocs.cs.ualberta.ca/~mmueller/ps/silver-ijcai2007.pdf pdf]
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2007'''). ''Reinforcement learning of local shape in the game of Go''. [[Conferences#IJCAI2007|20th IJCAI]], [http://webdocs.cs.ualberta.ca/~mmueller/ps/silver-ijcai2007.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''.
+
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf]
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2008'''). ''Sample-Based Learning and Search with Permanent and Transient Memories''. In Proceedings of the 25th International Conference on Machine Learning, [http://icml2008.cs.helsinki.fi/papers/564.pdf pdf]
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2008'''). ''Sample-Based Learning and Search with Permanent and Transient Memories''. In Proceedings of the 25th International Conference on Machine Learning, [http://icml2008.cs.helsinki.fi/papers/564.pdf pdf]
 
* [[Maria Cutumisu]], [[Michael Bowling]], [[Duane Szafron]], [[Richard Sutton]] ('''2008'''). ''Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games''. [https://www.aaai.org/Library/AIIDE/aiide08contents.php Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference], [https://webdocs.cs.ualberta.ca/~duane/publications/pdf/2008aiide.pdf pdf]
 
* [[Maria Cutumisu]], [[Michael Bowling]], [[Duane Szafron]], [[Richard Sutton]] ('''2008'''). ''Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games''. [https://www.aaai.org/Library/AIIDE/aiide08contents.php Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference], [https://webdocs.cs.ualberta.ca/~duane/publications/pdf/2008aiide.pdf pdf]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' NIPS 22, MIT Press
+
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], [https://papers.nips.cc/paper/2009/file/3a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. ICML-09, [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
+
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''. [https://dblp.uni-trier.de/db/conf/icml/icml2009.html#SuttonMPBSSW09 ICML 2009]
 
==2010==
 
==2010==
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[http://www.incompleteideas.net/sutton/publications.html#GQ GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. In Proceedings of the Third Conference on Artificial General Intelligence
+
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Richard Sutton]] ('''2010'''). ''Toward Off-Policy Learning Control with Function Approximation''. [https://dblp.uni-trier.de/db/conf/icml/icml2010.html#MaeiSBS10 ICML 2010], [https://icml.cc/Conferences/2010/papers/627.pdf pdf]
 +
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[https://www.researchgate.net/publication/215990384_GQlambda_A_general_gradient_algorithm_for_temporal-difference_prediction_learning_with_eligibility_traces GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. [https://agi-conf.org/2010/ AGI 2010]
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]
 
* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]
 
* [[Huizhen Yu]], [[A. Rupam Mahmood]], [[Richard Sutton]] ('''2017'''). ''On Generalized Bellman Equations and Temporal-Difference Learning''. Canadian Conference on AI 2017, [https://arxiv.org/abs/1704.04463 arXiv:1704.04463]
 
* [[Huizhen Yu]], [[A. Rupam Mahmood]], [[Richard Sutton]] ('''2017'''). ''On Generalized Bellman Equations and Temporal-Difference Learning''. Canadian Conference on AI 2017, [https://arxiv.org/abs/1704.04463 arXiv:1704.04463]
Line 44: Line 46:
 
=References=  
 
=References=  
 
<references />
 
<references />
 
 
'''[[People|Up one level]]'''
 
'''[[People|Up one level]]'''
 +
[[Category:Researcher|Sutton]]

Latest revision as of 13:52, 12 April 2021

Home * People * Richard Sutton

Richard Sutton [1]

Richard Stuart Sutton,
an American computer scientist and AI-researcher. Since 2003, Richard S. Sutton is a professor in the Department of Computing Science [2] at the University of Alberta and is principal investigator of the Reinforcement Learning and Artificial Intelligence (RLAI) [3] group. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is the author of the original paper on Temporal Difference Learning [4] and, with Andrew Barto, of the textbook Reinforcement Learning: An Introduction [5] . He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world [6] .

Selected Publications

[7][8]

1978

  • Richard Sutton (1978). Single channel theory: A neuronal theory of learning. Brain Theory Newsletter 3, No. 3/4

1980 ...

1990 ...

2000 ...

2010

External Links

References

Up one level