Changes

Jump to: navigation, search

Temporal Difference Learning

130 bytes added, 19:57, 11 June 2019
no edit summary
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time-Derivative Models of Pavlovian Reinforcement''. in [http://node.realityspline.net/ari/work/neuro/people/showpeople.php?person=faculty/mgabriel.php Michael Gabriel], [http://people.umass.edu/jwmoore/people.htm#JWMoore John Moore] (eds.) ('''1990'''). ''Learning and Computational Neuroscience: Foundations of Adaptive Networks''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-barto-90.pdf pdf]
* [http://dblp.uni-trier.de/pers/hd/y/Yee:Richard_C= Richard C. Yee], [http://dblp.uni-trier.de/pers/hd/s/Saxena:Sharad Sharad Saxena], [[Paul E. Utgoff]], [[Andrew Barto]] ('''1990'''). ''Explaining Temporal Differences to Create Useful Concepts for Evaluating States''. [http://dblp.uni-trier.de/db/conf/aaai/aaai90.html#YeeSUB90 AAAI 1990], [http://www.aaai.org/Papers/AAAI/1990/AAAI90-132.pdf pdf]
* [[Peter Dayan]] ('''1990'''). ''[https://papers.nips.cc/paper/428-navigating-through-temporal-difference Navigating Through Temporal Difference]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-3-1990 NIPS 1990], [https://papers.nips.cc/paper/428-navigating-through-temporal-difference.pdf pdf]
* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]
* [[Peter Dayan]] ('''1992'''). ''[https://www.researchgate.net/publication/227208155_The_Convergence_of_TDl_for_General_l The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3
* [[Gerald Tesauro]] ('''1992'''). ''[http://dl.acm.org/citation.cfm?id=139616 Practical Issues in Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 8, Nos. 3-4
* [[Michael Gherrity]] ('''1993'''). ''A Game Learning Machine''. Ph.D. thesis, [https://de.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego], advisor [[Mathematician#PKube|Paul Kube]], [http://www.gherrity.org/thesis.pdf pdf], [http://www.top-5000.nl/ps/A%20game%20learning%20machine.pdf pdf]
* [[Peter Dayan]] ('''1993'''). ''Improving generalisation for temporal difference learning: The successor representation''. [https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation], Vol. 5, [http://www.gatsby.ucl.ac.uk/~dayan/papers/sr93.pdf pdf]
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1993'''). ''[https://papers.nips.cc/paper/820-temporal-difference-learning-of-position-evaluation-in-the-game-of-go Temporal Difference Learning of Position Evaluation in the Game of Go]''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-6-1993 NIPS 1993] <ref>[http://satirist.org/learn-game/systems/go-net.html Nici Schraudolph’s go networks], review by [[Jay Scott]]</ref>
* [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1994'''). ''TD(λ) converges with Probability 1''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 14, No. 1, [https://www.researchgate.net/profile/Terrence_Sejnowski/publication/228392650_TD_X_Converges_with_Probability/links/54a4afea0cf256bf8bb327a9.pdf?origin=publication_detail pdf]
==1995 ...==
* [[Jonathan Schaeffer]], [[Markian Hlynka]], [[Vili Jussila]] ('''2001'''). ''Temporal Difference Learning Applied to a High-Performance Game-Playing Program''. [http://www.informatik.uni-trier.de/~ley/db/conf/ijcai/ijcai2001.html#SchaefferHJ01 IJCAI 2001]
* [[Don Beal]], [[Martin C. Smith]] ('''2001'''). ''[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V1G-41MJ1SV-7&_user=10&_coverDate=02%2F06%2F2001&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1436661548&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d855cbad10953476dbb92258347c8e94 Temporal difference learning applied to game playing and the results of application to Shogi]''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_%28journal%29 Theoretical Computer Science], Vol. 252, Nos. 1-2
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''2001'''). ''[httphttps://niclink.schraudolphspringer.orgcom/bib2htmlchapter/b2hd10.1007/978-SchDaySej01.html 3-7908-1833-8_4 Learning to Evaluate Go Positions via Temporal Difference Methods]''. [http://jasss.soc.surrey.ac.uk/7/1/reviews/takama.html Computational Intelligence in Games, Studies in Fuzziness and Soft Computing], . [http://www.springer.com/economics?SGWID=1-165-6-73481-0 Physica-Verlag], [https://papers.cnl.salk.edu/PDFs/Learning%20to%20Evaluate%20Go%20Positions%20Via%20Temporal%20Difference%20Methods%202001-3244.pdf pdf]
* [[Lex Weaver]], [[Jonathan Baxter]] ('''2001'''). ''STD (λ): learning state differences with TD (λ)''. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.7737 CiteSeerX]
'''2002'''
=References=
<references />
 
'''[[Learning|Up one level]]'''
[[Category:Jonas Hellborg]]
[[Category:Shawn Lane]]
[[Category:Jeff Sipe]]

Navigation menu