Changes

Newer edit →

Temporal Difference Learning

42,011 bytes added, 09:54, 19 May 2018

Created page with "'''Home * Learning * Temporal Difference Learning''' '''Temporal Difference Learning''', (TD learning) is a machine learning method applied to multi-st..."

'''[[Main Page|Home]] * [[Learning]] * Temporal Difference Learning'''

'''Temporal Difference Learning''', (TD learning) 
is a machine learning method applied to multi-step prediction problems. As a prediction method primarily used for [[Reinforcement Learning|reinforcement learning]], TD learning takes into account the fact that subsequent predictions are often correlated in some sense, while in [[Supervised Learning|supervised learning]], one learns only from actually observed values. TD resembles [https://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo methods] with [[Dynamic Programming|dynamic programming]] techniques <ref>[https://en.wikipedia.org/wiki/Temporal_difference_learning Temporal difference learning from Wikipedia]</ref>. In the domain of computer games and computer chess, TD learning is applied through self play, subsequently predicting the [https://en.wikipedia.org/wiki/Probability probability] of winning a [[Chess Game|game]] during the sequence of [[Moves|moves]] from the [[Initial Position|initial position]] until the end, to adjust weights for a more reliable prediction.

=Prediction=
Each prediction is a single number, derived from a formula using adjustable weights of features, for instance a [[Neural Networks|neural network]] most simply a single neuron [[Neural Networks#Perceptron|perceptron]], that is a linear [[Evaluation|evaluation]] function ...
[[File:TDLForula1.jpg|none|text-bottom]]
[[FILE:sigDeri.jpg|right|thumb]]
... with the [[Pawn Advantage, Win Percentage, and Elo|pawn advantage]] converted to a [[Pawn Advantage, Win Percentage, and Elo|winning probability]] by the standard [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid squashing function], also topic in [[Automated Tuning#LogisticRegression|logistic regression]] in the domain of [[Supervised Learning|supervised learning]] and [[Automated Tuning|automated tuning]], ...
[[File:TDLForula2.jpg|none|text-bottom]]
... which has the advantage of its simple [https://en.wikipedia.org/wiki/Derivative derivative]:
[[File:TDLForula3.jpg|none|text-bottom]]

=TD(λ)=
Each pair of [https://en.wiktionary.org/wiki/temporal temporally] successive predictions P at time step t and t+1 gives rise to a recommendation for weight changes, to converge Pt to Pt+1, first applied in the late 50s by [[Arthur Samuel]] in his [[Checkers]] player for [[Automated Tuning|automamated evaluation tuning]] <ref> [[Arthur Samuel]] ('''1959'''). ''[http://domino.watson.ibm.com/tchjr/journalindex.nsf/600cc5649e2871db852568150060213c/39a870213169f45685256bfa00683d74!OpenDocument Some Studies in Machine Learning Using the Game of Checkers]''. IBM Journal July 1959</ref>. This TD method was improved, generalized and formalized by [[Richard Sutton]] et al. in the 80s, the term ''Temporal Difference Learning'' coined in 1988 <ref>[[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 3, No. 1, [https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88-with-erratum.pdf pdf]</ref>, also introducing the decay or recency parameter '''λ''', where proportions of the score came from the outcome of [https://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo] simulated games, tapering between [https://en.wikipedia.org/wiki/Bootstrapping#Artificial_intelligence_and_machine_learning bootstrapping] (λ = 0) and Monte Carlo predictions (λ = 1), the latter equivalent to [https://en.wikipedia.org/wiki/Gradient_descent gradient descent] on the [https://en.wikipedia.org/wiki/Mean_squared_error mean squared error function]. Weight adjustments in TD(λ) are made according to ...
[[File:TDLForula4.jpg|none|text-bottom]]
... where P is the series of temporally successive predictions, w the set of adjustable weights. α is a parameter controlling the learning rate, also called step-size, ∇wPk <ref>[https://en.wikipedia.org/wiki/Nabla_symbol Nabla symbol from Wikipedia]</ref> is the [https://en.wikipedia.org/wiki/Gradient gradient], the vector of [https://en.wikipedia.org/wiki/Partial_derivative partial derivatives] of Pt with respect of w. The process may be applied to any initial set of weights. Learning performance depends on λ and α, which have to be chosen appropriately for the domain. In principle, TD(λ) weight adjustments may be made after each move, or at any arbitrary interval. For game playing tasks the end of every game is a convenient point to actually alter the evaluation weights <ref>[[Don Beal]], [[Martin C. Smith]] ('''1998'''). ''[http://www.springerlink.com/content/l9f4ngc2tqgnac9e/ First Results from Using Temporal Difference Learning in Shogi]''. [[CG 1998]]</ref>.

TD(λ) was famously applied by [[Gerald Tesauro]] in his [[Backgammon]] program [https://en.wikipedia.org/wiki/TD-Gammon TD-Gammon] <ref>[[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]</ref> <ref>[[Gerald Tesauro]] ('''1994'''). ''TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play''. [http://www.informatik.uni-trier.de/~ley/db/journals/neco/neco6.html#Tesauro94 Neural Computation Vol. 6, No. 2]</ref>, a stochastic game picking the action whose successor state minimizes the opponent's expected reward, i.e. [[Search|looking]] one [[Ply|ply]] ahead.

=TDLeaf(λ)=
In games like chess or [[Othello]], due to their [[Tactics|tactical]] nature, [[Search|deep searches]] are necessary for expert performance. The problem has already been recognized and solved by [[Arthur Samuel]] but seemed to have been forgotten later on <ref>[[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]</ref> - rediscovered independently by [[Don Beal]] and [[Martin C. Smith]] in 1997 <ref>[[Don Beal]], [[Martin C. Smith]] ('''1997'''). ''Learning Piece Values Using Temporal Differences''. [[ICGA Journal#20_3|ICCA Journal, Vol. 20, No. 3]]</ref>, and by [[Jonathan Baxter]], [[Andrew Tridgell]], and [[Lex Weaver]] in 1998 <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]</ref>, who coined the term TD-Leaf. TD-Leaf is the adaption of TD(λ) to [[Minimax|minimax]] search, where instead of the corresponding [[Chess Position|positions]] of the [[Root|root]] the [[Leaf Node|leaf nodes]] of the [[Principal variation|principal variation]] are considered in the weight adjustments. TD-Leaf was successfully used in [[Automated Tuning|evaluation tuning]] of chess programs <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]</ref>, with [[KnightCap]] <ref>[[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search''. Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]</ref> and [[CilkChess]] as most prominent samples, while the latter used the improved '''Temporal Coherence Learning''' <ref>[http://supertech.csail.mit.edu/chess/ The Cilkchess Parallel Chess Program]</ref>, which automatically adjusts α and λ <ref>[[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''[http://portal.acm.org/citation.cfm?id=1624299 Temporal Coherence and Prediction Decay in TD Learning]''. [[Conferences#IJCAI1999|IJCAI 1999]], [http://ijcai.org/Past%20Proceedings/IJCAI-99-VOL-1/PDF/081.pdf pdf]</ref>.

=Quotes=
==Don Beal==
[[Don Beal]] in a 1998 [[CCC]] discussion with [[Jonathan Baxter]] <ref> [https://www.stmintz.com/ccc/index.php?id=28819 Re: Parameter Tuning] by [[Don Beal]], [[CCC]], October 02, 1998</ref>:
With fixed learning rates (aka step size) we found [[Point Value|piece values]] settle to consistent relative ordering in around 500 self-play games. The ordering remains in place despite considerable erratic movements. But [[Piece-Square Tables|piece-square values]] can take a lot longer - more like 5000.

The learning rate is critical - it has to be as large as one dares for fast learning, but low for stable values. We've been experimenting with methods for automatically adjusting the learning rate. (Higher rates if the adjustments go in the same direction, lower if they keep changing direction.)

The other problem is learning weights for terms which only occur rarely. Then the learning process doesn't see enough examples to settle on good weights in a reasonable time. I suspect this is the main limitation of the method, but it may be possible to devise ways to generate extra games which exercise the rare conditions.

==Bas Hamstra==
[[Bas Hamstra]] in a 2002 [[CCC]] discussion on TD learning <ref>[https://www.stmintz.com/ccc/index.php?id=244085 Re: Hello from Edmonton (and on Temporal Differences)] by [[Bas Hamstra]], [[CCC]], August 05, 2002</ref>:
I have played with it. I am convinced it has possibilities, but one problem I encountered was the cause-effect problem. For say I am a piece down. After I lost the game TD will conclude that the winner had better [[Mobility|mobility]] and will tune it up. However worse mobility was not the '''cause''' of the loss, it was the '''effect''' of simply being a piece down. In my case it kept tuning mobility up and up until ridiculous values.

==Don Dailey==
[[Don Dailey]] in a reply <ref>[http://www.talkchess.com/forum/viewtopic.php?t=37062&start=2 Re: Positional learning] by [[Don Dailey]], [[CCC]], December 13, 2010</ref> to [[Ben-Hur Carlos Vieira Langoni Junior]], [[CCC]], December 2010 <ref>[http://www.talkchess.com/forum/viewtopic.php?t=37062 Positional learning] by [[Ben-Hur Carlos Vieira Langoni Junior]], [[CCC]], December 13, 2010</ref> :
Another approach that may be more in line with what you want is called "temporal difference learning", and it is based on feedback from each move to the move that precedes it. For example if you play move 35 and the program thinks the position is equal, but then on move 36 you find that you are winning a pawn, it indicates that the evaluation of move 35 is in error, the position is better than the program thought it was. Little tiny incremental adjustments are made to the evaluation function so that it is ever so slightly biased in favor of being slightly more positive in this case, or slightly more negative in the case where you find your score is dropping. This is done recursively back through the moves of the game so that winning the game gives some credit to all the positions of the game. Look on the web and read up on the "credit assignment problem" and temporal difference learning. It's probably ideal for what you are looking for. It can be done at the end of the game one time and scores then updated. If you are not using [[Float|floating point]] evaluation you may have to figure out how to modify this to be workable.

=Chess Programs=
* [[CilkChess]]
* [[EXchess]]
* [[FUSCsharp|FUSc#]]
* [[Giraffe]]
* [[Green Light Chess]]
* [[KnightCap]]
* [[Meep]]
: [[Meep#RootStrap|RootStrap]]
: [[Meep#TreeStrap|TreeStrap]]
* [[Morph]]
* [[NeuroChess]]
* [[SAL]]
* [[Tao]] <ref>[https://www.stmintz.com/ccc/index.php?id=149645 Tao update] by [[Bas Hamstra]], [[CCC]], January 12, 2001</ref>
* [[TDChess]]

=See also=
* [[Automated Tuning]]
* [[Backgammon]]
* [[Deep Learning]]
* [[Evaluation]]
* [[Neural Networks]]

=Publications=
==1959==
* [[Arthur Samuel]] ('''1959'''). ''[http://domino.watson.ibm.com/tchjr/journalindex.nsf/600cc5649e2871db852568150060213c/39a870213169f45685256bfa00683d74!OpenDocument Some Studies in Machine Learning Using the Game of Checkers]''. IBM Journal July 1959
==1970 ...==
* [[A. Harry Klopf]] ('''1972'''). ''Brain Function and Adaptive Systems - A Heterostatic Theory''. [https://en.wikipedia.org/wiki/Air_Force_Cambridge_Research_Laboratories Air Force Cambridge Research Laboratories], Special Reports, No. 133, [http://www.dtic.mil/dtic/tr/fulltext/u2/742259.pdf pdf]
* [[Mathematician#Holland|John H. Holland]] ('''1975'''). ''Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence''. [http://www.amazon.com/Adaptation-Natural-Artificial-Systems-Introductory/dp/0262581116 amazon.com]
==1980 ...==
* [[Richard Sutton]] ('''1984'''). ''[http://scholarworks.umass.edu/dissertations/AAI8410337/ Temporal Credit Assignment in Reinforcement Learning]''. Ph.D. dissertation, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts]
* [[Jens Christensen]] ('''1986'''). ''[http://link.springer.com/chapter/10.1007/978-1-4613-2279-5_9?no-access=true Learning Static Evaluation Functions by Linear Regression]''. in [[Tom Mitchell]], [[Jaime Carbonell]], [[Ryszard Michalski]] ('''1986'''). ''[http://link.springer.com/book/10.1007/978-1-4613-2279-5 Machine Learning: A Guide to Current Research]''. The Kluwer International Series in Engineering and Computer Science, Vol. 12
* [[Richard Sutton]] ('''1988'''). ''Learning to Predict by the Methods of Temporal Differences''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 3, No. 1, [https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88-with-erratum.pdf pdf]
==1990 ...==
* [[Richard Sutton]], [[Andrew Barto]] ('''1990'''). ''Time-Derivative Models of Pavlovian Reinforcement''. in [http://node.realityspline.net/ari/work/neuro/people/showpeople.php?person=faculty/mgabriel.php Michael Gabriel], [http://people.umass.edu/jwmoore/people.htm#JWMoore John Moore] (eds.) ('''1990'''). ''Learning and Computational Neuroscience: Foundations of Adaptive Networks''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-barto-90.pdf pdf]
* [http://dblp.uni-trier.de/pers/hd/y/Yee:Richard_C= Richard C. Yee], [http://dblp.uni-trier.de/pers/hd/s/Saxena:Sharad Sharad Saxena], [[Paul E. Utgoff]], [[Andrew Barto]] ('''1990'''). ''Explaining Temporal Differences to Create Useful Concepts for Evaluating States''. [http://dblp.uni-trier.de/db/conf/aaai/aaai90.html#YeeSUB90 AAAI 1990], [http://www.aaai.org/Papers/AAAI/1990/AAAI90-132.pdf pdf]
* [[Peter Dayan]] ('''1990'''). ''Navigating Through Temporal Difference''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-3-1990 NIPS 1990], [https://papers.nips.cc/paper/428-navigating-through-temporal-difference.pdf pdf]
* [[Gerald Tesauro]] ('''1992'''). ''Temporal Difference Learning of Backgammon Strategy''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/ml1992.html#Tesauro92 ML 1992]
* [[Peter Dayan]] ('''1992'''). ''[https://www.researchgate.net/publication/227208155_The_Convergence_of_TDl_for_General_l The convergence of TD (λ) for general λ]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3
* [[Gerald Tesauro]] ('''1992'''). ''[http://dl.acm.org/citation.cfm?id=139616 Practical Issues in Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 8, Nos. 3-4
* [[Michael Gherrity]] ('''1993'''). ''A Game Learning Machine''. Ph.D. thesis, [https://de.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego], advisor [[Mathematician#PKube|Paul Kube]], [http://www.gherrity.org/thesis.pdf pdf], [http://www.top-5000.nl/ps/A%20game%20learning%20machine.pdf pdf]
* [[Peter Dayan]] ('''1993'''). ''Improving generalisation for temporal difference learning: The successor representation''. [https://en.wikipedia.org/wiki/Neural_Computation_(journal) Neural Computation], Vol. 5, [http://www.gatsby.ucl.ac.uk/~dayan/papers/sr93.pdf pdf]
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1994'''). ''[http://nic.schraudolph.org/bib2html/b2hd-SchDaySej94.html Temporal Difference Learning of Position Evaluation in the Game of Go]''. [http://papers.nips.cc/book/advances-in-neural-information-processing-systems-6-1993 Advances in Neural Information Processing Systems 6]
* [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''1994'''). ''TD(λ) converges with Probability 1''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 14, No. 1, [https://www.researchgate.net/profile/Terrence_Sejnowski/publication/228392650_TD_X_Converges_with_Probability/links/54a4afea0cf256bf8bb327a9.pdf?origin=publication_detail pdf]
==1995 ...==
* [[Anton Leouski]] ('''1995'''). ''Learning of Position Evaluation in the Game of Othello''. Master's Project, [https://en.wikipedia.org/wiki/University_of_Massachusetts University of Massachusetts], [https://en.wikipedia.org/wiki/Amherst,_Massachusetts Amherst, Massachusetts], [http://people.ict.usc.edu/~leuski/publications/papers/UM-CS-1995-023.pdf pdf]
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]], Vol. 38, No. 3
* [[Sebastian Thrun]] ('''1995'''). ''[http://robots.stanford.edu/papers/thrun.nips7.neuro-chess.html Learning to Play the Game of Chess]''. in [[Gerald Tesauro]], [https://en.wikipedia.org/wiki/David_S._Touretzky David S. Touretzky], [http://mitpress.mit.edu/authors/todd-k-leen Todd K. Leen] (eds.) Advances in Neural Information Processing Systems 7, [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
'''1996'''
* [[Robert Schapire]], [[Mathematician#MKWarmuth|Manfred K. Warmuth]] ('''1996'''). ''On the Worst-Case Analysis of Temporal-Difference Learning Algorithms''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 22, Nos. 1-3, [https://users.soe.ucsc.edu/~manfred/pubs/J34.pdf pdf]
* [[Johannes Fürnkranz]] ('''1996'''). ''Machine Learning in Computer Chess: The Next Generation.'' [[ICGA Journal#19_3|ICCA Journal, Vol. 19, No. 3]], [http://www.ofai.at/cgi-bin/get-tr?download=1&paper=oefai-tr-96-11.ps.gz zipped ps]
* [[Steven Bradtke]], [[Andrew Barto]] ('''1996''') ''Linear Least-Squares Algorithms for Temporal Difference Learning''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 22, Nos. 1/2/3, [http://www-anw.cs.umass.edu/pubs/1995_96/bradtke_b_ML96.pdf pdf]
'''1997'''
* [[Mathematician#JNTsitsiklis|John N. Tsitsiklis]], [[Mathematician#BVanRoy|Benjamin Van Roy]] ('''1997'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=bWTPrLEAAAAJ&citation_for_view=bWTPrLEAAAAJ:2osOgNQ5qMEC An Analysis of Temporal Difference Learning with Function Approximation]''. [[IEEE#TAC|IEEE Transactions on Automatic Control]], Vol. 42, No. 5
* [[Don Beal]], [[Martin C. Smith]] ('''1997'''). ''Learning Piece Values Using Temporal Differences''. [[ICGA Journal#20_3|ICCA Journal, Vol. 20, No. 3]]
'''1998'''
* [[Don Beal]], [[Martin C. Smith]] ('''1998'''). ''[http://www.springerlink.com/content/l9f4ngc2tqgnac9e/ First Results from Using Temporal Difference Learning in Shogi]''. [[CG 1998]]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search''. [https://www.chatbots.org/journal/australian_journal_of_intelligent_information_processing_systems/ Australian Journal of Intelligent Information Processing Systems], Vol. 5 No. 1, [http://arxiv.org/abs/cs/9901001 arXiv:cs/9901001]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Experiments in Parameter Learning Using Temporal Differences''. [[ICGA Journal#21_2|ICCA Journal, Vol. 21, No. 2]], [http://cs.anu.edu.au/%7ELex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf pdf]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search''. Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]
* [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[http://www.incompleteideas.net/sutton/book/the-book.html Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html 6. Temporal-Difference Learning]
* [[Justin A. Boyan]] ('''1998'''). ''Least-Squares Temporal Difference Learning''. [[Carnegie Mellon University]], CMU-CS-98-152, [http://www.research.rutgers.edu/~lihong/project/ahlp/boyan99least.pdf pdf]
'''1999'''
* [[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''[http://portal.acm.org/citation.cfm?id=1624299 Temporal Coherence and Prediction Decay in TD Learning]''. [[Conferences#IJCAI1999|IJCAI 1999]], [http://ijcai.org/Past%20Proceedings/IJCAI-99-VOL-1/PDF/081.pdf pdf]
* [[Don Beal]], [[Martin C. Smith]] ('''1999'''). ''Learning Piece-Square Values using Temporal Differences.'' [[ICGA Journal#22_4|ICCA Journal, Vol. 22, No. 4]]
==2000 ...==
* [[Sebastian Thrun]], [[Michael L. Littman]] ('''2000'''). ''A Review of Reinforcement Learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/aim/aim21.html#ThrunL00 AI Magazine, Vol. 21], No. 1, [http://www.aistudy.com/paper/aaai_journal/AIMag21-01-001.pdf pdf]
* [[Robert Levinson]], [[Ryan Weber]] ('''2000'''). ''[http://link.springer.com/chapter/10.1007/3-540-45579-5_9 Chess Neighborhoods, Function Combination, and Reinforcement Learning]''. [[CG 2000]], [https://users.soe.ucsc.edu/~levinson/Papers/CNFCRL.pdf pdf] » [[Morph]]
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''2000'''). ''Learning to Play Chess Using Temporal Differences''. [http://www.dblp.org/db/journals/ml/ml40.html#BaxterTW00 Machine Learning, Vol 40, No. 3], [http://www.cs.princeton.edu/courses/archive/fall06/cos402/papers/chess-RL.pdf pdf]
* [[Johannes Fürnkranz]] ('''2000'''). ''Machine Learning in Games: A Survey''. [https://en.wikipedia.org/wiki/Austrian_Research_Institute_for_Artificial_Intelligence Austrian Research Institute for Artificial Intelligence], OEFAI-TR-2000-3, [http://www.ofai.at/cgi-bin/get-tr?download=1&paper=oefai-tr-2000-31.pdf pdf]
'''2001'''
* [[Jonathan Schaeffer]], [[Markian Hlynka]], [[Vili Jussila]] ('''2001'''). ''Temporal Difference Learning Applied to a High-Performance Game-Playing Program''. [http://www.informatik.uni-trier.de/~ley/db/conf/ijcai/ijcai2001.html#SchaefferHJ01 IJCAI 2001]
* [[Don Beal]], [[Martin C. Smith]] ('''2001'''). ''[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V1G-41MJ1SV-7&_user=10&_coverDate=02%2F06%2F2001&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1436661548&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d855cbad10953476dbb92258347c8e94 Temporal difference learning applied to game playing and the results of application to Shogi]''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_%28journal%29 Theoretical Computer Science], Vol. 252, Nos. 1-2
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]] ('''2001'''). ''[http://nic.schraudolph.org/bib2html/b2hd-SchDaySej01.html Learning to Evaluate Go Positions via Temporal Difference Methods]''. in [[Norio Baba]], [[Lakhmi C. Jain]] (eds.) ('''2001'''). ''[http://jasss.soc.surrey.ac.uk/7/1/reviews/takama.html Computational Intelligence in Games, Studies in Fuzziness and Soft Computing]''. [http://www.springer.com/economics?SGWID=1-165-6-73481-0 Physica-Verlag]
'''2002'''
* [[Ari Shapiro]], [[Gil Fuchs]], [[Robert Levinson]] ('''2002'''). ''[http://www.arishapiro.com/researchportfolio/Learning%20Game%20Strategy/index.htm Learning a Game Strategy Using Pattern-Weights and Self-play]''. [[CG 2002]], [http://www.arishapiro.com//ShapiroA_CG2002.pdf pdf]
* [[Mark Winands]], [[Levente Kocsis]], [[Jos Uiterwijk]], [[Jaap van den Herik]] ('''2002'''). ''Temporal difference learning and the Neural MoveMap heuristic in the game of Lines of Action''. GAME-ON 2002 » [[Neural MoveMap Heuristic]]
* [[James Swafford]] ('''2002'''). ''Optimizing Parameter Learning using Temporal Differences''. [http://www.aaai.org/Conferences/AAAI/aaai02.php AAAI-02], Student Abstracts, [https://www.aaai.org/Papers/AAAI/2002/AAAI02-150.pdf pdf]
'''2003'''
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artiﬁcial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]
'''2004'''
* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''[http://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&pagesize=80&citation_for_view=xVas0I8AAAAJ:7PzlFSSx8tAC Learning to play chess using TD(λ)-learning with database games]''. [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artiﬁcial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04
* [[Marco Block-Berlitz|Marco Block]] ('''2004'''). ''Verwendung von Temporale-Differenz-Methoden im Schachmotor FUSc#''. Diplomarbeit, Betreuer: [[Raúl Rojas]], [[Free University of Berlin]], [http://page.mi.fu-berlin.de/block/Skripte/diplomarbeit.pdf pdf] (German)
* [[Jacek Mańdziuk]], [[Daniel Osman]] ('''2004'''). ''Temporal Difference Approach to Playing Give-Away Checkers''. [http://www.informatik.uni-trier.de/~ley/db/conf/icaisc/icaisc2004.html#MandziukO04 ICAISC 2004], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICAISC04-3.pdf pdf]
==2005 ...==
* [[Marco Wiering]], [http://dblp.uni-trier.de/pers/hd/p/Patist:Jan_Peter Jan Peter Patist], [[Henk Mannen]] ('''2005'''). ''Learning to Play Board Games using Temporal Difference Methods''. Technical Report, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], UU-CS-2005-048, [http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/learning_games_TR.pdf pdf]
'''2006'''
* [[Simon Lucas]], [[Thomas Philip Runarsson]] ('''2006'''). ''[http://scholar.google.is/citations?view_op=view_citation&hl=en&user=4eWdc_sAAAAJ&citation_for_view=4eWdc_sAAAAJ:qjMakFHDy7sC Temporal Difference Learning versus Co-Evolution for Acquiring Othello Position Evaluation]''. [[IEEE#CIG|IEEE Symposium on Computational Intelligence and Games]]
'''2007'''
* [[Edward P. Manning]] ('''2007'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4219046 Temporal Difference Learning of an Othello Evaluation Function for a Small Neural Network with Shared Weights]''. [[IEEE#CIG|IEEE Symposium on Computational Intelligence and AI in Games]]
* [[Daniel Osman]] ('''2007'''). ''Temporal Difference Methods for Two-player Board Games''. Ph.D. thesis, Faculty of Mathematics and Information Science, [https://en.wikipedia.org/wiki/Warsaw_University_of_Technology Warsaw University of Technology]
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2007'''). ''Reinforcement Learning of Evaluation Functions Using Temporal Difference-Monte Carlo learning method''. [[Conferences#GPW|12th Game Programming Workshop]]
'''2008'''
* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2008'''). ''An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning''. [http://www.csse.uwa.edu.au/cig08/Proceedings/toc.html CIG'08], [http://www.csse.uwa.edu.au/cig08/Proceedings/papers/8010.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [http://www.sztaki.hu/%7Eszcsaba/papers/gtdnips08.pdf pdf] (draft)
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol. 31, No. 4]]
* [[Albrecht Fiebiger]] ('''2008'''). ''Einsatz von allgemeinen Evaluierungsheuristiken in Verbindung mit der Reinforcement-Learning-Strategie in der Schachprogrammierung''. [https://de.wikipedia.org/wiki/Besondere_Lernleistung Besondere Lernleistung] im [https://de.wikipedia.org/wiki/Fachgebiet Fachbereich] [https://de.wikipedia.org/wiki/Informatik Informatik], [https://en.wikipedia.org/wiki/Federal_School_of_Saxony%E2%80%93Saint_Afra Sächsischees Landesgymnasium Sankt Afra], Internal advisor: Ralf Böttcher, External advisors: [[Stefan Meyer-Kahlen]], [[Marco Block-Berlitz|Marco Block]], [http://page.mi.fu-berlin.de/block/abschlussarbeiten/Fiebiger_BeLL.pdf pdf] (German)
'''2009'''
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' Accepted in Advances in Neural Information Processing Systems 22, Vancouver, BC. December 2009. MIT Press. [http://books.nips.cc/papers/files/nips22/NIPS2009_1121.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. [http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf]
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2009'''). ''Coevolutionary Temporal Difference Learning for Othello''. [[IEEE#CIG|IEEE Symposium on Computational Intelligence and Games]], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert09coevolutionary.pdf pdf]
* [http://www.cs.cmu.edu/~zkolter/ J. Zico Kolter], [[Andrew Ng]] ('''2009'''). ''Regularization and Feature Selection in Least-Squares Temporal Difference Learning''. [http://www.machinelearning.org/archive/icml2009/ ICML 2009], [http://www.cs.cmu.edu/~zkolter/pubs/kolter-icml09b-full.pdf pdf]
==2010 ...==
* [[Marco Wiering]] ('''2010'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&citation_for_view=xVas0I8AAAAJ:_kc_bZDykSQC Self-play and using an expert to learn to play backgammon with temporal difference learning]''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[http://www.incompleteideas.net/sutton/publications.html#GQ GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. In Proceedings of the Third Conference on Artificial General Intelligence
'''2011'''
* [[Hamid Reza Maei]] ('''2011'''). ''Gradient Temporal-Difference Learning Algorithms''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf]
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
* [[I-Chen Wu]], [[Hsin-Ti Tsai]], [[Hung-Hsuan Lin]], [[Yi-Shan Lin]], [[Chieh-Min Chang]], [[Ping-Hung Lin]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Temporal Difference Learning for Connect6]''. [[Advances in Computer Games 13]]
* [[Nikolaos Papahristou]], [[Ioannis Refanidis]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Improving Temporal Difference Performance in Backgammon Variants]''. [[Advances in Computer Games 13]], [http://ai.uom.gr/nikpapa/publications/Improving%20Temporal%20Difference%20Learning%20in%20Backgammon%20Variants_ACG13.pdf pdf]
* [[Krzysztof Krawiec]], [[Wojciech Jaśkowski]], [[Marcin Szubert]] ('''2011'''). ''[http://www.degruyter.com/view/j/amcs.2011.21.issue-4/v10006-011-0057-3/v10006-011-0057-3.xml Evolving small-board Go players using Coevolutionary Temporal Difference Learning with Archives]''. [http://www.degruyter.com/view/j/amcs Applied Mathematics and Computer Science], Vol. 21, No. 4
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2011'''). ''Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning''. [http://control.ibspan.waw.pl:3000/mainpage Control and Cybernetics], Vol. 40, No. 3,[http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2011learning.pdf pdf]
'''2012'''
* [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. in [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] (eds.). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:abG-DnoFyZgC Reinforcement learning: State-of-the-art]''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
'''2013'''
* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]
* [[Florian Kunz]] ('''2013'''). ''An Introduction to Temporal Difference Learning''. Seminar on Autonomous Learning Systems, [[Darmstadt University of Technology|TU Darmstad]], [http://www.ausy.informatik.tu-darmstadt.de/uploads/Teaching/AutonomousLearningSystems/Kunz_ALS_2013.pdf pdf]
'''2014'''
* [[I-Chen Wu]], [[Kun-Hao Yeh]], [[Chao-Chin Liang]], [[Chia-Chuan Chang]], [[Han Chiang]] ('''2014'''). ''Multi-Stage Temporal Difference Learning for 2048''. [[TAAI 2014]]
* [[Wojciech Jaśkowski]], [[Marcin Szubert]], [[Paweł Liskowski]] ('''2014'''). ''Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello''. [http://www.evostar.org/2014/ EvoApplications 2014], [http://www.springer.com/computer/theoretical+computer+science/book/978-3-662-45522-7 Springer, volume 8602]
==2015 ...==
* [[James L. McClelland]] ('''2015'''). ''[https://web.stanford.edu/group/pdplab/pdphandbook/handbook3.html#handbookch10.html Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises]''. Second Edition, [https://web.stanford.edu/group/pdplab/pdphandbook/handbookli1.html Contents], [https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html Temporal-Difference Learning]
* [[Matthew Lai]] ('''2015'''). ''Giraffe: Using Deep Reinforcement Learning to Play Chess''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Imperial_College_London Imperial College London], [http://arxiv.org/abs/1509.01549v1 arXiv:1509.01549v1] » [[Giraffe]]
* [[Kazuto Oka]], [[Kiminori Matsuzaki]] ('''2016'''). ''Systematic Selection of N-tuple Networks for 2048''. [[CG 2016]]
* [[Huizhen Yu]], [[A. Rupam Mahmood]], [[Richard Sutton]] ('''2017'''). ''On Generalized Bellman Equations and Temporal-Difference Learning''. Canadian Conference on AI 2017, [https://arxiv.org/abs/1704.04463 arXiv:1704.04463]

=Forum Posts=
==1995 ...==
* [https://www.stmintz.com/ccc/index.php?id=28584 Parameter Tuning] by [[Jonathan Baxter]], [[CCC]], October 01, 1998 » [[KnightCap]]
: [https://www.stmintz.com/ccc/index.php?id=28819 Re: Parameter Tuning] by [[Don Beal]], [[CCC]], October 02, 1998
==2000 ...==
* [https://www.stmintz.com/ccc/index.php?id=147377 any good experiences with genetic algos or temporal difference learning?] by [[Rafael B. Andrist]], [[CCC]], January 01, 2001
* [https://www.stmintz.com/ccc/index.php?id=148342 Temporal Difference] by [[Bas Hamstra]], [[CCC]], January 05, 2001
* [https://www.stmintz.com/ccc/index.php?id=149645 Tao update] by [[Bas Hamstra]], [[CCC]], January 12, 2001 » [[Tao]]
* [https://www.stmintz.com/ccc/index.php?id=218650 Re: Parameter Learning Using Temporal Differences !] by [[Aaron Tay]], [[CCC]], March 19, 2002
* [https://www.stmintz.com/ccc/index.php?id=243354 Hello from Edmonton (and on Temporal Differences)] by [[James Swafford]], [[CCC]], July 30, 2002
* [https://www.stmintz.com/ccc/index.php?id=394403 Temporal Differences] by [[Stuart Cracraft]], [[CCC]], November 03, 2004
: [https://www.stmintz.com/ccc/index.php?id=394440 Re: Temporal Differences] by [[Guy Haworth]], [[CCC]], November 04, 2004 <ref>[[Guy Haworth]], [[Meel Velliste]] ('''1998'''). ''[http://centaur.reading.ac.uk/4569/ Chess Endgames and Neural Networks]''. [[ICGA Journal#21_4|ICCA Journal, Vol. 21, No. 4]]</ref>
* [https://www.stmintz.com/ccc/index.php?id=401974 Temporal Differences] by [[Peter Fendrich]], [[CCC]], December 21, 2004
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=4467&p=23234 Chess program improvement project (copy at TalkChess/ICD)] by [[Stuart Cracraft]], [[Computer Chess Forums|Winboard Forum]], March 07, 2006 » [[Win at Chess]]
==2010 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=37062 Positional learning] by [[Ben-Hur Carlos Vieira Langoni Junior]], [[CCC]], December 13, 2010
: [http://www.talkchess.com/forum/viewtopic.php?t=37062&start=2 Re: Positional learning] by [[Don Dailey]], [[CCC]], December 13, 2010
* [http://www.talkchess.com/forum/viewtopic.php?t=43323 Pawn Advantage, Win Percentage, and Elo] by [[Adam Hair]], [[CCC]], April 15, 2012
: [http://www.talkchess.com/forum/viewtopic.php?t=43323&start=3 Re: Pawn Advantage, Win Percentage, and Elo] by [[Don Dailey]], [[CCC]], April 15, 2012
==2015 ...==
* [http://talkchess.com/forum/viewtopic.php?t=56913 *First release* Giraffe, a new engine based on deep learning] by [[Matthew Lai]], [[CCC]], July 08, 2015 » [[Deep Learning]], [[Giraffe]]
* [http://www.talkchess.com/forum/viewtopic.php?t=57860 td-leaf] by [[Alexandru Mosoi]], [[CCC]], October 06, 2015 » [[Automated Tuning]]
* [http://www.talkchess.com/forum/viewtopic.php?t=62053 TD-leaf(lambda)] by [[Robert Pope]], [[CCC]], November 09, 2016

=External Links=
* [https://en.wikipedia.org/wiki/Temporal_difference_learning Temporal difference learning from Wikipedia]
: [https://en.wiktionary.org/wiki/temporal temporal - Wiktionary]
* [https://en.wikipedia.org/wiki/Reinforcement_learning#Temporal_difference_methods Reinforcement learning - Temporal difference methods from Wikipedia]
* [http://www.scholarpedia.org/article/Temporal_difference_learning Temporal difference learning - Scholarpedia]
* [https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html 6. Temporal-Difference Learning] in [[Richard Sutton]], [[Andrew Barto]] ('''1998'''). ''[http://www.incompleteideas.net/sutton/book/the-book.html Reinforcement Learning: An Introduction]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press] eBook
* [https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html Temporal-Difference Learning] (Chapter 9) in [[James L. McClelland]] ('''2015'''). ''[https://web.stanford.edu/group/pdplab/pdphandbook/handbook3.html#handbookch10.html Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises]''. Second Edition, [https://web.stanford.edu/group/pdplab/pdphandbook/handbookli1.html Contents]
* [https://www.tu-chemnitz.de/informatik/KI/scripts/ws0910/ml09_6.pdf Temporal-Difference learning], Slides as pdf from [[Chemnitz University of Technology]]
* [http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/C/creditassign.html University of Alberta Dictionary of Cognitive Science: Credit Assignment Problem]
* [[Videos#ShawnLane|Shawn Lane]], [[Videos#JonasHellborg|Jonas Hellborg]], [[Videos#JeffSipe|Jeff Sipe]] - [https://en.wikipedia.org/wiki/Temporal_Analogues_of_Paradise Temporal Analogues of Paradise - 2nd Movement], [https://en.wikipedia.org/wiki/Atlanta Atlanta] Drums & Percussion, August 20, 1996, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=oJ_MfyrQT8I|alignment=left|valignment=top}}

=References=
<references />

'''[[Learning|Up one level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Temporal Difference Learning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools