Changes

Temporal Difference Learning

322 bytes added, 14:06, 12 April 2021

no edit summary

'''2008'''

* [[Yasuhiro Osaki]], [[Kazutomo Shibahara]], [[Yasuhiro Tajima]], [[Yoshiyuki Kotani]] ('''2008'''). ''An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning''. [http://www.csse.uwa.edu.au/cig08/Proceedings/toc.html CIG'08], [http://www.csse.uwa.edu.au/cig08/Proceedings/papers/8010.pdf pdf]

* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [~~http~~https://~~www~~dblp.~~sztaki~~uni-trier.hude/~~%7Eszcsaba~~db/~~papers~~conf/~~gtdnips08~~nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf] ~~(draft)~~

* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]

* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol. 31, No. 4]]

* [[Albrecht Fiebiger]] ('''2008'''). ''Einsatz von allgemeinen Evaluierungsheuristiken in Verbindung mit der Reinforcement-Learning-Strategie in der Schachprogrammierung''. [https://de.wikipedia.org/wiki/Besondere_Lernleistung Besondere Lernleistung] im [https://de.wikipedia.org/wiki/Fachgebiet Fachbereich] [https://de.wikipedia.org/wiki/Informatik Informatik], [https://en.wikipedia.org/wiki/Federal_School_of_Saxony%E2%80%93Saint_Afra Sächsischees Landesgymnasium Sankt Afra], Internal advisor: Ralf Böttcher, External advisors: [[Stefan Meyer-Kahlen]], [[Marco Block-Berlitz|Marco Block]], [http://page.mi.fu-berlin.de/block/abschlussarbeiten/Fiebiger_BeLL.pdf pdf] (German)

'''2009'''

* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' . [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], [~~http~~https://~~books~~papers.nips.cc/~~papers~~paper/~~files~~2009/~~nips22~~file/~~NIPS2009_1121~~3a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]

* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. NIPS 2009, [http://jveness.info/publications/nips2009%20-%20bootstrapping%20from%20game%20tree%20search.pdf pdf]

* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''. ~~ICML-09,~~ [~~http~~https://~~www~~dblp.~~sztaki~~uni-trier.hude/db/~~~szcsaba~~conf/~~papers~~icml/~~GTD-ICML09~~icml2009.~~pdf pdf~~html#SuttonMPBSSW09 ICML 2009]

* [[Simon Lucas]] ('''2009'''). ''[https://ieeexplore.ieee.org/document/5286496 Temporal difference learning with interpolated table value functions]''. [[IEEE#CIG|CIG 2009]]

* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2009'''). ''Coevolutionary Temporal Difference Learning for Othello''. [[IEEE#CIG|GIG 2009]], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert09coevolutionary.pdf pdf]

==2010 ...==

* [[Marco Wiering]] ('''2010'''). ''Self-play and using an expert to learn to play backgammon with temporal difference learning''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2

* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[~~http~~https://www.~~incompleteideas~~researchgate.net/~~sutton~~publication/~~publications.html#GQ~~ 215990384_GQlambda_A_general_gradient_algorithm_for_temporal-difference_prediction_learning_with_eligibility_traces GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. ~~In Proceedings of the Third Conference on Artificial General Intelligence~~[https://agi-conf.org/2010/ AGI 2010]

'''2011'''

* [[Hamid Reza Maei]] ('''2011'''). ''[https://era.library.ualberta.ca/items/fd55edcb-ce47-4f84-84e2-be281d27b16a Gradient Temporal-Difference Learning Algorithms]''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]~~], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf~~]

* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]

* [[I-Chen Wu]], [[Hsin-Ti Tsai]], [[Hung-Hsuan Lin]], [[Yi-Shan Lin]], [[Chieh-Min Chang]], [[Ping-Hung Lin]] ('''2011'''). ''[https://www.conftool.net/acg13/index.php?page=browseSessions&form_session=5 Temporal Difference Learning for Connect6]''. [[Advances in Computer Games 13]]

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Temporal Difference Learning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools