Changes

Jump to: navigation, search

Lex Weaver

61 bytes added, 22:41, 23 May 2019
no edit summary
* [[Lex Weaver]], [[Jonathan Baxter]] ('''2001'''). ''STD (λ): learning state differences with TD (λ)''. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.7737 CiteSeerX]
==2010 ...==
* [[Mathematician#PBartlett|Peter Bartlett]], [[Jonathan Baxter]], [[Lex Weaver]] ('''2011'''). ''Experiments with Infinite-Horizon, Policy-Gradient Estimation''. [https://arxiv.org/abs/1106.0666 arXiv:1106.0666]
* [[Lex Weaver]], [https://dblp.uni-trier.de/pers/hd/t/Tao:Nigel Nigel Tao] ('''2013'''). ''The Optimal Reward Baseline for Gradient-Based Reinforcement Learning''. [https://arxiv.org/abs/1301.2315 arXiv:1301.2315]
'''[[People|Up one level]]'''
[[Category:Chess Programmer|Weaver]]

Navigation menu