Changes

Csaba Szepesvári

448 bytes added, 13:53, 12 April 2021

no edit summary

* [http://www.szit.bme.hu/~gya/ András György], [[Levente Kocsis]], [http://dblp.uni-trier.de/pers/hd/s/Szab=oacute=:Ivett Ivett Szabó], [[Csaba Szepesvári]] ('''2007'''). ''Continuous Time Associative Bandit Problems'' IJCAI-07, 830-835. [http://www.sztaki.hu/~szcsaba/papers/cbandit-ijcai07.pdf pdf]

* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2007'''). ''Tuning Bandit Algorithms in Stochastic Environments''. [http://certis.enpc.fr/~audibert/ucb_alt.pdf pdf]

* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2008.html#SuttonSM08 NIPS 2008], [~~http~~https://~~www~~proceedings.~~sztaki~~neurips.hucc/paper/~~~szcsaba~~2008/~~papers~~file/~~gtdnips08~~e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf] ~~(draft)~~

* [[Rémi Munos]], [[Csaba Szepesvári]] ('''2008'''). ''Finite time bounds for sampling based fitted value iteration''. Journal of Machine Learning Research, 9:815-857, 2008. [http://hal.inria.fr/docs/00/26/09/34/PDF/savi_1.5.pdf pdf], [http://www.ualberta.ca/~szepesva/papers/munos08a.pdf pdf]

* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' ~~Accepted in Advances in Neural Information Processing Systems 22, Vancouver, BC~~. ~~December 2009~~[https://dblp.uni-trier. ~~MIT Press~~de/db/conf/nips/nips2009. html#MaeiSBPSS09 NIPS 2009], [~~http~~https://~~books~~papers.nips.cc/~~papers~~paper/~~files~~2009/~~nips22~~file/~~NIPS2009_1121~~3a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''~~. In Proceedings of the 26th International Conference on Machine Learning (ICML-09)~~. [~~http~~https://~~www~~dblp.~~sztaki~~uni-trier.hude/db/~~~szcsaba~~conf/~~papers~~icml/~~GTD-ICML09~~icml2009.~~pdf pdf~~html#SuttonMPBSSW09 ICML 2009]* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2009'''). ''Exploration-exploitation trade-off using variance estimates in multi-armed bandits''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_(journal) Theoretical Computer Science], Vol. 410~~:1876-1902, 2009~~, [http://www.ualberta.ca/~szepesva/papers/ucbtuned-journal.pdf pdf]

==2010 ...==

* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool

* [[István Szita]], [[Csaba Szepesvári]] ('''2010'''). ''Model-based reinforcement learning with nearly tight exploration complexity bounds''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2010.html#SzitaS10 ICML 2010]

* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Richard Sutton]] ('''2010'''). ''Toward Off-Policy Learning Control with Function Approximation''. [https://dblp.uni-trier.de/db/conf/icml/icml2010.html#MaeiSBS10 ICML 2010], [https://icml.cc/Conferences/2010/papers/627.pdf pdf]

* [[István Szita]], [[Csaba Szepesvári]] ('''2011'''). ''Agnostic KWIK learning and efficient approximate reinforcement learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/jmlrp19.html#SzitaS11 Journal of Machine Learning Research - Proceedings Track 19]

* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Csaba Szepesvári

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools