Changes

Jump to: navigation, search

Csaba Szepesvári

448 bytes added, 13:53, 12 April 2021
no edit summary
* [http://www.szit.bme.hu/~gya/ András György], [[Levente Kocsis]], [http://dblp.uni-trier.de/pers/hd/s/Szab=oacute=:Ivett Ivett Szabó], [[Csaba Szepesvári]] ('''2007'''). ''Continuous Time Associative Bandit Problems'' IJCAI-07, 830-835. [http://www.sztaki.hu/~szcsaba/papers/cbandit-ijcai07.pdf pdf]
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2007'''). ''Tuning Bandit Algorithms in Stochastic Environments''. [http://certis.enpc.fr/~audibert/ucb_alt.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2008.html#SuttonSM08 NIPS 2008], [httphttps://wwwproceedings.sztakineurips.hucc/paper/~szcsaba2008/papersfile/gtdnips08e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf] (draft)
* [[Rémi Munos]], [[Csaba Szepesvári]] ('''2008'''). ''Finite time bounds for sampling based fitted value iteration''. Journal of Machine Learning Research, 9:815-857, 2008. [http://hal.inria.fr/docs/00/26/09/34/PDF/savi_1.5.pdf pdf], [http://www.ualberta.ca/~szepesva/papers/munos08a.pdf pdf]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' Accepted in Advances in Neural Information Processing Systems 22, Vancouver, BC. December 2009[https://dblp.uni-trier. MIT Pressde/db/conf/nips/nips2009. html#MaeiSBPSS09 NIPS 2009], [httphttps://bookspapers.nips.cc/paperspaper/files2009/nips22file/NIPS2009_11213a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [httphttps://wwwdblp.sztakiuni-trier.hude/db/~szcsabaconf/papersicml/GTD-ICML09icml2009.pdf pdfhtml#SuttonMPBSSW09 ICML 2009]* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2009'''). ''Exploration-exploitation trade-off using variance estimates in multi-armed bandits''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_(journal) Theoretical Computer Science], Vol. 410:1876-1902, 2009, [http://www.ualberta.ca/~szepesva/papers/ucbtuned-journal.pdf pdf]
==2010 ...==
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool
* [[István Szita]], [[Csaba Szepesvári]] ('''2010'''). ''Model-based reinforcement learning with nearly tight exploration complexity bounds''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2010.html#SzitaS10 ICML 2010]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Richard Sutton]] ('''2010'''). ''Toward Off-Policy Learning Control with Function Approximation''. [https://dblp.uni-trier.de/db/conf/icml/icml2010.html#MaeiSBS10 ICML 2010], [https://icml.cc/Conferences/2010/papers/627.pdf pdf]
* [[István Szita]], [[Csaba Szepesvári]] ('''2011'''). ''Agnostic KWIK learning and efficient approximate reinforcement learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/jmlrp19.html#SzitaS11 Journal of Machine Learning Research - Proceedings Track 19]
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]

Navigation menu