Difference between revisions of "Csaba Szepesvári"

From Chessprogramming wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 21: Line 21:
 
* [http://www.szit.bme.hu/~gya/ András György], [[Levente Kocsis]], [http://dblp.uni-trier.de/pers/hd/s/Szab=oacute=:Ivett Ivett Szabó], [[Csaba Szepesvári]] ('''2007'''). ''Continuous Time Associative Bandit Problems'' IJCAI-07, 830-835. [http://www.sztaki.hu/~szcsaba/papers/cbandit-ijcai07.pdf pdf]
 
* [http://www.szit.bme.hu/~gya/ András György], [[Levente Kocsis]], [http://dblp.uni-trier.de/pers/hd/s/Szab=oacute=:Ivett Ivett Szabó], [[Csaba Szepesvári]] ('''2007'''). ''Continuous Time Associative Bandit Problems'' IJCAI-07, 830-835. [http://www.sztaki.hu/~szcsaba/papers/cbandit-ijcai07.pdf pdf]
 
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2007'''). ''Tuning Bandit Algorithms in Stochastic Environments''. [http://certis.enpc.fr/~audibert/ucb_alt.pdf pdf]
 
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2007'''). ''Tuning Bandit Algorithms in Stochastic Environments''. [http://certis.enpc.fr/~audibert/ucb_alt.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation'', [http://www.sztaki.hu/~szcsaba/papers/gtdnips08.pdf pdf] (draft)
+
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf]
 
* [[Rémi Munos]], [[Csaba Szepesvári]] ('''2008'''). ''Finite time bounds for sampling based fitted value iteration''. Journal of Machine Learning Research, 9:815-857, 2008. [http://hal.inria.fr/docs/00/26/09/34/PDF/savi_1.5.pdf pdf], [http://www.ualberta.ca/~szepesva/papers/munos08a.pdf pdf]
 
* [[Rémi Munos]], [[Csaba Szepesvári]] ('''2008'''). ''Finite time bounds for sampling based fitted value iteration''. Journal of Machine Learning Research, 9:815-857, 2008. [http://hal.inria.fr/docs/00/26/09/34/PDF/savi_1.5.pdf pdf], [http://www.ualberta.ca/~szepesva/papers/munos08a.pdf pdf]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' Accepted  in Advances in Neural Information Processing Systems 22, Vancouver, BC. December 2009. MIT Press. [http://books.nips.cc/papers/files/nips22/NIPS2009_1121.pdf pdf]
+
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], [https://papers.nips.cc/paper/2009/file/3a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
+
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''[https://dl.acm.org/doi/10.1145/1553374.1553501 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation]''. [https://dblp.uni-trier.de/db/conf/icml/icml2009.html#SuttonMPBSSW09 ICML 2009]
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2009'''). ''Exploration-exploitation trade-off using variance estimates in multi-armed bandits''. Theoretical Computer Science, 410:1876-1902, 2009, [http://www.ualberta.ca/~szepesva/papers/ucbtuned-journal.pdf pdf]
+
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2009'''). ''Exploration-exploitation trade-off using variance estimates in multi-armed bandits''. [https://en.wikipedia.org/wiki/Theoretical_Computer_Science_(journal) Theoretical Computer Science], Vol. 410, [http://www.ualberta.ca/~szepesva/papers/ucbtuned-journal.pdf pdf]
 
==2010 ...==
 
==2010 ...==
 
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool
 
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool
 
* [[István Szita]], [[Csaba Szepesvári]] ('''2010'''). ''Model-based reinforcement learning with nearly tight exploration complexity bounds''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2010.html#SzitaS10 ICML 2010]
 
* [[István Szita]], [[Csaba Szepesvári]] ('''2010'''). ''Model-based reinforcement learning with nearly tight exploration complexity bounds''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2010.html#SzitaS10 ICML 2010]
 +
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Richard Sutton]] ('''2010'''). ''Toward Off-Policy Learning Control with Function Approximation''. [https://dblp.uni-trier.de/db/conf/icml/icml2010.html#MaeiSBS10 ICML 2010], [https://icml.cc/Conferences/2010/papers/627.pdf pdf]
 
* [[István Szita]], [[Csaba Szepesvári]] ('''2011'''). ''Agnostic KWIK learning and efficient approximate reinforcement learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/jmlrp19.html#SzitaS11 Journal of Machine Learning Research - Proceedings Track 19]
 
* [[István Szita]], [[Csaba Szepesvári]] ('''2011'''). ''Agnostic KWIK learning and efficient approximate reinforcement learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/jmlrp19.html#SzitaS11 Journal of Machine Learning Research - Proceedings Track 19]
 
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]
 
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]
 +
* [https://scholar.google.com/citations?user=iuCvTuIAAAAJ&hl=en Mahdi Milani Fard], [[Joelle Pineau]], [[Csaba Szepesvári]] ('''2012'''). ''PAC-Bayesian Policy Evaluation for Reinforcement Learning''. [https://arxiv.org/abs/1202.3717 arXiv:1202.3717]
 
==2015 ...==
 
==2015 ...==
 
* [[Tor Lattimore]], [[Csaba Szepesvári]] ('''2017'''). ''The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits''.  [https://www.aistats.org/aistats2017/ AISTATS], [https://sites.ualberta.ca/~szepesva/papers/linbandits_aistats17.pdf pdf]
 
* [[Tor Lattimore]], [[Csaba Szepesvári]] ('''2017'''). ''The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits''.  [https://www.aistats.org/aistats2017/ AISTATS], [https://sites.ualberta.ca/~szepesva/papers/linbandits_aistats17.pdf pdf]

Latest revision as of 22:16, 12 April 2021

Home * People * Csaba Szepesvári

Csaba Szepesvári [1]

Csaba Szepesvári,
a Hungarian computer scientiest with research interests in applications of statistical techniques in AI, and Reinforcement Learning [2]. Csaba Szepesvári worked at the Computer and Automation Research Institute of the Hungarian Academy of Sciences, and is professor at the Department of Computing Science, University of Alberta, and principal investigator of the RLAI [3] group, actually on leave at DeepMind.

UCT

In 2006, along with Levente Kocsis, Csaba Szepesvári introduced UCT (Upper Confidence bounds applied to Trees), a new algorithm that applies bandit ideas to guide Monte-Carlo planning [4]. UCT accelerated the Monte-Carlo revolution in computer Go [5] and other domains.

Selected Publications

[6] [7]

1994 ...

2005 ...

2010 ...

2015 ...

External Links

References

Up one level