Csaba Szepesvári

Home * People * Csaba Szepesvári



Csaba Szepesvári, a Hungarian computer scientiest with research interests in applications of statistical techniques in AI, and Reinforcement Learning. Csaba Szepesvári worked at the Computer and Automation Research Institute of the Hungarian Academy of Sciences, and is professor at the Department of Computing Science, University of Alberta, and principal investigator of the RLAI group, actually on leave at DeepMind.

=UCT= In 2006, along with Levente Kocsis, Csaba Szepesvári introduced UCT (Upper Confidence bounds applied to Trees), a new algorithm that applies bandit ideas to guide Monte-Carlo planning. UCT accelerated the Monte-Carlo revolution in computer Go and other domains.

=Selected Publications=

1994 ...

 * Csaba Szepesvári, Lászlo Balázs, András Lõrincz (1994). Topology learning solved by extended objects: a neural network model. pdf
 * Csaba Szepesvári (1998). Reinforcement Learning: Theory and Practice. in Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, zipped ps

2005 ...

 * Levente Kocsis, Csaba Szepesvári, Mark Winands (2005). RSPSA: Enhanced Parameter Optimization in Games. Advances in Computer Games 11, pdf
 * Levente Kocsis, Csaba Szepesvári (2006). Universal Parameter Optimisation in Games Based on SPSA. Machine Learning, Special Issue on Machine Learning and Games, Vol. 63, No. 3
 * Levente Kocsis, Csaba Szepesvári (2006). Bandit based Monte-Carlo Planning. ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing UCT, pdf
 * Levente Kocsis, Csaba Szepesvári, Jan Willemson (2006). Improved Monte-Carlo Search. pdf
 * András György, Levente Kocsis, Ivett Szabó, Csaba Szepesvári (2007). Continuous Time Associative Bandit Problems IJCAI-07, 830-835. pdf
 * Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2007). Tuning Bandit Algorithms in Stochastic Environments. pdf
 * Richard Sutton, Csaba Szepesvári, Hamid Reza Maei (2008). A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation. NIPS 2008, pdf
 * Rémi Munos, Csaba Szepesvári (2008). Finite time bounds for sampling based fitted value iteration. Journal of Machine Learning Research, 9:815-857, 2008. pdf, pdf
 * Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Doina Precup, David Silver, Richard Sutton (2009). Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. NIPS 2009, pdf
 * Richard Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora. (2009). Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. ICML 2009
 * Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2009). Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, Vol. 410, pdf

2010 ...

 * Csaba Szepesvári (2010). Algorithms for Reinforcement Learning. Morgan & Claypool
 * István Szita, Csaba Szepesvári (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. ICML 2010
 * Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard Sutton (2010). Toward Off-Policy Learning Control with Function Approximation. ICML 2010, pdf
 * István Szita, Csaba Szepesvári (2011). Agnostic KWIK learning and efficient approximate reinforcement learning. Journal of Machine Learning Research - Proceedings Track 19
 * Sylvain Gelly, Marc Schoenauer, Michèle Sebag, Olivier Teytaud, Levente Kocsis, David Silver, Csaba Szepesvári (2012). The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions. Communications of the ACM, Vol. 55, No. 3, pdf preprint

2015 ...

 * Tor Lattimore, Csaba Szepesvári (2017). The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits. AISTATS, pdf
 * Tor Lattimore, Csaba Szepesvári (2018). Cleaning up the neighborhood: A full classification for adversarial partial monitoring. arXiv:1805.09247
 * Tor Lattimore, Csaba Szepesvári (2019). Bandit Algorithms. Cambridge University Press (draft), pdf

=External Links=
 * Homepage of Csaba Szepesvári from University of Alberta
 * Bandit Algorithms
 * Csaba Szepesvari - Google Scholar Citations
 * Csaba, Szepesvári, PhD. Senior Research Scientist from Hungarian Academy of Sciences
 * Introduction to Reinforcement Learning, videolecture by Csaba Szepesvári, 2008

=References= Up one level