Changes

Jump to: navigation, search

Csaba Szepesvári

8,272 bytes added, 19:52, 3 June 2018
Created page with "'''Home * People * Csaba Szepesvári''' FILE:CsabaSzepesvari.jpg|border|right|thumb|link=http://www.ualberta.ca/~szepesva/|Csaba Szepesvári <ref>[http://..."
'''[[Main Page|Home]] * [[People]] * Csaba Szepesvári'''

[[FILE:CsabaSzepesvari.jpg|border|right|thumb|link=http://www.ualberta.ca/~szepesva/|Csaba Szepesvári <ref>[http://www.ualberta.ca/~szepesva/ Homepage of Csaba Szepesvári]</ref> ]]

'''Csaba Szepesvári''',<br/>
a Hungarian computer scientiest with research interests in applications of statistical techniques in AI, and [[Reinforcement Learning]] <ref>[http://www.sztaki.hu/%7Eszcsaba/research/Interests.html Research Interests of Csaba Szepesvári]</ref>. Csaba Szepesvári worked at the ''Computer and Automation Research Institute'' of the [https://en.wikipedia.org/wiki/Hungarian_Academy_of_Sciences Hungarian Academy of Sciences], and is professor at the [http://www.cs.ualberta.ca/ Department of Computing Science], [[University of Alberta]], and principal investigator of the RLAI <ref>[http://rlai.cs.ualberta.ca/RLAI/ualberta.html Reinforcement Learning and Artificial Intelligence (RLAI)]</ref> group, actually on leave at [[DeepMind]].

=UCT=
In 2006, along with [[Levente Kocsis]], Csaba Szepesvári introduced [[UCT]] (Upper Confidence bounds applied to Trees), a new algorithm that applies [https://en.wikipedia.org/wiki/Multi-armed_bandit bandit] ideas to guide [[Monte-Carlo Tree Search|Monte-Carlo planning]] <ref>[[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]''</ref>. UCT accelerated the Monte-Carlo revolution in computer [[Go]] <ref>[[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]</ref> and other domains.

=Selected Publications=
<ref>[https://sites.ualberta.ca/~szepesva/publications.html Publications of Csaba Szepesvári]</ref> <ref>[http://dblp.uni-trier.de/pers/hd/s/Szepesv=aacute=ri:Csaba dblp: Csaba Szepesvári]</ref>
==1994 ...==
* [[Csaba Szepesvári]], [http://dblp.uni-trier.de/pers/hd/b/Bal=aacute=zs:L=aacute=szl=oacute= Lászlo Balázs], [http://dblp.uni-trier.de/pers/hd/l/L=ouml=rincz:Andr=aacute=s András Lõrincz] ('''1994'''). ''Topology learning solved by extended objects: a neural network model''. [http://www.ualberta.ca/~szepesva/papers/toplearn94.ps.pdf pdf]
* [[Csaba Szepesvári]] ('''1998'''). ''Reinforcement Learning: Theory and Practice''. in Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, [http://www.sztaki.hu/%7Eszcsaba/papers/scann98.ps.gz zipped ps]
==2005 ...==
* [[Levente Kocsis]], [[Csaba Szepesvári]], [[Mark Winands]] ('''2005'''). ''[http://link.springer.com/chapter/10.1007/11922155_4 RSPSA: Enhanced Parameter Optimization in Games]''. [[Advances in Computer Games 11]], [http://www.sztaki.hu/~szcsaba/papers/rspsa_acg.pdf pdf]
* [[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://link.springer.com/article/10.1007/s10994-006-6888-8 Universal Parameter Optimisation in Games Based on SPSA]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Special Issue on Machine Learning and Games, Vol. 63, No. 3
* [[Levente Kocsis]], [[Csaba Szepesvári]] ('''2006'''). ''[http://www.computer-go.info/resources/bandit.html Bandit based Monte-Carlo Planning]''. ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing [[UCT]], [http://www.sztaki.hu/%7Eszcsaba/papers/ecml06.pdf pdf]
* [[Levente Kocsis]], [[Csaba Szepesvári]], [[Jan Willemson]] ('''2006'''). ''Improved Monte-Carlo Search''. [http://www.sztaki.hu/~szcsaba/papers/cg06-ext.pdf pdf]
* [http://www.szit.bme.hu/~gya/ András György], [[Levente Kocsis]], [http://dblp.uni-trier.de/pers/hd/s/Szab=oacute=:Ivett Ivett Szabó], [[Csaba Szepesvári]] ('''2007'''). ''Continuous Time Associative Bandit Problems'' IJCAI-07, 830-835. [http://www.sztaki.hu/~szcsaba/papers/cbandit-ijcai07.pdf pdf]
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2007'''). ''Tuning Bandit Algorithms in Stochastic Environments''. [http://certis.enpc.fr/~audibert/ucb_alt.pdf pdf]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation'', [http://www.sztaki.hu/~szcsaba/papers/gtdnips08.pdf pdf] (draft)
* [[Rémi Munos]], [[Csaba Szepesvári]] ('''2008'''). ''Finite time bounds for sampling based fitted value iteration''. Journal of Machine Learning Research, 9:815-857, 2008. [http://hal.inria.fr/docs/00/26/09/34/PDF/savi_1.5.pdf pdf], [http://www.ualberta.ca/~szepesva/papers/munos08a.pdf pdf]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' Accepted in Advances in Neural Information Processing Systems 22, Vancouver, BC. December 2009. MIT Press. [http://books.nips.cc/papers/files/nips22/NIPS2009_1121.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
* [[Jean-Yves Audibert]], [[Rémi Munos]], [[Csaba Szepesvári]] ('''2009'''). ''Exploration-exploitation trade-off using variance estimates in multi-armed bandits''. Theoretical Computer Science, 410:1876-1902, 2009, [http://www.ualberta.ca/~szepesva/papers/ucbtuned-journal.pdf pdf]
==2010 ...==
* [[Csaba Szepesvári]] ('''2010'''). ''[https://sites.ualberta.ca/~szepesva/RLBook.html Algorithms for Reinforcement Learning]''. Morgan & Claypool
* [[István Szita]], [[Csaba Szepesvári]] ('''2010'''). ''Model-based reinforcement learning with nearly tight exploration complexity bounds''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2010.html#SzitaS10 ICML 2010]
* [[István Szita]], [[Csaba Szepesvári]] ('''2011'''). ''Agnostic KWIK learning and efficient approximate reinforcement learning''. [http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/jmlrp19.html#SzitaS11 Journal of Machine Learning Research - Proceedings Track 19]
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]
==2015 ...==
* [[Tor Lattimore]], [[Csaba Szepesvári]] ('''2017'''). ''The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits''. [https://www.aistats.org/aistats2017/ AISTATS], [https://sites.ualberta.ca/~szepesva/papers/linbandits_aistats17.pdf pdf]
* [[Tor Lattimore]], [[Csaba Szepesvári]] ('''2017'''). ''Cleaning up the neighborhood: A full classification for adversarial partial monitoring''. [https://arxiv.org/abs/1805.09247 arXiv:1805.09247]

=External Links=
* [http://www.ualberta.ca/~szepesva/ Homepage of Csaba Szepesvári] from [[University of Alberta]]
* [http://banditalgs.com/ Bandit Algorithms]
* [https://scholar.google.com/citations?user=zvC19mQAAAAJ&hl=en Csaba Szepesvari - Google Scholar Citations]
* [http://www.sztaki.hu/%7Eszcsaba/ Csaba, Szepesvári, PhD. Senior Research Scientist] from [https://en.wikipedia.org/wiki/Hungarian_Academy_of_Sciences Hungarian Academy of Sciences]
* [http://videolectures.net/mlss08au_szepesvari_rele/ Introduction to Reinforcement Learning], [https://en.wikipedia.org/wiki/VideoLectures.net videolecture] by Csaba Szepesvári, 2008

=References=
<references />

'''[[People|Up one level]]'''

Navigation menu