Rémi Munos
Rémi Munos,
a French mathematician and computer scientist at Google DeepMind, from 2000 to 2006 Associate Professor at the Centre de Mathématiques Appliquées, Ecole Polytechnique and later affiliated with INRIA Lille [2]. His research interests covers reinforcement learning, multi-armed bandits, and dynamic programming. Rémi Muno was contributor of the Go playing program Mogo, using Monte-Carlo Tree Search which uses patterns in the simulations and improvements in UCT.
Contents
Selected Publications
1996
- Rémi Munos (1996). A convergent reinforcement learning algorithm in the continuous case : the finite-element reinforcement learning. In International Conference on Machine Learning. Morgan Kaufmann
2005 ...
- Sylvain Gelly, Yizao Wang, Rémi Munos, Olivier Teytaud (2006). Modification of UCT with Patterns in Monte-Carlo Go. INRIA
- Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2007). Tuning Bandit Algorithms in Stochastic Environments. pdf
- Yizao Wang, Jean-Yves Audibert, Rémi Munos (2008). Algorithms for Infinitely Many-Armed Bandits, , Advances in Neural Information Processing Systems, pdf, Supplemental material - pdf
- Rémi Munos, Csaba Szepesvári (2008). Finite time bounds for sampling based fitted value iteration. Journal of Machine Learning Research, 9:815-857, 2008. pdf, pdf
- Raphaël Maîtrepierre, Jérémie Mary, Rémi Munos (2008). Adaptive play in Texas Hold'em Poker. ECAI 2008
- Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2009). Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410:1876-1902, 2009, pdf
- Vincent Berthier, Amine Bourki, Matthieu Coulm, Guillaume Chaslot, Christophe Fiter, Sylvain Gelly, Jean-Baptiste Hoock, Rémi Munos, Julien Pérez, Arpad Rimmel, Philippe Rolet, Olivier Teytaud, Paul Vayssière, Yizao Wang, Ziqin Yu (et al.) (2009). Computer-Go is not only for Go. Korea, August 2009 slides as pdf
2010 ...
- Rémi Munos (2010). Approximate dynamic programming. In Olivier Sigaud and Olivier Buffet, editors, Markov Decision Processes in Artificial Intelligence, chapter 3, pages 67-98. ISTE Ltd and John Wiley & Sons Inc., pdf
- Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos (2014). Regret bounds for restless Markov bandits. Theoretical Computer Science 558, pdf
- Rémi Munos (2014). From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. Foundations and Trends in Machine Learning, Vol. 7, No 1, hal-00747575v5, slides as pdf
- Tor Lattimore, Rémi Munos (2014). Bounded Regret for Finite-Armed Structured Bandits. arXiv:1411.2919
2015 ...
- Audrūnas Gruslys, Rémi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves (2016). Memory-Efficient Backpropagation Through Time. arXiv:1606.03401
- Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Rémi Munos, Charles Blundell, Dharshan Kumaran, Matthew Botvinick (2016). Learning to reinforcement learn. arXiv:1611.05763
- Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver (2018). Learning to Search with MCTSnets. arXiv:1802.04697