Changes

Newer edit →

Arthur Guez

8,197 bytes added, 22:42, 3 June 2018

'''[[Main Page|Home]] * [[People]] * Arthur Guez'''

[[FILE:aguez.jpg|border|right|thumb|link=http://www.gatsby.ucl.ac.uk/~aguez/| Arthur Guez <ref>Image clipped from [http://www.gatsby.ucl.ac.uk/~aguez/files/aguez.jpg aguez.jpg], [http://www.gatsby.ucl.ac.uk/~aguez/ Arthur Guez's Homepage]</ref> ]]

'''Arthur Guez''',<br/>
a Canadian computer and neuro scientist, currently researcher at [[Google]] [[DeepMind]] with expertise in [[Learning|machine learning]], in particular [[Deep Learning|deep learning]], and involved in the [[AlphaGo]] and [[AlphaZero]] projects. He holds a M.Sc. in machine learning from [[McGill University]] in 2010 and a Ph.D. from ''Gatsby Computational Neuroscience Unit'' at [https://en.wikipedia.org/wiki/University_College_London University College London] in 2015 titled ''Sample-based Search Methods for Bayes-Adaptive Planning'', where he was supervised by [[Peter Dayan]] and [[David Silver]].

In his Ph.D. thesis, Arthur Guez elaborates on [[Search|search]] and [[Planning|planning]] methods in the face of [https://en.wikipedia.org/wiki/Uncertainty uncertainty] about the environment inducing the [https://en.wikipedia.org/wiki/Exploration exploration] versus [https://en.wikipedia.org/wiki/Exploitation exploitation] trade-off of an [https://en.wikipedia.org/wiki/Agent-based_model agent-based model] to [https://en.wikipedia.org/wiki/Optimization_problem optimize] the return by maintaining a [https://en.wikipedia.org/wiki/Posterior_probability posterior distribution] over possible environments considering all possible future paths. This optimization is equivalent to solving a [https://en.wikipedia.org/wiki/Markov_decision_process Markov decision process] (MDP) whose hyperstate comprises the agent’s beliefs about the environment, as well as its current state in that environment - the corresponding process is called a [https://en.wikipedia.org/wiki/Bayes%27_theorem Bayes-Adaptive] MDP (BAMDP), also using a tailored [[Monte-Carlo Tree Search|Monte-Carlo tree search]]. In ''historical notes on Bayesian Adaptive control'', Arthur Guez mentions [[Mathematician#AWald|Abraham Wald's]] [[Match Statistics#SPRT|Sequential Probability Ratio Test (SPRT)]] <ref>[[Mathematician#AWald|Abraham Wald]] ('''1945'''). ''Sequential Tests of Statistical Hypotheses''. [https://en.wikipedia.org/wiki/Annals_of_Mathematical_Statistics Annals of Mathematical Statistics], Vol. 16, No. 2, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: [http://projecteuclid.org/euclid.aoms/1177731118 10.1214/aoms/1177731118]</ref>, and that [[Alan Turing]] assisted by [[Jack Good]] used a similar sequential testing technique to help decipher [https://en.wikipedia.org/wiki/Enigma_machine enigma codes] at [https://en.wikipedia.org/wiki/Bletchley_Park Bletchley Park] <ref>[[Jack Good]] ('''1979'''). ''[https://www.jstor.org/stable/2335677 Studies in the history of probability and statistics. XXXVII AM Turing’s statistical work in World War II]''. [https://en.wikipedia.org/wiki/Biometrika Biometrika], Vol. 66, No. 2</ref> <ref>[[Arthur Guez]] ('''2015'''). ''Sample-based Search Methods for Bayes-Adaptive Planning''. Ph.D. thesis, Gatsby Computational Neuroscience Unit, [https://en.wikipedia.org/wiki/University_College_London University College London], [http://www.gatsby.ucl.ac.uk/~aguez/files/guez_phdthesis2015.pdf pdf]</ref>.

=Selected Publications=
<ref>[http://dblp.uni-trier.de/pers/hc/g/Guez:Arthur DBLP: Arthur Guez]</ref>
==2010 ...==
* [[Arthur Guez]] ('''2010'''). ''Adaptive control of epileptic seizures using reinforcement learning''. M.Sc. thesis, [[McGill University]], [http://www.gatsby.ucl.ac.uk/~aguez/files/GuezMSc2010.pdf pdf]
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search''. [http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012 NIPS 2012], [https://papers.nips.cc/paper/4767-efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search.pdf pdf]
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf]
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2014'''). ''Better Optimism By Bayes: Adaptive Planning with Rich Models''. [https://arxiv.org/abs/1402.1958 arXiv:1402.1958v1]
* [[Arthur Guez]], [[Nicolas Heess]], [[David Silver]], [[Peter Dayan]] ('''2014'''). ''Bayes-Adaptive Simulation-based Search with Value Function Approximation''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-27-2014 NIPS 2014], [https://papers.nips.cc/paper/5501-bayes-adaptive-simulation-based-search-with-value-function-approximation.pdf pdf]
==2015 ...==
* [[Hado van Hasselt]], [[Arthur Guez]], [[David Silver]] ('''2015'''). ''Deep Reinforcement Learning with Double Q-learning''. [http://arxiv.org/abs/1509.06461 arXiv:1509.06461]
* [[Arthur Guez]] ('''2015'''). ''Sample-based Search Methods for Bayes-Adaptive Planning''. Ph.D. thesis, Gatsby Computational Neuroscience Unit, [https://en.wikipedia.org/wiki/University_College_London University College London], [http://www.gatsby.ucl.ac.uk/~aguez/files/guez_phdthesis2015.pdf pdf]
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
* [[Hado van Hasselt]], [[Arthur Guez]], [[Matteo Hessel]], [[Volodymyr Mnih]], [[David Silver]] ('''2016'''). ''Learning values across many orders of magnitude''. [https://arxiv.org/abs/1602.07714 arXiv:1602.07714v2], [https://nips.cc/Conferences/2016/Schedule?type=Poster NIPS 2016]
* [[David Silver]], [[Julian Schrittwieser]], [[Karen Simonyan]], [[Ioannis Antonoglou]], [[Shih-Chieh Huang|Aja Huang]], [[Arthur Guez]], [[Thomas Hubert]], [[Lucas Baker]], [[Matthew Lai]], [[Adrian Bolton]], [[Yutian Chen]], [[Timothy Lillicrap]], [[Fan Hui]], [[Laurent Sifre]], [[George van den Driessche]], [[Thore Graepel]], [[Demis Hassabis]] ('''2017'''). ''[https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html Mastering the game of Go without human knowledge]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 550 <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref>
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
* [[Arthur Guez]], [[Théophane Weber]], [[Ioannis Antonoglou]], [[Karen Simonyan]], [[Oriol Vinyals]], [[Daan Wierstra]], [[Rémi Munos]], [[David Silver]] ('''2018'''). ''Learning to Search with MCTSnets''. [https://arxiv.org/abs/1802.04697 arXiv:1802.04697]

=External Links=
* [http://www.gatsby.ucl.ac.uk/~aguez/ Arthur Guez's Homepage]
* [https://scholar.google.co.uk/citations?user=iyD9aw8AAAAJ&hl=en Arthur Guez - Google Scholar Citations]
* [https://github.com/acguez acguez (Arthur Guez) · GitHub]

=References=
<references />

'''[[People|Up one level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Arthur Guez

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools