Difference between revisions of "Arthur Guez"

From Chessprogramming wiki
Jump to: navigation, search
(Created page with "'''Home * People * Arthur Guez''' FILE:aguez.jpg|border|right|thumb|link=http://www.gatsby.ucl.ac.uk/~aguez/| Arthur Guez <ref>Image clipped from [http://...")
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
a Canadian computer and neuro scientist, currently researcher at [[Google]] [[DeepMind]] with expertise in [[Learning|machine learning]], in particular [[Deep Learning|deep learning]], and involved in the [[AlphaGo]] and [[AlphaZero]] projects. He holds a M.Sc. in machine learning from [[McGill University]] in 2010 and a Ph.D. from ''Gatsby Computational Neuroscience Unit'' at [https://en.wikipedia.org/wiki/University_College_London University College London] in 2015 titled ''Sample-based Search Methods for Bayes-Adaptive Planning'', where he was supervised by [[Peter Dayan]] and [[David Silver]].  
 
a Canadian computer and neuro scientist, currently researcher at [[Google]] [[DeepMind]] with expertise in [[Learning|machine learning]], in particular [[Deep Learning|deep learning]], and involved in the [[AlphaGo]] and [[AlphaZero]] projects. He holds a M.Sc. in machine learning from [[McGill University]] in 2010 and a Ph.D. from ''Gatsby Computational Neuroscience Unit'' at [https://en.wikipedia.org/wiki/University_College_London University College London] in 2015 titled ''Sample-based Search Methods for Bayes-Adaptive Planning'', where he was supervised by [[Peter Dayan]] and [[David Silver]].  
  
 +
=Ph.D. Thesis=
 
In his Ph.D. thesis, Arthur Guez elaborates on [[Search|search]] and [[Planning|planning]] methods in the face of [https://en.wikipedia.org/wiki/Uncertainty uncertainty] about the environment inducing the [https://en.wikipedia.org/wiki/Exploration exploration] versus [https://en.wikipedia.org/wiki/Exploitation exploitation] trade-off of an [https://en.wikipedia.org/wiki/Agent-based_model agent-based model] to [https://en.wikipedia.org/wiki/Optimization_problem optimize] the return by maintaining a [https://en.wikipedia.org/wiki/Posterior_probability posterior distribution] over possible environments considering all possible future paths. This optimization is equivalent to solving a [https://en.wikipedia.org/wiki/Markov_decision_process Markov decision process] (MDP) whose hyperstate comprises the agent’s beliefs about the environment, as well as its current state in that environment - the corresponding process is called a [https://en.wikipedia.org/wiki/Bayes%27_theorem Bayes-Adaptive] MDP (BAMDP), also using a tailored [[Monte-Carlo Tree Search|Monte-Carlo tree search]]. In ''historical notes on Bayesian Adaptive control'', Arthur Guez mentions [[Mathematician#AWald|Abraham Wald's]] [[Match Statistics#SPRT|Sequential Probability Ratio Test (SPRT)]] <ref>[[Mathematician#AWald|Abraham Wald]] ('''1945'''). ''Sequential Tests of Statistical Hypotheses''. [https://en.wikipedia.org/wiki/Annals_of_Mathematical_Statistics Annals of Mathematical Statistics], Vol. 16, No. 2, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: [http://projecteuclid.org/euclid.aoms/1177731118 10.1214/aoms/1177731118]</ref>, and that [[Alan Turing]] assisted by [[Jack Good]]  used a similar sequential testing technique to help decipher [https://en.wikipedia.org/wiki/Enigma_machine enigma codes] at [https://en.wikipedia.org/wiki/Bletchley_Park Bletchley Park] <ref>[[Jack Good]] ('''1979'''). ''[https://www.jstor.org/stable/2335677 Studies in the history of probability and statistics. XXXVII AM Turing’s statistical work in World War II]''. [https://en.wikipedia.org/wiki/Biometrika Biometrika], Vol. 66, No. 2</ref> <ref>[[Arthur Guez]]  ('''2015'''). ''Sample-based Search Methods for Bayes-Adaptive Planning''. Ph.D. thesis, Gatsby Computational Neuroscience Unit, [https://en.wikipedia.org/wiki/University_College_London University College London], [http://www.gatsby.ucl.ac.uk/~aguez/files/guez_phdthesis2015.pdf pdf]</ref>.
 
In his Ph.D. thesis, Arthur Guez elaborates on [[Search|search]] and [[Planning|planning]] methods in the face of [https://en.wikipedia.org/wiki/Uncertainty uncertainty] about the environment inducing the [https://en.wikipedia.org/wiki/Exploration exploration] versus [https://en.wikipedia.org/wiki/Exploitation exploitation] trade-off of an [https://en.wikipedia.org/wiki/Agent-based_model agent-based model] to [https://en.wikipedia.org/wiki/Optimization_problem optimize] the return by maintaining a [https://en.wikipedia.org/wiki/Posterior_probability posterior distribution] over possible environments considering all possible future paths. This optimization is equivalent to solving a [https://en.wikipedia.org/wiki/Markov_decision_process Markov decision process] (MDP) whose hyperstate comprises the agent’s beliefs about the environment, as well as its current state in that environment - the corresponding process is called a [https://en.wikipedia.org/wiki/Bayes%27_theorem Bayes-Adaptive] MDP (BAMDP), also using a tailored [[Monte-Carlo Tree Search|Monte-Carlo tree search]]. In ''historical notes on Bayesian Adaptive control'', Arthur Guez mentions [[Mathematician#AWald|Abraham Wald's]] [[Match Statistics#SPRT|Sequential Probability Ratio Test (SPRT)]] <ref>[[Mathematician#AWald|Abraham Wald]] ('''1945'''). ''Sequential Tests of Statistical Hypotheses''. [https://en.wikipedia.org/wiki/Annals_of_Mathematical_Statistics Annals of Mathematical Statistics], Vol. 16, No. 2, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: [http://projecteuclid.org/euclid.aoms/1177731118 10.1214/aoms/1177731118]</ref>, and that [[Alan Turing]] assisted by [[Jack Good]]  used a similar sequential testing technique to help decipher [https://en.wikipedia.org/wiki/Enigma_machine enigma codes] at [https://en.wikipedia.org/wiki/Bletchley_Park Bletchley Park] <ref>[[Jack Good]] ('''1979'''). ''[https://www.jstor.org/stable/2335677 Studies in the history of probability and statistics. XXXVII AM Turing’s statistical work in World War II]''. [https://en.wikipedia.org/wiki/Biometrika Biometrika], Vol. 66, No. 2</ref> <ref>[[Arthur Guez]]  ('''2015'''). ''Sample-based Search Methods for Bayes-Adaptive Planning''. Ph.D. thesis, Gatsby Computational Neuroscience Unit, [https://en.wikipedia.org/wiki/University_College_London University College London], [http://www.gatsby.ucl.ac.uk/~aguez/files/guez_phdthesis2015.pdf pdf]</ref>.
  
Line 24: Line 25:
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]
 
* [[Arthur Guez]], [[Théophane Weber]], [[Ioannis Antonoglou]], [[Karen Simonyan]], [[Oriol Vinyals]], [[Daan Wierstra]], [[Rémi Munos]], [[David Silver]] ('''2018'''). ''Learning to Search with MCTSnets''. [https://arxiv.org/abs/1802.04697 arXiv:1802.04697]
 
* [[Arthur Guez]], [[Théophane Weber]], [[Ioannis Antonoglou]], [[Karen Simonyan]], [[Oriol Vinyals]], [[Daan Wierstra]], [[Rémi Munos]], [[David Silver]] ('''2018'''). ''Learning to Search with MCTSnets''. [https://arxiv.org/abs/1802.04697 arXiv:1802.04697]
 +
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2018'''). ''[http://science.sciencemag.org/content/362/6419/1140 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play]''. [https://en.wikipedia.org/wiki/Science_(journal) Science], Vol. 362, No. 6419 <ref>[https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/ AlphaZero: Shedding new light on the grand games of chess, shogi and Go] by [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]] and [[Demis Hassabis]], [[DeepMind]], December 03, 2018</ref>
 +
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265]
  
 
=External Links=
 
=External Links=
Line 32: Line 35:
 
=References=  
 
=References=  
 
<references />
 
<references />
 
 
'''[[People|Up one level]]'''
 
'''[[People|Up one level]]'''
 +
[[Category:Researcher|Guez]]

Revision as of 21:06, 23 November 2019

Home * People * Arthur Guez

Arthur Guez [1]

Arthur Guez,
a Canadian computer and neuro scientist, currently researcher at Google DeepMind with expertise in machine learning, in particular deep learning, and involved in the AlphaGo and AlphaZero projects. He holds a M.Sc. in machine learning from McGill University in 2010 and a Ph.D. from Gatsby Computational Neuroscience Unit at University College London in 2015 titled Sample-based Search Methods for Bayes-Adaptive Planning, where he was supervised by Peter Dayan and David Silver.

Ph.D. Thesis

In his Ph.D. thesis, Arthur Guez elaborates on search and planning methods in the face of uncertainty about the environment inducing the exploration versus exploitation trade-off of an agent-based model to optimize the return by maintaining a posterior distribution over possible environments considering all possible future paths. This optimization is equivalent to solving a Markov decision process (MDP) whose hyperstate comprises the agent’s beliefs about the environment, as well as its current state in that environment - the corresponding process is called a Bayes-Adaptive MDP (BAMDP), also using a tailored Monte-Carlo tree search. In historical notes on Bayesian Adaptive control, Arthur Guez mentions Abraham Wald's Sequential Probability Ratio Test (SPRT) [2], and that Alan Turing assisted by Jack Good used a similar sequential testing technique to help decipher enigma codes at Bletchley Park [3] [4].

Selected Publications

[5]

2010 ...

2015 ...

External Links

References

Up one level