Changes

Newer edit →

David Silver

16,155 bytes added, 20:58, 31 May 2018

'''[[Main Page|Home]] * [[People]] * David Silver'''

[[FILE:Dave_Silver.jpg|border|right|thumb|link=http://www.cs.ucl.ac.uk/staff/D.Silver/web/Home.html| David Silver <ref>[http://www.cs.ucl.ac.uk/staff/D.Silver/web/Home.html David Silver]</ref> ]]

'''David Silver''',<br/>
a British computer scientist at [[Google]] [[DeepMind]], and co-author of [[AlphaGo]] and [[AlphaZero]]. Before, since 2010, he was researcher at [https://en.wikipedia.org/wiki/University_College_London University College London], postdoc at [[Massachusetts Institute of Technology]] <ref>[http://www.cs.ucl.ac.uk/staff/D.Silver/web/Home.html Research Staff > David Silver]</ref>, Ph.D student and postdoc at [[University of Alberta]], and [https://en.wikipedia.org/wiki/Chief_technology_officer CTO] for [https://en.wikipedia.org/wiki/Elixir_Studios Elixir Studios] and lead programmer on the [[IBM PC|PC]] [https://en.wikipedia.org/wiki/Strategy_video_game strategy game] [https://en.wikipedia.org/wiki/Republic:_The_Revolution Republic: the Revolution] <ref>[http://www.cs.ucl.ac.uk/staff/D.Silver/web/Applications.html David Silver - Applications]</ref>. His research interests covers simulation-based search, [[Reinforcement Learning|reinforcement learning]], and cooperative [https://en.wikipedia.org/wiki/Pathfinding pathfinding].

=See also=
* [[AlphaGo]]
* [[AlphaZero]]
* [[Reinforcement Learning#MOOC|Reinforcement Learning Course]]

=Selected Publications=
<ref>[http://www.cs.ucl.ac.uk/staff/D.Silver/web/Publications.html David Silver - Publications]</ref> <ref>[http://academic.research.microsoft.com/Author/594626/david-m-silver David M. Silver] from [[Microsoft Academic Search]]</ref> <ref>[http://jveness.info/publications/default.html Joel Veness - Publications]</ref>
==2006 ...==
* [[David Silver]] ('''2006'''). ''Cooperative Pathﬁnding''. In AI Game Programming Wisdom 3, pages 99–111. Charles River Media, [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/coop-path-AIWisdom.pdf pdf]
'''2007'''
* [[David Silver]], [[Richard Sutton]], [[Martin Müller]] ('''2007'''). ''Reinforcement learning of local shape in the game of Go''.[http://www.ai.upf.edu/publications/ijcai-2007-proceedings-20th-international-joint-conference-artificial-intelligence 20th IJCAI], [http://webdocs.cs.ualberta.ca/~mmueller/ps/silver-ijcai2007.pdf pdf], [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Publications_files/local-shape.pdf pdf]
* [[Sylvain Gelly]], [[David Silver]] ('''2007'''). ''Combining Online and Offline Knowledge in UCT.'' [http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf pdf]
'''2008'''
* [[David Silver]], [[Richard Sutton]] and [[Martin Müller]] ('''2008'''). ''Sample-Based Learning and Search with Permanent and Transient Memories''. In Proceedings of the 25th International Conference on Machine Learning, [http://icml2008.cs.helsinki.fi/papers/564.pdf pdf]
* [[Sylvain Gelly]], [[David Silver]] ('''2008'''). ''Achieving Master Level Play in 9 x 9 Computer Go.'' [http://www.lri.fr/~gelly/paper/MoGoNectar.pdf pdf]
'''2009'''
* [[David Silver]], [[Gerald Tesauro]] ('''2009'''). ''Monte-Carlo Simulation Balancing''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09).
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]] and [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.'' Accepted in Advances in Neural Information Processing Systems 22, Vancouver, BC. December 2009. MIT Press. [http://books.nips.cc/papers/files/nips22/NIPS2009_1121.pdf pdf]
* [[Joel Veness]], [[David Silver]], [[William Uther]], [[Alan Blair]] ('''2009'''). ''[http://papers.nips.cc/paper/3722-bootstrapping-from-game-tree-search Bootstrapping from Game Tree Search]''. [http://webdocs.cs.ualberta.ca/~silver/David_Silver/Applications_files/bootstrapping.pdf pdf]
* [[David Silver]] ('''2009'''). ''Reinforcement Learning and Simulation-Based Search''. Ph.D. thesis, [[University of Alberta]], [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/thesis.pdf pdf]
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2009'''). ''A Monte Carlo AIXI Approximation'', [http://jveness.info/publications/arXive2009%20-%20a%20monte%20carlo%20aixi%20approximation.pdf pdf]
* [[David Silver]], [[Gerald Tesauro]] ('''2009'''). ''Monte-Carlo Simulation Balancing''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2009.html#SilverT09 ICML 2009], [http://www.machinelearning.org/archive/icml2009/papers/500.pdf pdf]
==2010 ...==
* [[Joel Veness]], [[Kee Siong Ng]], [[Marcus Hutter]], [[David Silver]] ('''2010'''). ''Reinforcement Learning via AIXI Approximation''. Association for the Advancement of Artificial Intelligence (AAAI), [http://jveness.info/publications/veness_rl_via_aixi_approx.pdf pdf]
'''2011'''
* [[Sylvain Gelly]], [[David Silver]] ('''2011'''). ''Monte-Carlo tree search and rapid action value estimation in computer Go''. [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 175, No. 11
* [[Sylvain Gelly]], [[Marc Schoenauer]], [[Michèle Sebag]], [[Olivier Teytaud]], [[Levente Kocsis]], [[David Silver]], [[Csaba Szepesvári]] ('''2012'''). ''[http://dl.acm.org/citation.cfm?id=2093548.2093574 The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions]''. [[ACM#Communications|Communications of the ACM]], Vol. 55, No. 3, [http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf pdf preprint]
'''2012'''
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2012'''). ''Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search''. [http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012 NIPS 2012], [https://papers.nips.cc/paper/4767-efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search.pdf pdf]
'''2013'''
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2013'''). ''Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search''. [https://en.wikipedia.org/wiki/Journal_of_Artificial_Intelligence_Research Journal of Artificial Intelligence Research], Vol. 48, [https://www.jair.org/media/4117/live-4117-7507-jair.pdf pdf]
* [[David Silver]], [[Richard Sutton]], [[Martin Müller|Martin Mueller]] ('''2013'''). ''Temporal-Difference Search in Computer Go''. Proceedings of the [http://icaps13.icaps-conference.org/technical-program/workshop-program/planning-and-learning/ ICAPS-13 Workshop on Planning and Learning], [http://webdocs.cs.ualberta.ca/~sutton/papers/SSM-ICAPS-13.pdf pdf]
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Alex Graves]], [[Ioannis Antonoglou]], [[Daan Wierstra]], [[Martin Riedmiller]] ('''2013'''). ''Playing Atari with Deep Reinforcement Learning''. [http://arxiv.org/abs/1312.5602 arXiv:1312.5602] <ref>[http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ Demystifying Deep Reinforcement Learning] by [http://www.nervanasys.com/author/tambet/ Tambet Matiisen], [http://www.nervanasys.com/ Nervana], December 21, 2015</ref>
'''2014'''
* [[Tom Schaul]], [[Ioannis Antonoglou]], [[David Silver]] ('''2014'''). ''Unit Tests for Stochastic Optimization''. [http://arxiv.org/abs/1312.6055v3 arXiv:1312.6055v3] <ref>[https://github.com/IoannisAntonoglou/optimBench GitHub - IoannisAntonoglou/optimBench: Benchmark testbed for assessing the performance of optimisation algorithms]</ref>
* [[Arthur Guez]], [[David Silver]], [[Peter Dayan]] ('''2014'''). ''Better Optimism By Bayes: Adaptive Planning with Rich Models''. [https://arxiv.org/abs/1402.1958 arXiv:1402.1958v1]
* [[Arthur Guez]], [[Nicolas Heess]], [[David Silver]], [[Peter Dayan]] ('''2014'''). ''Bayes-Adaptive Simulation-based Search with Value Function Approximation''. [https://papers.nips.cc/book/advances-in-neural-information-processing-systems-27-2014 NIPS 2014], [https://papers.nips.cc/paper/5501-bayes-adaptive-simulation-based-search-with-value-function-approximation.pdf pdf]
* [[Chris J. Maddison]], [[Shih-Chieh Huang|Aja Huang]], [[Ilya Sutskever]], [[David Silver]] ('''2014'''). ''Move Evaluation in Go Using Deep Convolutional Neural Networks''. [http://arxiv.org/abs/1412.6564v1 arXiv:1412.6564v1] <ref>[http://computer-go.org/pipermail/computer-go/2014-December/007046.html Move Evaluation in Go Using Deep Convolutional Neural Networks] by [[Shih-Chieh Huang|Aja Huang]], [http://computer-go.org/pipermail/computer-go/ The Computer-go Archives], December 19, 2014</ref> » [[Go#CNN|DCNN in Go]]
* [[Johannes Heinrich]], [[David Silver]] ('''2014'''). ''[https://www.aaai.org/ocs/index.php/WS/AAAIW14/paper/view/8811 Self-Play Monte-Carlo Tree Search in Computer Poker]''. [[AAAI|AAAI-14 Workshop]]
==2015 ...==
* [[Johannes Heinrich]], [[Marc Lanctot]], [[David Silver]] ('''2015'''). ''Fictitious Self-Play in Extensive-Form Games''. [http://proceedings.mlr.press/v37/ JMLR: W&CP, Vol. 37], [http://proceedings.mlr.press/v37/heinrich15.pdf pdf]
* [[Johannes Heinrich]], [[David Silver]] ('''2015'''). ''Smooth UCT Search in Computer Poker''. [[Conferences#IJCA2015|IJCAI 2015]], [http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/smooth_uct.pdf pdf]
* [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]], [[Andrei A. Rusu]], [[Joel Veness]], [[Marc G. Bellemare]], [[Alex Graves]], [[Martin Riedmiller]], [[Andreas K. Fidjeland]], [[Georg Ostrovski]], [[Stig Petersen]], [[Charles Beattie]], [[Amir Sadik]], [[Ioannis Antonoglou]], [[Helen King]], [[Dharshan Kumaran]], [[Daan Wierstra]], [[Shane Legg]], [[Demis Hassabis]] ('''2015'''). ''[http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Human-level control through deep reinforcement learning]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 518
* [[Arun Nair]], [[Praveen Srinivasan]], [[Sam Blackwell]], [[Cagdas Alcicek]], [[Rory Fearon]], [[Alessandro De Maria]], [[Veda Panneershelvam]], [[Mustafa Suleyman]], [[Charles Beattie]], [[Stig Petersen]], [[Shane Legg]], [[Volodymyr Mnih]], [[Koray Kavukcuoglu]], [[David Silver]] ('''2015'''). ''Massively Parallel Methods for Deep Reinforcement Learning''. [http://arxiv.org/abs/1507.04296 arXiv:1507.04296]
* [[Timothy Lillicrap]], [[Jonathan J. Hunt]], [[Alexander Pritzel]], [[Nicolas Heess]], [[Tom Erez]], [[Yuval Tassa]], [[David Silver]], [[Daan Wierstra]] ('''2015'''). ''Continuous Control with Deep Reinforcement Learning''. [https://arxiv.org/abs/1509.02971 arXiv:1509.02971]
* [[Hado van Hasselt]], [[Arthur Guez]], [[David Silver]] ('''2015'''). ''Deep Reinforcement Learning with Double Q-learning''. [http://arxiv.org/abs/1509.06461 arXiv:1509.06461]
* [[Tom Schaul]], [[John Quan]], [[Ioannis Antonoglou]], [[David Silver]] ('''2015'''). ''Prioritized Experience Replay''. [http://arxiv.org/abs/1511.05952 arXiv:1511.05952]
* [[Nicolas Heess]], [[Jonathan J. Hunt]], [[Timothy Lillicrap]], [[David Silver]] ('''2015'''). ''Memory-based control with recurrent neural networks''. [https://arxiv.org/abs/1512.04455 arXiv:1512.04455]
'''2016'''
* [[David Silver]], [[Shih-Chieh Huang|Aja Huang]], [[Chris J. Maddison]], [[Arthur Guez]], [[Laurent Sifre]], [[George van den Driessche]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Veda Panneershelvam]], [[Marc Lanctot]], [[Sander Dieleman]], [[Dominik Grewe]], [[John Nham]], [[Nal Kalchbrenner]], [[Ilya Sutskever]], [[Timothy Lillicrap]], [[Madeleine Leach]], [[Koray Kavukcuoglu]], [[Thore Graepel]], [[Demis Hassabis]] ('''2016'''). ''[http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html Mastering the game of Go with deep neural networks and tree search]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 529 » [[AlphaGo]]
* [[Volodymyr Mnih]], [[Adrià Puigdomènech Badia]], [[Mehdi Mirza]], [[Alex Graves]], [[Timothy Lillicrap]], [[Tim Harley]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Asynchronous Methods for Deep Reinforcement Learning''. [https://arxiv.org/abs/1602.01783 arXiv:1602.01783v2]
* [[Max Jaderberg]], [[Volodymyr Mnih]], [[Wojciech Marian Czarnecki]], [[Tom Schaul]], [[Joel Z. Leibo]], [[David Silver]], [[Koray Kavukcuoglu]] ('''2016'''). ''Reinforcement Learning with Unsupervised Auxiliary Tasks''. [https://arxiv.org/abs/1611.05397v1 arXiv:1611.05397v1]
* [[Hado van Hasselt]], [[Arthur Guez]], [[Matteo Hessel]], [[Volodymyr Mnih]], [[David Silver]] ('''2016'''). ''Learning values across many orders of magnitude''. [https://arxiv.org/abs/1602.07714 arXiv:1602.07714v2], [https://nips.cc/Conferences/2016/Schedule?type=Poster NIPS 2016]
* [[Johannes Heinrich]], [[David Silver]] ('''2016'''). ''Deep Reinforcement Learning from Self-Play in Imperfect-Information Games''. [https://arxiv.org/abs/1603.01121 arXiv:1603.01121]
'''2017'''
* [[David Silver]], [[Julian Schrittwieser]], [[Karen Simonyan]], [[Ioannis Antonoglou]], [[Shih-Chieh Huang|Aja Huang]], [[Arthur Guez]], [[Thomas Hubert]], [[Lucas Baker]], [[Matthew Lai]], [[Adrian Bolton]], [[Yutian Chen]], [[Timothy Lillicrap]], [[Fan Hui]], [[Laurent Sifre]], [[George van den Driessche]], [[Thore Graepel]], [[Demis Hassabis]] ('''2017'''). ''[https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html Mastering the game of Go without human knowledge]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 550 <ref>[https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017</ref>
* [[Marc Lanctot]], [[Vinícius Flores Zambaldi]], [[Audrunas Gruslys]], [[Angeliki Lazaridou]], [[Karl Tuyls]], [[Julien Pérolat]], [[David Silver]], [[Thore Graepel]] ('''2017'''). ''A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning''. [https://arxiv.org/abs/1711.00832 arXiv:1711.00832]
* [[David Silver]], [[Thomas Hubert]], [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Matthew Lai]], [[Arthur Guez]], [[Marc Lanctot]], [[Laurent Sifre]], [[Dharshan Kumaran]], [[Thore Graepel]], [[Timothy Lillicrap]], [[Karen Simonyan]], [[Demis Hassabis]] ('''2017'''). ''Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm''. [https://arxiv.org/abs/1712.01815 arXiv:1712.01815] » [[AlphaZero]]

=External Links=
* [http://www.cs.ucl.ac.uk/staff/D.Silver/web/Home.html David Silver homepage]
* [https://scholar.google.com/citations?user=-8DNE4UAAAAJ&hl=en David Silver - Google Scholar Citation]
* [http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html Advanced Topics: RL] by [[David Silver]]
* [http://videolectures.net/icml09_silver_mcsb/ Monte-Carlo Simulation Balancing - videolectures.net] <ref>[[David Silver]], [[Gerald Tesauro]] ('''2009'''). ''Monte-Carlo Simulation Balancing''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2009.html#SilverT09 ICML 2009], [http://www.machinelearning.org/archive/icml2009/papers/500.pdf pdf]</ref>
* [https://deepmind.com/blog/alphagos-next-move/ AlphaGo's next move] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], May 27, 2017
* [https://deepmind.com/blog/alphago-zero-learning-scratch/ AlphaGo Zero: Learning from scratch] by [[Demis Hassabis]] and [[David Silver]], [[DeepMind]], October 18, 2017
: <span id="AlphaGoZeroVideo"></span>[https://youtu.be/WXHFqTvfFSw AlphaGo Zero: Discovering new knowledge] by [[David Silver]], [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=WXHFqTvfFSw|alignment=left|valignment=top}}

=References=
<references />

'''[[People|Up one level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

David Silver

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools