Gerald Tesauro
Home * People * Gerald Tesauro
Gerald Tesauro,
an American physicist, computer scientist and games researcher at IBM Watson Research Center and pioneer in applying Neural Networks, Reinforcement Learning and Temporal Difference Learning to stochastic games, especially the game of Backgammon. More recently, he is member of the team around David Ferrucci, which developed Watson [2]. He explained that Watson's wagers were based on its confidence level for the category and a complex regression model called the Game State Evaluator [3]. Gerald Tesauro holds a Ph.D. in theoretical physics from Princeton University. He is section editor of the ICGA Journal.
Programs
Gerald Tesauro is author of the Backgammon programs Neurogammon, which won the Gold medal at the 1st Computer Olympiad 1989 in London, and TD-Gammon, improved by TD-Lambda based Temporal Difference Learning [4], and along with Martin Müller, Broderick Arneson, Richard Segal, Markus Enzenberger and Arpad Rimmel (since 2010), co-author of the Go playing program Fuego [5], which won the Gold medal at the 14th Computer Olympiad in 9x9 Go, as well the Silver medal in 19x19 Go [6] [7].
Selected Publications
1987 ...
- Gerald Tesauro, Terrence J. Sejnowski (1987). A 'Neural' Network that Learns to Play Backgammon. NIPS 1987
- Gerald Tesauro (1988). Connectionist Learning of Expert Backgammon Evaluations. ML, 1988
- Gerald Tesauro (1988). Neural network defeats creator in backgammon match. Technical report no. CCSR-88-6, Center for Complex Systems Research, University of Illinois at Urbana-Champaign
- Gerald Tesauro (1989). Connections learning of expert preferences by comparison training. Advances in Neural Information Processing Systems Morgan Kaufman.
- Gerald Tesauro (1989). NEUROGAMMON: A Neural-Network Backgammon Learning Program. Heuristic Programming in Artificial Intelligence 1
- Gerald Tesauro (1989). Neurogammon Wins Computer Olympiad. Neural Computation Vol. 1, No. 3
- Gerald Tesauro, Terrence J. Sejnowski (1989). A Parallel Network that Learns to Play Backgammon. Artificial Intelligence, Vol. 39, No. 3
1990 ...
- Gerald Tesauro (1992). Temporal Difference Learning of Backgammon Strategy. ML 1992
- Gerald Tesauro (1992). Practical Issues in Temporal Difference Learning. Machine Learning, Vol. 8, No. 3-4
- Gerald Tesauro (1994). TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation Vol. 6, No. 2
- Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3
2000 ...
- Gerald Tesauro (2001). Comparison Training of Chess Evaluation Functions. In Johannes Fürnkranz, Miroslav Kubat (eds.) (2001). Machines that learn to play games, 117–130, Nova Science Publishers » Automated Tuning, SCP, Deep Blue
- Gerald Tesauro (2002). Programming backgammon using self-teaching neural nets. Artificial Intelligence Vol. 134 No. 1-2
- Gerald Tesauro (2007). Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies. IEEE Internet Computing Vol. 11
- David Silver, Gerald Tesauro (2009). Monte-Carlo Simulation Balancing. ICML 2009, pdf [10]
External Links
References
- ↑ IBM Research: Watson’s wagering strategies by Gerald Tesauro, February 13, 2011
- ↑ IBM Press room - 2011-08-29 IBM and Jeopardy! Relive History with Encore Presentation of Jeopardy!: The IBM Challenge - United States
- ↑ IBM Research: Watson’s wagering strategies by Gerald Tesauro, February 13, 2011
- ↑ Richard Sutton, Andrew Barto (1998). Reinforcement Learning: An Introduction. MIT Press, 11.1 TD-Gammon
- ↑ Fuego from sourceforge
- ↑ Markus Enzenberger, Martin Müller (2009). Fuego - An Open-source Framework for Board Games and Go Engine Based on Monte-Carlo Tree Search. Tecnical Report
- ↑ Martin Müller (2009). Fuego at the Computer Olympiad in Pamplona 2009: A Tournament Report.
- ↑ ICGA Reference Database (pdf)
- ↑ DBLP: Gerald Tesauro
- ↑ Monte-Carlo Simulation Balancing - videolectures.net