Difference between revisions of "Learning"

From Chessprogramming wiki
Jump to: navigation, search
Line 224: Line 224:
 
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]] Vol. 38, No. 3
 
* [[Gerald Tesauro]] ('''1995'''). ''Temporal Difference Learning and TD-Gammon''. [[ACM#Communications|Communications of the ACM]] Vol. 38, No. 3
 
* [[Sebastian Thrun]] ('''1995'''). ''[http://robots.stanford.edu/papers/thrun.nips7.neuro-chess.html Learning to Play the Game of Chess]''. in [[Gerald Tesauro]], [https://en.wikipedia.org/wiki/David_S._Touretzky David S. Touretzky], [http://mitpress.mit.edu/authors/todd-k-leen Todd K. Leen] (eds.) Advances in Neural Information Processing Systems 7, [https://en.wikipedia.org/wiki/MIT_Press MIT Press]  
 
* [[Sebastian Thrun]] ('''1995'''). ''[http://robots.stanford.edu/papers/thrun.nips7.neuro-chess.html Learning to Play the Game of Chess]''. in [[Gerald Tesauro]], [https://en.wikipedia.org/wiki/David_S._Touretzky David S. Touretzky], [http://mitpress.mit.edu/authors/todd-k-leen Todd K. Leen] (eds.) Advances in Neural Information Processing Systems 7, [https://en.wikipedia.org/wiki/MIT_Press MIT Press]  
* [[Marco Wiering]] ('''1995'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&citation_for_view=xVas0I8AAAAJ:roLk4NBRz8UC TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures]''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], [http://webber.physik.uni-freiburg.de/~hon/vorlss02/Literatur/reinforcement/GameEvaluationWithNeuronal.pdf pdf]
+
* [[Marco Wiering]] ('''1995'''). ''TD Learning of Game Evaluation Functions with Hierarchical Neural Architectures''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], [http://webber.physik.uni-freiburg.de/~hon/vorlss02/Literatur/reinforcement/GameEvaluationWithNeuronal.pdf pdf]
 
* [[Mathematician#MAArbib|Michael A. Arbib]] (ed.) ('''1995, 2002'''). ''[http://mitpress.mit.edu/books/handbook-brain-theory-and-neural-networks The Handbook of Brain Theory and Neural Networks]''. [https://en.wikipedia.org/wiki/MIT_Press The MIT Press]
 
* [[Mathematician#MAArbib|Michael A. Arbib]] (ed.) ('''1995, 2002'''). ''[http://mitpress.mit.edu/books/handbook-brain-theory-and-neural-networks The Handbook of Brain Theory and Neural Networks]''. [https://en.wikipedia.org/wiki/MIT_Press The MIT Press]
 
* [[Nicol N. Schraudolph]] ('''1995'''). ''[http://nic.schraudolph.org/bib2html/b2hd-Schraudolph95 Optimization of Entropy with Neural Networks]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego]
 
* [[Nicol N. Schraudolph]] ('''1995'''). ''[http://nic.schraudolph.org/bib2html/b2hd-Schraudolph95 Optimization of Entropy with Neural Networks]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego]
Line 256: Line 256:
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97a.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps]
 
* [[William Uther]], [[Manuela Veloso|Manuela M. Veloso]] ('''1997'''). ''Generalizing Adversarial Reinforcement Learning''. [[Carnegie Mellon University]], [http://www.cse.unsw.edu.au/~willu/w/papers/Uther97b.ps ps]
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1997'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:u5HHmVD_uO8C HQ-learning]''. [https://en.wikipedia.org/wiki/Adaptive_Behavior_%28journal%29 Adaptive Behavior], Vol. 6, No 2
+
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1997'''). ''HQ-learning''. [https://en.wikipedia.org/wiki/Adaptive_Behavior_%28journal%29 Adaptive Behavior], Vol. 6, No 2
 
'''1998'''
 
'''1998'''
 
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search'', Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]
 
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''1998'''). ''Knightcap: A chess program that learns by combining td(λ) with game-tree search'', Proceedings of the 15th International Conference on Machine Learning, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8263&rep=rep1&type=pdf pdf] via [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.8263 citeseerX]
Line 272: Line 272:
 
* [[Ryszard Michalski]] ('''1998'''). ''Learnable Evolution: Combining Symbolic and Evolutionary Learning''. Proceedings of the Fourth International Workshop on Multistrategy Learning (MSL'98)
 
* [[Ryszard Michalski]] ('''1998'''). ''Learnable Evolution: Combining Symbolic and Evolutionary Learning''. Proceedings of the Fourth International Workshop on Multistrategy Learning (MSL'98)
 
* [[Krzysztof Krawiec]], [http://www.informatik.uni-trier.de/~ley/pers/hd/s/Slowinski:Roman.html Roman Slowinski], [http://www.informatik.uni-trier.de/~ley/pers/hd/s/Szczesniak:Irmina.html Irmina Szczesniak] ('''1998'''). ''[http://link.springer.com/chapter/10.1007%2F3-540-69115-4_60 Pedagogical Method for Extraction of Symbolic Knowledge from Neural Networks]''. [http://link.springer.com/book/10.1007%2F3-540-69115-4 Rough Sets and Current Trends in Computing 1998]
 
* [[Krzysztof Krawiec]], [http://www.informatik.uni-trier.de/~ley/pers/hd/s/Slowinski:Roman.html Roman Slowinski], [http://www.informatik.uni-trier.de/~ley/pers/hd/s/Szczesniak:Irmina.html Irmina Szczesniak] ('''1998'''). ''[http://link.springer.com/chapter/10.1007%2F3-540-69115-4_60 Pedagogical Method for Extraction of Symbolic Knowledge from Neural Networks]''. [http://link.springer.com/book/10.1007%2F3-540-69115-4 Rough Sets and Current Trends in Computing 1998]
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1998'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:2osOgNQ5qMEC Fast online Q (λ)]''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 33, No. 1
+
* [[Marco Wiering]],  [[Jürgen Schmidhuber]] ('''1998'''). ''Fast online Q (λ)''. [https://en.wikipedia.org/wiki/Machine_Learning_%28journal%29 Machine Learning], Vol. 33, No. 1
 
'''1999'''
 
'''1999'''
 
* [[Robert Hyatt]] ('''1999'''). ''[http://www.craftychess.com/hyatt/learning.html Book Learning - a Methodology to Tune an Opening Book Automatically]''. [[ICGA Journal#22_1|ICCA Journal, Vol. 22, No. 1]]
 
* [[Robert Hyatt]] ('''1999'''). ''[http://www.craftychess.com/hyatt/learning.html Book Learning - a Methodology to Tune an Opening Book Automatically]''. [[ICGA Journal#22_1|ICCA Journal, Vol. 22, No. 1]]
Line 282: Line 282:
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
 
* [http://www.ilsp.gr/homepages/papavasiliou_eng.html Vassilis Papavassiliou], [[Stuart Russell]] ('''1999'''). ''Convergence of reinforcement learning with general function approximators.'' In Proc. IJCAI-99, Stockholm, [http://www.cs.berkeley.edu/~russell/papers/ijcai99-bridge.ps ps]
 
* [[Philip G. K. Reiser]], [[Patricia J. Riddle]] ('''1999'''). ''[http://link.springer.com/chapter/10.1007%2F3-540-48873-1_19 Evolving Logic Programs to Classify Chess-Endgame Positions]''. [http://link.springer.com/book/10.1007%2F3-540-48873-1 Simulated Evolution and Learning], [https://en.wikipedia.org/wiki/Canberra Canberra], Australia. [http://www.springer.com/series/1244 Lecture Notes in Artificial Intelligence], No. 1585, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://stancomb.co.uk/Papers/seal98.pdf pdf] » [[Endgame]]
 
* [[Philip G. K. Reiser]], [[Patricia J. Riddle]] ('''1999'''). ''[http://link.springer.com/chapter/10.1007%2F3-540-48873-1_19 Evolving Logic Programs to Classify Chess-Endgame Positions]''. [http://link.springer.com/book/10.1007%2F3-540-48873-1 Simulated Evolution and Learning], [https://en.wikipedia.org/wiki/Canberra Canberra], Australia. [http://www.springer.com/series/1244 Lecture Notes in Artificial Intelligence], No. 1585, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://stancomb.co.uk/Papers/seal98.pdf pdf] » [[Endgame]]
* [[Marco Wiering]] ('''1999'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&pagesize=100&citation_for_view=xVas0I8AAAAJ:9yKSN-GCB0IC Explorations in Efficient Reinforcement Learning]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
+
* [[Marco Wiering]] ('''1999'''). ''[Explorations in Efficient Reinforcement Learning''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_Amsterdam University of Amsterdam], advisors [[Mathematician#FGroen|Frans Groen]] and [[Jürgen Schmidhuber]]
 
* [[Mathematician#GEHinton|Geoffrey E. Hinton]], [[Terrence J. Sejnowski]] (eds.) ('''1999'''). ''[https://mitpress.mit.edu/books/unsupervised-learning Unsupervised Learning: Foundations of Neural Computation]''.  [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
* [[Mathematician#GEHinton|Geoffrey E. Hinton]], [[Terrence J. Sejnowski]] (eds.) ('''1999'''). ''[https://mitpress.mit.edu/books/unsupervised-learning Unsupervised Learning: Foundations of Neural Computation]''.  [https://en.wikipedia.org/wiki/MIT_Press MIT Press]
 
==2000 ...==  
 
==2000 ...==  
Line 352: Line 352:
 
* [[Daniel Osman]], [[Jacek Mańdziuk]] ('''2004'''). ''Comparison of TDLeaf and TD learning in Game Playing Domain''. [http://www.informatik.uni-trier.de/~ley/db/conf/iconip/iconip2004.html#OsmanM04 11. ICONIP], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICONIP04.pdf pdf]
 
* [[Daniel Osman]], [[Jacek Mańdziuk]] ('''2004'''). ''Comparison of TDLeaf and TD learning in Game Playing Domain''. [http://www.informatik.uni-trier.de/~ley/db/conf/iconip/iconip2004.html#OsmanM04 11. ICONIP], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICONIP04.pdf pdf]
 
* [[Albert Xin Jiang]] ('''2004'''). ''Multiagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces''. [http://www.cs.ubc.ca/%7Ejiang/papers/continuous.pdf pdf]
 
* [[Albert Xin Jiang]] ('''2004'''). ''Multiagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces''. [http://www.cs.ubc.ca/%7Ejiang/papers/continuous.pdf pdf]
* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''[http://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&pagesize=80&citation_for_view=xVas0I8AAAAJ:7PzlFSSx8tAC Learning to play chess using TD(λ)-learning with database games]''. [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04
+
* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''Learning to play chess using TD(λ)-learning with database games''. [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04
 
==2005 ...==  
 
==2005 ...==  
 
* [[Dave Gomboc]], [[Michael Buro]], [[Tony Marsland]] ('''2005'''). ''Tuning evaluation functions by maximizing concordance'' Theoretical Computer Science, Volume 349, Issue 2, pp. 202-229, [http://www.cs.ualberta.ca/%7Emburo/ps/tcs-learn.pdf pdf]
 
* [[Dave Gomboc]], [[Michael Buro]], [[Tony Marsland]] ('''2005'''). ''Tuning evaluation functions by maximizing concordance'' Theoretical Computer Science, Volume 349, Issue 2, pp. 202-229, [http://www.cs.ualberta.ca/%7Emburo/ps/tcs-learn.pdf pdf]
Line 433: Line 433:
 
* [[Edward P. Manning]] ('''2010'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5409565 Using Resource-Limited Nash Memory to Improve an Othello Evaluation Function]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 1 » [[Othello]]
 
* [[Edward P. Manning]] ('''2010'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5409565 Using Resource-Limited Nash Memory to Improve an Othello Evaluation Function]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 2, No. 1 » [[Othello]]
 
* [[Edward P. Manning]] ('''2010'''). ''[http://dl.acm.org/citation.cfm?id=1830667 Coevolution in a Large Search Space using Resource-limited Nash Memory]''. [http://www.informatik.uni-trier.de/~ley/db/conf/gecco/gecco2010.html#Manning10 GECCO '10] » [[Othello]]
 
* [[Edward P. Manning]] ('''2010'''). ''[http://dl.acm.org/citation.cfm?id=1830667 Coevolution in a Large Search Space using Resource-limited Nash Memory]''. [http://www.informatik.uni-trier.de/~ley/db/conf/gecco/gecco2010.html#Manning10 GECCO '10] » [[Othello]]
* [[Marco Wiering]] ('''2010'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=20&citation_for_view=xVas0I8AAAAJ:_kc_bZDykSQC Self-play and using an expert to learn to play backgammon with temporal difference learning]''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2
+
* [[Marco Wiering]] ('''2010'''). ''Self-play and using an expert to learn to play backgammon with temporal difference learning''. [http://www.scirp.org/journal/jilsa/ Journal of Intelligent Learning Systems and Applications], Vol. 2, No. 2
 
'''2011'''
 
'''2011'''
 
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
 
* [[Joel Veness]] ('''2011'''). ''Approximate Universal Artificial Intelligence and Self-Play Learning for Games''. Ph.D. thesis, [https://en.wikipedia.org/wiki/University_of_New_South_Wales University of New South Wales], supervisors: [[Kee Siong Ng]], [[Marcus Hutter]], [[Alan Blair]], [[William Uther]], [[John Lloyd]]; [http://jveness.info/publications/veness_phd_thesis_final.pdf pdf]
Line 446: Line 446:
 
* [[Hamid Reza Maei]] ('''2011'''). ''Gradient Temporal-Difference Learning Algorithms''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf]  
 
* [[Hamid Reza Maei]] ('''2011'''). ''Gradient Temporal-Difference Learning Algorithms''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf]  
 
'''2012'''
 
'''2012'''
* [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&citation_for_view=xVas0I8AAAAJ:abG-DnoFyZgC Reinforcement learning: State-of-the-art]''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
+
* [[Marco Wiering]], [http://martijnvanotterlo.nl/ Martijn Van Otterlo] ('''2012'''). ''Reinforcement learning: State-of-the-art''. [http://link.springer.com/book/10.1007/978-3-642-27645-3 Adaptation, Learning, and Optimization, Vol. 12], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
: [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17
 
: [[István Szita]] ('''2012'''). ''[http://link.springer.com/chapter/10.1007%2F978-3-642-27645-3_17 Reinforcement Learning in Games]''. Chapter 17
 
* [http://dblp.uni-trier.de/pers/hd/d/Dries:Sjoerd_van_den Sjoerd van den Dries], [[Marco Wiering]] ('''2012'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=40&citation_for_view=xVas0I8AAAAJ:P5F9QuxV20EC Neural-fitted TD-leaf learning for playing Othello with structured neural networks]''. [[IEEE#NN|IEEE Transactions on Neural Networks and Learning Systems]], Vol. 23, No. 11
 
* [http://dblp.uni-trier.de/pers/hd/d/Dries:Sjoerd_van_den Sjoerd van den Dries], [[Marco Wiering]] ('''2012'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=40&citation_for_view=xVas0I8AAAAJ:P5F9QuxV20EC Neural-fitted TD-leaf learning for playing Othello with structured neural networks]''. [[IEEE#NN|IEEE Transactions on Neural Networks and Learning Systems]], Vol. 23, No. 11
Line 459: Line 459:
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Paweł Liskowski]], [[Krzysztof Krawiec]] ('''2013'''). ''Shaping Fitness Function for Evolutionary Learning of Game Strategies''. [http://www.informatik.uni-trier.de/~ley/db/conf/gecco/gecco2013.html#SzubertJLK13 GECCO 2013], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2013shaping.pdf pdf]
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Paweł Liskowski]], [[Krzysztof Krawiec]] ('''2013'''). ''Shaping Fitness Function for Evolutionary Learning of Game Strategies''. [http://www.informatik.uni-trier.de/~ley/db/conf/gecco/gecco2013.html#SzubertJLK13 GECCO 2013], [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2013shaping.pdf pdf]
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2013'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6504736 On Scalability, Generalization, and Hybridization of Coevolutionary Learning: a Case Study for Othello]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 5, No. 3 » [[Othello]]
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2013'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6504736 On Scalability, Generalization, and Hybridization of Coevolutionary Learning: a Case Study for Othello]''. [[IEEE#TOCIAIGAMES|IEEE Transactions on Computational Intelligence and AI in Games]], Vol. 5, No. 3 » [[Othello]]
* [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=60&pagesize=80&citation_for_view=xVas0I8AAAAJ:K3LRdlH-MEoC Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play]''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#ReeW13 ADPRL 2013]
+
* [http://dblp.uni-trier.de/pers/hd/r/Ree:M=_van_der Michiel van der Ree], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#ReeW13 ADPRL 2013]
* [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). ''[https://scholar.google.com/citations?view_op=view_citation&hl=en&user=xVas0I8AAAAJ&cstart=40&citation_for_view=xVas0I8AAAAJ:l7t_Zn2s7bgC Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs]''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#BomHW13 ADPRL 2013] <ref>[https://en.wikipedia.org/wiki/Ms._Pac-Man Ms. Pac-Man from Wikipedia]</ref>
+
* [http://dblp.uni-trier.de/pers/hd/b/Bom:Luuk Luuk Bom], [http://dblp.uni-trier.de/pers/hd/h/Henken:Ruud Ruud Henken], [[Marco Wiering]] ('''2013'''). ''Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs''. [http://dblp.uni-trier.de/db/conf/adprl/adprl2013.html#BomHW13 ADPRL 2013] <ref>[https://en.wikipedia.org/wiki/Ms._Pac-Man Ms. Pac-Man from Wikipedia]</ref>
 
* [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409]
 
* [[Peter Auer]], [[Marcus Hutter]], [[Laurent Orseau]] ('''2013'''). ''[http://drops.dagstuhl.de/opus/volltexte/2013/4340/ Reinforcement Learning]''. [http://dblp.uni-trier.de/db/journals/dagstuhl-reports/dagstuhl-reports3.html#AuerHO13 Dagstuhl Reports, Vol. 3, No. 8], DOI: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ 10.4230/DagRep.3.8.1], URN: [http://drops.dagstuhl.de/opus/volltexte/2013/4340/ urn:nbn:de:0030-drops-43409]
 
* [[Igor Roizen]], [[Judea Pearl]] ('''2013'''). ''Learning Link-Probabilities in Causal Trees.'' [https://arxiv.org/abs/1304.3103 arXiv:1304.3103]
 
* [[Igor Roizen]], [[Judea Pearl]] ('''2013'''). ''Learning Link-Probabilities in Causal Trees.'' [https://arxiv.org/abs/1304.3103 arXiv:1304.3103]
Line 498: Line 498:
 
'''2018'''
 
'''2018'''
 
* [[Arthur Guez]], [[Théophane Weber]], [[Ioannis Antonoglou]], [[Karen Simonyan]], [[Oriol Vinyals]], [[Daan Wierstra]], [[Rémi Munos]], [[David Silver]] ('''2018'''). ''Learning to Search with MCTSnets''. [https://arxiv.org/abs/1802.04697 arXiv:1802.04697] » [[Monte-Carlo Tree Search]]
 
* [[Arthur Guez]], [[Théophane Weber]], [[Ioannis Antonoglou]], [[Karen Simonyan]], [[Oriol Vinyals]], [[Daan Wierstra]], [[Rémi Munos]], [[David Silver]] ('''2018'''). ''Learning to Search with MCTSnets''. [https://arxiv.org/abs/1802.04697 arXiv:1802.04697] » [[Monte-Carlo Tree Search]]
 
+
* [[Matthia Sabatelli]], [[Francesco Bidoia]], [[Valeriu Codreanu]], [[Marco Wiering]] ('''2018'''). ''Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead''. ICPRAM 2018, [http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/ICPRAM_CHESS_DNN_2018.pdf pdf]
  
 
=Forum Posts=
 
=Forum Posts=

Revision as of 15:13, 25 August 2018

Home * Learning

Learning [1]

Learning,
the process of acquiring new knowledge which involves synthesizing different types of information. Machine learning as aspect of computer chess programming deals with algorithms that allow the program to change its behavior based on data, which for instance occurs during game playing against a variety of opponents considering the final outcome and/or the game record for instance as history score chart indexed by ply. Related to Machine learning is evolutionary computation and its sub-areas of genetic algorithms, and genetic programming, that mimics the process of natural evolution, as further mentioned in automated tuning. The process of learning often implies understanding, perception or reasoning. So called Rote learning avoids understanding and focuses on memorization. Inductive learning takes examples and generalizes rather than starting with existing knowledge. Deductive learning takes abstract concepts to make sense of examples [2].

Learning inside a Chess Program

Learning inside a chess program may address several disjoint issues. A persistent hash table remembers "important" positions from earlier games inside the search with its exact score [3]. Worse positions may be avoided in advance. Learning opening book moves, that is appending successful novelties or modify the probability of already stored moves from the book based on the outcome of a game [4]. Another application is learning evaluation weights of various features, f. i. piece- [5] or piece-square [6] values or mobility. Programs may also learn to control search [7] or time usage [8].

Learning Paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of neural network architecture can be employed in any of those tasks.

Supervised Learning

Supervised learning is learning from examples provided by a knowledgable external supervisor. In machine learning, supervised learning is a technique for deducing a function from training data. The training data consist of pairs of input objects and desired outputs, f.i. in computer chess a sequence of positions associated with the outcome of a game [9] .

Unsupervised Learning

Unsupervised machine learning seems much harder: the goal is to have the computer learn how to do something that we don't tell it how to do. The learner is given only unlabeled examples, f. i. a sequence of positions of a running game but the final result (still) unknown. A form of reinforcement learning can be used for unsupervised learning, where an agent bases its actions on the previous rewards and punishments without necessarily even learning any information about the exact ways that its actions affect the world. Clustering is another method of unsupervised learning.

Reinforcement Learning

see main page Reinforcement Learning

Reinforcement learning is defined not by characterizing learning methods, but by characterizing a learning problem. Reinforcement learning is learning what to do - how to map situations to actions - so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. The reinforcement learning problem is deeply indebted to the idea of Markov decision processes (MDPs) from the field of optimal control.

Learning Topics

Programs

See also

Selected Publications

[10]

1940 ...

1950 ...

Claude Shannon, John McCarthy (eds.) (1956). Automata Studies. Annals of Mathematics Studies, No. 34
Alan Turing, Jack Copeland (editor) (2004). The Essential Turing, Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life plus The Secrets of Enigma. Oxford University Press, amazon, google books

1955 ...

  • John von Neumann (1956). Probabilistic Logic and the Synthesis of Reliable Organisms From Unreliable Components. in
Claude Shannon, John McCarthy (eds.) (1956). Automata Studies. Annals of Mathematics Studies, No. 34, pdf

1960 ...

1965 ...

1970 ...

1975 ...

  • Jacques Pitrat (1976). A Program to Learn to Play Chess. Pattern Recognition and Artificial Intelligence, pp. 399-419. Academic Press Ltd. London, UK. ISBN 0-12-170950-7.
  • Jacques Pitrat (1976). Realization of a Program Learning to Find Combinations at Chess. Computer Oriented Learning Processes (ed. J. Simon). Noordhoff, Groningen, The Netherlands.
  • Pericles Negri (1977). Inductive Learning in a Hierarchical Model for Representing Knowledge in Chess End Games. pdf
  • Ryszard Michalski, Pericles Negri (1977). An experiment on inductive learning in chess endgames. Machine Intelligence 8, pdf
  • Boris Stilman (1977). The Computer Learns. in 1976 US Computer Chess Championship, by David Levy, Computer Science Press, Woodland Hills, CA, pp. 83-90
  • Richard Sutton (1978). Single channel theory: A neuronal theory of learning. Brain Theory Newsletter 3, No. 3/4, pp. 72-75.
  • Ross Quinlan (1979). Discovering Rules by Induction from Large Collections of Examples. Expert Systems in the Micro-electronic Age, pp. 168-201. Edinburgh University Press (Introducing ID3)

1980 ...

1985 ...

1986

1987

1988

1989

1990 ...

1991

1992

1993

1994

1995 ...

1996

1997

1998

Miroslav Kubat, Ivan Bratko, Ryszard Michalski (1998). A Review of Machine Learning Methods. pdf

1999

2000 ...

2001

2002

2003

2004

2005 ...

2006

2007

2008

2009

2010 ...

2011

2012

István Szita (2012). Reinforcement Learning in Games. Chapter 17

2013

2014

2015 ...

2016

2017

2018

Forum Posts

1998 ...

2000 ...

2005 ...

2010 ...

2015 ...

External Links

Machine Learning

AI

Learning I
Learning II

Chess

Supervised Learning

AdaBoost from Wikipedia

Unsupervised Learning

Reinforcement Learning

TD Learning

Statistics

Naive Bayes classifier from Wikipedia
Probabilistic classification from Wikipedia
Outline of regression analysis from Wikipedia
Linear regression from Wikipedia
Logistic regression from Wikipedia
Normal distribution from Wikipedia
Pseudorandom number generator from Wikipedia
Pseudo-random number sampling from Wikipedia
Statistical randomness from Wikipedia

Markov Models

NNs

ANNs

Topics

RNNs

Blogs

The Single Layer Perceptron
Hidden Neurons and Feature Space
Training Neural Networks Using Back Propagation in C#
Data Mining with Artificial Neural Networks (ANN)

Courses

References

  1. A depiction of the world's oldest continually operating university, the University of Bologna, Italy, by Laurentius de Voltolina, second half of 14th century, Learning from Wikipedia
  2. Inductive learning vs Deductive learning
  3. David Slate (1987). A Chess Program that uses its Transposition Table to Learn from Experience. ICCA Journal, Vol. 10, No. 2
  4. Robert Hyatt (1999). Book Learning - a Methodology to Tune an Opening Book Automatically. ICCA Journal, Vol. 22, No. 1
  5. Don Beal, Martin C. Smith (1997). Learning Piece Values Using Temporal Differences. ICCA Journal, Vol. 20, No. 3
  6. Don Beal, Martin C. Smith (1999). Learning Piece-Square Values using Temporal Differences. ICCA Journal, Vol. 22, No. 4
  7. Yngvi Björnsson, Tony Marsland (2001). Learning Search Control in Adversary Games. Advances in Computer Games 9, pdf
  8. Levente Kocsis, Jos Uiterwijk, Jaap van den Herik (2000). Learning Time Allocation using Neural Networks. CG 2000, postscript
  9. AI Horizon: Machine Learning, Part II: Supervised and Unsupervised Learning
  10. online papers from Machine Learning in Games by Jay Scott
  11. Rosenblatt's Contributions
  12. Ratio Club from Wikipedia
  13. Royal Radar Establishment from Wikipedia
  14. see Swap-off by Helmut Richter
  15. The abandonment of connectionism in 1969 - Wikipedia
  16. Frank Rosenblatt (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books
  17. Long short term memory from Wikipedia
  18. Tsumego from Wikipedia
  19. Learnable Evolution Model from Wikipedia
  20. University of Bristol - Department of Computer Science - Technical Reports
  21. Generalized Hebbian Algorithm from Wikipedia
  22. Dap Hartmann (2010). Mimicking the Black Box - Genetically evolving evaluation functions and search algorithms. Review on Omid David's Ph.D. Thesis, ICGA Journal, Vol 33, No. 1
  23. Monte-Carlo Simulation Balancing - videolectures.net by David Silver
  24. MATLAB from Wikipedia
  25. Ms. Pac-Man from Wikipedia
  26. Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 21, 2015
  27. Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents
  28. Jaap van den Herik wint Humies Award 2014 - LIACS - Leiden Institute of Advanced Computer Science
  29. 2048 (video game) from Wikipedia
  30. Teaching Deep Convolutional Neural Networks to Play Go by Hiroshi Yamashita, The Computer-go Archives, December 14, 2014
  31. Teaching Deep Convolutional Neural Networks to Play Go by Michel Van den Bergh, CCC, December 16, 2014
  32. Convolutional neural network from Wikipedia
  33. Best Paper Awards | TAAI 2014
  34. DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
  35. ICANN 2016 | Recipients of the best paper awards
  36. Using GAN to play chess by Evgeniy Zheltonozhskiy, CCC, February 23, 2017
  37. Naive Bayes classifier from Wikipedia
  38. Amir Ban (2012). Automatic Learning of Evaluation, with Applications to Computer Chess. Discussion Paper 613, The Hebrew University of Jerusalem - Center for the Study of Rationality, Givat Ram
  39. Christopher Clark, Amos Storkey (2014). Teaching Deep Convolutional Neural Networks to Play Go. arXiv:1412.3409
  40. Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver (2014). Move Evaluation in Go Using Deep Convolutional Neural Networks. arXiv:1412.6564v1

Up one Level