Difference between revisions of "Learning"

From Chessprogramming wiki
Jump to: navigation, search
(7 intermediate revisions by the same user not shown)
Line 210: Line 210:
 
* [[Gerald Tesauro]] ('''1992'''). ''[http://dl.acm.org/citation.cfm?id=139616 Practical Issues in Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3-4
 
* [[Gerald Tesauro]] ('''1992'''). ''[http://dl.acm.org/citation.cfm?id=139616 Practical Issues in Temporal Difference Learning]''. [https://en.wikipedia.org/wiki/Machine_Learning_(journal) Machine Learning], Vol. 8, No. 3-4
 
* [[Manuela Veloso]] ('''1992'''). ''[http://search.library.cmu.edu/vufind/Record/421096 Learning by Analogical Reasoning in General Purpose Problem Solving]''. Ph.D. thesis, [[Carnegie Mellon University]], advisor [[Jaime Carbonell]]
 
* [[Manuela Veloso]] ('''1992'''). ''[http://search.library.cmu.edu/vufind/Record/421096 Learning by Analogical Reasoning in General Purpose Problem Solving]''. Ph.D. thesis, [[Carnegie Mellon University]], advisor [[Jaime Carbonell]]
 +
* [[Chris J. Thornton]] ('''1992'''). ''Techniques in Computational Learning: An Introduction''. [https://en.wikipedia.org/wiki/Chapman_%26_Hall Chapman & Hall]
 
'''1993'''
 
'''1993'''
 
* [[Michael Gherrity]] ('''1993'''). ''A Game Learning Machine''. Ph.D. Thesis, [http://de.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego], [http://www.gherrity.org/thesis.ps.gz zipped ps]
 
* [[Michael Gherrity]] ('''1993'''). ''A Game Learning Machine''. Ph.D. Thesis, [http://de.wikipedia.org/wiki/University_of_California,_San_Diego University of California, San Diego], [http://www.gherrity.org/thesis.ps.gz zipped ps]
Line 311: Line 312:
 
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''2000'''). ''Learning to Play Chess Using Temporal Differences''. [http://www.dblp.org/db/journals/ml/ml40.html#BaxterTW00 Machine Learning, Vol 40, No. 3], [http://www.cs.princeton.edu/courses/archive/fall06/cos402/papers/chess-RL.pdf pdf]  
 
* [[Jonathan Baxter]], [[Andrew Tridgell]], [[Lex Weaver]] ('''2000'''). ''Learning to Play Chess Using Temporal Differences''. [http://www.dblp.org/db/journals/ml/ml40.html#BaxterTW00 Machine Learning, Vol 40, No. 3], [http://www.cs.princeton.edu/courses/archive/fall06/cos402/papers/chess-RL.pdf pdf]  
 
* [[Michael Bain]], [[Stephen Muggleton]], [[Ashwin Srinivasan]] ('''2000'''). ''Generalising Closed World Specialisation: A Chess End Game Application''. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3499 CitySeerX]
 
* [[Michael Bain]], [[Stephen Muggleton]], [[Ashwin Srinivasan]] ('''2000'''). ''Generalising Closed World Specialisation: A Chess End Game Application''. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3499 CitySeerX]
 +
* [[Chris J. Thornton]] ('''2000'''). ''[https://www.goodreads.com/book/show/1097454.Truth_from_Trash Truth from Trash: How Learning Makes Sense]''. [https://en.wikipedia.org/wiki/MIT_Press Bradford Books] <ref>[[Jean Hayes Michie]] ('''2001'''). ''[https://www.aaai.org/ojs/index.php/aimagazine/article/view/1599/0 Machine Learning and Light Relief: A Review of Truth from Trash]''. [[AAAI#AIMAG|AI Magazine]], Vol. 22 No. 4, [http://www.aaai.org/ojs/index.php/aimagazine/article/download/1599/1498 pdf]</ref>
 
'''2001'''
 
'''2001'''
 
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]]  ('''2001'''). ''[http://nic.schraudolph.org/bib2html/b2hd-SchDaySej01.html Learning to Evaluate Go Positions via Temporal Difference Methods]''. [http://jasss.soc.surrey.ac.uk/7/1/reviews/takama.html Computational Intelligence in Games, Studies in Fuzziness and Soft Computing] [http://www.springer.com/economics?SGWID=1-165-6-73481-0 Physica-Verlag], revised version of [[Nicol N. Schraudolph#1993|1993 paper]]
 
* [[Nicol N. Schraudolph]], [[Peter Dayan]], [[Terrence J. Sejnowski]]  ('''2001'''). ''[http://nic.schraudolph.org/bib2html/b2hd-SchDaySej01.html Learning to Evaluate Go Positions via Temporal Difference Methods]''. [http://jasss.soc.surrey.ac.uk/7/1/reviews/takama.html Computational Intelligence in Games, Studies in Fuzziness and Soft Computing] [http://www.springer.com/economics?SGWID=1-165-6-73481-0 Physica-Verlag], revised version of [[Nicol N. Schraudolph#1993|1993 paper]]
Line 347: Line 349:
 
* [[Pedro Campos]], [[Thibault Langlois]] ('''2003'''). ''[http://ilk.uvt.nl/icga/journal/contents/content26-4.htm#ABALEARN Abalearn: a Program that Learns How to Play Abalone]''. [[ICGA Journal#26_4|ICGA Journal, Vol. 26, No. 4]]
 
* [[Pedro Campos]], [[Thibault Langlois]] ('''2003'''). ''[http://ilk.uvt.nl/icga/journal/contents/content26-4.htm#ABALEARN Abalearn: a Program that Learns How to Play Abalone]''. [[ICGA Journal#26_4|ICGA Journal, Vol. 26, No. 4]]
 
* [[David Gleich]] ('''2003'''). ''Machine Learning in Computer Chess: Genetic Programming and KRK''.  [https://en.wikipedia.org/wiki/Harvey_Mudd_College Harvey Mudd College], [http://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf pdf]
 
* [[David Gleich]] ('''2003'''). ''Machine Learning in Computer Chess: Genetic Programming and KRK''.  [https://en.wikipedia.org/wiki/Harvey_Mudd_College Harvey Mudd College], [http://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf pdf]
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University]
+
* [[Henk Mannen]] ('''2003'''). ''Learning to play chess using reinforcement learning with database games''. Master’s thesis, Cognitive Artificial Intelligence, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.810&rep=rep1&type=pdf pdf]
 
* [[Jan Žižka]], [[Michal Mádr]] ('''2003'''). ''[https://www.muni.cz/research/publications/490371 Learning Representative Patterns from Real Chess Positions: A Case Study]''. [http://dblp.uni-trier.de/db/conf/iicai/iicai2003.html#ZizkaM03 IICAI 2003]
 
* [[Jan Žižka]], [[Michal Mádr]] ('''2003'''). ''[https://www.muni.cz/research/publications/490371 Learning Representative Patterns from Real Chess Positions: A Case Study]''. [http://dblp.uni-trier.de/db/conf/iicai/iicai2003.html#ZizkaM03 IICAI 2003]
 
'''2004'''
 
'''2004'''
Line 363: Line 365:
 
* [[Daniel Osman]], [[Jacek Mańdziuk]] ('''2004'''). ''Comparison of TDLeaf and TD learning in Game Playing Domain''. [http://www.informatik.uni-trier.de/~ley/db/conf/iconip/iconip2004.html#OsmanM04 11. ICONIP], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICONIP04.pdf pdf]
 
* [[Daniel Osman]], [[Jacek Mańdziuk]] ('''2004'''). ''Comparison of TDLeaf and TD learning in Game Playing Domain''. [http://www.informatik.uni-trier.de/~ley/db/conf/iconip/iconip2004.html#OsmanM04 11. ICONIP], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/ICONIP04.pdf pdf]
 
* [[Albert Xin Jiang]] ('''2004'''). ''Multiagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces''. [http://www.cs.ubc.ca/%7Ejiang/papers/continuous.pdf pdf]
 
* [[Albert Xin Jiang]] ('''2004'''). ''Multiagent Reinforcement Learning in Stochastic Games with Continuous Action Spaces''. [http://www.cs.ubc.ca/%7Ejiang/papers/continuous.pdf pdf]
* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''Learning to play chess using TD(λ)-learning with database games''. [http://students.uu.nl/en/hum/cognitive-artificial-intelligence Cognitive Artificial Intelligence], [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04
+
* [[Henk Mannen]], [[Marco Wiering]] ('''2004'''). ''[https://www.semanticscholar.org/paper/Learning-to-Play-Chess-using-TD(lambda)-learning-Mannen-Wiering/00a6f81c8ebe8408c147841f26ed27eb13fb07f3 Learning to play chess using TD(λ)-learning with database games]''. Cognitive Artificial Intelligence, [https://en.wikipedia.org/wiki/Utrecht_University Utrecht University], Benelearn’04, [https://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/learning-chess.pdf pdf]
 
==2005 ...==  
 
==2005 ...==  
 
* [[Dave Gomboc]], [[Michael Buro]], [[Tony Marsland]] ('''2005'''). ''Tuning evaluation functions by maximizing concordance'' Theoretical Computer Science, Volume 349, Issue 2, pp. 202-229, [http://www.cs.ualberta.ca/%7Emburo/ps/tcs-learn.pdf pdf]
 
* [[Dave Gomboc]], [[Michael Buro]], [[Tony Marsland]] ('''2005'''). ''Tuning evaluation functions by maximizing concordance'' Theoretical Computer Science, Volume 349, Issue 2, pp. 202-229, [http://www.cs.ualberta.ca/%7Emburo/ps/tcs-learn.pdf pdf]
Line 404: Line 406:
 
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]
 
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning of Piece Values for Chess Variants.'' Technical Report TUD–KE–2008-07, Knowledge Engineering Group, [[Darmstadt University of Technology|TU Darmstadt]], [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf pdf]
 
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol 31, No. 4]]
 
* [[Sacha Droste]], [[Johannes Fürnkranz]] ('''2008'''). ''Learning the Piece Values for three Chess Variants''. [[ICGA Journal#31_4|ICGA Journal, Vol 31, No. 4]]
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation'', [http://www.sztaki.hu/%7Eszcsaba/papers/gtdnips08.pdf pdf] (draft)
+
* [[Richard Sutton]], [[Csaba Szepesvári]], [[Hamid Reza Maei]] ('''2008'''). ''A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2008.html#SuttonSM08 NIPS 2008], [https://proceedings.neurips.cc/paper/2008/file/e0c641195b27425bb056ac56f8953d24-Paper.pdf pdf]
 
* [[Matej Guid]], [[Martin Možina]], [[Jana Krivec]], [[Aleksander Sadikov]], [[Ivan Bratko]]  ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_18 Learning Positional Features for Annotating Chess Games: A Case Study]''. [[CG 2008]], [http://www.ailab.si/matej/doc/Learning_Positional_Features-Case_Study.pdf pdf]
 
* [[Matej Guid]], [[Martin Možina]], [[Jana Krivec]], [[Aleksander Sadikov]], [[Ivan Bratko]]  ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_18 Learning Positional Features for Annotating Chess Games: A Case Study]''. [[CG 2008]], [http://www.ailab.si/matej/doc/Learning_Positional_Features-Case_Study.pdf pdf]
 
* [[Martin Možina]], [[Matej Guid]], [[Jana Krivec]], [[Aleksander Sadikov]], [[Ivan Bratko]] ('''2008'''). ''Fighting Knowledge Acquisition Bottleneck with Argument Based Machine Learning''. 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece. [http://www.ailab.si/martin/abml/abml_expert_system_for_web.pdf pdf]
 
* [[Martin Možina]], [[Matej Guid]], [[Jana Krivec]], [[Aleksander Sadikov]], [[Ivan Bratko]] ('''2008'''). ''Fighting Knowledge Acquisition Bottleneck with Argument Based Machine Learning''. 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece. [http://www.ailab.si/martin/abml/abml_expert_system_for_web.pdf pdf]
Line 424: Line 426:
 
* [[Eli David|Omid David]] ('''2009'''). ''Genetic Algorithms Based Learning for Evolving Intelligent Organisms''. Ph.D. Thesis <ref>[[Dap Hartmann]] ('''2010'''). ''Mimicking the Black Box - Genetically evolving evaluation functions and search algorithms''. Review on Omid David's Ph.D. Thesis, [[ICGA Journal#33_1|ICGA Journal, Vol 33, No. 1]]</ref>
 
* [[Eli David|Omid David]] ('''2009'''). ''Genetic Algorithms Based Learning for Evolving Intelligent Organisms''. Ph.D. Thesis <ref>[[Dap Hartmann]] ('''2010'''). ''Mimicking the Black Box - Genetically evolving evaluation functions and search algorithms''. Review on Omid David's Ph.D. Thesis, [[ICGA Journal#33_1|ICGA Journal, Vol 33, No. 1]]</ref>
 
* [[Nur Merve Amil]], [[Nicolas Bredèche]], [[Christian Gagné]], [[Sylvain Gelly]], [[Marc Schoenauer]], [[Olivier Teytaud]] ('''2009'''). ''A Statistical Learning Perspective of Genetic Programming''. EuroGP 2009, [http://hal.inria.fr/docs/00/36/97/82/PDF/eurogp.pdf pdf]
 
* [[Nur Merve Amil]], [[Nicolas Bredèche]], [[Christian Gagné]], [[Sylvain Gelly]], [[Marc Schoenauer]], [[Olivier Teytaud]] ('''2009'''). ''A Statistical Learning Perspective of Genetic Programming''. EuroGP 2009, [http://hal.inria.fr/docs/00/36/97/82/PDF/eurogp.pdf pdf]
* [[Richard Sutton]], [[Hamid Reza Maei]], [[Doina Precup]], [[Shalabh Bhatnagar]], [[David Silver]], [[Csaba Szepesvári]], [[Eric Wiewiora]]. ('''2009'''). ''Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation''. In Proceedings of the 26th International Conference on Machine Learning (ICML-09). [http://www.sztaki.hu/~szcsaba/papers/GTD-ICML09.pdf pdf]
+
* [[Hamid Reza Maei]], [[Csaba Szepesvári]], [[Shalabh Bhatnagar]], [[Doina Precup]], [[David Silver]], [[Richard Sutton]] ('''2009'''). ''Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation''. [https://dblp.uni-trier.de/db/conf/nips/nips2009.html#MaeiSBPSS09 NIPS 2009], [https://papers.nips.cc/paper/2009/file/3a15c7d0bbe60300a39f76f8a5ba6896-Paper.pdf pdf]
 
* [[Mesut Kirci]], [[Jonathan Schaeffer]], [[Nathan Sturtevant]] ('''2009'''). ''Feature Learning Using State Differences''. [http://web.cs.du.edu/~sturtevant/papers/GGPfeatures.pdf pdf]
 
* [[Mesut Kirci]], [[Jonathan Schaeffer]], [[Nathan Sturtevant]] ('''2009'''). ''Feature Learning Using State Differences''. [http://web.cs.du.edu/~sturtevant/papers/GGPfeatures.pdf pdf]
 
* [[David Silver]], [[Gerald Tesauro]] ('''2009'''). ''Monte-Carlo Simulation Balancing''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2009.html#SilverT09 ICML 2009], [http://www.machinelearning.org/archive/icml2009/papers/500.pdf pdf] <ref>[http://videolectures.net/icml09_silver_mcsb/ Monte-Carlo Simulation Balancing - videolectures.net] by [[David Silver]]</ref>
 
* [[David Silver]], [[Gerald Tesauro]] ('''2009'''). ''Monte-Carlo Simulation Balancing''. [http://www.informatik.uni-trier.de/~ley/db/conf/icml/icml2009.html#SilverT09 ICML 2009], [http://www.machinelearning.org/archive/icml2009/papers/500.pdf pdf] <ref>[http://videolectures.net/icml09_silver_mcsb/ Monte-Carlo Simulation Balancing - videolectures.net] by [[David Silver]]</ref>
Line 443: Line 445:
 
* [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf]
 
* [[Julien Pérez]], [[Cécile Germain-Renaud]], [[Balázs Kégl]], [[Charles Loomis]] ('''2010'''). ''Multi-objective Reinforcement Learning for Responsive Grids''. In The Journal of Grid Computing. [http://hal.archives-ouvertes.fr/docs/00/49/15/60/PDF/RLGrid_JGC09_V7.pdf pdf]
 
* [[Jean-Yves Audibert]] ('''2010'''). ''PAC-Bayesian aggregation and multi-armed bandits''. Habilitation thesis, [http://fr.wikipedia.org/wiki/Universit%C3%A9_Paris-Est Université Paris Est], [http://certis.enpc.fr/~audibert/Mes%20articles/hdr.pdf pdf], [http://certis.enpc.fr/~audibert/Mes%20articles/hdrSlides.pdf slides as pdf]
 
* [[Jean-Yves Audibert]] ('''2010'''). ''PAC-Bayesian aggregation and multi-armed bandits''. Habilitation thesis, [http://fr.wikipedia.org/wiki/Universit%C3%A9_Paris-Est Université Paris Est], [http://certis.enpc.fr/~audibert/Mes%20articles/hdr.pdf pdf], [http://certis.enpc.fr/~audibert/Mes%20articles/hdrSlides.pdf slides as pdf]
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[http://www.incompleteideas.net/sutton/publications.html#GQ GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. In Proceedings of the Third Conference on Artificial General Intelligence
+
* [[Hamid Reza Maei]], [[Richard Sutton]] ('''2010'''). ''[https://www.researchgate.net/publication/215990384_GQlambda_A_general_gradient_algorithm_for_temporal-difference_prediction_learning_with_eligibility_traces GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces]''. [https://agi-conf.org/2010/ AGI 2010]
 
* [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/w/Waledzik:Karol.html Karol Walędzik], [[Jacek Mańdziuk]] ('''2010'''). ''The Layered Learning method and its Application to Generation of Evaluation Functions for the Game of Checkers''. [http://www.informatik.uni-trier.de/~ley/db/conf/ppsn/ppsn2010-2.html#WaledzikM10 11. PPSN], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/PPSN10.pdf pdf] » [[Checkers]]
 
* [http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/w/Waledzik:Karol.html Karol Walędzik], [[Jacek Mańdziuk]] ('''2010'''). ''The Layered Learning method and its Application to Generation of Evaluation Functions for the Game of Checkers''. [http://www.informatik.uni-trier.de/~ley/db/conf/ppsn/ppsn2010-2.html#WaledzikM10 11. PPSN], [http://www.mini.pw.edu.pl/~mandziuk/PRACE/PPSN10.pdf pdf] » [[Checkers]]
 
* [[Krzysztof Krawiec]], [[Marcin Szubert]] ('''2010'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5586054 Coevolutionary Temporal Difference Learning for small-board Go]''. [[IEEE#EC|IEEE Congress on Evolutionary Computation]] » [[Go]]
 
* [[Krzysztof Krawiec]], [[Marcin Szubert]] ('''2010'''). ''[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5586054 Coevolutionary Temporal Difference Learning for small-board Go]''. [[IEEE#EC|IEEE Congress on Evolutionary Computation]] » [[Go]]
Line 459: Line 461:
 
* [[Krzysztof Krawiec]], [[Wojciech Jaśkowski]], [[Marcin Szubert]] ('''2011'''). ''[http://www.degruyter.com/view/j/amcs.2011.21.issue-4/v10006-011-0057-3/v10006-011-0057-3.xml Evolving small-board Go players using Coevolutionary Temporal Difference Learning with Archives]''. [http://www.degruyter.com/view/j/amcs Applied Mathematics and Computer Science], Vol. 21, No. 4
 
* [[Krzysztof Krawiec]], [[Wojciech Jaśkowski]], [[Marcin Szubert]] ('''2011'''). ''[http://www.degruyter.com/view/j/amcs.2011.21.issue-4/v10006-011-0057-3/v10006-011-0057-3.xml Evolving small-board Go players using Coevolutionary Temporal Difference Learning with Archives]''. [http://www.degruyter.com/view/j/amcs Applied Mathematics and Computer Science], Vol. 21, No. 4
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2011'''). ''Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning''. [http://control.ibspan.waw.pl:3000/mainpage Control and Cybernetics], Vol. 40, No. 3, [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2011learning.pdf pdf] » [[Othello]]
 
* [[Marcin Szubert]], [[Wojciech Jaśkowski]], [[Krzysztof Krawiec]] ('''2011'''). ''Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning''. [http://control.ibspan.waw.pl:3000/mainpage Control and Cybernetics], Vol. 40, No. 3, [http://www.cs.put.poznan.pl/wjaskowski/pub/papers/szubert2011learning.pdf pdf] » [[Othello]]
* [[Hamid Reza Maei]] ('''2011'''). ''Gradient Temporal-Difference Learning Algorithms''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]], [http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf pdf]  
+
* [[Hamid Reza Maei]] ('''2011'''). ''[https://era.library.ualberta.ca/items/fd55edcb-ce47-4f84-84e2-be281d27b16a Gradient Temporal-Difference Learning Algorithms]''. Ph.D. thesis, [[University of Alberta]], advisor [[Richard Sutton]]
 
* [[Eli David|Omid David]], [[Moshe Koppel]], [[Nathan S. Netanyahu]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10710-010-9103-4 Expert-Driven Genetic Algorithms for Simulating Evaluation Functions]''. [https://www.springer.com/journal/10710 Genetic Programming and Evolvable Machines], Vol. 12, No. 1, [https://arxiv.org/abs/1711.06841 arXiv:1711.06841]
 
* [[Eli David|Omid David]], [[Moshe Koppel]], [[Nathan S. Netanyahu]] ('''2011'''). ''[https://link.springer.com/article/10.1007/s10710-010-9103-4 Expert-Driven Genetic Algorithms for Simulating Evaluation Functions]''. [https://www.springer.com/journal/10710 Genetic Programming and Evolvable Machines], Vol. 12, No. 1, [https://arxiv.org/abs/1711.06841 arXiv:1711.06841]
 
'''2012'''
 
'''2012'''
Line 519: Line 521:
 
* [[Matthia Sabatelli]], [[Francesco Bidoia]], [[Valeriu Codreanu]], [[Marco Wiering]] ('''2018'''). ''Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead''. ICPRAM 2018, [http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/ICPRAM_CHESS_DNN_2018.pdf pdf]
 
* [[Matthia Sabatelli]], [[Francesco Bidoia]], [[Valeriu Codreanu]], [[Marco Wiering]] ('''2018'''). ''Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead''. ICPRAM 2018, [http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/ICPRAM_CHESS_DNN_2018.pdf pdf]
 
* [[Takeshi Ito]] ('''2018'''). ''Game learning support system based on future position''. [[CG 2018]], [[ICGA Journal#40_4|ICGA Journal, Vol. 40, No. 4]]
 
* [[Takeshi Ito]] ('''2018'''). ''Game learning support system based on future position''. [[CG 2018]], [[ICGA Journal#40_4|ICGA Journal, Vol. 40, No. 4]]
 +
* [[Kristian Kersting]] ('''2018'''). ''[https://www.frontiersin.org/articles/10.3389/fdata.2018.00006/full Machine Learning and Artificial Intelligence: Two Fellow Travelers on the Quest for Intelligent Behavior in Machines]''. [https://www.frontiersin.org/journals/big-data# Frontiers in Big Data]
 
'''2019'''
 
'''2019'''
 
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
 
* [[Herilalaina Rakotoarison]], [[Marc Schoenauer]], [[Michèle Sebag]] ('''2019'''). ''Automated Machine Learning with Monte-Carlo Tree Search''. [https://arxiv.org/abs/1906.00170 arXiv:1906.00170]
 
* [[Frank Hutter]], [https://dblp.org/pers/hd/k/Kotthoff:Lars Lars Kotthoff], [https://dblp.org/pers/hd/v/Vanschoren:Joaquin Joaquin Vanschoren] (eds.) ('''2019'''). ''[https://link.springer.com/book/10.1007%2F978-3-030-05318-5 Automated Machine Learning]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
* [[Frank Hutter]], [https://dblp.org/pers/hd/k/Kotthoff:Lars Lars Kotthoff], [https://dblp.org/pers/hd/v/Vanschoren:Joaquin Joaquin Vanschoren] (eds.) ('''2019'''). ''[https://link.springer.com/book/10.1007%2F978-3-030-05318-5 Automated Machine Learning]''. [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
 
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>
 
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2019'''). ''Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model''. [https://arxiv.org/abs/1911.08265 arXiv:1911.08265] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=2&t=72381 New DeepMind paper] by GregNeto, [[CCC]], November 21, 2019</ref>
 +
==2020 ...==
 +
* [[Julian Schrittwieser]], [[Ioannis Antonoglou]], [[Thomas Hubert]], [[Karen Simonyan]], [[Laurent Sifre]], [[Simon Schmitt]], [[Arthur Guez]], [[Edward Lockhart]], [[Demis Hassabis]], [[Thore Graepel]], [[Timothy Lillicrap]], [[David Silver]] ('''2020'''). ''[https://www.nature.com/articles/s41586-020-03051-4 Mastering Atari, Go, chess and shogi by planning with a learned model]''. [https://en.wikipedia.org/wiki/Nature_%28journal%29 Nature], Vol. 588 <ref>[https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules?fbclid=IwAR3mSwrn1YXDKr9uuGm2GlFKh76wBilex7f8QvBiQecwiVmAvD6Bkyjx-rE MuZero: Mastering Go, chess, shogi and Atari without rules]</ref>
 +
* [[Johannes Czech]], [[Moritz Willig]], [[Alena Beyer]], [[Kristian Kersting]], [[Johannes Fürnkranz]] ('''2020'''). ''[https://www.frontiersin.org/articles/10.3389/frai.2020.00024/full Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data]''.  [https://www.frontiersin.org/journals/artificial-intelligence# Frontiers in Artificial Intelligence]  » [[CrazyAra]]
  
 
=Forum Posts=
 
=Forum Posts=
Line 549: Line 555:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56313 Position learning and opening books] by Forrest Hoch, [[CCC]], May 11, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56313 Position learning and opening books] by Forrest Hoch, [[CCC]], May 11, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61861 A database for learning evaluation functions] by [[Álvaro Begué]], [[CCC]], October 28, 2016 » [[Automated Tuning]], [[Evaluation]], [[Texel's Tuning Method]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61861 A database for learning evaluation functions] by [[Álvaro Begué]], [[CCC]], October 28, 2016 » [[Automated Tuning]], [[Evaluation]], [[Texel's Tuning Method]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69536 Interesting article in "Nature" on Machine Learning] by [[Kai Laskos]], [[CCC]], January 09, 2019
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72020 A book on machine learning] by Mehdi Amini, [[CCC]], October 06, 2019
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72020 A book on machine learning] by Mehdi Amini, [[CCC]], October 06, 2019
  

Revision as of 08:58, 5 July 2021

Home * Learning

Learning [1]

Learning,
the process of acquiring new knowledge which involves synthesizing different types of information. Machine learning as aspect of computer chess programming deals with algorithms that allow the program to change its behavior based on data, which for instance occurs during game playing against a variety of opponents considering the final outcome and/or the game record for instance as history score chart indexed by ply. Related to Machine learning is evolutionary computation and its sub-areas of genetic algorithms, and genetic programming, that mimics the process of natural evolution, as further mentioned in automated tuning. The process of learning often implies understanding, perception or reasoning. So called Rote learning avoids understanding and focuses on memorization. Inductive learning takes examples and generalizes rather than starting with existing knowledge. Deductive learning takes abstract concepts to make sense of examples [2].

Learning inside a Chess Program

Learning inside a chess program may address several disjoint issues. A persistent hash table remembers "important" positions from earlier games inside the search with its exact score [3]. Worse positions may be avoided in advance. Learning opening book moves, that is appending successful novelties or modify the probability of already stored moves from the book based on the outcome of a game [4]. Another application is learning evaluation weights of various features, f. i. piece- [5] or piece-square [6] values or mobility. Programs may also learn to control search [7] or time usage [8].

Learning Paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of neural network architecture can be employed in any of those tasks.

Supervised Learning

see main page Supervised Learning

Supervised learning is learning from examples provided by a knowledgable external supervisor. In machine learning, supervised learning is a technique for deducing a function from training data. The training data consist of pairs of input objects and desired outputs, f.i. in computer chess a sequence of positions associated with the outcome of a game [9] .

Unsupervised Learning

Unsupervised machine learning seems much harder: the goal is to have the computer learn how to do something that we don't tell it how to do. The learner is given only unlabeled examples, f. i. a sequence of positions of a running game but the final result (still) unknown. A form of reinforcement learning can be used for unsupervised learning, where an agent bases its actions on the previous rewards and punishments without necessarily even learning any information about the exact ways that its actions affect the world. Clustering is another method of unsupervised learning.

Reinforcement Learning

see main page Reinforcement Learning

Reinforcement learning is defined not by characterizing learning methods, but by characterizing a learning problem. Reinforcement learning is learning what to do - how to map situations to actions - so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. The reinforcement learning problem is deeply indebted to the idea of Markov decision processes (MDPs) from the field of optimal control.

Learning Topics

Programs

See also

Selected Publications

[10]

1940 ...

1950 ...

Claude Shannon, John McCarthy (eds.) (1956). Automata Studies. Annals of Mathematics Studies, No. 34
Alan Turing, Jack Copeland (editor) (2004). The Essential Turing, Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life plus The Secrets of Enigma. Oxford University Press, amazon, google books

1955 ...

Claude Shannon, John McCarthy (eds.) (1956). Automata Studies. Annals of Mathematics Studies, No. 34, pdf

1960 ...

1965 ...

1970 ...

1975 ...

  • Jacques Pitrat (1976). A Program to Learn to Play Chess. Pattern Recognition and Artificial Intelligence, pp. 399-419. Academic Press Ltd. London, UK. ISBN 0-12-170950-7.
  • Jacques Pitrat (1976). Realization of a Program Learning to Find Combinations at Chess. Computer Oriented Learning Processes (ed. J. Simon). Noordhoff, Groningen, The Netherlands.
  • Pericles Negri (1977). Inductive Learning in a Hierarchical Model for Representing Knowledge in Chess End Games. pdf
  • Ryszard Michalski, Pericles Negri (1977). An experiment on inductive learning in chess endgames. Machine Intelligence 8, pdf
  • Boris Stilman (1977). The Computer Learns. in 1976 US Computer Chess Championship, by David Levy, Computer Science Press, Woodland Hills, CA, pp. 83-90
  • Richard Sutton (1978). Single channel theory: A neuronal theory of learning. Brain Theory Newsletter 3, No. 3/4, pp. 72-75.
  • Ross Quinlan (1979). Discovering Rules by Induction from Large Collections of Examples. Expert Systems in the Micro-electronic Age, pp. 168-201. Edinburgh University Press (Introducing ID3)

1980 ...

1985 ...

1986

1987

1988

1989

1990 ...

1991

1992

1993

1994

1995 ...

1996

1997

1998

Miroslav Kubat, Ivan Bratko, Ryszard Michalski (1998). A Review of Machine Learning Methods. pdf

1999

2000 ...

2001

2002

2003

2004

2005 ...

2006

2007

2008

2009

2010 ...

2011

2012

István Szita (2012). Reinforcement Learning in Games. Chapter 17

2013

2014

2015 ...

2016

2017

2018

2019

2020 ...

Forum Posts

1998 ...

2000 ...

2005 ...

2010 ...

2015 ...

External Links

Machine Learning

AI

Learning I
Learning II

Chess

Supervised Learning

Unsupervised Learning

Reinforcement Learning

TD Learning

Statistics

Markov Models

NNs

ANNs

Topics

RNNs

Courses

References

  1. A depiction of the world's oldest continually operating university, the University of Bologna, Italy, by Laurentius de Voltolina, second half of 14th century, Learning from Wikipedia
  2. Inductive learning vs Deductive learning
  3. David Slate (1987). A Chess Program that uses its Transposition Table to Learn from Experience. ICCA Journal, Vol. 10, No. 2
  4. Robert Hyatt (1999). Book Learning - a Methodology to Tune an Opening Book Automatically. ICCA Journal, Vol. 22, No. 1
  5. Don Beal, Martin C. Smith (1997). Learning Piece Values Using Temporal Differences. ICCA Journal, Vol. 20, No. 3
  6. Don Beal, Martin C. Smith (1999). Learning Piece-Square Values using Temporal Differences. ICCA Journal, Vol. 22, No. 4
  7. Yngvi Björnsson, Tony Marsland (2001). Learning Search Control in Adversary Games. Advances in Computer Games 9, pdf
  8. Levente Kocsis, Jos Uiterwijk, Jaap van den Herik (2000). Learning Time Allocation using Neural Networks. CG 2000, postscript
  9. AI Horizon: Machine Learning, Part II: Supervised and Unsupervised Learning
  10. online papers from Machine Learning in Games by Jay Scott
  11. Rosenblatt's Contributions
  12. Ratio Club from Wikipedia
  13. Royal Radar Establishment from Wikipedia
  14. see Swap-off by Helmut Richter
  15. The abandonment of connectionism in 1969 - Wikipedia
  16. Frank Rosenblatt (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books
  17. Long short term memory from Wikipedia
  18. Tsumego from Wikipedia
  19. Learnable Evolution Model from Wikipedia
  20. Jean Hayes Michie (2001). Machine Learning and Light Relief: A Review of Truth from Trash. AI Magazine, Vol. 22 No. 4, pdf
  21. University of Bristol - Department of Computer Science - Technical Reports
  22. Generalized Hebbian Algorithm from Wikipedia
  23. Dap Hartmann (2010). Mimicking the Black Box - Genetically evolving evaluation functions and search algorithms. Review on Omid David's Ph.D. Thesis, ICGA Journal, Vol 33, No. 1
  24. Monte-Carlo Simulation Balancing - videolectures.net by David Silver
  25. MATLAB from Wikipedia
  26. Weka (machine learning) from Wikipedia
  27. Ms. Pac-Man from Wikipedia
  28. Demystifying Deep Reinforcement Learning by Tambet Matiisen, Nervana, December 21, 2015
  29. Patent US20150100530 - Methods and apparatus for reinforcement learning - Google Patents
  30. 2048 (video game) from Wikipedia
  31. Teaching Deep Convolutional Neural Networks to Play Go by Hiroshi Yamashita, The Computer-go Archives, December 14, 2014
  32. Teaching Deep Convolutional Neural Networks to Play Go by Michel Van den Bergh, CCC, December 16, 2014
  33. Convolutional neural network from Wikipedia
  34. Best Paper Awards | TAAI 2014
  35. DeepChess: Another deep-learning based chess program by Matthew Lai, CCC, October 17, 2016
  36. ICANN 2016 | Recipients of the best paper awards
  37. Using GAN to play chess by Evgeniy Zheltonozhskiy, CCC, February 23, 2017
  38. New DeepMind paper by GregNeto, CCC, November 21, 2019
  39. MuZero: Mastering Go, chess, shogi and Atari without rules
  40. Naive Bayes classifier from Wikipedia
  41. Amir Ban (2012). Automatic Learning of Evaluation, with Applications to Computer Chess. Discussion Paper 613, The Hebrew University of Jerusalem - Center for the Study of Rationality, Givat Ram
  42. Christopher Clark, Amos Storkey (2014). Teaching Deep Convolutional Neural Networks to Play Go. arXiv:1412.3409
  43. Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver (2014). Move Evaluation in Go Using Deep Convolutional Neural Networks. arXiv:1412.6564v1

Up one Level