Difference between revisions of "Match Statistics"

From Chessprogramming wiki
Jump to: navigation, search
 
(33 intermediate revisions by the same user not shown)
Line 298: Line 298:
 
         return ''
 
         return ''
 
</pre>
 
</pre>
 +
 +
Beside the above SPRT implementation using pentanomial frequencies and a simulation tool in [[Python]] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72962&start=6 Re: Stockfish Reverts 5 Recent Patches] by [[Michel Van den Bergh]], [[CCC]], February 02, 2020</ref> <ref>[https://github.com/vdbergh/pentanomial GitHub - vdbergh/pentanomial: SPRT for pentanomial frequencies and simulation tools] by [[Michel Van den Bergh]]</ref>, [[Michel Van den Bergh]] wrote a much faster multi-threaded [[C]] version <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72962&start=7 Re: Stockfish Reverts 5 Recent Patches] by [[Michel Van den Bergh]], [[CCC]], February 02, 2020</ref> <ref>[https://github.com/vdbergh/simul GitHub - vdbergh/simul: A multi-threaded pentanomial simulator] by [[Michel Van den Bergh]]</ref>.
 +
 
<span id="TournamentManager"></span>
 
<span id="TournamentManager"></span>
 
=Tournaments=
 
=Tournaments=
Line 321: Line 324:
 
=Publications=  
 
=Publications=  
 
==1920 ...==  
 
==1920 ...==  
 +
* [[Mathematician#LLThurstone|L. L. Thurstone]] ('''1927'''). ''[https://psycnet.apa.org/record/1928-00527-001 A law of comparative judgement]''. [https://en.wikipedia.org/wiki/Psychological_Review Psychological Review], Vol. 34, No. 4 <ref>[https://en.wikipedia.org/wiki/Law_of_comparative_judgment Law of comparative judgment - Wikipedia]</ref>
 
* [[Ernst Zermelo]] ('''1929'''). ''Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung''. [http://gdz.sub.uni-goettingen.de/dms/load/img/?IDDOC=82727 pdf] (German)
 
* [[Ernst Zermelo]] ('''1929'''). ''Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung''. [http://gdz.sub.uni-goettingen.de/dms/load/img/?IDDOC=82727 pdf] (German)
 +
==1930 ...==
 +
* [[Mathematician#ACAitken|Alexander Aitken]] ('''1935'''). ''[https://www.cambridge.org/core/journals/proceedings-of-the-royal-society-of-edinburgh/article/ivon-least-squares-and-linear-combination-of-observations/7106C26F19F2EBF75BCEE7FA285780B9 On Least Squares and Linear Combinations of Observations]''. Proceedings of the [https://en.wikipedia.org/wiki/Royal_Society_of_Edinburgh Royal Society of Edinburgh]
 +
==1940 ...==
 
* [[Mathematician#AWald|Abraham Wald]] ('''1945'''). ''Sequential Tests of Statistical Hypotheses''. [https://en.wikipedia.org/wiki/Annals_of_Mathematical_Statistics Annals of Mathematical Statistics], Vol. 16, No. 2, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: [http://projecteuclid.org/euclid.aoms/1177731118 10.1214/aoms/1177731118]
 
* [[Mathematician#AWald|Abraham Wald]] ('''1945'''). ''Sequential Tests of Statistical Hypotheses''. [https://en.wikipedia.org/wiki/Annals_of_Mathematical_Statistics Annals of Mathematical Statistics], Vol. 16, No. 2, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: [http://projecteuclid.org/euclid.aoms/1177731118 10.1214/aoms/1177731118]
 
* [[Mathematician#AWald|Abraham Wald]] ('''1947'''). ''Sequential Analysis''. [https://en.wikipedia.org/wiki/John_Wiley_%26_Sons John Wiley and Sons], [http://www.abebooks.com/book-search/title/sequential-analysis/author/abraham-wald/ AbeBooks]
 
* [[Mathematician#AWald|Abraham Wald]] ('''1947'''). ''Sequential Analysis''. [https://en.wikipedia.org/wiki/John_Wiley_%26_Sons John Wiley and Sons], [http://www.abebooks.com/book-search/title/sequential-analysis/author/abraham-wald/ AbeBooks]
* [[Mathematician#RABradley|Ralph A. Bradley]], [[Mathematician#METerry|Milton E. Terry]] ('''1952'''). ''[http://biomet.oxfordjournals.org/content/39/3-4/324.citation Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons]''. [https://en.wikipedia.org/wiki/Biometrika Biometrika], Vol. 39, Nos. 3/4, [https://en.wikipedia.org/wiki/Digital_object_identifier doi]: 10.2307/2334029, [http://www.jstor.org/stable/2334029?seq=1#page_scan_tab_contents JSTOR 2334029]
+
==1950 ...==
 +
* [[Mathematician#Mosteller|Frederick Mosteller]] ('''1951'''). ''[https://psycnet.apa.org/record/1951-07176-001 Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations]''. [https://en.wikipedia.org/wiki/Psychometrika Psychometrika], Vol. 16, No. 1
 +
* [[Mathematician#RABradley|Ralph A. Bradley]], [[Mathematician#METerry|Milton E. Terry]] ('''1952'''). ''[http://biomet.oxfordjournals.org/content/39/3-4/324.citation Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons]''. [https://en.wikipedia.org/wiki/Biometrika Biometrika], Vol. 39, Nos. 3/4
 
==1960 ...==
 
==1960 ...==
* [[Mathematician#FNDavid|Florence Nightingale David]] ('''1962'''). ''[http://books.google.com/books/about/Games_Gods_and_Gambling.html?id=8ddP8zNx9nQC&redir_esc=y Games, Gods & Gambling: A History of Probability and Statistical Ideas]''. Dover Publications, ISBN-13: 978-0486400235
+
* [[Mathematician#WAGlenn|William A. Glenn]], [[Mathematician#HADavid|Herbert  A. David]] ('''1960'''). ''[https://www.jstor.org/stable/2527957?seq=1#page_scan_tab_contents Ties in Paired-Comparison Experiments Using a Modified Thurstone-Mosteller Model]''. [https://en.wikipedia.org/wiki/Biometrics_(journal) Biometrics], Vol. 16, No. 1
 +
* [[Mathematician#FNDavid|Florence Nightingale David]] ('''1962'''). ''[http://books.google.com/books/about/Games_Gods_and_Gambling.html?id=8ddP8zNx9nQC&redir_esc=y Games, Gods & Gambling: A History of Probability and Statistical Ideas]''. [https://en.wikipedia.org/wiki/Dover_Publications Dover Publications]
 +
* [[Mathematician#PVRao|P. V. Rao]], [[Mathematician#LLKupper|L. L. Kupper]] ('''1967'''). ''[https://www.tandfonline.com/doi/abs/10.1080/01621459.1967.10482901 Ties in Paired-Comparison Experiments: A Generalization of the Bradley-Terry Model]''. [https://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Association Journal of the American Statistical Association], Vol. 62, No. 317
 +
==1970 ...==
 +
* [https://www.semanticscholar.org/author/Roger-R.-Davidson/32819150 Roger R. Davidson] ('''1970'''). ''[https://www.jstor.org/stable/2283595 On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments]''. [https://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Association Journal of the American Statistical Association], Vol. 64, No. 329
 +
* <span id="Bloss"></span>[http://what-when-how.com/earth-scientists/bloss-f-donald-earth-scientist/ F. Donald Bloss] ('''1973'''). ''[https://www.amazon.de/Rate-Your-Chess-F-Donald-Bloss/dp/0442008295 Rate your own Chess]''. Van Nostrand Reinhold Inc.
 
* [[Tony Marsland]], [[Paul Rushton]] ('''1973'''). ''[http://dl.acm.org/citation.cfm?id=805703 Mechanisms for Comparing Chess Programs].'' [[ACM 1973|ACM Annual Conference]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/Marsland-Rushton-ACM73 pdf]
 
* [[Tony Marsland]], [[Paul Rushton]] ('''1973'''). ''[http://dl.acm.org/citation.cfm?id=805703 Mechanisms for Comparing Chess Programs].'' [[ACM 1973|ACM Annual Conference]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/Marsland-Rushton-ACM73 pdf]
 
* [[James Gillogly]] ('''1978'''). ''Performance Analysis of the Technology Chess Program''. Ph.D. Thesis. Tech. Report CMU-CS-78-189, [[Carnegie Mellon University]], [http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-77-gillogly.pdf CMU-CS-77 pdf] » [[Tech]]
 
* [[James Gillogly]] ('''1978'''). ''Performance Analysis of the Technology Chess Program''. Ph.D. Thesis. Tech. Report CMU-CS-78-189, [[Carnegie Mellon University]], [http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-77-gillogly.pdf CMU-CS-77 pdf] » [[Tech]]
Line 332: Line 346:
 
* [[David Cahlander]] ('''1979'''). ''Strength of a Chess Playing Computer''. [[ICGA Journal#2_1|ICCA Newsletter, Vol. 2, No. 1]]
 
* [[David Cahlander]] ('''1979'''). ''Strength of a Chess Playing Computer''. [[ICGA Journal#2_1|ICCA Newsletter, Vol. 2, No. 1]]
 
* [[Jack Good]] ('''1979'''). ''On the Grading of Chess Players''. [[Personal Computing#3_3|Personal Computing, Vol. 3, No. 3]], pp. 47
 
* [[Jack Good]] ('''1979'''). ''On the Grading of Chess Players''. [[Personal Computing#3_3|Personal Computing, Vol. 3, No. 3]], pp. 47
 +
* <span id="Ratliff"></span>[[Gary L. Ratliff]] ('''1979'''). ''Practical Rating Program''. [[Personal Computing#3_9|Personal Computing, Vol. 3, No. 9]], pp. 62  » [[#Bloss|Bloss]]
 +
* [[Frieder Schwenkel]] ('''1979'''). ''Berating the ratings system''. [[Personal Computing#3_11|Personal Computing, Vol. 3, No. 11]], pp. 77 » [[#Ratliff|Ratliff]], [[#Bloss|Bloss]]
 +
* [[John Shaposka]] ('''1979'''). ''"JS" Takes the Bloss Test''. [[Personal Computing#3_12|Personal Computing, Vol. 3, No. 12]], pp. 75 » [[#Ratliff|Ratliff]], [[#Bloss|Bloss]]
 
==1980 ...==  
 
==1980 ...==  
 +
* Floyd R. Kirk ('''1980'''). ''Bloss Flunks Test''. [[Personal Computing#4_8|Personal Computing, Vol. 4, No. 8]], pp. 72  » [[#Ratliff|Ratliff]], [[#Bloss|Bloss]]
 
* [[John F. White]] ('''1981'''). ''[http://yourcomputeronline.wordpress.com/2010/12/10/survey-chess-games/ Survey-Chess Games]''. [[Your Computer]], [http://yourcomputeronline.wordpress.com/2010/10/31/augustseptember-1981-contents-and-editorial/ August/September 1981] <ref>[https://en.wikipedia.org/wiki/The_Master_Game The Master Game from Wikipedia]</ref>
 
* [[John F. White]] ('''1981'''). ''[http://yourcomputeronline.wordpress.com/2010/12/10/survey-chess-games/ Survey-Chess Games]''. [[Your Computer]], [http://yourcomputeronline.wordpress.com/2010/10/31/augustseptember-1981-contents-and-editorial/ August/September 1981] <ref>[https://en.wikipedia.org/wiki/The_Master_Game The Master Game from Wikipedia]</ref>
 
* [[Ken Thompson]] ('''1982'''). ''Computer Chess Strength''. [[Advances in Computer Chess 3]]
 
* [[Ken Thompson]] ('''1982'''). ''Computer Chess Strength''. [[Advances in Computer Chess 3]]
Line 340: Line 358:
 
==1990 ...==
 
==1990 ...==
 
* [[Hans Berliner]], [[Gordon Goetsch]], [[Murray Campbell]], [[Carl Ebeling]] ('''1990'''). ''Measuring the Performance Potential of Chess Programs.'' [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 43, No. 1
 
* [[Hans Berliner]], [[Gordon Goetsch]], [[Murray Campbell]], [[Carl Ebeling]] ('''1990'''). ''Measuring the Performance Potential of Chess Programs.'' [https://en.wikipedia.org/wiki/Artificial_Intelligence_%28journal%29 Artificial Intelligence], Vol. 43, No. 1
* [http://www.ics.uci.edu/~sternh/ Hal Stern] ('''1990'''). ''Are all Linear Paired Comparison Models Equivalent''. [http://www.dtic.mil/dtic/tr/fulltext/u2/a236856.pdf pdf]
+
* [[Mathematician#HSStern|Hal Stern]] ('''1990'''). ''Are all Linear Paired Comparison Models Equivalent''. [http://www.dtic.mil/dtic/tr/fulltext/u2/a236856.pdf pdf]
 
* [[Eric Hallsworth]] ('''1990'''). ''Speed, Processors and Ratings''. [[Selective Search|Computer Chess News Sheet]] 25, pp 6, [http://www.chesscomputeruk.com/SS_25.pdf pdf] hosted by [[Mike Watters]]
 
* [[Eric Hallsworth]] ('''1990'''). ''Speed, Processors and Ratings''. [[Selective Search|Computer Chess News Sheet]] 25, pp 6, [http://www.chesscomputeruk.com/SS_25.pdf pdf] hosted by [[Mike Watters]]
 
* [[Hans Berliner]], [[Danny Kopec]], [[Ed Northam]] ('''1991'''). ''A taxonomy of concepts for evaluating chess strength: examples from two difficult categories''. [[Advances in Computer Chess 6]], [http://www.sci.brooklyn.cuny.edu/%7Ekopec/Publications/Publications/O_20_C.pdf pdf]
 
* [[Hans Berliner]], [[Danny Kopec]], [[Ed Northam]] ('''1991'''). ''A taxonomy of concepts for evaluating chess strength: examples from two difficult categories''. [[Advances in Computer Chess 6]], [http://www.sci.brooklyn.cuny.edu/%7Ekopec/Publications/Publications/O_20_C.pdf pdf]
 
* [[Steve Maughan]] ('''1992'''). ''Are You Sure It's Better?'' [[Selective Search]] 40, pp. 21, [http://www.chesscomputeruk.com/SS_40.pdf pdf] hosted by [[Mike Watters]]
 
* [[Steve Maughan]] ('''1992'''). ''Are You Sure It's Better?'' [[Selective Search]] 40, pp. 21, [http://www.chesscomputeruk.com/SS_40.pdf pdf] hosted by [[Mike Watters]]
 
* [[Warren D. Smith]] ('''1993'''). ''Rating Systems for Gameplayers, and Learning''. [http://scorevoting.net/WarrenSmithPages/homepage/ratingspap.ps ps]
 
* [[Warren D. Smith]] ('''1993'''). ''Rating Systems for Gameplayers, and Learning''. [http://scorevoting.net/WarrenSmithPages/homepage/ratingspap.ps ps]
 +
* [[Mark E. Glickman]] ('''1993'''). ''Paired Comparison Models with Time Varying Parameters''. Ph.D. thesis, [[Harvard University]], advisor [[Mathematician#HSStern|Hal Stern]], [http://www.glicko.net/research/thesis.pdf pdf]
 
* [[Mark E. Glickman]] ('''1995'''). ''A Comprehensive Guide To Chess Ratings''. [http://www.glicko.net/research/acjpaper.pdf pdf]
 
* [[Mark E. Glickman]] ('''1995'''). ''A Comprehensive Guide To Chess Ratings''. [http://www.glicko.net/research/acjpaper.pdf pdf]
 
* [[Mark E. Glickman]], [[Christopher Chabris]] ('''1996'''). ''Using Chess Ratings as Data in Psychological Research''. [https://pdfs.semanticscholar.org/1ffd/3432f56476f0047426b37f7f433f5a6575b0.pdf pdf]
 
* [[Mark E. Glickman]], [[Christopher Chabris]] ('''1996'''). ''Using Chess Ratings as Data in Psychological Research''. [https://pdfs.semanticscholar.org/1ffd/3432f56476f0047426b37f7f433f5a6575b0.pdf pdf]
 
* [[Robert Hyatt]], [[Monroe Newborn]] ('''1997'''). ''CRAFTY Goes Deep''. [[ICGA Journal#20_2|ICCA Journal, Vol. 20, No. 2]] » [[Crafty]]
 
* [[Robert Hyatt]], [[Monroe Newborn]] ('''1997'''). ''CRAFTY Goes Deep''. [[ICGA Journal#20_2|ICCA Journal, Vol. 20, No. 2]] » [[Crafty]]
 +
* [[Mark E. Glickman]], [[Mathematician#ACJones|Albyn C. Jones]] ('''1999'''). ''Rating the Chess Rating System''. [http://www.glicko.net/research/chance.pdf pdf]
 
==2000 ...==  
 
==2000 ...==  
 
* [[Ernst A. Heinz]] ('''2000'''). ''[http://link.springer.com/chapter/10.1007/3-540-45579-5_18 New Self-Play Results in Computer Chess]''. [[CG 2000]]
 
* [[Ernst A. Heinz]] ('''2000'''). ''[http://link.springer.com/chapter/10.1007/3-540-45579-5_18 New Self-Play Results in Computer Chess]''. [[CG 2000]]
Line 375: Line 395:
 
* [[Diogo R. Ferreira]] ('''2012'''). ''Determining the Strength of Chess Players based on actual Play''. [[ICGA Journal#35_1|ICGA Journal, Vol. 35, No. 1]]
 
* [[Diogo R. Ferreira]] ('''2012'''). ''Determining the Strength of Chess Players based on actual Play''. [[ICGA Journal#35_1|ICGA Journal, Vol. 35, No. 1]]
 
* [[Daniel Shawul]], [[Rémi Coulom]] ('''2013'''). ''Paired Comparisons with Ties: Modeling Game Outcomes in Chess''.  <ref>[http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=471004&t=44180 Re: EloStat, Bayeselo and Ordo] by [[Rémi Coulom]], [[CCC]], June 25, 2012</ref> <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72087&start=3 Re: Understanding and Pushing the Limits of the Elo Rating Algorithm] by [[Daniel Shawul]], [[CCC]], October 15, 2019</ref>  
 
* [[Daniel Shawul]], [[Rémi Coulom]] ('''2013'''). ''Paired Comparisons with Ties: Modeling Game Outcomes in Chess''.  <ref>[http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=471004&t=44180 Re: EloStat, Bayeselo and Ordo] by [[Rémi Coulom]], [[CCC]], June 25, 2012</ref> <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72087&start=3 Re: Understanding and Pushing the Limits of the Elo Rating Algorithm] by [[Daniel Shawul]], [[CCC]], October 15, 2019</ref>  
* [[Diogo R. Ferreira]] ('''2013'''). ''The Impact of the Search Depth on Chess Playing Strength''. [[ICGA Journal#36_2|ICGA Journal, Vol. 36, No. 2]]
+
* [[Diogo R. Ferreira]] ('''2013'''). ''The Impact of the Search Depth on Chess Playing Strength''. [[ICGA Journal#36_2|ICGA Journal, Vol. 36, No. 2]] » [[Depth]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Houdini]] <ref>[https://www.hiarcs.net/forums/viewtopic.php?t=10004 Ply versus ELO] by Greg, [[Computer Chess Forums|HIARCS Forum]], May 30, 2020 » [[Diogo R. Ferreira#Impact|Diogo R. Ferreira - Impact of the Search Depth ...]]</ref>
 
* [[Miguel A. Ballicora]] ('''2014'''). ''ORDO v0.9.6 Ratings for chess and other games''. September 2014, [https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxnYXZpb3RhY2hlc3NlbmdpbmV8Z3g6NmQ0NmNhNGM4YjA3YTc5ZQ pdf] » [[Ordo]] <ref>[https://sites.google.com/site/gaviotachessengine/ordo Ordo] by [[Miguel A. Ballicora]]</ref>
 
* [[Miguel A. Ballicora]] ('''2014'''). ''ORDO v0.9.6 Ratings for chess and other games''. September 2014, [https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxnYXZpb3RhY2hlc3NlbmdpbmV8Z3g6NmQ0NmNhNGM4YjA3YTc5ZQ pdf] » [[Ordo]] <ref>[https://sites.google.com/site/gaviotachessengine/ordo Ordo] by [[Miguel A. Ballicora]]</ref>
 
* [[Don Dailey]], [[Adam Hair]], [[Mark Watkins]] ('''2014'''). ''[http://www.sciencedirect.com/science/article/pii/S1875952113000177 Move Similarity Analysis in Chess Programs]''. [http://www.journals.elsevier.com/entertainment-computing/ Entertainment Computing], Vol. 5, No. 3, [http://magma.maths.usyd.edu.au/~watkins/papers/DHW.pdf preprint as pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38772 Pairwise Analysis of Chess Engine Move Selections] by [[Adam Hair]], [[CCC]], April 17, 2011</ref>
 
* [[Don Dailey]], [[Adam Hair]], [[Mark Watkins]] ('''2014'''). ''[http://www.sciencedirect.com/science/article/pii/S1875952113000177 Move Similarity Analysis in Chess Programs]''. [http://www.journals.elsevier.com/entertainment-computing/ Entertainment Computing], Vol. 5, No. 3, [http://magma.maths.usyd.edu.au/~watkins/papers/DHW.pdf preprint as pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38772 Pairwise Analysis of Chess Engine Move Selections] by [[Adam Hair]], [[CCC]], April 17, 2011</ref>
 
* [[Kenneth W. Regan]], [[Tamal T. Biswas]], [[Jason Zhou]] ('''2014'''). ''Human and Computer Preferences at Chess''. [http://www.cse.buffalo.edu/~regan/papers/pdf/RBZ14aaai.pdf pdf]
 
* [[Kenneth W. Regan]], [[Tamal T. Biswas]], [[Jason Zhou]] ('''2014'''). ''Human and Computer Preferences at Chess''. [http://www.cse.buffalo.edu/~regan/papers/pdf/RBZ14aaai.pdf pdf]
 
* [[Erik Varend]] ('''2014'''). ''Quality of play in chess and methods for measuring''. [http://www.chessanalysis.ee/Quality%20of%20play%20in%20chess%20and%20methods%20for%20measuring.pdf pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=54571 Questions regarding rating systems of humans and engines] by [[Erik Varend]], [[CCC]], December 06, 2014</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=60721 chess statistics scientific article] by Nuno Sousa, [[CCC]], July 06, 2016</ref>
 
* [[Erik Varend]] ('''2014'''). ''Quality of play in chess and methods for measuring''. [http://www.chessanalysis.ee/Quality%20of%20play%20in%20chess%20and%20methods%20for%20measuring.pdf pdf] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=54571 Questions regarding rating systems of humans and engines] by [[Erik Varend]], [[CCC]], December 06, 2014</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=60721 chess statistics scientific article] by Nuno Sousa, [[CCC]], July 06, 2016</ref>
 +
* [[Mathematician#XiaoouLi|Xiaoou Li]], [[Mathematician#JingchenLiu|Jingchen Liu]], [[Mathematician#ZhiliangYing|Zhiliang Ying]] ('''2014'''). ''[https://www.tandfonline.com/doi/abs/10.1080/07474946.2014.961861 Generalized Sequential Probability Ratio Test for Separate Families of Hypotheses]''. [https://www.tandfonline.com/loi/lsqa20?open=36&year=2017&repitition=0 Sequential Analysis], Vol. 33, No. 4,  [http://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf pdf]
 
==2015 ...==  
 
==2015 ...==  
 
* [[Tamal T. Biswas]], [[Kenneth W. Regan]] ('''2015'''). ''Quantifying Depth and Complexity of Thinking and Knowledge''. [http://www.icaart.org/EuropeanProjectSpace.aspx?y=2015 ICAART 2015], [http://www.cse.buffalo.edu/~regan/papers/pdf/BiReICAART15CR.pdf pdf]
 
* [[Tamal T. Biswas]], [[Kenneth W. Regan]] ('''2015'''). ''Quantifying Depth and Complexity of Thinking and Knowledge''. [http://www.icaart.org/EuropeanProjectSpace.aspx?y=2015 ICAART 2015], [http://www.cse.buffalo.edu/~regan/papers/pdf/BiReICAART15CR.pdf pdf]
Line 387: Line 408:
 
* [[Jean-Marc Alliot]] ('''2017'''). ''Who is the Master''? [[ICGA Journal#39_1|ICGA Journal, Vol. 39, No. 1]], [http://www.alliot.fr/CHESS/draft-icga-39-1.pdf draft as pdf] » [[Stockfish]], [[Jean-Marc Alliot#WhoistheMaster|Who is the Master?]]
 
* [[Jean-Marc Alliot]] ('''2017'''). ''Who is the Master''? [[ICGA Journal#39_1|ICGA Journal, Vol. 39, No. 1]], [http://www.alliot.fr/CHESS/draft-icga-39-1.pdf draft as pdf] » [[Stockfish]], [[Jean-Marc Alliot#WhoistheMaster|Who is the Master?]]
 
* [[Mathematician#IAdler|Ilan Adler]], [[Mathematician#YangCao|Yang Cao]], [[Richard Karp]], [[Mathematician#EAPekoz|Erol A. Peköz]], [[Mathematician#SMRoss|Sheldon M. Ross]] ('''2017'''). ''[https://pubsonline.informs.org/doi/10.1287/opre.2017.1657 Random Knockout Tournaments]''. [https://en.wikipedia.org/wiki/Operations_Research_(journal) Operations Research], Vol. 65, No. 6, [https://arxiv.org/abs/1612.04448 arXiv:1612.04448]
 
* [[Mathematician#IAdler|Ilan Adler]], [[Mathematician#YangCao|Yang Cao]], [[Richard Karp]], [[Mathematician#EAPekoz|Erol A. Peköz]], [[Mathematician#SMRoss|Sheldon M. Ross]] ('''2017'''). ''[https://pubsonline.informs.org/doi/10.1287/opre.2017.1657 Random Knockout Tournaments]''. [https://en.wikipedia.org/wiki/Operations_Research_(journal) Operations Research], Vol. 65, No. 6, [https://arxiv.org/abs/1612.04448 arXiv:1612.04448]
 +
* [[Michel Van den Bergh]] ('''2017'''). ''A Practical Introduction to the GSPRT''. [http://hardy.uhasselt.be/Toga/GSPRT_approximation.pdf pdf]
 
* [[Leszek Szczecinski]], [[Aymen Djebbi]] ('''2019'''). ''Understanding and Pushing the Limits of the Elo Rating Algorithm''. [https://arxiv.org/abs/1910.06081 arXiv:1910.06081] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72087 Understanding and Pushing the Limits of the Elo Rating Algorithm] by [[Michel Van den Bergh]], [[CCC]], October 15, 2019</ref>
 
* [[Leszek Szczecinski]], [[Aymen Djebbi]] ('''2019'''). ''Understanding and Pushing the Limits of the Elo Rating Algorithm''. [https://arxiv.org/abs/1910.06081 arXiv:1910.06081] <ref>[http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72087 Understanding and Pushing the Limits of the Elo Rating Algorithm] by [[Michel Van den Bergh]], [[CCC]], October 15, 2019</ref>
 +
* [[Michel Van den Bergh]] ('''2019'''). ''The Generalized Maximum Likelihood Ratio for the Expectation Value of a Distribution''. [http://hardy.uhasselt.be/Fishtest/support_MLE_multinomial.pdf pdf]
  
 
=Forum & Blog Postings=  
 
=Forum & Blog Postings=  
Line 414: Line 437:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13800 a beat b,b beat c,c beat a question] by [[Uri Blass]], [[CCC]], May 16, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13800 a beat b,b beat c,c beat a question] by [[Uri Blass]], [[CCC]], May 16, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=16545 how to do a proper statistical test] by [[Rein Halbersma]], [[CCC]], September 19, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=16545 how to do a proper statistical test] by [[Rein Halbersma]], [[CCC]], September 19, 2007
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=24590 Comparing two version of the same engine] by [[Fermin Serrano]], [[CCC]], October 26, 2008
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=25461 Debate: testing at fast time controls] by [[Fermin Serrano]], [[CCC]], December 15, 2008
 
* [http://www.talkchess.com/forum/viewtopic.php?t=27516 Elo Calcuation] by [[Edmund Moshammer]], [[CCC]], April 19, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=27516 Elo Calcuation] by [[Edmund Moshammer]], [[CCC]], April 19, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=30624 Likelihood of superiority] by [[Marco Costalba]], [[CCC]], November 15, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=30624 Likelihood of superiority] by [[Marco Costalba]], [[CCC]], November 15, 2009
Line 422: Line 447:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=36592 Do You really need 1000s of games for testing?] by [[Jouni Uski]], [[CCC]], November 04, 2010
 
* [http://www.talkchess.com/forum/viewtopic.php?t=36592 Do You really need 1000s of games for testing?] by [[Jouni Uski]], [[CCC]], November 04, 2010
 
* [http://www.talkchess.com/forum/viewtopic.php?t=36979 GUI idea: Testing until certainty] by [[Albert Silver]], [[CCC]], December 07, 2010
 
* [http://www.talkchess.com/forum/viewtopic.php?t=36979 GUI idea: Testing until certainty] by [[Albert Silver]], [[CCC]], December 07, 2010
* [http://www.talkchess.com/forum/viewtopic.php?t=37056 SPRT and Engine testing] by [[Adam Hair]], [[CCC]], December 13, 2010 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=37056 SPRT and Engine testing] by [[Adam Hair]], [[CCC]], December 13, 2010 » [[#SPRT|SPRT]]
 
'''2011'''
 
'''2011'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=38698 Testing very small changes ( <= 5 ELO points of gain)] by [[Fermin Serrano]], [[CCC]], April 08, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38772 Pairwise Analysis of Chess Engine Move Selections] by [[Adam Hair]], [[CCC]], April 17, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=38772 Pairwise Analysis of Chess Engine Move Selections] by [[Adam Hair]], [[CCC]], April 17, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39511 Ply vs ELO] by Andriy Dzyben, [[CCC]], June 28, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=39511 Ply vs ELO] by Andriy Dzyben, [[CCC]], June 28, 2011
Line 429: Line 455:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=41341 Increase in Elo ..Question For The Experts] by [[Steve Blincoe|Steve B]], [[CCC]], December 05, 2011
 
* [http://www.talkchess.com/forum/viewtopic.php?t=41341 Increase in Elo ..Question For The Experts] by [[Steve Blincoe|Steve B]], [[CCC]], December 05, 2011
 
'''2012'''
 
'''2012'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=42032 Are all test the same?] by [[Fermin Serrano]], [[CCC]], January 17, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42729 Advantage for White; Bayeselo (to Rémi Coulom)] by [[Edmund Moshammer]], [[CCC]], March 03, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42729 Advantage for White; Bayeselo (to Rémi Coulom)] by [[Edmund Moshammer]], [[CCC]], March 03, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42737 Pairwise Analysis of Chess Engine Move Selections Revisited] by [[Adam Hair]], [[CCC]], March 04, 2012
 
* [http://www.talkchess.com/forum/viewtopic.php?t=42737 Pairwise Analysis of Chess Engine Move Selections Revisited] by [[Adam Hair]], [[CCC]], March 04, 2012
Line 458: Line 485:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=47885 Fishtest Distributed Testing Framework] by [[Marco Costalba]], [[CCC]], May 01, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=47885 Fishtest Distributed Testing Framework] by [[Marco Costalba]], [[CCC]], May 01, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48649 The influence of the length of openings] by [[Kai Laskos]], [[CCC]], July 14, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48649 The influence of the length of openings] by [[Kai Laskos]], [[CCC]], July 14, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=48733 Scaling at 2x nodes (or doubling time control)] by [[Kai Laskos]], [[CCC]], July 23, 2013 » [[Match Statistics#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Houdini]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=48733 Scaling at 2x nodes (or doubling time control)] by [[Kai Laskos]], [[CCC]], July 23, 2013 » [[#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Houdini]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48863 Type I error in LOS based early stopping rule] by [[Kai Laskos]], [[CCC]], August 06, 2013 <ref>[https://en.wikipedia.org/wiki/Type_I_and_type_II_errors Type I and type II errors from Wikipedia]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48863 Type I error in LOS based early stopping rule] by [[Kai Laskos]], [[CCC]], August 06, 2013 <ref>[https://en.wikipedia.org/wiki/Type_I_and_type_II_errors Type I and type II errors from Wikipedia]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48864 How much elo is pondering worth] by [[Michel Van den Bergh]], [[CCC]], August 07, 2013 » [[Pondering]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=48864 How much elo is pondering worth] by [[Michel Van den Bergh]], [[CCC]], August 07, 2013 » [[Pondering]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=49248 Contempt and the ELO model] by [[Michel Van den Bergh]], [[CCC]], September 05, 2013 » [[Contempt Factor]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=49248 Contempt and the ELO model] by [[Michel Van den Bergh]], [[CCC]], September 05, 2013 » [[Contempt Factor]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=49393 1 draw=1 win + 1 loss (always!)] by [[Michel Van den Bergh]], [[CCC]], September 19, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=49393 1 draw=1 win + 1 loss (always!)] by [[Michel Van den Bergh]], [[CCC]], September 19, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=49584 SPRT and narrowing of (elo1 - elo0) difference] by [[Jesús Muñoz]], [[CCC]], October 05, 2013 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=49584 SPRT and narrowing of (elo1 - elo0) difference] by [[Jesús Muñoz]], [[CCC]], October 05, 2013 » [[#SPRT|SPRT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=49727 sprt and margin of error] by [[Larry Kaufman]], [[CCC]], October 15, 2013 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=49727 sprt and margin of error] by [[Larry Kaufman]], [[CCC]], October 15, 2013 » [[#SPRT|SPRT]]
 
* [http://www.open-chess.org/viewtopic.php?f=5&t=2477 How (not) to use SPRT ?] by [[Mark Watkins|BB+]], [[Computer Chess Forums|OpenChess Forum]], October 19, 2013
 
* [http://www.open-chess.org/viewtopic.php?f=5&t=2477 How (not) to use SPRT ?] by [[Mark Watkins|BB+]], [[Computer Chess Forums|OpenChess Forum]], October 19, 2013
 
* [http://www.talkchess.com/forum/viewtopic.php?t=50266 Houdini, much weaker engines, and Arpad Elo] by [[Kai Laskos]], [[CCC]], November 29, 2013 » [[Houdini]], [[Pawn Advantage, Win Percentage, and Elo]] <ref>[https://en.wikipedia.org/wiki/Arpad_Elo Arpad Elo - Wikipedia]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=50266 Houdini, much weaker engines, and Arpad Elo] by [[Kai Laskos]], [[CCC]], November 29, 2013 » [[Houdini]], [[Pawn Advantage, Win Percentage, and Elo]] <ref>[https://en.wikipedia.org/wiki/Arpad_Elo Arpad Elo - Wikipedia]</ref>
Line 475: Line 502:
 
* [https://chesscomputer.tumblr.com/post/98632536555/using-the-stockfish-position-evaluation-score-to/embed Using the Stockfish position evaluation score to predict victory probability] by unavoidablegrain, [https://en.wikipedia.org/wiki/Tumblr Tumblr], September 28, 2014 » [[Pawn Advantage, Win Percentage, and Elo]], [[Stockfish]]
 
* [https://chesscomputer.tumblr.com/post/98632536555/using-the-stockfish-position-evaluation-score-to/embed Using the Stockfish position evaluation score to predict victory probability] by unavoidablegrain, [https://en.wikipedia.org/wiki/Tumblr Tumblr], September 28, 2014 » [[Pawn Advantage, Win Percentage, and Elo]], [[Stockfish]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=53891 Elo estimation using quasi-Monte Carlo integration] by Branko Radovanovic, [[CCC]], September 30, 2014
 
* [http://www.talkchess.com/forum/viewtopic.php?t=53891 Elo estimation using quasi-Monte Carlo integration] by Branko Radovanovic, [[CCC]], September 30, 2014
* [http://www.talkchess.com/forum/viewtopic.php?t=54331 SPRT question] by [[Robert Hyatt]], [[CCC]], November 13, 2014 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=54331 SPRT question] by [[Robert Hyatt]], [[CCC]], November 13, 2014 » [[#SPRT|SPRT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=54359 Usage sprt / cutechess-cli] by [[Michael Hoffmann]], [[CCC]], November 16, 2014 » [[Cutechess-cli]], [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=54359 Usage sprt / cutechess-cli] by [[Michael Hoffmann]], [[CCC]], November 16, 2014 » [[Cutechess-cli]], [[#SPRT|SPRT]]
 
==2015 ...==  
 
==2015 ...==  
* [http://www.talkchess.com/forum/viewtopic.php?t=55130 2-SPRT] by [[Michel Van den Bergh]], [[CCC]], January 28, 2015 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=55130 2-SPRT] by [[Michel Van den Bergh]], [[CCC]], January 28, 2015 » [[#SPRT|SPRT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=55893 Script for computing SPRT probabilities] by [[Michel Van den Bergh]], [[CCC]], April 05, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=55893 Script for computing SPRT probabilities] by [[Michel Van den Bergh]], [[CCC]], April 05, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56067 Maximum ELO gain per test game played?] by Forrest Hoch, [[CCC]], April 20, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56067 Maximum ELO gain per test game played?] by Forrest Hoch, [[CCC]], April 20, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=56095 Getting SPRT right] by [[Alexandru Mosoi]], [[CCC]], April 22, 2015 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=56095 Getting SPRT right] by [[Alexandru Mosoi]], [[CCC]], April 22, 2015 » [[#SPRT|SPRT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=56358 SPRT questions] by [[Uri Blass]], [[CCC]], May 15, 2015 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=56358 SPRT questions] by [[Uri Blass]], [[CCC]], May 15, 2015 » [[#SPRT|SPRT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56426 Adam Hair's article on Pairwise comparison of engines] by [[Charles Roberson]], [[CCC]], May 19, 2015  
 
* [http://www.talkchess.com/forum/viewtopic.php?t=56426 Adam Hair's article on Pairwise comparison of engines] by [[Charles Roberson]], [[CCC]], May 19, 2015  
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57223 computing elo of multiple chess engines] by [[Alexandru Mosoi]], [[CCC]], August 09, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57223 computing elo of multiple chess engines] by [[Alexandru Mosoi]], [[CCC]], August 09, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57270 Some musings about search] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 14, 2015 » [[Automated Tuning]], [[Search]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57270 Some musings about search] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 14, 2015 » [[Automated Tuning]], [[Search]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57437 Bullet vs regular time control, say 40/4m CCRL/CEGT] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 29, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57437 Bullet vs regular time control, say 40/4m CCRL/CEGT] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 29, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=57465 The SPRT without draw model, elo model or whatever...] by [[Michel Van den Bergh]], [[CCC]], September 01, 2015 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=57465 The SPRT without draw model, elo model or whatever...] by [[Michel Van den Bergh]], [[CCC]], September 01, 2015 » [[#SPRT|SPRT]]
 
: [http://talkchess.com/forum/viewtopic.php?t=57465&start=19 Re: The SPRT without draw model, elo model or whatever..] by [[Michel Van den Bergh]], [[CCC]], August 18, 2016
 
: [http://talkchess.com/forum/viewtopic.php?t=57465&start=19 Re: The SPRT without draw model, elo model or whatever..] by [[Michel Van den Bergh]], [[CCC]], August 18, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57482 Name for elo without draws?] by [[Marcel van Kervinck]], [[CCC]], September 02, 2015
 
* [http://www.talkchess.com/forum/viewtopic.php?t=57482 Name for elo without draws?] by [[Marcel van Kervinck]], [[CCC]], September 02, 2015
Line 515: Line 542:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61636 Differences between top engines related to "style"] by [[Kai Laskos]], October 07, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61636 Differences between top engines related to "style"] by [[Kai Laskos]], October 07, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61781 SPRT when not used for self testing] by [[Andrew Grant]], [[CCC]], October 21, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61781 SPRT when not used for self testing] by [[Andrew Grant]], [[CCC]], October 21, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=61784 Doubling of time control] by [[Andreas Strangmüller]], [[CCC]], October 21, 2016 » [[Match Statistics#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Komodo]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=61784 Doubling of time control] by [[Andreas Strangmüller]], [[CCC]], October 21, 2016 » [[#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Komodo]]
* [http://www.talkchess.com/forum/viewtopic.php?t=62146 Stockfish 8 - Double time control vs. 2 threads] by [[Andreas Strangmüller]], [[CCC]],  November 15, 2016 » [[Match Statistics#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Stockfish]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=62146 Stockfish 8 - Double time control vs. 2 threads] by [[Andreas Strangmüller]], [[CCC]],  November 15, 2016 » [[#DoublingTC|Doubling TC]], [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Stockfish]]
 
* [https://rjlipton.wordpress.com/2016/11/30/when-data-serves-turkey/ When Data Serves Turkey] by [[Kenneth W. Regan|Ken Regan]], [https://rjlipton.wordpress.com/ Gödel's Lost Letter and P=NP], November 30, 2016
 
* [https://rjlipton.wordpress.com/2016/11/30/when-data-serves-turkey/ When Data Serves Turkey] by [[Kenneth W. Regan|Ken Regan]], [https://rjlipton.wordpress.com/ Gödel's Lost Letter and P=NP], November 30, 2016
 
* [https://rjlipton.wordpress.com/2016/12/08/magnus-and-the-turkey-grinder/ Magnus and the Turkey Grinder] by [[Kenneth W. Regan|Ken Regan]], [https://rjlipton.wordpress.com/ Gödel's Lost Letter and P=NP], December 08, 2016 » [[Pawn Advantage, Win Percentage, and Elo]] <ref>[https://en.wikipedia.org/wiki/World_Chess_Championship_2016 World Chess Championship 2016 from Wikipedia]</ref>  
 
* [https://rjlipton.wordpress.com/2016/12/08/magnus-and-the-turkey-grinder/ Magnus and the Turkey Grinder] by [[Kenneth W. Regan|Ken Regan]], [https://rjlipton.wordpress.com/ Gödel's Lost Letter and P=NP], December 08, 2016 » [[Pawn Advantage, Win Percentage, and Elo]] <ref>[https://en.wikipedia.org/wiki/World_Chess_Championship_2016 World Chess Championship 2016 from Wikipedia]</ref>  
Line 522: Line 549:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62438 Statistical Interpretation] by [[Dennis Sceviour]], [[CCC]], December 10, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62438 Statistical Interpretation] by [[Dennis Sceviour]], [[CCC]], December 10, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62510 Absolute ELO scale] by [[Nicu Ionita]], [[CCC]], December 17, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62510 Absolute ELO scale] by [[Nicu Ionita]], [[CCC]], December 17, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=62598 A question about SPRT] by [[Andrew Grant]], [[CCC]], December 25, 2016 » [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=62598 A question about SPRT] by [[Andrew Grant]], [[CCC]], December 25, 2016 » [[#SPRT|SPRT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62622 Diminishing returns and hyperthreading] by [[Kai Laskos]], [[CCC]], December 27, 2016 » [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Thread]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62622 Diminishing returns and hyperthreading] by [[Kai Laskos]], [[CCC]], December 27, 2016 » [[Depth#DiminishingReturns|Diminishing Returns]], [[Playing Strength]], [[Thread]]
 
'''2017'''
 
'''2017'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62868 Progress in 30 years by four intervals of 7-8 years] by [[Kai Laskos]], [[CCC]], January 19, 2017 » [[Playing Strength]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62868 Progress in 30 years by four intervals of 7-8 years] by [[Kai Laskos]], [[CCC]], January 19, 2017 » [[Playing Strength]]
* [http://www.talkchess.com/forum/viewtopic.php?t=62922 sprt tourney manager] by [[Richard Delorme]], [[CCC]], January 24, 2017 » [[Amoeba#TournamentManager|Amoeba Tournament Manager]], [[Match Statistics#SPRT|SPRT]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=62922 sprt tourney manager] by [[Richard Delorme]], [[CCC]], January 24, 2017 » [[Amoeba#TournamentManager|Amoeba Tournament Manager]], [[#SPRT|SPRT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63327 Binomial distribution for chess statistics] by [[Lyudmil Antonov]], [[CCC]], March 03, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63327 Binomial distribution for chess statistics] by [[Lyudmil Antonov]], [[CCC]], March 03, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63355 Higher than expected by me efficiency of Ponder ON] by [[Kai Laskos]], [[CCC]], March 06, 2017 » [[Pondering]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=63355 Higher than expected by me efficiency of Ponder ON] by [[Kai Laskos]], [[CCC]], March 06, 2017 » [[Pondering]]
Line 546: Line 573:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64683 Invariance with time control of rating schemes] by [[Kai Laskos]], [[CCC]], July 22, 2017 <ref>[http://hardy.uhasselt.be/Toga/normalized_elo.pdf Normalized Elo] (pdf) by [[Michel Van den Bergh]]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64683 Invariance with time control of rating schemes] by [[Kai Laskos]], [[CCC]], July 22, 2017 <ref>[http://hardy.uhasselt.be/Toga/normalized_elo.pdf Normalized Elo] (pdf) by [[Michel Van den Bergh]]</ref>
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64719 Ways to avoid "Draw Death" in Computer Chess] by [[Kai Laskos]], [[CCC]], July 25, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64719 Ways to avoid "Draw Death" in Computer Chess] by [[Kai Laskos]], [[CCC]], July 25, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=64824&start=2 SMP NPS measurements] by [[Peter Österlund]], [[CCC]], August 06, 2017 » [[Lazy SMP]], [[Parallel Search]], [[Nodes per second]]
+
* [http://www.talkchess.com/forum/viewtopic.php?t=64824&start=2 SMP NPS measurements] by [[Peter Österlund]], [[CCC]], August 06, 2017 » [[Lazy SMP]], [[Parallel Search]], [[Nodes per Second]]
 
: [http://www.talkchess.com/forum/viewtopic.php?t=64824&start=3 ELO measurements] by [[Peter Österlund]], [[CCC]], August 06, 2017 » [[Playing Strength]]
 
: [http://www.talkchess.com/forum/viewtopic.php?t=64824&start=3 ELO measurements] by [[Peter Österlund]], [[CCC]], August 06, 2017 » [[Playing Strength]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65061 What is a Match?] by [[Henk van den Belt]], [[CCC]], September 01, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65061 What is a Match?] by [[Henk van den Belt]], [[CCC]], September 01, 2017
Line 555: Line 582:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66000 ELO progression measured by year] by [[Ed Schroder]], [[CCC]], December 13, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66000 ELO progression measured by year] by [[Ed Schroder]], [[CCC]], December 13, 2017
 
'''2018'''
 
'''2018'''
* [https://groups.google.com/d/msg/fishcooking/nqgLNUfjkok/gfMr7amXCAAJ Wrong use of SPRT] by [[Uri Blass]], [[Computer Chess Forums|FishCooking]], February 09, 2018 » [[Contempt Factor]], [[Match Statistics#SPRT|SPRT]]
+
* [https://groups.google.com/d/msg/fishcooking/nqgLNUfjkok/gfMr7amXCAAJ Wrong use of SPRT] by [[Uri Blass]], [[Computer Chess Forums|FishCooking]], February 09, 2018 » [[Contempt Factor]], [[#SPRT|SPRT]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66775 Feed bayeselo with pure game results without PGN] by [[Sergei Markoff|Sergei S. Markoff]], [[CCC]], March 08, 2018
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66775 Feed bayeselo with pure game results without PGN] by [[Sergei Markoff|Sergei S. Markoff]], [[CCC]], March 08, 2018
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66793 Elo measurement of contempt in SF in self-play] by [[Michel Van den Bergh]], [[CCC]], March 10, 2018 » [[Contempt Factor|Contempt]], [[Playing Strength]], [[Stockfish]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=66793 Elo measurement of contempt in SF in self-play] by [[Michel Van den Bergh]], [[CCC]], March 10, 2018 » [[Contempt Factor|Contempt]], [[Playing Strength]], [[Stockfish]]
Line 569: Line 596:
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69672 Schizophrenic rating model for Leela] by [[Kai Laskos]], [[CCC]], January 21, 2019 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=69672 Schizophrenic rating model for Leela] by [[Kai Laskos]], [[CCC]], January 21, 2019 » [[Leela Chess Zero]]
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71419 best way to determine elos of a group] by [[Daniel Shawul]], [[CCC]], July 30, 2019
 
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71419 best way to determine elos of a group] by [[Daniel Shawul]], [[CCC]], July 30, 2019
 +
==2020 ...==
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72962 Stockfish Reverts 5 Recent Patches] by Deberger, [[CCC]], February 01, 2020 » [[Stockfish]]
 +
: [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=72962&start=6 Re: Stockfish Reverts 5 Recent Patches] by [[Michel Van den Bergh]], [[CCC]], February 02, 2020
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73419 repackaging of Fishtest's SPRT calculation code] by [[Jon Dart]], [[CCC]], March 20, 2020 » [[#SPRT|SPRT]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=74037 Stockfish_dev is probably stronger than Sargon 1978 v1.00] by [[Kai Laskos]], [[CCC]], May 29, 2020 » [[Stockfish]], [[Sargon]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=74319 Throwing out draws to calculate Elo] by [[Dann Corbit]], [[CCC]], June 29, 2020
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75006 Jackknife estimate of the variance of engine results for Arasan test suite] by [[Kai Laskos]], [[CCC]], September 05, 2020 » [[#Statistical Analysis|Statistical Analysis]]
 +
'''2021'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76261 What K factor should be used if two players are in different K factor brackets?] by [[Tamás Kuzmics]], [[CCC]], January 09, 2021 <ref>[https://en.wikipedia.org/wiki/Elo_rating_system#Most_accurate_K-factor Most accurate K-factor - Elo rating system from Wikipedia]</ref> <ref>[https://ratings.fide.com/calculator_rtd.phtml FIDE Chess Rating calculators: Chess Rating change calculator]</ref>
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76370 Cutechess-cli and SPRT] by [[Tom King]], [[CCC]], January 19, 2021 » [[Cutechess-cli]], [[#SPRT|SPRT]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=76382 correspondence chess in the age of NNUE] by [[Larry Kaufman]], [[CCC]], January 21, 2021 » [[NNUE]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=77544 what is the rating of engines when you do not count draws?] by [[Uri Blass]], [[CCC]], June 23, 2021
 +
'''2022'''
 +
* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79280 How do you know you improved ?] by Philippe Chevalier, [[CCC]], February 03, 2022
  
 
=External Links=  
 
=External Links=  
Line 611: Line 652:
 
* [https://en.wikipedia.org/wiki/Margin_of_error Margin of error from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Margin_of_error Margin of error from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Confidence_interval Confidence interval from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Confidence_interval Confidence interval from Wikipedia]
* [https://en.wikipedia.org/wiki/Statistical_hypothesis_testing Statistical hypothesis testing from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Sequential_probability_ratio_test Sequential probability ratio test from Wikipedia]
 
 
* [https://en.wikipedia.org/wiki/Null_hypothesis Null hypothesis from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Null_hypothesis Null hypothesis from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Alternative_hypothesis Alternative hypothesis from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Alternative_hypothesis Alternative hypothesis from Wikipedia]
Line 618: Line 657:
 
* [https://en.wikipedia.org/wiki/Type_I_and_type_II_errors Type I and type II errors from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Type_I_and_type_II_errors Type I and type II errors from Wikipedia]
 
* [https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/type-1-errors Type 1 Errors | Hypothesis testing with one sample] | [https://en.wikipedia.org/wiki/Khan_Academy Khan Academy]
 
* [https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/type-1-errors Type 1 Errors | Hypothesis testing with one sample] | [https://en.wikipedia.org/wiki/Khan_Academy Khan Academy]
 +
==Hypothesis Testing==
 +
* [https://en.wikipedia.org/wiki/Statistical_hypothesis_testing Statistical hypothesis testing from Wikipedia]
 +
* [https://en.wikipedia.org/wiki/Likelihood-ratio_test Likelihood-ratio test from Wikipedia]
 +
* [https://en.wikipedia.org/wiki/Score_test Score test from Wikipedia]
 +
* [https://en.wikipedia.org/wiki/Wald_test Wald test from Wikipedia]
 +
==Sequential Analysis==
 +
* [https://en.wikipedia.org/wiki/Sequential_analysis Sequential analysis from Wikipedia]
 +
* [https://en.wikipedia.org/wiki/Sequential_probability_ratio_test Sequential probability ratio test from Wikipedia]
 +
* [https://github.com/vdbergh/pentanomial GitHub - vdbergh/pentanomial: SPRT for pentanomial frequencies and simulation tools] by [[Michel Van den Bergh]]
 +
* [https://github.com/vdbergh/simul GitHub - vdbergh/simul: A multi-threaded pentanomial simulator] by [[Michel Van den Bergh]]
 
==Data Visualization==
 
==Data Visualization==
 
* [https://blog.ebemunk.com/a-visual-look-at-2-million-chess-games/ A Visual Look at 2 Million Chess Games - Thinking Through the Party] by [[Buğra Fırat]], March 02, 2016 <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65610 A Visual Look at 2 Million Chess Games] by Brahim Hamadicharef, [[CCC]], November 02, 2017</ref>
 
* [https://blog.ebemunk.com/a-visual-look-at-2-million-chess-games/ A Visual Look at 2 Million Chess Games - Thinking Through the Party] by [[Buğra Fırat]], March 02, 2016 <ref>[http://www.talkchess.com/forum/viewtopic.php?t=65610 A Visual Look at 2 Million Chess Games] by Brahim Hamadicharef, [[CCC]], November 02, 2017</ref>
Line 626: Line 675:
 
: [https://en.wikipedia.org/wiki/ARMS_Charity_Concerts ARMS Charity Concert], [https://en.wikipedia.org/wiki/Madison_Square_Garden Madison Square Garden], [http://forums.ledzeppelin.com/index.php?/topic/394-arms-benefit-concerts-in-nyc-dec-1983/ December 08, 1983]
 
: [https://en.wikipedia.org/wiki/ARMS_Charity_Concerts ARMS Charity Concert], [https://en.wikipedia.org/wiki/Madison_Square_Garden Madison Square Garden], [http://forums.ledzeppelin.com/index.php?/topic/394-arms-benefit-concerts-in-nyc-dec-1983/ December 08, 1983]
 
: {{#evu:https://www.youtube.com/watch?v=IdVJw-b3HHE|alignment=left|valignment=top}}
 
: {{#evu:https://www.youtube.com/watch?v=IdVJw-b3HHE|alignment=left|valignment=top}}
 +
* [[:Category:Björk|Björk]] - [https://en.wikipedia.org/wiki/Possibly_Maybe Possibly Maybe] (1996), [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 +
: {{#evu:https://www.youtube.com/watch?v=iyqKy5P1Y0Q|alignment=left|valignment=top}}
  
 
=References=  
 
=References=  
Line 634: Line 685:
 
[[Category:Jan Hammer]]
 
[[Category:Jan Hammer]]
 
[[Category:Simon Phillips]]
 
[[Category:Simon Phillips]]
 +
[[Category:Björk]]

Latest revision as of 10:58, 4 February 2022

Home * Engine Testing * Match Statistics

Match Statistics [1]

Match Statistics,
the statistics of chess tournaments and matches, that is a collection of chess games and the presentation, analysis, and interpretation of game related data, most common game results to determine the relative playing strength of chess playing entities, here with focus on chess engines. To apply match statistics, beside considering statistical population, it is conventional to hypothesize a statistical model describing a set of probability distributions.

Ratios / Operating Figures

Common tools, ratios and figures to illustrate a tournament outcome and provide a base for its interpretation.

Number of games

The total number of games played by an engine in a tournament.

N = wins + draws + losses

Score

The score is a representation of the tournament-outcome from the viewpoint of a certain engine.

score_difference = wins - losses
score = wins + draws/2

Win & Draw Ratio

win_ratio  = score/N
draw_ratio = draws/N

These two ratios depend on the strength difference between the competitors, the average strength level, the color and the drawishness of the opening book-line. Due to the second reason given, these ratios are very much influenced by the timecontrol, what is also confirmed by the published statistics of the testing orgnisations CCRL and CEGT, showing an increase of the draw rate at longer time controls. This correlation was also shown by Kirill Kryukov, who was analyzing statistics of his test-games [2] . The program playing white seems to be more supported by the additional level of strength. So, although one would expect with increasing draw rates the win ratio to approach 50%, in fact it is remaining about equal.

Timecontrol Draw Ratio Win Ratio (white) Source
40/4 30.9% 55.0% CEGT
40/20 35.6% 54.6% CEGT
40/120 41.3% 55.4% CEGT
40/120 (4cpu) 45.2% 55.9% CEGT
Timecontrol Draw Ratio Win Ratio (white) Source
40/4 31.0% 54.1% CCRL
40/40 37.2% 54.6% CCRL

Doubling Time Control As posted in October 2016 [3] , Andreas Strangmüller conducted an experiment with Komodo 9.3, time control doubling matches under Cutechess-cli, playing 3000 games with 1500 opening positions each, without pondering, learning, and tablebases, Intel i5-750 @ 3.5 GHz, 1 Core, 128 MB Hash [4] , see also Kai Laskos' 2013 results with Houdini 3 [5] and Diminishing Returns:

Time Control 2
vs 1
20+0.2
10+0.1
40+0.4
20+0.2
80+0.8
40+0.4
160+1.6
80+0.8
320+3.2
160+1.6
640+6.4
320+3.2
1280+12.8
640+6.4
2560+25.6
1280+12.8
Elo 144 133 112 101 93 73 59 51
Win 44.97% 41.27% 36.67% 32.67% 30.47% 25.17% 21.77% 18.97%
Draw 49.20% 54.00% 57.93% 63.03% 65.33% 70.47% 73.17% 76.63%
Loss 5.83% 4.73% 5.40% 4.30% 4.20% 4.37% 5.07% 4.40%

Elo-Rating & Win-Probability

see Pawn Advantage, Win Percentage, and Elo

Expected win_ratio, win_probability (E)
Elo Rating Difference (Δ) = Elo_Player1 - Elo_Player2
E = 1 / ( 1 + 10-Δ/400)
Δ = 400 log10(E / (1 - E))

Generalization of the Elo-Formula: win_probability of player i in a tournament with n players

Ei = 10Eloi / (10Elo1 + 10Elo2 + ... + 10Elon-1 + 10Elon)

Likelihood of Superiority

See LOS Table

The likelihood of superiority (LOS) denotes how likely it would be for two players of the same strength to reach a certain result - in other fields called a p-value, a measure of statistical significance of a departure from the null hypothesis [6]. Doing this analysis after the tournament one has to differentiate between the case where one knows that a certain engine is either stronger or equally strong (directional or one-tailed test) or the case where one has no information of whether the other engine is stronger or weaker (non-directional or two-tailed test). The latter due to the reduced information results in larger confidence intervals.

Two-tailed Test
Null- and alternative hypothesis:

H0 : Elo_Player1 = Elo_Player2 

H1 : Elo_Player1 ≠ Elo_Player2 
LOS = P(Score > score of 2 programs with equal strength)

The probability of the null hypothesis being true can be calculated given the tournament outcome. In other words, how likely would it be for two players of the same strength to reach a certain result. The LOS would then be the inverse, 1 - the resulting probability.

For this type of analysis the trinomial distribution, a generalization of the binomial distribution, is needed. Whilest the binomial distribution can only calculate the probability to reach a certain outcome with two possible events, the trinominal distribution can account for all three possible events (win, draw, loss).

The following functions gives the probability of a certain game outcome assuming both players were of equal strength:

win_probability = (1 - draw_ratio) / 2
P(wins,draws,losses) = N!/(wins! draws! losses!) win_probabilitywins draw_ratiodrwas win_probabilitylosses

This calculation becomes very inefficient for larger number of games. In this case the standard normal distribution can give a good approximation:

N(N/2, N(1-draw_ratio))

where N(1 - draw_ratio) is the sum of wins and losses:

N(N/2, wins + losses)

To calculate the LOS one needs the cumulative distribution function of the given normal distribution. However, as pointed out by Rémi Coulom, calculation can be done cleverly, and the normal approximation is not really required [7] . As further emphasized by Kai Laskos [8] and Rémi Coulom [9] [10] , draws do not count in LOS calculation and don't make a difference whether the game results were obtained when playing Black or White. It is a good approximation when the two players played the same number of games with each color:

LOS = ϕ((wins - losses)/√(wins + losses))

LOS = ½[1 + erf((wins - losses)/√(2wins + 2losses))]

[11] [12] [13]

One-tailed Test
Null- and alternative hypothesis:

H0 : Elo_Player1 ≤ Elo_Player2 

H1 : Elo_Player1 > Elo_Player2 

Sample Program
A tiny C++11 program to compute Elo difference and LOS from W/L/D counts was given by Álvaro Begué [14] :

#include <cstdio>
#include <cstdlib>
#include <cmath>

int main(int argc, char **argv) {
  if (argc != 4) {
    std::printf("Wrong number of arguments.\n\nUsage:%s <wins> <losses> <draws>\n", argv[0]);
    return 1;
  }
  int wins = std::atoi(argv[1]);
  int losses = std::atoi(argv[2]);
  int draws = std::atoi(argv[3]);

  double games = wins + losses + draws;
  std::printf("Number of games: %g\n", games);
  double winning_fraction = (wins + 0.5*draws) / games;
  std::printf("Winning fraction: %g\n", winning_fraction);
  double elo_difference = -std::log(1.0/winning_fraction-1.0)*400.0/std::log(10.0);
  std::printf("Elo difference: %+g\n", elo_difference);
  double los = .5 + .5 * std::erf((wins-losses)/std::sqrt(2.0*(wins+losses)));
  std::printf("LOS: %g\n", los);
}

Statistical Analysis

The trinomial versus the 5-nomial model

As indicated above a match between two engines is usually modeled as a sequence of independent trials taken from a trinomial distribution with probabilities (win_ratio,draw_ratio,loss_ratio). This model is appropriate for a match with randomly selected opening positions and randomly assigned colors (to maintain fairness). However one may show that under reasonable elo models the trinomial model is not correct in case games are played in pairs with reversed colors (as is commonly the case) and unbalanced opening positions are used.

This was also empirically observed by Kai Laskos [15] . He noted that the statistical predictions of the trinomial model do not match reality very well in the case of paired games. In particular he observed that for some data sets the variance of the match score as predicted by the trinomial model greatly exceeds the variance as calculated by the jackknife estimator. The jackknife estimator is a non-parametric estimator, so it does not depend on any particular statistical model. It appears the mismatch may even occur for balanced opening positions, an effect which can only be explained by the existence of correlations between paired games - something not considered by any elo model.

Over estimating the variance of the match score implies that derived quantities such as the number of games required to establish the superiority of one engine over another with a given level of significance are also over estimated. To obtain agreement between statistical predictions and actual measurements one may adopt the more general 5-nomial model. In the 5-nomial model the outcome of paired games is assumed to follow a 5-nomial distribution with probabilities

(p0, p1/2, p1, p3/2, p2)

These unknown probabilities may be estimated from the outcome frequencies of the paired games and then subsequently be used to compute an estimate for the variance of the match score. Summarizing: in the case of paired games the 5-nomial model handles the following effects correctly which the trinomial model does not:

  • Unbalanced openings
  • Correlations between paired games

For further discussion on the potential use of unbalanced opening positions in engine testing see the posting by Kai Laskos [16] .

SPRT

The sequential probability ratio test (SPRT) is a specific sequential hypothesis test - a statistical analysis where the sample size is not fixed in advance - developed by Abraham Wald [17] . While originally developed for use in quality control studies in the realm of manufacturing, SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion [18]. As mentioned by Arthur Guez in this 2015 Ph.D. thesis Sample-based Search Methods for Bayes-Adaptive Planning [19], Alan Turing assisted by Jack Good used a similar sequential testing technique to help decipher enigma codes at Bletchley Park [20]. SPRT is applied in Stockfish testing to terminate self-testing series early if the result is likely outside a given elo-window [21] . In August 2016, Michel Van den Bergh posted following Python code in CCC to implement the SPRT a la Cutechess-cli or Fishtest: [22] [23]

from __future__ import division

import math

def LL(x):
    return 1/(1+10**(-x/400))

def LLR(W,D,L,elo0,elo1):
    """
This function computes the log likelihood ratio of H0:elo_diff=elo0 versus
H1:elo_diff=elo1 under the logistic elo model

expected_score=1/(1+10**(-elo_diff/400)).

W/D/L are respectively the Win/Draw/Loss count. It is assumed that the outcomes of
the games follow a trinomial distribution with probabilities (w,d,l). Technically
this is not quite an SPRT but a so-called GSPRT as the full set of parameters (w,d,l)
cannot be derived from elo_diff, only w+(1/2)d. For a description and properties of
the GSPRT (which are very similar to those of the SPRT) see

http://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf

This function uses the convenient approximation for log likelihood
ratios derived here:

http://hardy.uhasselt.be/Toga/GSPRT_approximation.pdf

The previous link also discusses how to adapt the code to the 5-nomial model
discussed above.
"""
# avoid division by zero
    if W==0 or D==0 or  L==0:
        return 0.0
    N=W+D+L
    w,d,l=W/N,D/N,L/N
    s=w+d/2
    m2=w+d/4
    var=m2-s**2
    var_s=var/N
    s0=LL(elo0)
    s1=LL(elo1)
    return (s1-s0)*(2*s-s0-s1)/var_s/2.0

def SPRT(W,D,L,elo0,elo1,alpha,beta):
    """
This function sequentially tests the hypothesis H0:elo_diff=elo0 versus
the hypothesis H1:elo_diff=elo1 for elo0<elo1. It should be called after
each game until it returns either 'H0' or 'H1' in which case the test stops
and the returned hypothesis is accepted.

alpha is the probability that H1 is accepted while H0 is true
(a false positive) and beta is the probability that H0 is accepted
while H1 is true (a false negative). W/D/L are the current win/draw/loss
counts, as before.
"""
    LLR_=LLR(W,D,L,elo0,elo1)
    LA=math.log(beta/(1-alpha))
    LB=math.log((1-beta)/alpha)
    if LLR_>LB:
        return 'H1'
    elif LLR_<LA:
        return 'H0'
    else:
        return ''

Beside the above SPRT implementation using pentanomial frequencies and a simulation tool in Python [24] [25], Michel Van den Bergh wrote a much faster multi-threaded C version [26] [27].

Tournaments

See also

Publications

1920 ...

1930 ...

1940 ...

1950 ...

1960 ...

1970 ...

1980 ...

1990 ...

2000 ...

2005 ...

2010 ...

2015 ...

Forum & Blog Postings

1996 ...

2000 ...

2005 ...

2010 ...

Re: Engine Testing - Statistics by John Major, CCC, January 14, 2010

2011

2012

2013

2014

2015 ...

Re: The SPRT without draw model, elo model or whatever.. by Michel Van den Bergh, CCC, August 18, 2016

2016

About expected scores and draw ratios by Jesús Muñoz, CCC, September 17, 2016

2017

Re: MATCH sanity by Salvatore Giannotti, CCC, May 03, 2017
ELO measurements by Peter Österlund, CCC, August 06, 2017 » Playing Strength
Re: "Intrinsic Chess Ratings" by Regan, Haworth -- by Kenneth Regan, CCC, November 20, 2017 » Who is the Master?

2018

2019

2020 ...

Re: Stockfish Reverts 5 Recent Patches by Michel Van den Bergh, CCC, February 02, 2020

2021

2022

External Links

Rating Systems

Tools

Statistics

Hypothesis Testing

Sequential Analysis

Data Visualization

Misc

ARMS Charity Concert, Madison Square Garden, December 08, 1983

References

  1. Image based on Standard deviation diagram by Mwtoews, April 7, 2007 with R code given, CC BY 2.5, Wikimedia Commons, Normal distribution from Wikipedia
  2. Kirr's Chess Engine Comparison KCEC - Draw rate » KCEC
  3. Doubling of time control by Andreas Strangmüller, CCC, October 21, 2016
  4. K93-Doubling-TC.pdf
  5. Scaling at 2x nodes (or doubling time control) by Kai Laskos, CCC, July 23, 2013
  6. Re: Likelihood Of Success (LOS) in the real world by Álvaro Begué, CCC, May 26, 2017
  7. Re: Calculating the LOS (likelihood of superiority) from results by Rémi Coulom, CCC, January 23, 2014
  8. Re: Calculating the LOS (likelihood of superiority) from results by Kai Laskos, CCC, January 22, 2014
  9. Re: Likelihood of superiority by Rémi Coulom, CCC, November 15, 2009
  10. Re: Likelihood of superiority by Rémi Coulom, CCC, November 15, 2009
  11. Error function from Wikipedia
  12. The Open Group Base Specifications Issue 6IEEE Std 1003.1, 2004 Edition: erf
  13. erf(x) and math.h by user76293, Stack Overflow, March 10, 2009
  14. Re: Calculating the LOS (likelihood of superiority) from results by Álvaro Begué, CCC, January 22, 2014
  15. Error margins via resampling (jackknifing) by Kai Laskos, CCC, August 12, 2016
  16. Properties of unbalanced openings using Bayeselo model by Kai Laskos, CCC, August 27, 2016
  17. Abraham Wald (1945). Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics, Vol. 16, No. 2, doi: 10.1214/aoms/1177731118
  18. Sequential probability ratio test from Wikipedia
  19. Arthur Guez (2015). Sample-based Search Methods for Bayes-Adaptive Planning. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, pdf
  20. Jack Good (1979). Studies in the history of probability and statistics. XXXVII AM Turing’s statistical work in World War II. Biometrika, Vol. 66, No. 2
  21. How (not) to use SPRT ? by BB+, OpenChess Forum, October 19, 2013
  22. Re: The SPRT without draw model, elo model or whatever.. by Michel Van den Bergh, CCC, August 18, 2016
  23. GSPRT approximation (pdf) by Michel Van den Bergh
  24. Re: Stockfish Reverts 5 Recent Patches by Michel Van den Bergh, CCC, February 02, 2020
  25. GitHub - vdbergh/pentanomial: SPRT for pentanomial frequencies and simulation tools by Michel Van den Bergh
  26. Re: Stockfish Reverts 5 Recent Patches by Michel Van den Bergh, CCC, February 02, 2020
  27. GitHub - vdbergh/simul: A multi-threaded pentanomial simulator by Michel Van den Bergh
  28. Law of comparative judgment - Wikipedia
  29. Elo's Book: The Rating of Chess Players by Sam Sloan
  30. The Master Game from Wikipedia
  31. Handwritten Notes on the 2004 David R. Hunter Paper 'MM Algorithms for Generalized Bradley-Terry Models' by Rémi Coulom
  32. Derivation of bayeselo formula by Rémi Coulom, CCC, August 07, 2012
  33. Mm algorithm from Wikipedia
  34. Pairwise comparison from Wikipedia
  35. Bayesian inference from Wikipedia
  36. How I did it: Diogo Ferreira on 4th place in Elo chess ratings competition | no free hunch
  37. "Intrinsic Chess Ratings" by Regan, Haworth -- seq by Kai Middleton, CCC, November 19, 2017
  38. Re: EloStat, Bayeselo and Ordo by Rémi Coulom, CCC, June 25, 2012
  39. Re: Understanding and Pushing the Limits of the Elo Rating Algorithm by Daniel Shawul, CCC, October 15, 2019
  40. Ply versus ELO by Greg, HIARCS Forum, May 30, 2020 » Diogo R. Ferreira - Impact of the Search Depth ...
  41. Ordo by Miguel A. Ballicora
  42. Pairwise Analysis of Chess Engine Move Selections by Adam Hair, CCC, April 17, 2011
  43. Questions regarding rating systems of humans and engines by Erik Varend, CCC, December 06, 2014
  44. chess statistics scientific article by Nuno Sousa, CCC, July 06, 2016
  45. Understanding and Pushing the Limits of the Elo Rating Algorithm by Michel Van den Bergh, CCC, October 15, 2019
  46. LOS Table by Joseph Ciarrochi from CEGT
  47. Arpad Elo and the Elo Rating System by Dan Ross, ChessBase News, December 16, 2007
  48. David R. Hunter (2004). MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, Vol. 32, No. 1, 384–406, pdf
  49. Type I and type II errors from Wikipedia
  50. Arpad Elo - Wikipedia
  51. Regan's latest: Depth of Satisficing by Carl Lumma, CCC, October 09, 2015
  52. Resampling (statistics) from Wikipedia
  53. Jackknife resampling from WIkipedia
  54. Delphil 3.3b2 (2334) - Stockfish 030916 (3228), TCEC Season 9 - Rapid, Round 11, September 16, 2016
  55. World Chess Championship 2016 from Wikipedia
  56. Normalized Elo (pdf) by Michel Van den Bergh
  57. Most accurate K-factor - Elo rating system from Wikipedia
  58. FIDE Chess Rating calculators: Chess Rating change calculator
  59. table for detecting significant difference between two engines by Joseph Ciarrochi, CCC, February 03, 2006
  60. an interesting study from Erik Varend by scandien, Hiarcs Forum, August 13, 2017
  61. A Visual Look at 2 Million Chess Games by Brahim Hamadicharef, CCC, November 02, 2017

Up one level