Changes

Jump to: navigation, search

Engine Testing

19,067 bytes added, 18:20, 10 May 2018
Created page with "'''Home * Engine Testing''' FILE:WileECoyote.jpg|border|right|thumb|link=http://sanseverything.wordpress.com/2008/01/16/hope-springs-eternal/|The ever-optimis..."
'''[[Main Page|Home]] * Engine Testing'''

[[FILE:WileECoyote.jpg|border|right|thumb|link=http://sanseverything.wordpress.com/2008/01/16/hope-springs-eternal/|The ever-optimistic [https://en.wikipedia.org/wiki/Wile_E._Coyote_and_Road_Runner Wile E. Coyote] <ref>[http://sanseverything.wordpress.com/2008/01/16/hope-springs-eternal/ Hope Springs Eternal]</ref> ]]

'''Engine Testing''',<br/>
the process either to eliminate [https://en.wikipedia.org/wiki/Software_bug bugs] and to measure [[Playing Strength|performance]] of a chess engine. New implementations of [[Move Generation|move generation]] are tested with [[Perft]], while new features and [[Automated Tuning|tuning]] of [[Search|search]] and [[Evaluation|evaluation]] are verified by [[Test-Positions|test-positions]] and by playing [[Match Statistics|matches]] against other engines.

=Bug Hunting=
* [[Perft]] ([[Perft Results]])
* [[Debugging]]

=Analyzing=
* [[Logging]]
* [[Profiling]]
* [[Search Statistics]]

=Tuning=
* [[Automated Tuning]]

=Test-Positions=
Running sets of test-positions with number of solutions per fixed time-frame is useful to prove whether things are broken after program changes or to get hints about missing knowledge. But one should be careful to tune engines based on test-position results, since solving (possible tactical) test-positions does not necessarily correlate with practical [[Playing Strength|playing strength]] in matches against other opponents.
* [[Test-Positions]]
<span id="Matches"></span>
=Matches=
Most testing involves running different versions of a program in matches, and comparing results.

==Time Controls==
Generally speaking, for testing changes that don't alter the search tree itself, but only affect performance (eg. [[Move Generation|move generation]]) can be tested with given fixed nodes, fixed time or fixed depth. In all other cases the [[Time Management|time management]] should be left to the engine to simulate real tournament conditions. On the other hand, [[Debugging|debugging]] is much easier under fixed conditions as the games become deterministic.

A side from the type of [[Time Management#Time%20Controls|time control]] one also has to decide on how much time should be spent per game, ie. what the average quality of the games should be like. While one can test more changes in the a certain time at short time controls, it is also relevant how a certain change scales to different strengths. So for example should one increase the [[Depth Reduction R|R]] in [[Null Move Pruning|Null move pruning]] to 3 in depths > 7, this change may only be effectively tested on time controls where this new condition is triggered frequently enough, ie. where the average search depth is far greater than seven. It is hard to generalize, but on average changes of the search functions ([[Late Move Reductions|LMR]], [[Null Move|nullmove]], [[Futility Pruning|futility]] or similar [[Pruning|pruning]], [[Reductions|reductions]] and [[Extensions|extensions]] ) tend to be more sensitive to the time control than the tuning of [[Evaluation|evaluation]] parameters.

==Opening==
During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own [[Opening Book|opening book]] or one can use [[Test-Positions#OpeningSuites|opening suites]], a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.

==Interfaces==
Free [[GUI|graphical user interfaces]] or [[CLI|command line tools]] for [[UCI]] and [[Chess Engine Communication Protocol]] compatible engines in engine-engine matches are:
* [[Amoeba#TournamentManager|Amoeba Tournament Manager]]
* [[Arena]] by [[Martin Blume]] <ref>[http://www.playwitharena.com/ Free chess graphical user interface (GUI) Arena for chess engines]</ref>
* [[Cute Chess]] by [[Arto Jonsson]] and [[Ilari Pihlajisto]]
: [[Cutechess-cli]]
* [[LittleBlitzer]] by [[Nathan Thom]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=35249 Fast Games] by [[Manuel Diaz]], [[CCC]], July 02, 2010</ref>

==Frameworks==
* [[Stockfish#TestingFramework|Fishtest]]
<span id="ChessServer"></span>
==Chess Server==
One can also test an engine's performance by comparing it to other programs on the various internet platforms <ref>[https://en.wikipedia.org/wiki/Category:Internet_chess_servers Internet chess servers from Wikipedia]</ref> . In this case the different hardware and features like different [[Endgame Tablebases]] or [[Opening Book|Opening Books]] have to be considered.
* [[Chess Server]]
* [[Tournaments and Matches|Tournaments]]

==Statistics==
The question whether certain results actually indicates a [[Playing Strength|strength]] increase or not, can be answered with
* [[Match Statistics]]
* [[Pawn Advantage, Win Percentage, and Elo]]
* [[LOS Table]]
<span id="Ratings"></span>
==Ratings==
* [[Rating System]]
* [[Engine Rating Lists]]
<span id="TestResults"></span>
=Test Results=
* [[Null Move Pruning Test Results]]
* [[Late Move Reduction Test Results]]
<span id="bugs"></span>
=Notable Bugs=
* [[Brute Force (Program)]], [[En passant#bugs|En passant bug]], [[ACM 1977]] and [[ACM 1978]]
* [[Coko#MateBug|Coko - Mate in One?]], [[ACM 1971]]
* [[4th Computer Olympiad#PromotionBug|Chess 2175X vs. Genesis]], [[Promotions|Promotion]] bug, [[4th Computer Olympiad#Chess|4th Computer Olympiad 1992]]
* [[WMCCC 1993#NimzoBug|Nimzo's winning white-black bug]], [[WMCCC 1993]]
* [[Novag Micro Chess#CastlingBug|Novag Micro Chess - Castling bug]], [[CPWTIPC 1981]]
* [[Proscha#bugs|Proscha]] capturing its own king versus [[Daja]], [[First GI Computer Chess Tournament|First GI Computer Chess Tournament 1975]]
* [[WMCCC 1995#TalXXXX|System Tal vs. XXXX]], [[Promotions|Promotion]] bug, [[WMCCC 1995]]
* [[XiniX|Xinix - Mate in One]], [[DOCCC 2000]] <ref>[http://www.xs4all.nl/~timkr/chess2/honor.htm Defending Humanity's Honor] by [https://en.wikipedia.org/wiki/Tim_Krabb%C3%A9 Tim Krabbé]</ref>

=Publications=
* [[Tony Marsland]], [[Paul Rushton]] ('''1973'''). ''[http://dl.acm.org/citation.cfm?id=805703 Mechanisms for Comparing Chess Programs].'' [[ACM 1973|ACM Annual Conference]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/Marsland-Rushton-ACM73 pdf]
* [[Tim Breitkreutz]], [[Jonathan Schaeffer]] ('''1984'''). ''Computer vs Computer via Computer''. [[ICGA Journal#7_4|ICCA Journal, Vol. 7, No. 4]]
* [[John Stanback]] ('''1990'''). ''Supercomputing '90: Computer-Chess Testing and Programming Session''. [[ICGA Journal#13_4|ICCA Journal, Vol. 13, No. 4]] » [[ACM 1990]]
* [[Larry Kaufman]] ('''1993'''). ''How Our PC Chess Programs Are Developed''. [[Computer Chess Reports]] 1992-93, Vol. 3, No. 2, pp. 12
* [[Thomas Mally]] ('''1993'''). ''Matt in Wieviel?'' [[PC Schach]] 3/93 (German)
* [[Jeff Rollason]] ('''2007'''). ''[http://www.aifactory.co.uk/newsletter/2007_04_stat_minefields.htm Statistical Minefields with Version Testing]''. [[AI Factory]], Winter 2007 » [[Match Statistics]]
* [[Jónheiður Ísleifsdóttir]] ('''2007'''). ''GTQL: A Query Language for Game Trees''. M.Sc. thesis, [https://en.wikipedia.org/wiki/Reykjav%C3%ADk_University Reykjavík University], [http://www.ru.is/lisalib/getfile.aspx?itemid=9655 pdf]
* [[Jónheiður Ísleifsdóttir]], [[Yngvi Björnsson]]. ('''2008'''). ''[http://link.springer.com/chapter/10.1007/978-3-540-87608-3_20 GTQ: A Language and Tool for Game-Tree Analysis]''. [[CG 2008]], [http://www.ru.is/faculty/yngvi/pdf/IsleifsdottirB08.pdf pdf]

=Forum Posts=
==1995 ...==
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/2aaa054105f1445e Testing Chess Programs] by [[Jan Eric Larsson]], [[Computer Chess Forums|rgcc]], February 09, 1996
* [https://www.stmintz.com/ccc/index.php?id=13569 Self-test and others rating stuffs...] by [[Christophe Théron]], [[CCC]], January 01, 1998
* [https://www.stmintz.com/ccc/index.php?id=16851 Proposal: New testing methods for SSDF (1)] by [[Jeroen Noomen]], [[CCC]], April 13, 1998
==2000 ...==
* [https://www.stmintz.com/ccc/index.php?id=176716 Using 2 machines for matches (Linux)] by [[Jon Dart]], [[CCC]], June 24, 2001 » [[XBoard]], [[Linux]]
* [https://www.stmintz.com/ccc/index.php?id=189308 A proposed WAC replacement for testing] by [[Gian-Carlo Pascutto]], [[CCC]], September 18, 2001 » [[Win at Chess]]
* [https://www.stmintz.com/ccc/index.php?id=275347 Value of playing different versions of a program against each other] by [[Tom King]], [[CCC]], January 06, 2003
* [https://www.stmintz.com/ccc/index.php?id=293815 testing of evaluation function] by Steven Chu, [[CCC]], April 17, 2003 » [[Evaluation]]
* [https://www.stmintz.com/ccc/index.php?id=296689 Testing the reliability of forward pruning] by [[Russell Reagan]], [[CCC]], May 15, 2003 » [[Pruning]]
* [https://www.stmintz.com/ccc/index.php?id=334370 To programmers: Hints for testing after a partial rewrite] by [[Federico Andrés Corigliano|Federico Corigliano]], [[CCC]], December 08, 2003
* [https://www.stmintz.com/ccc/index.php?id=400589 Is there a way?] by [[Ed Schroder|Ed Schröder]], [[CCC]], December 13, 2004
==2005 ...==
* [https://www.stmintz.com/ccc/index.php?id=484357 table for detecting significant difference between two engines] by Joseph Ciarrochi, [[CCC]], February 03, 2006
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5866 test methodology] by [[Giuseppe Cannella]], [[Computer Chess Forums|Winboard Forum]], November 13, 2006
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5955 Testing and debugging chess engines] by [[Patrice Duhamel]], [[Computer Chess Forums|Winboard Forum]], December 03, 2006
* [http://www.talkchess.com/forum/viewtopic.php?t=13557 Programmer bug hunt challenge] by [[Ed Schroder|Ed Schröder]], [[CCC]], May 04, 2007 » [[Portable Game Notation]], [[En passant]]
* [http://www.talkchess.com/forum/viewtopic.php?t=13800 a beat b,b beat c,c beat a question] by [[Uri Blass]], [[CCC]], May 16, 2007 » [[Playing Strength]]
* [http://www.talkchess.com/forum/viewtopic.php?t=16412 An objective test process for the rest of us?] by [[Nicolai Czempin]], [[CCC]], September 12, 2007
* [http://www.talkchess.com/forum/viewtopic.php?t=17947 My new testing scheme] by [[Zach Wegner]], [[CCC]], November 20, 2007
* [http://www.talkchess.com/forum/viewtopic.php?t=22832 New testing thread] by [[Robert Hyatt]], [[CCC]], August 07, 2008
* [http://www.talkchess.com/forum/viewtopic.php?t=27024 Cutechess-cli: A command line tool for engine-engine matches], by [[Ilari Pihlajisto]], [[CCC]], March 16, 2009
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=293506&t=27024 Cutechess-cli version 0.1.8 released] by [[Ilari Pihlajisto]], [[CCC]], September 29, 2009
* [http://www.talkchess.com/forum/viewtopic.php?t=30513 A reason for testing at fixed number of nodes] by [[J. Wesley Cleveland]], [[CCC]], November 06, 2009
* [http://www.talkchess.com/forum/viewtopic.php?t=30550 different kinds of testing] by [[Don Dailey]], [[CCC]], November 09, 2009
* [http://www.talkchess.com/forum/viewtopic.php?t=30565 more on fixed nodes] by [[Robert Hyatt]], [[CCC]], November 10, 2009
==2010 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=32254 XBoard and epd tournament] by [[Vlad Stamate]], [[CCC]], January 31, 2010 » [[Chess Engine Communication Protocol]]
* [http://www.talkchess.com/forum/viewtopic.php?t=33685 Long game vs short game testing] by [[Vlad Stamate]], [[CCC]], April 08, 2010
* [http://www.talkchess.com/forum/viewtopic.php?t=35537 Pairings generation based on a big PGN file] by [[Harun Taner]], [[CCC]], July 22, 2010
* [http://www.talkchess.com/forum/viewtopic.php?p=358600 hiatus good for bug-finding] by [[Stuart Cracraft]], [[CCC]], June 27, 2010
'''2011'''
* [http://www.talkchess.com/forum/viewtopic.php?t=39255 testing question] by [[Larry Kaufman]], [[CCC]], June 01, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=39390 Debugging regression tests] by [[Onno Garms]], [[CCC]], June 16, 2011 <ref>[https://en.wikipedia.org/wiki/Regression_testing Regression testing from Wikipedia]</ref>
'''2012'''
* [http://www.talkchess.com/forum/viewtopic.php?t=41876 fast game testing] by [[Jon Dart]], [[CCC]], January 08, 2012
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=477389&t=44706&sid=8b10031146b44f86ac4c4a129debf451 Your best bug ?] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 06, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=45158 Yet Another Testing Question] by [[Brian Richardson]], [[CCC]], September 15, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=45287 Another testing question] by [[Larry Kaufman]], [[CCC]], September 23, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=46572 A word for casual testers] by [[Don Dailey]], [[CCC]], December 25, 2012
'''2013'''
* [http://www.talkchess.com/forum/viewtopic.php?t=46759 A poor man's testing environment] by [[Ed Schroder|Ed Schröder]], [[CCC]], January 04, 2013 <ref>[http://www.top-5000.nl/tuning.htm Testing a chess engine from the ground up] from [http://www.top-5000.nl/ Home of the Dutch Rebel] by [[Ed Schroder|Ed Schröder]]</ref> » [[Match Statistics]]
* [http://www.talkchess.com/forum/viewtopic.php?t=46948 engine-engine testing isues] by [[Jens Bæk Nielsen]], [[CCC]], January 20, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=47407 Beta for Stockfish distributed testing] by [[Gary Linscott|Gary]], [[CCC]], March 05, 2013 » [[Stockfish#TestingFramework|Fishtest]]
* [http://www.talkchess.com/forum/viewtopic.php?t=47885 Fishtest Distributed Testing Framework] by [[Marco Costalba]], [[CCC]], May 01, 2013 » [[Stockfish#TestingFramework|Fishtest]]
* [http://www.talkchess.com/forum/viewtopic.php?t=48626 cutechess-cli 0.6.0 released] by [[Ilari Pihlajisto]], [[CCC]], July 12, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=49054 fast testing NIT algorithm] by [[Don Dailey]], [[CCC]], August 22, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=49103 OICS: Computers Only ICS based Chess server for anyone] by [[Joshua Shriver]], [[CCC]], August 26, 2013 » [[OICS]]
'''2014'''
* [http://www.talkchess.com/forum/viewtopic.php?t=51383 testing procedure] by [[Daniel José Queraltó]], [[CCC]], February 23, 2014
==2015 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=57437 Bullet vs regular time control, say 40/4m CCRL/CEGT] by [[Ed Schroder|Ed Schröder]], [[CCC]], August 29, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=58359 Static evaluation test posistions] by [[Shawn Chidester]], [[CCC]], November 25, 2015
: [http://www.talkchess.com/forum/viewtopic.php?t=58359&start=2 Re: Static evaluation test posistions] by [[Ferdinand Mosca]], [[CCC]], November 26, 2015 » [[Python]]
'''2016'''
* [http://www.talkchess.com/forum/viewtopic.php?t=59038 Ordo 1.0.9 (new features for testers)] by [[Miguel A. Ballicora]], [[CCC]], January 25, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=59984 cluster versus single server] by [[Folkert van Heusden]], [[CCC]], April 28, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=61422 Testing using many computers and architectures] by [[Andrew Grant]], [[CCC]], September 14, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=61988 command line engine match?] by [[Erin Dame]], [[CCC]], November 06, 2016 » [[CLI]]
* [http://www.talkchess.com/forum/viewtopic.php?t=62576 Testing with different EPD suits for search vs eval changes] by [[Michael Sherwin]], [[CCC]], December 23, 2016
'''2017'''
* [http://www.talkchess.com/forum/viewtopic.php?t=62922 sprt tourney manager] by [[Richard Delorme]], [[CCC]], January 24, 2017 » [[Amoeba#TournamentManager|Amoeba Tournament Manager]], [[Match Statistics#SPRT|SPRT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=63001 how to properly test the changes to the engine ?] by [[Mahmoud Uthman]], [[CCC]], February 01, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=63119 How to go about chasing a bug like this?] by [[Colin Jenkins]], [[CCC]], February 09, 2017 » [[Debugging]]
* [http://www.talkchess.com/forum/viewtopic.php?t=63454 How to find SMP bugs ?] by Lucas Braesch, [[CCC]], March 15, 2017 » [[Debugging]], [[Lazy SMP]]
* [http://www.talkchess.com/forum/viewtopic.php?t=63555 Testing for Move Ordering Improvements] by [[Cheney Nattress]], [[CCC]], March 25, 2017 » [[Move Ordering]], [[Search Statistics]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64356 Testing endgame strength] by [[Álvaro Begué]], [[CCC]], June 21, 2017 » [[Endgame]], [[RuyDos]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64358 Opening testing suites efficiency] by [[Kai Laskos]], [[CCC]], June 21, 2017 » [[Opening]], [[Match Statistics]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64394 Testing A against B by playing a pool of others] by [[Andrew Grant]], [[CCC]], June 24, 2017 » [[Match Statistics]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64441 Core behaviour] by [[Ed Schroder]], [[CCC]], June 28, 2017 » [[Process]], [[Thread]]
* [http://www.talkchess.com/forum/viewtopic.php?t=64519 Engine testing & error margin ?] by [[Mahmoud Uthman]], [[CCC]], July 05, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=65766 Engines for testing (Linux, fast time control)] by [[Jon Dart]], [[CCC]], November 18, 2017 » [[Linux]]

=External Links=
* [http://ajonsson.kapsi.fi/cutechess.html Cute Chess]
* [https://github.com/cutechess/cutechess cutechess · GitHub]
* [http://www.top-5000.nl/tuning.htm Testing a chess engine from the ground up] from [http://www.top-5000.nl/ Home of the Dutch Rebel] by [[Ed Schroder|Ed Schröder]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=46759 A poor man's testing environment] by [[Ed Schroder|Ed Schröder]], [[CCC]], January 04, 2013</ref> » [[Match Statistics]]
* [https://en.wikipedia.org/wiki/Regression_testing Regression testing from Wikipedia]
* [http://www.sp-cc.de/index.htm SPCC] by [[Stefan Pohl]]
* [http://research.microsoft.com/en-us/projects/chess/ CHESS - Microsoft Research] a tool for finding and reproducing [https://en.wikipedia.org/wiki/Unusual_software_bug Heisenbugs] in concurrent programs.
* [https://en.wikipedia.org/wiki/Engine_test_stand Engine test stand from Wikipedia]
* [[Videos#TerjeRypdal|Terje Rypdal Group]] feat. [[Videos#PalleMikkelborg|Palle Mikkelborg]], [https://de.wikipedia.org/wiki/Haakon_Graf Håkon Graf], [http://no.wikipedia.org/wiki/Sveinung_Hovensj%C3%B8 Sveinung Hovensjø] and [[Videos#JonChristensen|Jon Christensen]] - [http://no.wikipedia.org/wiki/Per_Ulv Per Ulv], 1978, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=B5HwFDVWzFs|alignment=left|valignment=top}}

=References=
<references />

'''[[Main Page|Up one Level]]'''

Navigation menu