Difference between revisions of "Engine Testing"

From Chessprogramming wiki
Jump to: navigation, search
(Created page with "'''Home * Engine Testing''' FILE:WileECoyote.jpg|border|right|thumb|link=http://sanseverything.wordpress.com/2008/01/16/hope-springs-eternal/|The ever-optimis...")
 
 
(23 intermediate revisions by the same user not shown)
Line 17: Line 17:
 
=Tuning=  
 
=Tuning=  
 
* [[Automated Tuning]]
 
* [[Automated Tuning]]
 +
* [[Engine Similarity]]
  
 
=Test-Positions=  
 
=Test-Positions=  
Line 33: Line 34:
 
During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own [[Opening Book|opening book]] or one can use [[Test-Positions#OpeningSuites|opening suites]], a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.
 
During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own [[Opening Book|opening book]] or one can use [[Test-Positions#OpeningSuites|opening suites]], a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.
  
==Interfaces==  
+
==Tournament Manager==  
Free [[GUI|graphical user interfaces]] or [[CLI|command line tools]] for [[UCI]] and [[Chess Engine Communication Protocol]] compatible engines in engine-engine matches are:
+
[[User Interface|User interfaces]] or [[CLI|command line tools]] for [[UCI]] and [[Chess Engine Communication Protocol]] compatible engines in engine-engine matches are mentioned under [[Tournament Manager]].
* [[Amoeba#TournamentManager|Amoeba Tournament Manager]]
 
* [[Arena]] by [[Martin Blume]] <ref>[http://www.playwitharena.com/ Free chess graphical user interface (GUI) Arena for chess engines]</ref>
 
* [[Cute Chess]] by [[Arto Jonsson]] and [[Ilari Pihlajisto]]
 
: [[Cutechess-cli]]
 
* [[LittleBlitzer]] by [[Nathan Thom]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=35249 Fast Games] by [[Manuel Diaz]], [[CCC]], July 02, 2010</ref>
 
  
 
==Frameworks==
 
==Frameworks==
 
* [[Stockfish#TestingFramework|Fishtest]]
 
* [[Stockfish#TestingFramework|Fishtest]]
 +
* [[OpenBench]]
 
<span id="ChessServer"></span>
 
<span id="ChessServer"></span>
 
==Chess Server==  
 
==Chess Server==  
Line 75: Line 72:
 
=Publications=  
 
=Publications=  
 
* [[Tony Marsland]], [[Paul Rushton]] ('''1973'''). ''[http://dl.acm.org/citation.cfm?id=805703 Mechanisms for Comparing Chess Programs].'' [[ACM 1973|ACM Annual Conference]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/Marsland-Rushton-ACM73 pdf]
 
* [[Tony Marsland]], [[Paul Rushton]] ('''1973'''). ''[http://dl.acm.org/citation.cfm?id=805703 Mechanisms for Comparing Chess Programs].'' [[ACM 1973|ACM Annual Conference]], [http://webdocs.cs.ualberta.ca/~tony/OldPapers/Marsland-Rushton-ACM73 pdf]
* [[Tim Breitkreutz]], [[Jonathan Schaeffer]] ('''1984'''). ''Computer vs Computer via Computer''. [[ICGA Journal#7_4|ICCA Journal, Vol. 7, No. 4]]
+
* [[Tim Breitkreutz]], [[Jonathan Schaeffer]] ('''1984'''). ''Computer vs Computer via Computer''. [[ICGA Journal#7_4|ICCA Journal, Vol. 7, No. 4]], reprinted in [[Computer Chess Reports|Computer Chess Reports 1985]], Vol. 3, No. 2 » [[Phoenix]], [[Super Constellation]]
 
* [[John Stanback]] ('''1990'''). ''Supercomputing '90: Computer-Chess Testing and Programming Session''. [[ICGA Journal#13_4|ICCA Journal, Vol. 13, No. 4]] » [[ACM 1990]]
 
* [[John Stanback]] ('''1990'''). ''Supercomputing '90: Computer-Chess Testing and Programming Session''. [[ICGA Journal#13_4|ICCA Journal, Vol. 13, No. 4]] » [[ACM 1990]]
 
* [[Larry Kaufman]] ('''1993'''). ''How Our PC Chess Programs Are Developed''. [[Computer Chess Reports]] 1992-93, Vol. 3, No. 2, pp. 12
 
* [[Larry Kaufman]] ('''1993'''). ''How Our PC Chess Programs Are Developed''. [[Computer Chess Reports]] 1992-93, Vol. 3, No. 2, pp. 12
Line 100: Line 97:
 
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5866 test methodology] by [[Giuseppe Cannella]], [[Computer Chess Forums|Winboard Forum]], November 13, 2006
 
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5866 test methodology] by [[Giuseppe Cannella]], [[Computer Chess Forums|Winboard Forum]], November 13, 2006
 
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5955 Testing and debugging chess engines] by [[Patrice Duhamel]], [[Computer Chess Forums|Winboard Forum]], December 03, 2006
 
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=5955 Testing and debugging chess engines] by [[Patrice Duhamel]], [[Computer Chess Forums|Winboard Forum]], December 03, 2006
 +
'''2007'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13557 Programmer bug hunt challenge] by [[Ed Schroder|Ed Schröder]], [[CCC]], May 04, 2007 » [[Portable Game Notation]], [[En passant]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13557 Programmer bug hunt challenge] by [[Ed Schroder|Ed Schröder]], [[CCC]], May 04, 2007 » [[Portable Game Notation]], [[En passant]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13800 a beat b,b beat c,c beat a question] by [[Uri Blass]], [[CCC]], May 16, 2007 » [[Playing Strength]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=13800 a beat b,b beat c,c beat a question] by [[Uri Blass]], [[CCC]], May 16, 2007 » [[Playing Strength]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=16412 An objective test process for the rest of us?] by [[Nicolai Czempin]], [[CCC]], September 12, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=16412 An objective test process for the rest of us?] by [[Nicolai Czempin]], [[CCC]], September 12, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=17947 My new testing scheme] by [[Zach Wegner]], [[CCC]], November 20, 2007
 
* [http://www.talkchess.com/forum/viewtopic.php?t=17947 My new testing scheme] by [[Zach Wegner]], [[CCC]], November 20, 2007
 +
'''2008'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=20082 Test you engine] by [[Fermin Serrano]], [[CCC]], March 10, 2008
 
* [http://www.talkchess.com/forum/viewtopic.php?t=22832 New testing thread] by [[Robert Hyatt]], [[CCC]], August 07, 2008
 
* [http://www.talkchess.com/forum/viewtopic.php?t=22832 New testing thread] by [[Robert Hyatt]], [[CCC]], August 07, 2008
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=24590 Comparing two version of the same engine] by [[Fermin Serrano]], [[CCC]], October 26, 2008
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=25461 Debate: testing at fast time controls] by [[Fermin Serrano]], [[CCC]], December 15, 2008
 +
'''2009'''
 
* [http://www.talkchess.com/forum/viewtopic.php?t=27024 Cutechess-cli: A command line tool for engine-engine matches], by [[Ilari Pihlajisto]], [[CCC]], March 16, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=27024 Cutechess-cli: A command line tool for engine-engine matches], by [[Ilari Pihlajisto]], [[CCC]], March 16, 2009
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=28130 Testing procedure] by [[Matt Gingell]], [[CCC]], May 27, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=293506&t=27024 Cutechess-cli version 0.1.8 released] by [[Ilari Pihlajisto]], [[CCC]], September 29, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=293506&t=27024 Cutechess-cli version 0.1.8 released] by [[Ilari Pihlajisto]], [[CCC]], September 29, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=30513  A reason for testing at fixed number of nodes] by [[J. Wesley Cleveland]], [[CCC]], November 06, 2009
 
* [http://www.talkchess.com/forum/viewtopic.php?t=30513  A reason for testing at fixed number of nodes] by [[J. Wesley Cleveland]], [[CCC]], November 06, 2009
Line 143: Line 147:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61422 Testing using many computers and architectures] by [[Andrew Grant]], [[CCC]], September 14, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61422 Testing using many computers and architectures] by [[Andrew Grant]], [[CCC]], September 14, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61988 command line engine match?] by [[Erin Dame]], [[CCC]], November 06, 2016 » [[CLI]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=61988 command line engine match?] by [[Erin Dame]], [[CCC]], November 06, 2016 » [[CLI]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=61991 Looking for an epd file for sanity checks...] by [[Fermin Serrano]], [[CCC]], November 06, 2016
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62576 Testing with different EPD suits for search vs eval changes] by [[Michael Sherwin]], [[CCC]], December 23, 2016  
 
* [http://www.talkchess.com/forum/viewtopic.php?t=62576 Testing with different EPD suits for search vs eval changes] by [[Michael Sherwin]], [[CCC]], December 23, 2016  
 
'''2017'''
 
'''2017'''
Line 156: Line 161:
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64519 Engine testing & error margin ?] by [[Mahmoud Uthman]], [[CCC]], July 05, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=64519 Engine testing & error margin ?] by [[Mahmoud Uthman]], [[CCC]], July 05, 2017
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65766 Engines for testing (Linux, fast time control)] by [[Jon Dart]], [[CCC]], November 18, 2017 » [[Linux]]
 
* [http://www.talkchess.com/forum/viewtopic.php?t=65766 Engines for testing (Linux, fast time control)] by [[Jon Dart]], [[CCC]], November 18, 2017 » [[Linux]]
 +
'''2018'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=67485 Issue with self play testing] by [[Charles Roberson]], [[CCC]], May 18, 2018
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=68531 Basic automated testing] by Josh Odom, [[CCC]], September 28, 2018
 +
: [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=68531&start=6 Re:Basic automated testing] by [[Andrew Grant]], [[CCC]], September 30, 2018 » [[OpenBench]]
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=69284 testing consistency] by [[Jon Dart]], [[CCC]], December 16, 2018
 +
'''2019'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=70383 Any testing framwork similair to Fishtest that can be run locally ?] by  [[Mahmoud Uthman]], [[CCC]], April 02, 2019
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=71848 self test] by [[Vivien Clauzon]], [[CCC]], September 18, 2019
 +
==2020 ...==
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73129 EPD destruction tests] by [[Chris Whittington]], [[CCC]], February 19, 2020
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73208 EPD destruction tests, part 2] by [[Chris Whittington]], [[CCC]], February 27, 2020
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=74503 Looking for automatic Engine Testing Software] by [[Oliver Brausch]], [[CCC]], July 19, 2020
 +
'''2021'''
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76225 Testing strategies for my engines playing strength] by Thomas Jahn, [[CCC]], January 04, 2021
 +
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76536 Effect of adjudication and TC on testing process] by [[Vivien Clauzon]], [[CCC]], February 09, 2021 » [[Minic]]
 +
'''2022'''
 +
* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79276 Strategies to unit testing the search] by Olexiy Svitashev, [[CCC]], February 03, 2022 » [[Search]]
 +
* [https://www.talkchess.com/forum3/viewtopic.php?f=7&t=79280 How do you know you improved ?] by Philippe Chevalier, [[CCC]], February 03, 2022
  
 
=External Links=
 
=External Links=
 
* [http://ajonsson.kapsi.fi/cutechess.html Cute Chess]
 
* [http://ajonsson.kapsi.fi/cutechess.html Cute Chess]
 
* [https://github.com/cutechess/cutechess cutechess · GitHub]
 
* [https://github.com/cutechess/cutechess cutechess · GitHub]
* [http://www.top-5000.nl/tuning.htm Testing a chess engine from the ground up] from [http://www.top-5000.nl/ Home of the Dutch Rebel] by [[Ed Schroder|Ed Schröder]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=46759 A poor man's testing environment] by [[Ed Schroder|Ed Schröder]], [[CCC]], January 04, 2013</ref>  » [[Match Statistics]]
+
* [https://github.com/AndyGrant/OpenBench GitHub - OpenBench a Distributed SPRT Testing Framework for Chess Engines] by [[Andrew Grant]] » [[OpenBench]], [[Match Statistics#SPRT|SPRT]]  
 +
* [https://github.com/ChrisWhittington/Chess-EPDs GitHub - ChrisWhittington/Chess-EPDs: Various EPD test suites] by [[Chris Whittington]]
 
* [https://en.wikipedia.org/wiki/Regression_testing Regression testing from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Regression_testing Regression testing from Wikipedia]
 
* [http://www.sp-cc.de/index.htm SPCC] by [[Stefan Pohl]]
 
* [http://www.sp-cc.de/index.htm SPCC] by [[Stefan Pohl]]
 
* [http://research.microsoft.com/en-us/projects/chess/ CHESS - Microsoft Research] a tool for finding and reproducing [https://en.wikipedia.org/wiki/Unusual_software_bug Heisenbugs] in concurrent programs.
 
* [http://research.microsoft.com/en-us/projects/chess/ CHESS - Microsoft Research] a tool for finding and reproducing [https://en.wikipedia.org/wiki/Unusual_software_bug Heisenbugs] in concurrent programs.
 
* [https://en.wikipedia.org/wiki/Engine_test_stand Engine test stand from Wikipedia]
 
* [https://en.wikipedia.org/wiki/Engine_test_stand Engine test stand from Wikipedia]
* [[Videos#TerjeRypdal|Terje Rypdal Group]] feat. [[Videos#PalleMikkelborg|Palle Mikkelborg]], [https://de.wikipedia.org/wiki/Haakon_Graf Håkon Graf], [http://no.wikipedia.org/wiki/Sveinung_Hovensj%C3%B8 Sveinung Hovensjø] and [[Videos#JonChristensen|Jon Christensen]] -  [http://no.wikipedia.org/wiki/Per_Ulv Per Ulv], 1978, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
+
* [[:Category:Terje Rypdal|Terje Rypdal Group]] feat. [[:Category:Palle Mikkelborg|Palle Mikkelborg]], [https://de.wikipedia.org/wiki/Haakon_Graf Håkon Graf], [http://no.wikipedia.org/wiki/Sveinung_Hovensj%C3%B8 Sveinung Hovensjø] and [[:Category:Jon Christensen|Jon Christensen]] -  [http://no.wikipedia.org/wiki/Per_Ulv Per Ulv], 1978, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
 
: {{#evu:https://www.youtube.com/watch?v=B5HwFDVWzFs|alignment=left|valignment=top}}
 
: {{#evu:https://www.youtube.com/watch?v=B5HwFDVWzFs|alignment=left|valignment=top}}
  
Line 172: Line 196:
  
 
'''[[Main Page|Up one Level]]'''
 
'''[[Main Page|Up one Level]]'''
 +
[[Category:Jon Christensen]]
 +
[[Category:Palle Mikkelborg]]
 +
[[Category:Terje Rypdal]]

Latest revision as of 11:16, 4 February 2022

Home * Engine Testing

The ever-optimistic Wile E. Coyote [1]

Engine Testing,
the process either to eliminate bugs and to measure performance of a chess engine. New implementations of move generation are tested with Perft, while new features and tuning of search and evaluation are verified by test-positions and by playing matches against other engines.

Bug Hunting

Analyzing

Tuning

Test-Positions

Running sets of test-positions with number of solutions per fixed time-frame is useful to prove whether things are broken after program changes or to get hints about missing knowledge. But one should be careful to tune engines based on test-position results, since solving (possible tactical) test-positions does not necessarily correlate with practical playing strength in matches against other opponents.

Matches

Most testing involves running different versions of a program in matches, and comparing results.

Time Controls

Generally speaking, for testing changes that don't alter the search tree itself, but only affect performance (eg. move generation) can be tested with given fixed nodes, fixed time or fixed depth. In all other cases the time management should be left to the engine to simulate real tournament conditions. On the other hand, debugging is much easier under fixed conditions as the games become deterministic.

A side from the type of time control one also has to decide on how much time should be spent per game, ie. what the average quality of the games should be like. While one can test more changes in the a certain time at short time controls, it is also relevant how a certain change scales to different strengths. So for example should one increase the R in Null move pruning to 3 in depths > 7, this change may only be effectively tested on time controls where this new condition is triggered frequently enough, ie. where the average search depth is far greater than seven. It is hard to generalize, but on average changes of the search functions (LMR, nullmove, futility or similar pruning, reductions and extensions ) tend to be more sensitive to the time control than the tuning of evaluation parameters.

Opening

During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own opening book or one can use opening suites, a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.

Tournament Manager

User interfaces or command line tools for UCI and Chess Engine Communication Protocol compatible engines in engine-engine matches are mentioned under Tournament Manager.

Frameworks

Chess Server

One can also test an engine's performance by comparing it to other programs on the various internet platforms [2] . In this case the different hardware and features like different Endgame Tablebases or Opening Books have to be considered.

Statistics

The question whether certain results actually indicates a strength increase or not, can be answered with

Ratings

Test Results

Notable Bugs

Publications

Forum Posts

1995 ...

2000 ...

2005 ...

2007

2008

2009

2010 ...

2011

2012

2013

2014

2015 ...

Re: Static evaluation test posistions by Ferdinand Mosca, CCC, November 26, 2015 » Python

2016

2017

2018

Re:Basic automated testing by Andrew Grant, CCC, September 30, 2018 » OpenBench

2019

2020 ...

2021

2022

External Links

References

Up one Level