Engine Testing

From Chessprogramming wiki

Jump to: navigation, search

Home * Engine Testing

The ever-optimistic Wile E. Coyote ^[1]

Engine Testing,
the process either to eliminate bugs and to measure performance of a chess engine. New implementations of move generation are tested with Perft, while new features and tuning of search and evaluation are verified by test-positions and by playing matches against other engines.

Bug Hunting

Perft (Perft Results)
Debugging

Analyzing

Tuning

Test-Positions

Running sets of test-positions with number of solutions per fixed time-frame is useful to prove whether things are broken after program changes or to get hints about missing knowledge. But one should be careful to tune engines based on test-position results, since solving (possible tactical) test-positions does not necessarily correlate with practical playing strength in matches against other opponents.

Test-Positions

Matches

Most testing involves running different versions of a program in matches, and comparing results.

Time Controls

Generally speaking, for testing changes that don't alter the search tree itself, but only affect performance (eg. move generation) can be tested with given fixed nodes, fixed time or fixed depth. In all other cases the time management should be left to the engine to simulate real tournament conditions. On the other hand, debugging is much easier under fixed conditions as the games become deterministic.

A side from the type of time control one also has to decide on how much time should be spent per game, ie. what the average quality of the games should be like. While one can test more changes in the a certain time at short time controls, it is also relevant how a certain change scales to different strengths. So for example should one increase the R in Null move pruning to 3 in depths > 7, this change may only be effectively tested on time controls where this new condition is triggered frequently enough, ie. where the average search depth is far greater than seven. It is hard to generalize, but on average changes of the search functions (LMR, nullmove, futility or similar pruning, reductions and extensions ) tend to be more sensitive to the time control than the tuning of evaluation parameters.

Opening

During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines own opening book or one can use opening suites, a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.

Tournament Manager

User interfaces or command line tools for UCI and Chess Engine Communication Protocol compatible engines in engine-engine matches are mentioned under Tournament Manager.

Frameworks

Chess Server

One can also test an engine's performance by comparing it to other programs on the various internet platforms ^[2] . In this case the different hardware and features like different Endgame Tablebases or Opening Books have to be considered.

Statistics

The question whether certain results actually indicates a strength increase or not, can be answered with

Ratings

Test Results

Notable Bugs

Brute Force (Program), En passant bug, ACM 1977 and ACM 1978
Coko - Mate in One?, ACM 1971
Chess 2175X vs. Genesis, Promotion bug, 4th Computer Olympiad 1992
Nimzo's winning white-black bug, WMCCC 1993
Novag Micro Chess - Castling bug, CPWTIPC 1981
Proscha capturing its own king versus Daja, First GI Computer Chess Tournament 1975
System Tal vs. XXXX, Promotion bug, WMCCC 1995
Xinix - Mate in One, DOCCC 2000 ^[3]

Publications

Tony Marsland, Paul Rushton (1973). Mechanisms for Comparing Chess Programs. ACM Annual Conference, pdf
Tim Breitkreutz, Jonathan Schaeffer (1984). Computer vs Computer via Computer. ICCA Journal, Vol. 7, No. 4, reprinted in Computer Chess Reports 1985, Vol. 3, No. 2 » Phoenix, Super Constellation
John Stanback (1990). Supercomputing '90: Computer-Chess Testing and Programming Session. ICCA Journal, Vol. 13, No. 4 » ACM 1990
Larry Kaufman (1993). How Our PC Chess Programs Are Developed. Computer Chess Reports 1992-93, Vol. 3, No. 2, pp. 12
Thomas Mally (1993). Matt in Wieviel? PC Schach 3/93 (German)
Jeff Rollason (2007). Statistical Minefields with Version Testing. AI Factory, Winter 2007 » Match Statistics
Jónheiður Ísleifsdóttir (2007). GTQL: A Query Language for Game Trees. M.Sc. thesis, Reykjavík University, pdf
Jónheiður Ísleifsdóttir, Yngvi Björnsson. (2008). GTQ: A Language and Tool for Game-Tree Analysis. CG 2008, pdf

Forum Posts

1995 ...

Testing Chess Programs by Jan Eric Larsson, rgcc, February 09, 1996
Self-test and others rating stuffs... by Christophe Théron, CCC, January 01, 1998
Proposal: New testing methods for SSDF (1) by Jeroen Noomen, CCC, April 13, 1998

2000 ...

Using 2 machines for matches (Linux) by Jon Dart, CCC, June 24, 2001 » XBoard, Linux
A proposed WAC replacement for testing by Gian-Carlo Pascutto, CCC, September 18, 2001 » Win at Chess
Value of playing different versions of a program against each other by Tom King, CCC, January 06, 2003
testing of evaluation function by Steven Chu, CCC, April 17, 2003 » Evaluation
Testing the reliability of forward pruning by Russell Reagan, CCC, May 15, 2003 » Pruning
To programmers: Hints for testing after a partial rewrite by Federico Corigliano, CCC, December 08, 2003
Is there a way? by Ed Schröder, CCC, December 13, 2004

2005 ...

table for detecting significant difference between two engines by Joseph Ciarrochi, CCC, February 03, 2006
test methodology by Giuseppe Cannella, Winboard Forum, November 13, 2006
Testing and debugging chess engines by Patrice Duhamel, Winboard Forum, December 03, 2006

2007

Programmer bug hunt challenge by Ed Schröder, CCC, May 04, 2007 » Portable Game Notation, En passant
a beat b,b beat c,c beat a question by Uri Blass, CCC, May 16, 2007 » Playing Strength
An objective test process for the rest of us? by Nicolai Czempin, CCC, September 12, 2007
My new testing scheme by Zach Wegner, CCC, November 20, 2007

2008

Test you engine by Fermin Serrano, CCC, March 10, 2008
New testing thread by Robert Hyatt, CCC, August 07, 2008
Comparing two version of the same engine by Fermin Serrano, CCC, October 26, 2008
Debate: testing at fast time controls by Fermin Serrano, CCC, December 15, 2008

2009

Cutechess-cli: A command line tool for engine-engine matches, by Ilari Pihlajisto, CCC, March 16, 2009
Testing procedure by Matt Gingell, CCC, May 27, 2009
Cutechess-cli version 0.1.8 released by Ilari Pihlajisto, CCC, September 29, 2009
A reason for testing at fixed number of nodes by J. Wesley Cleveland, CCC, November 06, 2009
different kinds of testing by Don Dailey, CCC, November 09, 2009
more on fixed nodes by Robert Hyatt, CCC, November 10, 2009

2010 ...

XBoard and epd tournament by Vlad Stamate, CCC, January 31, 2010 » Chess Engine Communication Protocol
Long game vs short game testing by Vlad Stamate, CCC, April 08, 2010
Pairings generation based on a big PGN file by Harun Taner, CCC, July 22, 2010
hiatus good for bug-finding by Stuart Cracraft, CCC, June 27, 2010

2011

testing question by Larry Kaufman, CCC, June 01, 2011
Debugging regression tests by Onno Garms, CCC, June 16, 2011 ^[4]

2012

fast game testing by Jon Dart, CCC, January 08, 2012
Your best bug ? by Ed Schröder, CCC, August 06, 2012
Yet Another Testing Question by Brian Richardson, CCC, September 15, 2012
Another testing question by Larry Kaufman, CCC, September 23, 2012
A word for casual testers by Don Dailey, CCC, December 25, 2012

2013

A poor man's testing environment by Ed Schröder, CCC, January 04, 2013 ^[5] » Match Statistics
engine-engine testing isues by Jens Bæk Nielsen, CCC, January 20, 2013
Beta for Stockfish distributed testing by Gary, CCC, March 05, 2013 » Fishtest
Fishtest Distributed Testing Framework by Marco Costalba, CCC, May 01, 2013 » Fishtest
cutechess-cli 0.6.0 released by Ilari Pihlajisto, CCC, July 12, 2013
fast testing NIT algorithm by Don Dailey, CCC, August 22, 2013
OICS: Computers Only ICS based Chess server for anyone by Joshua Shriver, CCC, August 26, 2013 » OICS

2014

testing procedure by Daniel José Queraltó, CCC, February 23, 2014

2015 ...

Bullet vs regular time control, say 40/4m CCRL/CEGT by Ed Schröder, CCC, August 29, 2015
Static evaluation test posistions by Shawn Chidester, CCC, November 25, 2015

Re: Static evaluation test posistions by Ferdinand Mosca, CCC, November 26, 2015 » Python

2016

Ordo 1.0.9 (new features for testers) by Miguel A. Ballicora, CCC, January 25, 2016
cluster versus single server by Folkert van Heusden, CCC, April 28, 2016
Testing using many computers and architectures by Andrew Grant, CCC, September 14, 2016
command line engine match? by Erin Dame, CCC, November 06, 2016 » CLI
Looking for an epd file for sanity checks... by Fermin Serrano, CCC, November 06, 2016
Testing with different EPD suits for search vs eval changes by Michael Sherwin, CCC, December 23, 2016

2017

sprt tourney manager by Richard Delorme, CCC, January 24, 2017 » Amoeba Tournament Manager, SPRT
how to properly test the changes to the engine ? by Mahmoud Uthman, CCC, February 01, 2017
How to go about chasing a bug like this? by Colin Jenkins, CCC, February 09, 2017 » Debugging
How to find SMP bugs ? by Lucas Braesch, CCC, March 15, 2017 » Debugging, Lazy SMP
Testing for Move Ordering Improvements by Cheney Nattress, CCC, March 25, 2017 » Move Ordering, Search Statistics
Testing endgame strength by Álvaro Begué, CCC, June 21, 2017 » Endgame, RuyDos
Opening testing suites efficiency by Kai Laskos, CCC, June 21, 2017 » Opening, Match Statistics
Testing A against B by playing a pool of others by Andrew Grant, CCC, June 24, 2017 » Match Statistics
Core behaviour by Ed Schroder, CCC, June 28, 2017 » Process, Thread
Engine testing & error margin ? by Mahmoud Uthman, CCC, July 05, 2017
Engines for testing (Linux, fast time control) by Jon Dart, CCC, November 18, 2017 » Linux

2018

Issue with self play testing by Charles Roberson, CCC, May 18, 2018
Basic automated testing by Josh Odom, CCC, September 28, 2018

Re:Basic automated testing by Andrew Grant, CCC, September 30, 2018 » OpenBench

testing consistency by Jon Dart, CCC, December 16, 2018

2019

Any testing framwork similair to Fishtest that can be run locally ? by Mahmoud Uthman, CCC, April 02, 2019
self test by Vivien Clauzon, CCC, September 18, 2019

2020 ...

EPD destruction tests by Chris Whittington, CCC, February 19, 2020
EPD destruction tests, part 2 by Chris Whittington, CCC, February 27, 2020
Looking for automatic Engine Testing Software by Oliver Brausch, CCC, July 19, 2020

2021

Testing strategies for my engines playing strength by Thomas Jahn, CCC, January 04, 2021
Effect of adjudication and TC on testing process by Vivien Clauzon, CCC, February 09, 2021 » Minic

2022

Strategies to unit testing the search by Olexiy Svitashev, CCC, February 03, 2022 » Search
How do you know you improved ? by Philippe Chevalier, CCC, February 03, 2022

External Links

Cute Chess
cutechess · GitHub
GitHub - OpenBench a Distributed SPRT Testing Framework for Chess Engines by Andrew Grant » OpenBench, SPRT
GitHub - ChrisWhittington/Chess-EPDs: Various EPD test suites by Chris Whittington
Regression testing from Wikipedia
SPCC by Stefan Pohl
CHESS - Microsoft Research a tool for finding and reproducing Heisenbugs in concurrent programs.
Engine test stand from Wikipedia
Terje Rypdal Group feat. Palle Mikkelborg, Håkon Graf, Sveinung Hovensjø and Jon Christensen - Per Ulv, 1978, YouTube Video

References

Up one Level

Retrieved from "https://www.chessprogramming.org/index.php?title=Engine_Testing&oldid=26256"