Engine Testing
Home * Engine Testing
Engine Testing,
the process either to eliminatebugs and to measureperformance of a chess engine. New implementations ofmove generation are tested withPerft, while new features andtuning ofsearch andevaluation are verified viaSPRT testing, (historically)test-positions and by playingmatches against other engines.
Contents
Bug Hunting
Analyzing
Tuning
SPRT
The modern, preferred method to test strength modifications.
Test-Positions
Running sets of test-positions with number of solutions per fixed time-frame is useful to prove whether things are broken after program changes or to get hints about missing knowledge. But one should be careful to tune engines based on test-position results, since solving (possible tactical) test-positions does not necessarily correlate with practicalplaying strength in matches against other opponents.
Matches
Most testing involves running different versions of a program in matches, and comparing results.
Time Controls
Generally speaking, for testing changes that don't alter the search tree itself, but only affect performance (eg.move generation) can be tested with given fixed nodes, fixed time or fixed depth. In all other cases thetime management should be left to the engine to simulate real tournament conditions. On the other hand,debugging is much easier under fixed conditions as the games become deterministic.
A side from the type oftime control one also has to decide on how much time should be spent per game, ie. what the average quality of the games should be like. While one can test more changes in the a certain time at short time controls, it is also relevant how a certain change scales to different strengths. So for example should one increase theR inNull move pruning to 3 in depths > 7, this change may only be effectively tested on time controls where this new condition is triggered frequently enough, ie. where the average search depth is far greater than seven. It is hard to generalize, but on average changes of the search functions (LMR,nullmove,futility or similarpruning,reductions andextensions ) tend to be more sensitive to the time control than the tuning ofevaluation parameters.
Opening
During testing the engines should ideally play the same style of openings they would play in a normal tournament, so not to optimize them for different types of positions. One option is to use the engines ownopening book or one can useopening suites, a set of quiet test positions. In the latter case the same opening suit would be used for each tournament conducted and furthermore each position is played a second time with colors reversed. With these measures one can try to minimize the disparity between tests caused by different openings.
Tournament Manager
User interfaces orcommand line tools forUCI andChess Engine Communication Protocol compatible engines in engine-engine matches are mentioned underTournament Manager.
Frameworks
Chess Server
One can also test an engine's performance by comparing it to other programs on the various internet platforms[2] . In this case the different hardware and features like differentEndgame Tablebases orOpening Books have to be considered.
Statistics
The question whether certain results actually indicates astrength increase or not, can be answered with
Ratings
Test Results
Notable Bugs
- Brute Force (Program),En passant bug,ACM 1977 andACM 1978
- Coko - Mate in One?,ACM 1971
- Chess 2175X vs. Genesis,Promotion bug,4th Computer Olympiad 1992
- Nimzo's winning white-black bug,WMCCC 1993
- Novag Micro Chess - Castling bug,CPWTIPC 1981
- Proscha capturing its own king versusDaja,First GI Computer Chess Tournament 1975
- System Tal vs. XXXX,Promotion bug,WMCCC 1995
- Xinix - Mate in One,DOCCC 2000[3]
Publications
- Tony Marsland,Paul Rushton (1973).Mechanisms for Comparing Chess Programs.ACM Annual Conference,pdf
- Tim Breitkreutz,Jonathan Schaeffer (1984).Computer vs Computer via Computer.ICCA Journal, Vol. 7, No. 4, reprinted inComputer Chess Reports 1985, Vol. 3, No. 2 »Phoenix,Super Constellation
- John Stanback (1990).Supercomputing '90: Computer-Chess Testing and Programming Session.ICCA Journal, Vol. 13, No. 4 »ACM 1990
- Larry Kaufman (1993).How Our PC Chess Programs Are Developed.Computer Chess Reports 1992-93, Vol. 3, No. 2, pp. 12
- Thomas Mally (1993).Matt in Wieviel?PC Schach 3/93 (German)
- Jeff Rollason (2007).Statistical Minefields with Version Testing.AI Factory, Winter 2007 »Match Statistics
- Jónheiður Ísleifsdóttir (2007).GTQL: A Query Language for Game Trees. M.Sc. thesis,Reykjavík University,pdf
- Jónheiður Ísleifsdóttir,Yngvi Björnsson. (2008).GTQ: A Language and Tool for Game-Tree Analysis.CG 2008,pdf
Forum Posts
1995 ...
- Testing Chess Programs byJan Eric Larsson,rgcc, February 09, 1996
- Self-test and others rating stuffs... byChristophe Théron,CCC, January 01, 1998
- Proposal: New testing methods for SSDF (1) byJeroen Noomen,CCC, April 13, 1998
2000 ...
- Using 2 machines for matches (Linux) byJon Dart,CCC, June 24, 2001 »XBoard,Linux
- A proposed WAC replacement for testing byGian-Carlo Pascutto,CCC, September 18, 2001 »Win at Chess
- Value of playing different versions of a program against each other byTom King,CCC, January 06, 2003
- testing of evaluation function by Steven Chu,CCC, April 17, 2003 »Evaluation
- Testing the reliability of forward pruning byRussell Reagan,CCC, May 15, 2003 »Pruning
- To programmers: Hints for testing after a partial rewrite byFederico Corigliano,CCC, December 08, 2003
- Is there a way? byEd Schröder,CCC, December 13, 2004
2005 ...
- table for detecting significant difference between two engines by Joseph Ciarrochi,CCC, February 03, 2006
- test methodology byGiuseppe Cannella,Winboard Forum, November 13, 2006
- Testing and debugging chess engines byPatrice Duhamel,Winboard Forum, December 03, 2006
2007
- Programmer bug hunt challenge byEd Schröder,CCC, May 04, 2007 »Portable Game Notation,En passant
- a beat b,b beat c,c beat a question byUri Blass,CCC, May 16, 2007 »Playing Strength
- An objective test process for the rest of us? byNicolai Czempin,CCC, September 12, 2007
- My new testing scheme byZach Wegner,CCC, November 20, 2007
2008
- Test you engine byFermin Serrano,CCC, March 10, 2008
- New testing thread byRobert Hyatt,CCC, August 07, 2008
- Comparing two version of the same engine byFermin Serrano,CCC, October 26, 2008
- Debate: testing at fast time controls byFermin Serrano,CCC, December 15, 2008
2009
- Cutechess-cli: A command line tool for engine-engine matches, byIlari Pihlajisto,CCC, March 16, 2009
- Testing procedure byMatt Gingell,CCC, May 27, 2009
- Cutechess-cli version 0.1.8 released byIlari Pihlajisto,CCC, September 29, 2009
- A reason for testing at fixed number of nodes byJ. Wesley Cleveland,CCC, November 06, 2009
- different kinds of testing byDon Dailey,CCC, November 09, 2009
- more on fixed nodes byRobert Hyatt,CCC, November 10, 2009
2010 ...
- XBoard and epd tournament byVlad Stamate,CCC, January 31, 2010 »Chess Engine Communication Protocol
- Long game vs short game testing byVlad Stamate,CCC, April 08, 2010
- Pairings generation based on a big PGN file byHarun Taner,CCC, July 22, 2010
- hiatus good for bug-finding byStuart Cracraft,CCC, June 27, 2010
2011
- testing question byLarry Kaufman,CCC, June 01, 2011
- Debugging regression tests byOnno Garms,CCC, June 16, 2011[4]
2012
- fast game testing byJon Dart,CCC, January 08, 2012
- Your best bug ? byEd Schröder,CCC, August 06, 2012
- Yet Another Testing Question byBrian Richardson,CCC, September 15, 2012
- Another testing question byLarry Kaufman,CCC, September 23, 2012
- A word for casual testers byDon Dailey,CCC, December 25, 2012
2013
- A poor man's testing environment byEd Schröder,CCC, January 04, 2013[5] »Match Statistics
- engine-engine testing isues byJens Bæk Nielsen,CCC, January 20, 2013
- Beta for Stockfish distributed testing byGary,CCC, March 05, 2013 »Fishtest
- Fishtest Distributed Testing Framework byMarco Costalba,CCC, May 01, 2013 »Fishtest
- cutechess-cli 0.6.0 released byIlari Pihlajisto,CCC, July 12, 2013
- fast testing NIT algorithm byDon Dailey,CCC, August 22, 2013
- OICS: Computers Only ICS based Chess server for anyone byJoshua Shriver,CCC, August 26, 2013 »OICS
2014
- testing procedure byDaniel José Queraltó,CCC, February 23, 2014
2015 ...
- Bullet vs regular time control, say 40/4m CCRL/CEGT byEd Schröder,CCC, August 29, 2015
- Static evaluation test posistions byShawn Chidester,CCC, November 25, 2015
- Re: Static evaluation test posistions byFerdinand Mosca,CCC, November 26, 2015 »Python
2016
- Ordo 1.0.9 (new features for testers) byMiguel A. Ballicora,CCC, January 25, 2016
- cluster versus single server byFolkert van Heusden,CCC, April 28, 2016
- Testing using many computers and architectures byAndrew Grant,CCC, September 14, 2016
- command line engine match? byErin Dame,CCC, November 06, 2016 »CLI
- Looking for an epd file for sanity checks... byFermin Serrano,CCC, November 06, 2016
- Testing with different EPD suits for search vs eval changes byMichael Sherwin,CCC, December 23, 2016
2017
- sprt tourney manager byRichard Delorme,CCC, January 24, 2017 »Amoeba Tournament Manager,SPRT
- how to properly test the changes to the engine ? byMahmoud Uthman,CCC, February 01, 2017
- How to go about chasing a bug like this? byColin Jenkins,CCC, February 09, 2017 »Debugging
- How to find SMP bugs ? by Lucas Braesch,CCC, March 15, 2017 »Debugging,Lazy SMP
- Testing for Move Ordering Improvements byCheney Nattress,CCC, March 25, 2017 »Move Ordering,Search Statistics
- Testing endgame strength byÁlvaro Begué,CCC, June 21, 2017 »Endgame,RuyDos
- Opening testing suites efficiency byKai Laskos,CCC, June 21, 2017 »Opening,Match Statistics
- Testing A against B by playing a pool of others byAndrew Grant,CCC, June 24, 2017 »Match Statistics
- Core behaviour byEd Schroder,CCC, June 28, 2017 »Process,Thread
- Engine testing & error margin ? byMahmoud Uthman,CCC, July 05, 2017
- Engines for testing (Linux, fast time control) byJon Dart,CCC, November 18, 2017 »Linux
2018
- Issue with self play testing byCharles Roberson,CCC, May 18, 2018
- Basic automated testing by Josh Odom,CCC, September 28, 2018
- Re:Basic automated testing byAndrew Grant,CCC, September 30, 2018 »OpenBench
- testing consistency byJon Dart,CCC, December 16, 2018
2019
- Any testing framwork similair to Fishtest that can be run locally ? byMahmoud Uthman,CCC, April 02, 2019
- self test byVivien Clauzon,CCC, September 18, 2019
2020 ...
- EPD destruction tests byChris Whittington,CCC, February 19, 2020
- EPD destruction tests, part 2 byChris Whittington,CCC, February 27, 2020
- Looking for automatic Engine Testing Software byOliver Brausch,CCC, July 19, 2020
2021
- Testing strategies for my engines playing strength by Thomas Jahn,CCC, January 04, 2021
- Effect of adjudication and TC on testing process byVivien Clauzon,CCC, February 09, 2021 »Minic
2022
- Strategies to unit testing the search by Olexiy Svitashev,CCC, February 03, 2022 »Search
- How do you know you improved ? by Philippe Chevalier,CCC, February 03, 2022
External Links
- Cute Chess
- cutechess · GitHub
- GitHub - OpenBench a Distributed SPRT Testing Framework for Chess Engines byAndrew Grant »OpenBench,SPRT
- GitHub - ChrisWhittington/Chess-EPDs: Various EPD test suites byChris Whittington
- Regression testing from Wikipedia
- SPCC byStefan Pohl
- CHESS - Microsoft Research a tool for finding and reproducingHeisenbugs in concurrent programs.
- Engine test stand from Wikipedia
- Terje Rypdal Group feat.Palle Mikkelborg,Håkon Graf,Sveinung Hovensjø andJon Christensen -Per Ulv, 1978,YouTube Video

