Automatic detection of cheating: mature?
One of the advantages to work at a (large) consulting firm is to have access to scientific papers and other materials which are e.g. stored in sciencedirect.com. It is a database containing a lot of scientific knowledge which of course invites an ex-player to look for chess related stuff. This way I recently discovered the paper: ‘On the limits of engine analysis for cheating detection in chess’ by David J. Barnes and Julio Hernandez-Castro, both from the School of Computing, University of Kent.
It is a relative new paper (October 2014), published in the journal Computers & Security 48 (2015), 58-73. Both writers are amateur-chessplayers but their background is rather number crunching, which they applied on chess.
Let us start with the conclusions. Unfortunately the paper tells us that today there exists no water-proof method to find out somebody has used a computer during a game. As strong handhelds with engines only exist for about 5 years, we can use an interesting data-set for this problem. More than 90% of the games are guaranteed engine-free (from Morphy till Kramnik). That is a solid base to check the algorithm - it is impossible that games before 1990 were played with the help of engines (except maybe some correspondence games but the level was still very low at that time).
The paper uses as starting-point that the problem mainly pops up on internet-sties - where games 'live' can be checked, it will be easier to detect cheaters.
In over the board (OTB) chess, if there is no strange behavior of a player to notice, then it is very hard to proof something especially if his level does not deviate much from his actual (historical) level. Besides an automatic method if not fully conclusive, has the drawback of creating 'false positives'. The paper indicates some interesting examples of perfect play, long before the appearance of computers.
This paper is not the first publication about this subject. Kenneth Regan is a pioneer on which this paper built further. Regan used a method taking into account the elo of the player before and after a tournament, his tournament-result and the move-selection of the player, compared with the move-selection of equal players in similar positions (an experienced player will immediately see objections against such approach: shape of the player, under-rated youth-player, style of play, mood, environmental circumstances or even the occasional blunder). These defects are also identified by Barnes and Hernandez.
Stunningly the authors consider the current method applied by chess-servers inadequate on the long term. The servers keep their methods secret to detect cheating but this ‘security by obscurity’ eventually fails - once it is cracked the statistical method becomes useless. And every code is after some time cracked. Apparently the protocol of ICC contains more holes than a sieve. It is not only possible to steal credit-data but even to take over an account and use the account to cheat using engines.
Concretely the analysis happened on the database of Chessbase, extended with TWIC-databases till 2013. In total 7 million games, but for the analysis only 120.000 games were used; however the big database was used to build an opening-tree of 87 million positions. The analysis were executed by Stockfish 3.0. The games were cut into single positions which were analysed in multi-PV mode. When a played move didn't belong to the top 5 choices of an engine then this was detected. However the authors used a slightly different method compared with Regan. Regan compared each played move with the best move of the computer. On the other hand they used a parameter (coincide value) which compared the correspondence of the number of moves played by humans with the number of moves of an engine with approximately the same evaluation. A subtle but important difference. They took into account that some moves can have the same effect which shouldn't impact the cheating-detection (e.g. in an endgame a king can walk from a2 to c1 via 2 ways a2-b2-c1 or a2-b1-c1). On A UCI-based Chess Game Analyser you can find the C++code.
Nevertheless we should not underestimate cheaters - at least not smart ones. In quiet positions they will maybe choose for the second or third choice of an engine, and when the position is won, then it does not matter anymore if it is mate in 15 or 19. Therefore it is necessary to also consider the average deviation of the played move with the preferred move by the engine. A very smart cheater, playing every time sub-optimal moves, will still be able to win against somebody of <2500 points so even if regularly a slight advantage is thrown away. In the end many games are decided by blunders, and only 1 good move is sufficient to keep a permanent advantage.
The opening-tree saves for each of the moves the date it was played for the first time (something which surely is interesting). This permits to check if a strong move was already discovered earlier (and more likely known by the player) or if it is a novelty (likely more often an engine-move). However most of all the database was used to define from which point the games were original so a real good cheating-detection system could start. It makes no sense to check how good a player has studied the theory.
The analyses started with 70.000 games till 1950 and 50.000 games from 1950 till 2005. The games are analysed at a depth of 8 plies. The most suspicious games were analysed more extensive (25.000 on ply 10). This process was continued till only 250 games remained which were scrutinized till ply 22. This looks a very shallow approach, taking into account the current calculation power of our computers but if you check how much time is needed to analyse 250.000 games at ply 8 or 10 then you will surely look at it differently (e.g. Chessbase gui permits automatic analysis).
Along the logical things identified, we also notice confirmations of the human behavior: the longer the game, the more mistakes are made. Long games with invariable parameters are seldom and easily detectable. Nevertheless such games (of the era before the engines) exist and do demonstrate that 2 humans can play in exceptional cases a perfect game. On the other hand there were also many perfect short games which rather can be attributed to homework.
The article concludes with an illustration of ‘false positives’; games played close to perfection, but from before the computer-era. People thinking that players from the romantic era were only able to play spectacular attacks and exploiting the weakness of the opponent, will be surprised as the first example is Kennicott-Morphy, 1857. The game is 14,5 moves 'theory' (= moves played earlier and likely known by both players). After that Morphy plays 10 perfect new moves (perfectly matching Stockfish at depth ply 18. Morphy still pops up a few times with perfect performances (blind-game with black against Schulten, New York 1857; the seventh matchgame against Mongredien in Parijs 1859 and Morphy-Muarian in New Orleans 1866). Now we 'finally' see the respect confirmed which even world-champions have for Morphy.
If there were engines in 1889, then surely Max Weiss would need to explain his game against Burill in New York. Weiss played after 6,5 moves not less than 26 consecutive perfect moves and won the game.
The writers also discuss Mamedyarov-Kurnusov, a game which went around the world because of the sore loser Mamedyarov, but which using their method would not even be a ‘false positive’ as too short and the fact Kurnosov only played 6 perfect moves. Reversed engineered, if Kurnosov cheated then he would have committed the perfect murder. In the meanwhile we know that he was honest - the respected arbiter Geurt Gijssen did the investigations on-site.
The writers continue with their research and investigate if single thread or multi thread influences the analysis. The conclusion is clear. Only single thread analysis can be reproducible. Multi-thread generates variations (to be explained by the way the computer distributes the different tasks at the different cores).
Interesting is the start-up of a web-service which the writers want to launch to detect cheating quicker. This should increase the chances of being caught considerably, also when using a perfect camouflage (e.g. by using a wireless communication or if a player uses sub-optimal moves from an engine but still good moves) .To conclude I still give some games which were detected as 'suspicious' if only they were not played before the computer-era. It is just like you watch a car participating at the horse-race of Ben-Hur J. J. Carames-Fedorovsky, Buenos Aires 1965 and Browne-Timman, Wijk aan Zee 1980, worth looking up and replaying!