Automatic detection of cheating: mature?
One of the advantages to work at a (large) consulting firm is to have access to scientific papers and other materials which are e.g. stored in sciencedirect.com. It is a database containing a lot of scientific knowledge which of course invites an ex-player to look for chess related stuff. This way I recently discovered the paper: ‘On the limits of engine analysis for cheating detection in chess’ by David J. Barnes and Julio Hernandez-Castro, both from the School of Computing, University of Kent.
It is a relative new paper (October 2014), published in the journal Computers & Security 48 (2015), 58-73. Both writers are amateur-chessplayers but their background is rather number crunching, which they applied on chess.
Let us start with the conclusions. Unfortunately the paper tells us that today there exists no water-proof method to find out somebody has used a computer during a game. As strong handhelds with engines only exist for about 5 years, we can use an interesting data-set for this problem. More than 90% of the games are guaranteed engine-free (from Morphy till Kramnik). That is a solid base to check the algorithm - it is impossible that games before 1990 were played with the help of engines (except maybe some correspondence games but the level was still very low at that time).
The paper uses as starting-point that the problem mainly pops up on internet-sties - where games 'live' can be checked, it will be easier to detect cheaters.
In over the board (OTB) chess, if there is no strange behavior of a player to notice, then it is very hard to proof something especially if his level does not deviate much from his actual (historical) level. Besides an automatic method if not fully conclusive, has the drawback of creating 'false positives'. The paper indicates some interesting examples of perfect play, long before the appearance of computers.
This paper is not the first publication about this subject. Kenneth Regan is a pioneer on which this paper built further. Regan used a method taking into account the elo of the player before and after a tournament, his tournament-result and the move-selection of the player, compared with the move-selection of equal players in similar positions (an experienced player will immediately see objections against such approach: shape of the player, under-rated youth-player, style of play, mood, environmental circumstances or even the occasional blunder). These defects are also identified by Barnes and Hernandez.
Stunningly the authors consider the current method applied by chess-servers inadequate on the long term. The servers keep their methods secret to detect cheating but this ‘security by obscurity’ eventually fails - once it is cracked the statistical method becomes useless. And every code is after some time cracked. Apparently the protocol of ICC contains more holes than a sieve. It is not only possible to steal credit-data but even to take over an account and use the account to cheat using engines.
Concretely the analysis happened on the database of Chessbase, extended with TWIC-databases till 2013. In total 7 million games, but for the analysis only 120.000 games were used; however the big database was used to build an opening-tree of 87 million positions. The analysis were executed by Stockfish 3.0. The games were cut into single positions which were analysed in multi-PV mode. When a played move didn't belong to the top 5 choices of an engine then this was detected. However the authors used a slightly different method compared with Regan. Regan compared each played move with the best move of the computer. On the other hand they used a parameter (coincide value) which compared the correspondence of the number of moves played by humans with the number of moves of an engine with approximately the same evaluation. A subtle but important difference. They took into account that some moves can have the same effect which shouldn't impact the cheating-detection (e.g. in an endgame a king can walk from a2 to c1 via 2 ways a2-b2-c1 or a2-b1-c1). On A UCI-based Chess Game Analyser you can find the C++code.
Nevertheless we should not underestimate cheaters - at least not smart ones. In quiet positions they will maybe choose for the second or third choice of an engine, and when the position is won, then it does not matter anymore if it is mate in 15 or 19. Therefore it is necessary to also consider the average deviation of the played move with the preferred move by the engine. A very smart cheater, playing every time sub-optimal moves, will still be able to win against somebody of <2500 points so even if regularly a slight advantage is thrown away. In the end many games are decided by blunders, and only 1 good move is sufficient to keep a permanent advantage.
The opening-tree saves for each of the moves the date it was played for the first time (something which surely is interesting). This permits to check if a strong move was already discovered earlier (and more likely known by the player) or if it is a novelty (likely more often an engine-move). However most of all the database was used to define from which point the games were original so a real good cheating-detection system could start. It makes no sense to check how good a player has studied the theory.
The analyses started with 70.000 games till 1950 and 50.000 games from 1950 till 2005. The games are analysed at a depth of 8 plies. The most suspicious games were analysed more extensive (25.000 on ply 10). This process was continued till only 250 games remained which were scrutinized till ply 22. This looks a very shallow approach, taking into account the current calculation power of our computers but if you check how much time is needed to analyse 250.000 games at ply 8 or 10 then you will surely look at it differently (e.g. Chessbase gui permits automatic analysis).
Along the logical things identified, we also notice confirmations of the human behavior: the longer the game, the more mistakes are made. Long games with invariable parameters are seldom and easily detectable. Nevertheless such games (of the era before the engines) exist and do demonstrate that 2 humans can play in exceptional cases a perfect game. On the other hand there were also many perfect short games which rather can be attributed to homework.
The article concludes with an illustration of ‘false positives’; games played close to perfection, but from before the computer-era. People thinking that players from the romantic era were only able to play spectacular attacks and exploiting the weakness of the opponent, will be surprised as the first example is Kennicott-Morphy, 1857. The game is 14,5 moves 'theory' (= moves played earlier and likely known by both players). After that Morphy plays 10 perfect new moves (perfectly matching Stockfish at depth ply 18. Morphy still pops up a few times with perfect performances (blind-game with black against Schulten, New York 1857; the seventh matchgame against Mongredien in Parijs 1859 and Morphy-Muarian in New Orleans 1866). Now we 'finally' see the respect confirmed which even world-champions have for Morphy.
If there were engines in 1889, then surely Max Weiss would need to explain his game against Burill in New York. Weiss played after 6,5 moves not less than 26 consecutive perfect moves and won the game.
[Event "USA-06.Congress New York"] [Site "USA-06.Congress New York"] [Date "1889.04.18"] [EventDate "?"] [Round "20"] [Result "0-1"] [White "Constant Ferdinand Burille"] [Black "Max Weiss"] [ECO "C67"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "74"] 1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6 4.O-O Nxe4 5.d4 Be7 6.d5 Nd6 7.Nc3 Nxb5 8.Nxb5 Nb8 9.d6 Bxd6 10.Nxd6 cxd6 11.Qxd6 f6 12.Be3 Qe7 13.Bc5 Qxd6 14.Bxd6 Nc6 15.Nh4 Kf7 16.Rad1 b6 17.Nf5 g6 18.Ne3 Ba6 19.Rfe1 Rac8 20.c3 Rhd8 21.Rd2 Na5 22.Bb4 Nc4 23.Nxc4 Bxc4 24.Red1 Be6 25.Bd6 g5 26.a3 Rc6 27.a4 Bb3 28.Ra1 Ke6 29.Ba3 a5 30.Rd3 d5 31.Rc1 Rdc8 32.Rh3 R6c7 33.Kf1 Bxa4 34.Ke1 Bb3 35.Kd2 b5 36.Re1 g4 37.Rg3 b4 0-1
The writers also discuss Mamedyarov-Kurnusov, a game which went around the world because of the sore loser Mamedyarov, but which using their method would not even be a ‘false positive’ as too short and the fact Kurnosov only played 6 perfect moves. Reversed engineered, if Kurnosov cheated then he would have committed the perfect murder. In the meanwhile we know that he was honest - the respected arbiter Geurt Gijssen did the investigations on-site.
The writers continue with their research and investigate if single thread or multi thread influences the analysis. The conclusion is clear. Only single thread analysis can be reproducible. Multi-thread generates variations (to be explained by the way the computer distributes the different tasks at the different cores).
Interesting is the start-up of a web-service which the writers want to launch to detect cheating quicker. This should increase the chances of being caught considerably, also when using a perfect camouflage (e.g. by using a wireless communication or if a player uses sub-optimal moves from an engine but still good moves) .To conclude I still give some games which were detected as 'suspicious' if only they were not played before the computer-era. It is just like you watch a car participating at the horse-race of Ben-Hur J. J. Carames-Fedorovsky, Buenos Aires 1965 and Browne-Timman, Wijk aan Zee 1980, worth looking up and replaying!
[Event "Wijk aan Zee 29/596"] [Site "Wijk aan Zee 29/596"] [Date "1980"] [EventDate "?"] [Round "?"] [Result "1-0"] [White "Walter Shawn Browne"] [Black "Jan Timman"] [ECO "E63"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "79"] 1.d4 Nf6 2.Nf3 g6 3.c4 Bg7 4.g3 O-O 5.Bg2 d6 6.Nc3 Nc6 7.O-O a6 8.d5 Na5 9.Nd2 c5 10.Qc2 Rb8 11.b3 b5 12.Bb2 Bh6 13.f4 bxc4 14.bxc4 e5 15.dxe6 Bxe6 16.Nd5 Bxd5 17.cxd5 Ng4 18.Nb3 Nxb3 19.axb3 Qb6 20.Qc3 c4 21.Kh1 f6 22.Bh3 Nf2 23.Rxf2 Qxf2 24.Be6 Kh8 25.Qxc4 Qb6 26.Bd4 Qxb3 27.Qxb3 Rxb3 28.Rxa6 g5 29.fxg5 Bxg5 30.Rxd6 h5 31.Rc6 h4 32.d6 Rb1 33.Kg2 Rd1 34.e3 hxg3 35.hxg3 Rxd4 36.exd4 f5 37.Bxf5 Rxf5 38.Rc5 Rxc5 39.dxc5 Kg7 40.c6 1-0HK5000