Friday, December 2, 2016

Rise of the machines part 3

Regularly we have it on this blog about new engines. They don't only improve continuously but they also influence the way we play (see revolution in the new millennium) and analyze (see the fake truth). Besides we haven't yet met the ceiling as developments today are happening very rapidly. Personally I am really astonished about this after 2 decades of intensive programming done by several great talents. It is not easy to valuate this gigantic progression correctly  but I will give it a try in this article.

In 1997 Deep Blue defeated the at that time reigning world-champion Garry Kasparov which generally is considered as a milestone but it still took several years till every player could use an engine of the same strength. It is difficult to pin an exact date when that happened but I estimate 2003 will be close. 2003 was the period of the matches Kasparov against Deep Junior and X3D Fritz which were both drawn.

Ever since the top-engines have surpassed everybody and not a little bit. If we look at CCRL then Fritz progressed with 470 elo-points in the last 13 years. On top of that we notice that today Komodo 10 is an additional 210 rating-points stronger than the strongest version of Fritz. That makes a total of 680 points or averagely 52 points per year. If we only look at the 3 recent years then we have the same trend. End of 2013 I worked with stockfish 4. Last week I download Stockfish 8 which again is 165 points stronger than edition 4 based on the figures of CCRL. That is again 55 points averagely per year.

An important role during the progression of the last couple of years plays without any doubt TCEC (Top Chess Engine Championship). Ameliorations to the engines are allowed between the stages within 1 championship and this combined with the ever growing interest of the championship, clearly motivates most programmers.

Currently the superfinal of season 9 is ongoing and we are very close to the final decision. I see 2 big surprises this season. The first one is the non-qualification to the superfinal of Komdo while leading at CCRL. I guess this is related to new improved versions of the competitors which are not yet used by CCRL. The second big surprise is the comeback of our Belgian super talented programmer Robert Houdart with his engine Houdini. I didn't expect that as Houdini 4 already dates back from 2013 !. At the site of Houdini they claim a progression of not less than 200 ratingpoints which doesn't seem exaggerated to me.

In the separate rapid-championship Houdini won in front of Komodo and Stockfish but in the superfinal of the classical chess-championship, Houdini will most likely narrowly lose against Stockfish. Anyway 1 game will for sure be remembered for longtime if only because it created quite some controversy. Of course I talk about the 17th.
[Event "TCEC Season 9 - Superfinal"] [Site ""] [Date "2016.11.15"] [Round "17"] [White "Stockfish 8"] [Black "Houdini 5"] [Result "1-0"] [ECO "B78"] [WhiteElo "3228"] [BlackElo "3182"] [PlyCount "143"] 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 g6 6. Be3 Bg7 7. f3 O-O 8. Qd2 Nc6 9. Bc4 Bd7 10. O-O-O Rc8 11. Bb3 Ne5 12. h4 Nc4 13. Bxc4 Rxc4 14. h5 Nxh5 15. g4 Nf6 16. Kb1 Re8 17. b3 Rc8 18. Nd5 Nxd5 19. exd5 e5 20. dxe6 fxe6 21. Qh2 Qf6 22. a4 b6 23. Qxh7 Kf7 24. g5 Qe5 25. Rh6 Qxe3 26. Rxg6 Rg8 27. f4 d5 28. f5 exf5 29. Nxf5 Qxb3 30. cxb3 Bxf5 31. Ka2 Rc2 32. Ka3 Bxg6 33. Rf1 Ke7 34. Qxg8 Bb2 35. Kb4 Bc3 36. Kb5 Bd3 37. Kc6 Be5 38. Kb7 Rc7 39. Ka8 Bxf1 40. Qxd5 Bg7 41. Kb8 Rd7 42. Qe4 Kf8 43. Qf5 Rf7 44. Qc8 Ke7 45. Qc7 Ke6 46. Qc6 Kf5 47. Qd5 Be5 48. Ka8 Rf8 49. Kxa7 Be2 50. b4 Bh5 51. Kxb6 Bf7 52. Qf3 Bf4 53. Qc6 Rb8 54. Ka7 Kxg5 55. Qd7 Bh5 56. b5 Re8 57. b6 Be3 58. Qd5 Kh4 59. a5 Re7 60. Ka8 Re8 61. Kb7 Re7 62. Kc8 Bg4 63. Kb8 Bf4 64. Ka8 Kg3 65. Qg8 Re5 66. a6 Re6 67. Kb7 Be3 68. a7 Rxb6 69. Kc7 Ra6 70. a8=Q Bf4 71. Kb7 Rxa8 72. Kxa8 {(This position was automatically adjudicated as a win for white which created quite some controversy. The Nalimov tablebes show a win in 72 moves but the Syzygy tablebases tell us that the 50 moves rule comes into force. Besides the evaluations of both engines do not show a win at all.)} 1-0
In the final position a win was awarded automatically to Stockfish based on the Nalimov tablebases. However many viewers didn't agree with the verdict. First both engines showed a quotation of 0.00 in the final position see TCEC but on top the 50 moves-rule was not taken into account. If TCEC had used instead  Syzygy tablebases then the rule could have been applied.
Evaluation by Syzygy tablebases of the final position game 17th TCEC season 9
DTZ tells us how many moves no pawn was moved or piece was captured against optimal play. DTM on the other hand shows us the number of moves to mate against optimal play. 123 plies or 62 moves for DTZ means indeed that the 50 moves-rule comes into force.

However we should not forget that the 50 moves-rule is something introduced for humans to avoid searching endlessly for a win in vain. As I already wrote in my article ICCF it does make sense to ignore this rule here too.

Besides that it is still looks strange to me to award a win when both engines don't see at all such win. I do understand that adjudications win a lot of time and energy. Till then this was always going smoothly but not this time. Afterwards some people claimed rightly that Houdini would have avoided the final position if it was allowed to consult in advance the tablebases.

Decisions by (much) weaker arbiters often create problems when they are related to playing for a win but the opposite also exists. The much stronger arbiter makes a judgment based on its capabilities but ignores the much weaker skills of the involved players.

By accident something similar happened to my son Hugo playing in the -8 category of the Flemish youth-criterium at Gent. His third game was adjudicated as a draw when an endgame of each rook + king was on the board and the opponent risked losing on time. After the game Hugo could not suppress his tears anymore. The arbiter made a call in good conscience but it is of course very painful when just a few weeks earlier you lost the exact same endgame in the step-tournament of Turnhout against a brother of the opponent.
[Event "Step-tournament Turnhout"] [Date "2016"] [Round "9"] [White "Hugo"] [Black "Brother of opponent Gent"] [Result "0-1"] [SetUp "1"] [FEN "6r1/8/4k3/5R2/5K2/8/8/8 b - - 0 1"] [PlyCount "7"] 1... Rg1 {(Both players had still several minutes but none thinks this is a draw.)} 2. Ke4 $4 {(Only considering Rg4. As a parent I was not surprised to see this move as Hugo played the complete game below its normal level.)} Re1 3. Kf4 Rf1 4. Ke4 Rxf5 {(Black needed more than 50 moves to mate but as there was no notation, a draw could never be claimed. This game decided the second place of the 2nd category of the step-tournament.)} 0-1
Maybe Hugos opponent in Gent would have not made such kind of mistake but we can't be sure of that. You never know what will or will not happen in the -8 category so any decision is debatable. Eventually I advised Hugo to accept the decision of the arbiter. A draw was a fair result and from my experience I know that it is often better not to fight against such things on the long term.

I assume TCEC thought the same. The adjudication wasn't optimal but the decision was made and you can't change the rules during the superfinale anymore. In the end 100 games will be played and it doesn't look like this 1 game will influence who will win the final.

I expect after this superfinal CCRL will start to test the new versions of both finalists. Normally this means we will see Stockfish as the new number 1 with a bunch of ratingpoints ahead. Some difficult times are coming for the commercial engines as few will want to pay for a weaker engine while you can get the strongest one for free.

The exact elo-strength of the engines calculated by Carlsens rating + the progression since 2003 looks too simplistic to me. If we would do such math then it would mean Carlsen would not be able to score theoretically one single point in a standard game without a handicap. I do see him losing a match with a big margin but with the right openings it should be possible to score a couple of half points which means the rating-difference can't be 700 points.

On the other hand in this article I only talk about the strength of the engines. We don't take into account hardware developments, improved interfaces or new and bigger tablebases. Together they maybe push the rating another 200 points up.

It is not for no reason that I stated at the beginning of this article that the progression of the engines is difficult to valuate correctly. If you add up all the numbers then you get a dazzling rating of around 3800 elo which makes no sense. The only way to evaluate the engines is to let them compete against other engines. Unfortunately we also see a lot of players using the engines to denigrate our top-players which just shows a complete lack of respect.


No comments:

Post a Comment