Friday, December 2, 2016

Rise of the machines part 2

Regularly we have it on this blog about new engines. They don't only improve continuously but they also influence the way we play (see revolution in the new millennium) and analyze (see the fake truth). Besides we haven't yet met the ceiling as developments today are happening very rapidly. Personally I am really astonished about this after 2 decades of intensive programming done by several great talents. It is not easy to valuate this gigantic progression correctly  but I will give it a try in this article.

In 1997 Deep Blue defeated the at that time reigning world-champion Garry Kasparov which generally is considered as a milestone but it still took several years till every player could use an engine of the same strength. It is difficult to pin an exact date when that happened but I estimate 2003 will be close. 2003 was the period of the matches Kasparov against Deep Junior and X3D Fritz which were both drawn.

Ever since the top-engines have surpassed everybody and not a little bit. If we look at CCRL then Fritz progressed with 470 elo-points in the last 13 years. On top of that we notice that today Komodo 10 is an additional 210 rating-points stronger than the strongest version of Fritz. That makes a total of 680 points or averagely 52 points per year. If we only look at the 3 recent years then we have the same trend. End of 2013 I worked with stockfish 4. Last week I download Stockfish 8 which again is 165 points stronger than edition 4 based on the figures of CCRL. That is again 55 points averagely per year.

An important role during the progression of the last couple of years plays without any doubt TCEC (Top Chess Engine Championship). Ameliorations to the engines are allowed between the stages within 1 championship and this combined with the ever growing interest of the championship, clearly motivates most programmers.

Currently the superfinal of season 9 is ongoing and we are very close to the final decision. I see 2 big surprises this season. The first one is the non-qualification to the superfinal of Komdo while leading at CCRL. I guess this is related to new improved versions of the competitors which are not yet used by CCRL. The second big surprise is the comeback of our Belgian super talented programmer Robert Houdart with his engine Houdini. I didn't expect that as Houdini 4 already dates back from 2013 !. At the site of Houdini they claim a progression of not less than 200 ratingpoints which doesn't seem exaggerated to me.

In the separate rapid-championship Houdini won in front of Komodo and Stockfish but in the superfinal of the classical chess-championship, Houdini will most likely narrowly lose against Stockfish. Anyway 1 game will for sure be remembered for longtime if only because it created quite some controversy. Of course I talk about the 17th.

In the final position a win was awarded automatically to Stockfish based on the Nalimov tablebases. However many viewers didn't agree with the verdict. First both engines showed a quotation of 0.00 in the final position see TCEC but on top the 50 moves-rule was not taken into account. If TCEC had used instead  Syzygy tablebases then the rule could have been applied.
Evaluation by Syzygy tablebases of the final position game 17th TCEC season 9
DTZ tells us how many moves no pawn was moved or piece was captured against optimal play. DTM on the other hand shows us the number of moves to mate against optimal play. 123 plies or 62 moves for DTZ means indeed that the 50 moves-rule comes into force.

However we should not forget that the 50 moves-rule is something introduced for humans to avoid searching endlessly for a win in vain. As I already wrote in my article ICCF it does make sense to ignore this rule here too.

Besides that it is still looks strange to me to award a win when both engines don't see at all such win. I do understand that adjudications win a lot of time and energy. Till then this was always going smoothly but not this time. Afterwards some people claimed rightly that Houdini would have avoided the final position if it was allowed to consult in advance the tablebases.

Decisions by (much) weaker arbiters often create problems when they are related to playing for a win but the opposite also exists. The much stronger arbiter makes a judgment based on its capabilities but ignores the much weaker skills of the involved players.

By accident something similar happened to my son Hugo playing in the -8 category of the Flemish youth-criterium at Gent. His third game was adjudicated as a draw when an endgame of each rook + king was on the board and the opponent risked losing on time. After the game Hugo could not suppress his tears anymore. The arbiter made a call in good conscience but it is of course very painful when just a few weeks earlier you lost the exact same endgame in the step-tournament of Turnhout against a brother of the opponent.

Maybe Hugos opponent in Gent would have not made such kind of mistake but we can't be sure of that. You never know what will or will not happen in the -8 category so any decision is debatable. Eventually I advised Hugo to accept the decision of the arbiter. A draw was a fair result and from my experience I know that it is often better not to fight against such things on the long term.

I assume TCEC thought the same. The adjudication wasn't optimal but the decision was made and you can't change the rules during the superfinale anymore. In the end 100 games will be played and it doesn't look like this 1 game will influence who will win the final.

I expect after this superfinal CCRL will start to test the new versions of both finalists. Normally this means we will see Stockfish as the new number 1 with a bunch of ratingpoints ahead. Some difficult times are coming for the commercial engines as few will want to pay for a weaker engine while you can get the strongest one for free.

The exact elo-strength of the engines calculated by Carlsens rating + the progression since 2003 looks too simplistic to me. If we would do such math then it would mean Carlsen would not be able to score theoretically one single point in a standard game without a handicap. I do see him losing a match with a big margin but with the right openings it should be possible to score a couple of half points which means the rating-difference can't be 700 points.

On the other hand in this article I only talk about the strength of the engines. We don't take into account hardware developments, improved interfaces or new and bigger tablebases. Together they maybe push the rating another 200 points up.

It is not for no reason that I stated at the beginning of this article that the progression of the engines is difficult to valuate correctly. If you add up all the numbers then you get a dazzling rating of around 3800 elo which makes no sense. The only way to evaluate the engines is to let them compete against other engines. Unfortunately we also see a lot of players using the engines to denigrate our top-players which just shows a complete lack of respect.


No comments:

Post a Comment