Wednesday, February 28, 2018

To analyze using a computer part 3

December was a dark month for Stockfish. First Stockfih was beaten by Alpha Zero see chess.com. Next Stockfish didn't manage to qualify for the TCEC-super-final of season 10 the first since years. If you would only look at those facts then many would decide to stop working with Stockfish.

However if we are investigating more closely the data then we see a very different picture. The beating by Alpha Zero was eventually only a difference of 64-36 which corresponds to about a gap of 100 elo see the fide-handbook. Besides there was a lot of critics on how Stockfish was operated. In my previous article I already explained how easily you can get a difference of a couple of hundred ratingpoints by just manipulating some parameters. This is not even taking into account the missing opening-book and the very limited hash-table which definitely also impacted the playing-strength of the engine. In short a match where Stockfish would get better conditions, could well reverse the final result but this will never be allowed of course by deepmind of Google.

Also the early elimination of Stockfish after stage 2 in season 10 was no disgrace at all. Contrary to previous years after stage 2 immediately the super-final was played. Consequently a lot depended on how the best engines performed against the weaker ones. Stockfish was the only engine not losing a game but finally ended a half point from the leaders. So the 2 finalists (Komodo en Houdini) never proofed their superiority. Personally I believe Stockfish is the strongest engine today available. After the final it won the TCEC-rapid and the newest release 9  (available since beginning of this month) leads at ccrl with 39 elo.

Naturally Chessbase doesn't report anything about the success of Stockfish on their website. The engine destroys their market. Even when a new top-engine like Houdini 6 is introduced then mainly negative comments are received. You really need to be stupid to pay 100 euro while a stronger engine is available for free. An open forum creates extra visitors but can also cause damage to your business. Personally I think such comments are inappropriate. First you get more than just an engine for the 100 euro. Also you can't expect that everybody wants to work for free. There is nothing wrong with trying to make a living from creating new interesting things.

Still if we only concentrate at the engine then we can wonder how necessary it is for ourselves to get an extra engine beside Stockfish. Do other engines have an added value for us? Well honestly I doubt it for 99,9% of the players. The quality of Stockfish's analysis suffice for any player till at least 2600 fide and maybe even higher. Only for some theoreticians like correspondence-players and the world-top in otb it becomes doubtful to only rely at Stockfish. I call it doubtful as today it is really not clear if an extra engine will still bring something extra. To support this statement, I made during the 2 weeks of the last Christmas holidays a special research-project.

In the 6th round of the last Open Leuven again like last summer at Gent see evolution, I didn't stand a chance against the Belgian IM Stefan Docx. Again I was surprised in the opening and was trailing the whole game. However this time the problem of the opening was more serious than last time. Despite many hours of analysis I was not able to repair the system at home. In the end I had to admit the opening was not fully correct so I should look for something else instead. However I am rather reluctant today to learn something completely new from scratch. I play few games so the work should be proportional. Stefan recommended the classical Dutch as that is the closest related to my repertoire. Just recently our current world-champion Magnus demonstrated the viability of this opening in a secret online blitz-game.
Nevertheless Magnus can probably play anything especially at blitz and still win. Even in that Magnus' game black's position was several moves pretty dubious. So I wasn't convinced yet to take up the classical Dutch in my repertoire. I needed to know more about the quality of the opening. Of course the first thing you need to check are recent articles/ books about this opening. The e-book The Killer Dutch, published in 2015 can probably considered today as the current best up to date theoretical work about this opening. The author is the English grandmaster Simon Williams. He is the biggest expert in the opening. Unfortunately from theoretical point of view this book comes short. Simon doesn't write for theoreticians but he explains the opening from a pure practical use in tournament-play. That is understandable as the practical player is his main reading-audience. So some critical lines are a little too easy categorized as harmless.

If no good theoretical references exist then the only thing which remains is to start your own research. However that is easier said than done. Even an opening like the classical Dutch contains today a myriad of variations. Below you see a screenshot of my current personal opening-book only built with games of the Megadatase in which at least one color has + 2300 fide.
That is +1700 games and we still need to add the correspondence-games and the engine-games which also could influence the evaluation of the different lines. In my article studying openings part 2 I already explained that 100 games often take about a week to digest. So I realized in advance that I had to change my working-methods to process +1000 games within an acceptable time-frame. The first adaptation was to prioritize the lines played the most often in practice instead of the lines recommended by the engine. Especially at a very early stage of the opening which is here the case, we often see that the engine plays inferior moves compared to a strong opening-book (see also the earlier reference to the match between Alpha Zero and Stockfish). It is necessary to analyze side-lines to support the main-lines but analyzing side-lines to detect the main-lines is mainly a waste of time.

A second important time-improvement without loss of quality was expected by not checking everything anymore by a 2nd engine (see my old article to analyze using a computer part 1). It is very time-consuming to switch between engines even if 2 computers are used. That is the link to my introduction in which I announced a special research-project. Instead of checking everything twice, I only did for some positions which I considered critical for the evaluation of the opening. This means only the positions where best play of both sides could/ would still give an edge to white by the first engine.

In the end only 18 positions remained to be checked by a second engine. The result was stunning. Only for 3 positions there was a conflict due to the smallest possible difference of only 1 hundredth of a pawn. I discussed this absurd phenomenon in my article annotations. I like to use strict boundaries to achieve a very objective method of analyzing but in some exceptional cases this can create some weird evaluations. In other words the only conclusion of the project is that the extra analysis with the 2nd engine was not adding anything substantial.

It again demonstrates how drastic the engines have evolved in the last years. 10 years ago it was rare that 2 top-engines were so often agreeing. You could easily find positions in which one engine would tell you that white is winning while the other one would state black is winning. Extra analysis were necessary to find who was right or wrong. In some cases the truth was somewhere in middle. Today not only we see top-engines prefer the same move but also sometimes show the same main-line. In fact it is not so surprising as engines are getting closer to perfection. Besides Stockfich applies an open source strategy. Everybody is allowed to see their code and learn from it. Of course other top-engines copy stuff which again diminishes the differences between them. Before the old top-engines were more closed.

So this means we can skip the 2nd engine forever. No, that conclusion we can't make yet. It is not because we don't see differences for the classical Dutch that there exist none between the engines. The qualities of an engine are not depending on just one opening. It is also the most important reason why tests of engines happen with a wide range of openings. Recently I detected a serious difference of evaluation in a position popping up from a Spanish Breyer-opening. See below screenshot in which we see both engines calculating in parallel.


While Stockfish claims a big advantage, Komodo states it is approximately equal. So for some special positions some extra analysis remains necessary to know which engine to trust. Anyway the number of special position quickly diminishes.

Theoreticians will still have to use a second engine. For players not interested in maximum quality, it makes no sense anymore to buy a 2nd engine. I will use also less a 2nd engine in the future. Only for detecting small differences (0,3 pawn) I still see an added value. Today a 2nd engine has become redundant in positions with an evaluation higher than +2 or lower than -2.

Finally maybe some reader wonders what I concluded about the classical Dutch. A very concise summary of the weeks of analysis can be found below.

Brabo

Addendum 21 March 2018
At http://www.chesspub.com/cgi-bin/chess/YaBB.pl?num=1369191586/75 an important improvement was shown for black by the German FM Stefan Buecker in the classical Dutch. The novelty 13...a5 instead of the played 13...Bb7 revitalizes this line.

1 comment:

  1. OK. Some problem earlier.

    You have very interesting blog. I found it only recently. I myself have not been active for some years now but if I decided to start play again and catch up all the theory I was left behind this time, surely this blog of yours is the place to start. In many ways.

    Thanks from Finland by one national CM

    ReplyDelete