Chess-Brabo: Regression-tests

It took me about 20 years when I finally started to build a proper opening-repertoire. I often wondered afterwards why I waited so long with it. Not only I played far too long bad openings (which I probably still partly do because I am still playing the Dutch Defense) but in most games I really had no clue what I was doing during the opening. I still hope to play a lot of chess in the future but those 20 years are gone and won't return.

20 years or even longer ago nobody told me that I should study openings. The Belgian FM Gunter Deleyn told me just before the lockdown this year that in his younger years it was outrageous to consult books about openings. This would be doping and an insult to chess which should be a pure intellectual fight between 2 humans. Times have changed clearly. Nowadays I read often about coaches spending a lot of time at openings even at a very early stage of development.

So only around 2013 I finally started to study openings seriously. I guess it was a mix of elements which convinced me to do. First I experienced a couple of debacles in the opening see e.g. an article of 2012: an expanded repertoire for black which opened my eyes. I realized that I was nowhere with my repertoire. Yes I did know already a few things about openings at that time but it was totally insufficient to meet the repertoire of a master. Besides also the quick technological developments forced me to undertake actions if I didn't want to lose ratingpoints. More and more of games were inserted into the databases. Meanwhile stronger engines were often showing how bad my openings were.

In 2016 I wrote on this blog for the first time about the changes I made to my approach of selecting openings see to study openings part 2. In that article I described mainly the methods I used but also that the results came very (too) slowly. In 2016 I had studied about 100 openings deeply. Today this number has increased to 300 (the definition of an opening I use is that it concerns a position of which you can find about 100 games in the megadatabase in which at least one of the 2 players has minimum 2300 fide-elo).

I don't know how many openings I should still analyze before I have covered my complete repertoire. I guess it is possible to calculate the exact number by writing some script which connects an opening-tree of your repertoire to a megadatabase but I already know in advance that the answer won't make me happy. I am sure that I am not even at the midpoint. Nonetheless this doesn't mean that I can't profit already from my work in progress. My database of +300 openings analyzed very deeply is an oasis of killer novelties and ideas. I very often use them in my games and I am not only talking about refutations of specific lines but also about surprising the opponents.

Naturally the research of the openings also lead to many changes of my repertoire. Many lines were dropped and replaced by (hopefully) better ones. Readers following this blog for years will probably still remember my articles part 1 and part 2 of the Dutch steps in the English opening. However it aren't only old openings which are erased from my repertoire. Also of the +300 deeply analyzed openings which I made in recent years (between 2013 and 2020) already 60 of them are again obsolete. This also means that the analysis I made for those openings isn't relevant either anymore for my repertoire.

This I regretted as I spent a lot of time at it. Initially I thought this is something I have to accept till I realized maybe it is possible to try to revive some of the old analysis by checking it after x-years with more recent and stronger software+hardware. The idea of regression-tests was created. I call it regression-tests because the terminology is already known in the IT-world. When new software is added to older software (e.g. for a new release) then a good developer will not only test the new software but will also make sure that the new software didn't impact the functionality of the old software. Very often we see some changes did corrupt the code of the old software. In the jargon used by the developers this is called regression-tests.

Meanwhile I am already for 4 years doing regression-tests on some openings which were earlier removed from my repertoire. It is always nice to have an old opening back into the repertoire. The experience is never gone and getting more flexibility (choice between old and new) is always useful in practice. However till now the results of those tests were very disappointing as I wasn't able yet to resurrect any old opening. Sometimes I do manage to discover a small improvement of a sub-line but it never leads to a re-evaluation of the complete system. A nice example is one of my most recent regression-tests I made of the opening 1.d4 f5 2.Bf4.

[Event "MT-Preinfalk (SLO)"] [Site "ICCF"] [Date "2017.09.30"] [Round "?"] [White "Zugrav, Wolfgang"] [Black "Staroske, Uwe"] [Result "1-0"] [WhiteElo "2574"] [BlackElo "2530"] [PlyCount "61"] [EventDate "2017.??.??"] 1. d4 f5 2. Bf4 Nf6 3. e3 e6 {(Black is an expert in the Dutch Defense and plays the opening against the strongest correspondence-chess-players. I notice that after this game Uwe changed his repertoire and played next time g6. The French grandmaster Adrien Demuth convinced me by his book 'The Modernized Dutch Defense' to try however d6 as the critical mainline with g6 is very difficult to play for black.)} 4. Be2 {(In 2019 I thought based mainly on this game that Be2 is the strongest continuation in this position. However after regression-tests made in 2020 on the occasion of my game I played in the 4th round in Prague this summer, I am not anymore sure that Be2 is the best. I have found quite some annoying lines with the bishop at d3.)} 4... Be7 5. c4 O-O 6. Nf3 b6 7. Nfd2 Bb7 8. O-O c5 9. Nc3 d6 {(A game played between the engines Topple and Pirarucu in 2020, showed me an interesting alternative which maybe restores this line.)} (9... cxd4 10. exd4 Bb4 11. Qb3 Nc6 {(The engine-game continued with Qe7 but Stockfish and Leela show an interesting concept here.)} 12. d5 Bxc3 13. dxc6 Bxd2 14. cxb7 Bxf4 15. bxa8=Q Qxa8 $13) 10. h3 Ne8 {(A top-correspondence-game played in 2018 seems to indicate that cxd4 is stronger but even then it is not a walk in the park for black.)} 11. d5 e5 12. Bh2 Nd7 13. a4 g6 14. a5 Nb8 15. axb6 Qxb6 16. f4 e4 17. Bg3 h5 18. Be1 h4 19. Ndxe4 fxe4 20. Nxe4 Bc8 21. Bh5 {(This is correspondence-chess of the highest level. This can never be achieved by humans unless they already analyzed it in advance at home with an engine of course.)} 21... gxh5 22. Qxh5 Bf5 23. Bc3 Qc7 {(Accepting the third piece-sacrifice leads to a forced mate.)} (23... Bxe4 24. Qg4+ Kf7 (24... Kh7 25. f5 Bf6 26. Qg6+ Kh8 27. Rf4 Ng7 28. Rxh4+ Kg8 29. Qh7+ Kf7 30. Bxf6 Kxf6 31. Rxe4 Kf7 32. f6 Rg8 33. Re7+ Kxf6 34. Re6+ Nxe6 35. Rf1+ Nf4 36. Rxf4+ Ke5 37. Qf5#) 25. Qe6#) 24. Qh8+ Kf7 25. Ng5+ Bxg5 26. Qh5+ Kg8 27. Qxg5+ Ng7 28. e4 Bh7 29. f5 Rf7 30. Rf4 Nd7 31. Rxh4 {(A fantastic game.)} 1-0

Online I notice that the popularity of this opening has gained enormously. If I look at my personal database of my online played games then I encountered it already in more than 500 blitz-games. Naturally I was curious what the new book of the Serbian grandmaster Nikola Sedlak: Playing the Stonewall Dutch would tell us about this line.

Only 1 page in the book I was able to find about this opening which is a disappointment. This is a huge difference compared to the 38 pages !! in the book The Modernized Dutch Defense which I already announced in my article chess position trainer part 3. I believe 38 pages were absolutely necessary for this line so I feel that Nikola has underestimated this in his book.

For a more elaborated review of Nikola's book I refer to a comment which i wrote on chesspub. Anyway this article is about regression-tests so I don't want to digress a lot about this one specific opening.

So let us return to the regression-tests and try to understand why there are very few interesting results. First I believe it is important to know that the old analysis on which the regression-tests are executed, are averagely made about 5 years ago. In 2016 I wrote in my article raise of the machines part 2 that the strongest engine improves at a rate of about 55 elo per year without taking into account the hardware. With the introduction of neural networks we see that this trend hasn't flattened, at contrary. In other words we can estimate that the level of my analysis has improved with at least 200 elo over the last 5 years and probably it is even much more.

Nonetheless we see that this increase of playing-strength has rarely (never say never) refuted any of my old refutations. I don't have a conclusive explanation for it but I do have a theory based on my daily work with the best engines. 5 years ago the engines were already extremely strong (much stronger than our current world-champion Carlsen) so the quality of the analysis I made at that time was already very good (this is very different compared to analysis made by humans in the era before engines existed). If 5 years ago an engine discovered that a line is bad then I notice that the best engines of today can't fix this anymore.

The increase of playing-strength of the new engines is mostly concentrated into the discovery of new much more complex refutations of positions which were before considered still fine. That is also the reason why last year I wrote in my article computers achieve autonomy part 2 that many dubious openings are disappearing from grandmaster-practice. More and more openings/ lines are theoretically at a dead end and those doors don't open anymore.

Finally I want to add that we should not despair about chess. The engine closed many doors but at the same time also opened many new ones. Every day I find new and beautiful lines. So it is mainly a matter of let go some old lines and be ready to adapt the repertoire when you discover something doesn't work anymore.

Brabo

1 comment:

datajunkie14 September, 2023 13:51
Agreed that the engine closed many doors but also opened many new ones. Now also so many people just follow lines published in courses or books or just follow top players/corr players, that often some old lines can be a good surprise—though finding one viable enough to play in multiple games is another story.

Note: Only a member of this blog may post a comment.

Monday, September 28, 2020

Regression-tests

1 comment: