Thursday, March 14, 2019

Lc0 vs Stockfish superfinal TCEC season 14

1. Pre-match expectations of Leela.

Beforehand SF was considered the clear favorite. There are a number of reasons for that: the rating of Leela (Lc0 v0.19) at Ccrl40 was not great: the engine wasn't mentioned in the top-100 (version v0.20 was later during the match ranked at the 45th place). Second, Lc0 performs weakly using classical hardware compared with the traditional engines. That is no surprise as Lc0 is built for fast chips of graphical cards (GPU's), instead of CPU's. On a normal CPU Leela can't obtain maximum strength just like Fritz3 initially couldn't perform well as it was lacking RAM for the hash-tables. See for more information in the article of Frederik Friedel at Chessbase (the adventure of chess programming part 3). Finally I also thought the engine was tactically not mature yet which we saw in the previous tcec competitions and based upon my own usage of the engine. So a match against the undisputed leader of ccrl40....

2. Summary of the match

At the beginning SF was clearly the better engine, twice it took a lead. First 2-0 after only 10 games, but after game 13 the score was again tied. After that SF won 3 games on a row!, but again this didn't last very long: after game 29 Leela equalized the score. After that Leela took the lead: games 49 and 53 were won by Leela. It became an exciting match with switching leaders: we hadn't seen this for a longtime between engines. Maybe it didn't happen anymore since the Braingames "candidates-match" between Fritz en Junior in Cadaques 2001, when the engines were playing for a match against Kramnik. Junior got in that match almost 5-0 for free, after only 5 games but Fritz equalized in the second half of the match (24 games in total) and won the play-off with 2-0.

However the lost games were very painful for Leela - it reminded me to the match Botvinnik-Bronstein: Bronstein played ingenious chess, used new concepts, played very differently than Botvinnik. Botvinnik tried to reach draws by adjourning games and had a lot of trouble to score some wins. Leela wasted several half points by lacking tactical awareness. A good example is the 20th game, in which Leela plays the losing move (39...Rb6-d6) with an evaluation of 0.26, but SF answers with 40.Rg3+ and immediately shows +8.56 - probably the "boom" of the match. Maybe it wasn't a draw (SF was already giving +2,5 to itself), but more blunders would occur. In the next game it happened again: in an equal position Leela blunders once more and SF hits back immediately with taking at f2 (-4.46). Game 66 again. A very weird loss was the 85th game, the last decisive game of the match: Leela still believed it was a draw (overvaluing a far advanced free a-pawn) while SF considered the position for white already for a longtime as totally hopeless. When Leela realized it then the evaluation plumbed to -14.28 (SF was given already mate in 41…). In a very rare case Leela missed a certain win (65th game) in which SF (using 6-men tbs) was already 100% sure of the loss. This was the consequence of the low search-depth and less extensive usage of tbs. At game 80 the score was tied again.

3. Learnt lessons about openings, playing-style and other aspects

Besides the impressive performances in the middlegame (Leela) and endgame (Stockfish), there were also a number of important learnt lessons about the openings.
[Result "1-0"] [White "LCZero v20.2-32930"] [WhiteElo ""] [Black "Stockfish 190203"] [BlackElo ""] [ECO "C04"] [Date "2019.02.06"] [Event "TCEC Season 14 - Superfinal"] [Round "?"] [Site ""] [CurrentPosition "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"] 1.e4 e6 2.d4 d5 3.Nd2 Nc6 4.Ngf3 Nf6 5.e5 Nd7 6.Nb3 b6 7.c3 Ne7 { (Stockfish evaluates this position as about equal +0,25. Lc0 shows +1,78 which corresponds to a winning-percentage of 67% for white.) } 8.h4 c5 9.h5 h6 10.Nh4 c4 11.Nd2 b5 { (Stockfish shows now also a slight edge for white.) } 12.Qg4 a5 13.Be2 a4 14.a3 { (Stockfish gives +0,78 so a clear advantage. Lc0 already gives +3,06 which corresponds to a winning-percentage of 77% for white.) } 14...Nb8 15.f4 Nbc6 16.Nf1 Na5 17.Be3 Nb3 18.Rd1 Kd7 19.Bf2 Kc7 20.Ne3 Kb7 21.Qh3 g5 22.hxg6 fxg6 { (Meanwhile Lc0 has raised the winning-percentage to 82% or a score of +4,33) } 23.g4 Ka6 { (Stockfish evaluates this position as lost and shows a score of +1,57 for white) } 24.Nhg2 Rg8 25.Bf3 Ka5 26.Bh4 g5 27.Bg3 gxf4 28.Nxf4 Ra7 29.Bh4 Bd7 30.O-O Qe8 31.Rf2 h5 { (Stockfish's score has meanwhile slightly gone up to +1,78. Lc0 is now showing +7,05 or a winning-percentage of about  87%.) } 32.g5 Ng6 33.Nxg6 Qxg6 34.Bg2 Qe8 35.Rf6 Bc8 36.Qg3 Rc7 37.Bh3 Bg7 38.Kh2 Rf7 39.Ng2 Ka6 40.Qe3 Ka7 41.Nf4 Re7 42.Rg1 Kb8 43.g6 Rh8 44.Rg2 Ka7 45.Qf2 Rb7 46.Bxe6 Bxf6 47.Bxf6 Bxe6 48.Bxh8 Bg8 49.Bf6 { (In the last moves the evaluations increased rapidly. Stockfish now shows a score of +8,62. Lc0 gives +23,85 which corresponds to a winning-percentage of about 96%.) } 49...Nc1 50.Qe3 Nd3 51.Nxd3 cxd3 52.Qxd3 Rb8 53.Kg3 Be6 54.Kh4 Kb7 55.g7 Qd7 56.Rg5 Rc8 57.Rg1 Bf7 58.Qf3 Qe8 59.Rf1 Rc7 60.Qd3 Bg8 61.Kg5 b4 62.axb4 Rc4 63.Kh6 Qe6 64.Qf5 Qe8 65.Rf3 Rc6 66.Qxh5 Bf7 67.Qf5 Be6 68.Qg6 Bf7 69.Qd3 Re6 70.Kg5 Qc6 71.Rg3 Re8 72.Rh3 Rg8 73.Rh4 Qd7 74.Kf4 Be6 75.Rh7 Kb6 76.Rh6 Kb7 { (The win is attributed based on TCEC win rule 39. Stockfish finalevaluation = +11,13. Lc0 finalevaluation = +32,99) } 1-0
A first highlight was game 11, in which Leela gradually increases the white advantage from the opening (French). This was very impressive, especially as the evaluation of Leela was more than 10-20 moves ahead to the one of SF - it looked like grandmaster against amateur - only, the amateur has the strength of a super-grandmaster. Beside the evaluation of Leela is something you need to take with a grain of salt: in the first game the evaluation of Leela at move 104 jumps up to 2.65, while SF sees no problems. The same happens in game 9 : suddenly Leela shows an evaluation of +2,24 when it can exchange queens and obtain a bishop-endgame with a free a-pawn - SF again evaluates the position as fully equal and the game ends in a draw. Also in the 85th game: it seems Leela puts to much trust into the free a-pawns? There are many other examples of the too optimistic evaluation.

As stated before, the variance in the opening is good, but the engines seem capable of turning the most sharpest openings into forced sequences leading to boring drawn endgames. Fortunately some nice middlegames were played, but when SF after e.g. 1.e4 c5 2.Nf3 d6 3.d4 cxd4 4.Nxd4 Nf6 5.Nc3 a6 6.Be3 e6 7.a3 b5 8.g4 Bb7 9.Bg2 h5 10.g5 Ng4 11.Bc1 Qb6 with white and black gives very quickly 0.00 then such perfect play looks not exciting anymore. The same scenario in the 5th and 6th game: SF makes very quick draws in the Kings-Indian.

Leela couldn't do much with the kings-gambit and lost - SF could just hold the draw with white. Although a couple of games can't define the correctness of an opening, it is rather symptomatic that it is the kings-gambit leading to troubles for white. It seems the romantic opening isn't more today than just a surprise-weapon? As earlier written, the French game (game 11.1) was a highlight for Leela in the match - this time SF couldn't show the same quality for white. The French opening seems something Leela knows best as also in game 35 we saw SF having big troubles after the opening.

The win of Leela in the Nimzo-Indian was rather related to some small errors in the middlegame of SF so not due to the opening. In game 16 SF practically destroys the Pirc in the opening. The Pirc had a rough time in this match: in game 55 SF almost didn't make the draw. It became the longest game in the TCEC-history: 264 moves before Leela agrees with the draw. In game 71 we see the Pirc creating another dramatic turnaround. Leela has a stranglehold but can't break the defense despite dominating the whole board. White has everything - black even has to evacuate the king to the a-column - but it is not enough.

Generally we see that the smaller openings don't stand very well the test, nor do the sidelines of the big openings: it is striking that the white advantage is only disappearing after move 25. An example of this is game 23.

A great win is scored by Leela with a white stonewall in game 25. A Philidor/ Lion-opening in game 27 is annihilated by Leela - one of the seldom moments in which Leela manages to get SF away from the 0.00 evaluation and wins deservedly.

Games 59 & 60 show what "sharp" lines for humans mean for engines: the Sicilian Dragon creates 2 short draws. Even the Spanish is used but survives a Leela evaluation of 5.76 in game 75. The end of the match looked like football, in which team A has a lot of ball-control, but team B scores twice via the counter in the final quarter.

4. What could've been better in the match set-up?

Contrary to earlier tcec super-finals, this match of 100 games seemed "too short". The engines were very close in strength, and after 70 games there was still only a gap of 1 point. After 100 games this is also the final difference: 50,5 - 49,5. Also the openings could've been chosen a bit better. In most cases positions were chosen after 5 moves which were more or less neutral, mixed up with some deeper lines (which were randomly chosen from the opening-book created by Jeroen Noomen). Those deeper lines were not always creating interesting middle-games. The problem (for humans) with fun openings (like Marshall-gambit, Sic Dragon, Botvinnik-gambit, kings-gambit, Albin countergambit, Sveshnikov, Sämisch KID, Sevilla-variant Grünfeld, …) are that they equalize quickly ( due to a too forced mainline) or almost always give a win/ loss for white/black (as one side has a too big advantage). Some openings are complex for humans but that is not necessary also for engines. Nonetheless I agree with the critics to use more starting-positions from grandmaster-games. this would improve the relevance and use (for the practical player) more. Maybe this is something for the superfinal of season 15?

5. Other things which we can improve?

SF had the advantage of using 6-men tablebases (tbs), Leela only worked with 5-men. That difference for sure meant for 1 game the difference between win and draw, and had - with equal weapons- given a tie so 50-50 as final score. Now engines can already use in the opening those endgames tbs, so this is important in a match. Probably over 10 years we will have 8-men tbs (so having a solution for all rook-endgames with 2 pawns each - great !) so this aspect will become in the future even more important. Or maybe we should do the opposite so forbid the engines using tbs at all?

6. Conclusion: are we close to perfect chess?

Positionally Leela is close as to beat SF this needs a very high level of play. Tactically SF is still (a lot) stronger. The great search-depth avoids missing any tactical traps. Also this allows SF to defend some very difficult positions. One aspect of the development of Alpha Zero and Leela reminds me of what professor Jonathan Schaeffer experienced when developing his prefect playing checkers-engine Chinook (by the way if you want to read a beautiful and emotional story about the first engine beating a reigning world-champion then I recommend very much "One Jump Ahead"). It is something what Schaeffers team and also recently the team of Demis Hassabis (Deep Mind and Alpha Zero) noticed: further development leads to an increase of the draws (an indication that chess is a draw when played perfectly, or that there is a limit to further improvement). That effect can be partly explained by the fact that "a draw is a draw" for an engine. In other words: the simple evaluation of 0.00 should be added with other parameters as otherwise the first move in the list leading to 0.00 will be played (see for that behavior to an article of Tim Krabbé about pealing an orange in Alaska ("morons"). An intelligent add-on would be that the engine selects the move bringing the most chance to errors for the opponent (let us not consider contempt). This can be a line with many forced moves or avoiding exchanges. I guess some modern engines already use such parameters doing something like that but it is not yet working perfect.

The advantage of Leela is that "the engine" can now do the development - sooner or later the development of Stockfish (despite all tests the engine plays against itself eventual with a self-learning function) will stop when reaching the limits of human programming. Leela has the absolute minimum needed code to seek maximum results. One of the Leela developers wrote on his blog that if a developer of a classical engine (e.g. SF) takes a holiday for a week then the engine remains the same while with Leela after a week it became by itself again a bit stronger. It appears Komodo already hit the ceiling: at ccrl40 release 11.3 has 3 points more than release 12 and 12 points more than release 12.3. And also the MCTS-version of Komodo is getting close to the classical one. It looks like the Americans have reached their tipping point.

But as I have said: the evaluation of Leela - contrary to SF - is not fireproof: never did I see so many positions with a "winning" evaluation (+2, +3, +4, or even +5 and more…) not transformed to a win. The winning line is often so small for engines that one small deviation is sufficient to lose the advantage. Leela is not yet able to avoid those mistakes - so this looks like us humans playing chess :-) However for the practical player, this disadvantage can become an advantage, and Leela is definitely a good addition for new ideas (plans), or to find practical chances in non-tbs endgames (something which SF will rather evaluate as 0.00 without giving a view about practical chances).


Some critics have pointed to the openings as not well chosen, so additionally a rapidmatch was created without any pre-selected openings. Leela won this one with 56-44 (see e.g. s14 bonus match leela stockfish. That is a very large margin especially as tactics are normally much more dominant in rapid-play. The next season will have to answer the open questions.


No comments:

Post a Comment