Testing chess-engines

Till a couple of months ago I never bothered about testing chess-engines. I didn't see any value in it. I would never be able to achieve the same quality as the results CCRL publishes weekly. Besides such work is not cheap as you need to invest into hardware, electricity, floorspace,... On top most of those games played by engines are pretty boring. You better watch games of humans to see drama and creativity.

However as I mentioned in my last article, I had an open question for Leela. CCRL nor other sites give me an answer about how strong Leela would be in comparison with the classical engines when both use exactly the same type of hardware. That is a problem for me. I can install for free Leela on my PC but I only want to use it for analysis if I know the engine is one of the 2 strongest ones I possess. I am using that rule already for a very long time see my article of 2012 about how I analyze. Maybe some will consider this a bit silly but it assures me that my opponents will likely not have any better analysis.

So in the end I decided to do the testing myself. Then the next question is of course how to do this job quickly, accurately and as cheap as possible. I could use a set of puzzles but that is only one aspect of an engine. I rather prefer the engine to be tested by playing games but I can't/ don't want to miss my hardware for several months. A good compromise was found in a rapidmatch with the rate of 15 minutes + 10 seconds increment over 100 games. That should give a good indication of the playing-strength. At stake was a place in my top 2 engines so logically I chose Komodo 11 as its opponent for the match.

Then the next question is what do we decide about the openings. Do we give the engines full liberty of choice or do we select a number of positions which need to played out once from each side as TCEC does? The free choice is as we humans play our games but there are some disadvantages to that. The engines will likely play openings which are not part of my repertoire. The risk exists that they play very safe and we get an abundance of draws. Finally Leela will without an openingbook play almost exclusively the same moves in the opening so you risk to see several times the same opening/ game.

Therefore I preferred to let the engines start from a pre-defined set of openings. Which openings to choose is then the next logical question. It didn't take me long to find a good answer for it. I created a new database and injected a selection of 50 recently played games of myself. Next I removed in all games the moves beyond the 10th. The few duplicates which I got, were swapped by selecting a few other of my games. The final result was a nice mix of 50 positions in which some of them the balance was already broken. This way I avoided a too high number of draws. Besides the engines will only play openings which have occurred before in my practice which makes it of course more fun to watch the match.

Finally everything was ready. Via Fritz I activated the window to initialize the match as obviously I wanted to automate the whole process. First I selected Leela. Next Komodo11. I selected the right tempo and the last step was linking to my special database of 50 positions. After verification of all parameters I clicked ok and the match got off.
About 3 full days lasted the match. I let my PC run day and night but I did interrupt the process a few times to allow my PC cool down as around that time we were having temperatures around 40 degrees in Belgium. Anyway it was very easy to continue the match from the point where I paused.

The match was a big success which superseded the tests. First it became quickly clear both engines were very close of strength but also had a very different style. Often games got extremely interesting and besides played from openings all part of my repertoire. A number of times, I sometimes even together with my children, watched live 1 or more games. My children also regularly asked about the preliminary score as we all got attached to little Leela which despite the tactical handicap (more about it later) often managed to defeat the giant Komodo .

It made me want to have more of it so I decided to organize twice more such match in the next months with newer releases of Leela. For the 3rd match I decided to replace some of the openings. If in the 2 previous matches 4 times the same color won (so irrespective of the engine) then it seemed more appropriate to select some other opening to use as test.

2 matches were narrowly lost by Leela. The second match Leela tied with Komodo. I considered this a very unexpected and exceptionally good result on my modest computer definitely not optimal for Lc0. On the other hand the matches didn't give an answer on my original question. The scores were too close to know for sure which engine of the 2 was the strongest. Anyway this is not a disaster as now I got to know Leela very well in the 300 games. I got a pretty good idea when to use Leela for the analysis.

In my previous article we already got acquainted with Leela by looking at how the engine reacts in different types of positions but it is only by replaying her games that we fully realize how different the engine is compared with the traditional ones. So to conclude this article I made a selection of 3 games which demonstrate very well the strengths and weaknesses of Leela. This was not so easy as there was a very large number of beautiful games. I start with a fantastic game played from the Chigorin-variation of the Spanish opening (I covered the opening recently in my article statistics). Leela sacrifices very early an exchange and succeeds like a real boa constrictor to slowly suffocate black.
[Event "DESKTOP-VE6O9HB, Rapid 15m+10s"] [Site "DESKTOP-VE6O9HB"] [Date "2019.05.27"] [Round "13"] [White "Lc0 v0.21.0"] [Black "Komodo 11 64-bit"] [Result "1-0"] [ECO "C96"] [PlyCount "163"] [EventDate "2019.??.??"] [Eventtype "rapid"] [CurrentPosition "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"] 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Na5 10. Bc2 c5 {(I play this position with both colors. Leelal shows immediately already an advantage of +0,8. Komodo is more conservative with +0,2)} 11. d4 exd4 12. cxd4 Re8 13. d5 Nd7 14. Nbd2 Bf6 15. Rb1 {(Meanwhile Leela became more optimistic with +1,7. Komodo thinks there are no issues and evaluates the positon as +0,1)} 15... Nf8 16. b3 Ng6 17. Nf1 Bd7 18. Ng3 Ne5 19. Nd2 Ng6 20. Nf3 Ne5 21. Nh2 {(It is typical for Leela to repeat moves before proceeding with her plan.)} 21... Ng6 22. Nh5 Bd4 23. Nf3 Bc3 24. Re3 b4 25. Rxc3 {(After this exchange-sacrifice Leela already gives herself +4. Komodo sees white has full compensation but not much more with +0,5 for white.)} 25... bxc3 26. Qe1 f6 27. Qxc3 Rc8 28. Kh2 Re7 29. Be3 Nb7 30. Nd2 Qa5 31. Qa1 Qc7 32. f4 Rce8 33. Qc3 Qa5 34. Qb2 Qc7 35. g4 Kh8 36. Rg1 Rf8 37. Qc3 Qa5 38. Qa1 Qb5 39. Rg2 Be8 40. Ng3 {(The evaluation of Leela has raised to +8. Komodo now also realizes that his position is not good but +1 is a huge difference of evaluation.)} 40... Qb6 41. Nf5 Ref7 42. h4 Qd8 43. h5 Ne7 44. Nh4 Ng8 45. Qe1 Na5 46. Ndf3 Nb7 47. Bd2 Rc7 {(From here onward Komodo considers the positon lost with a score of +1,7 for white. The score from Leela keeps going up and is now already at +15.)} 48. Bc3 c4 49. b4 a5 50. a3 axb4 51. axb4 Qb8 52. Nd4 Nd8 53. Ndf5 Qb6 54. Bd4 Qa6 55. Qc3 Ra7 56. Bxa7 Qxa7 57. Nxd6 Nf7 58. Nxf7+ Bxf7 59. Nf5 Qc7 60. d6 Qd8 61. Qd4 Be6 62. Ne3 Nh6 63. b5 Nf7 64. b6 Qxd6 65. Qxd6 Nxd6 66. Rd2 Nb7 67. f5 Bg8 68. Rd7 Rb8 69. g5 Nc5 70. Rd6 Nb7 71. Rd1 fxg5 72. e5 {(Leela keeps sacrificing material as it considers activity more important.)} 72... Nc5 73. Rd6 Nb7 74. Rc6 Nd8 75. Rc7 g6 76. hxg6 hxg6 77. e6 Bxe6 78. fxe6 Nxe6 79. Rc6 Nd4 80. Rd6 Nxc2 81. Nxc2 Kg7 82. Nb4 1-0
The extraordinary of this game is that there is no fixed center. The battle rages over the full board but black never gets a change to exploit the extra exchange.

A second game starts from a Dutch stonewall which I encountered in one of my games played end of 2017 against the Dutch IM Xander Wemmers see secret. In the game we see the advance of both rook-pawns which is very typical for the style of Leela. Next we see a magnificent demonstration of activity. Komodo doesn't understand at all what Leela is trying to do.
[Event "DESKTOP-VE6O9HB, Rapid 15m+10s"] [Site "DESKTOP-VE6O9HB"] [Date "2019.07.27"] [Round "20"] [White "Komodo 11 64-bit"] [Black "Lc0 v0.21.2"] [Result "0-1"] [ECO "A90"] [PlyCount "124"] [EventDate "2019.??.??"] [Eventtype "rapid"] [CurrentPosition "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"] 1. d4 f5 2. g3 Nf6 3. Bg2 e6 4. Nf3 d5 5. O-O Bd6 6. c4 c6 7. Nc3 O-O 8. Qc2 Ne4 9. Rb1 a5 10. a3 Nd7 {(I encountered this opening on the board end of 2017 in my game against the Dutch IM Xander Wemmers. Komodo shows +0,3. Leela evaluates it as 0.0.)} 11. Be3 Qe7 12. Rfd1 h6 13. Rbc1 g5 14. Ne1 Ndf6 15. c5 Bc7 16. f3 Nxc3 17. bxc3 b5 18. Nd3 a4 {(Leela starts to like black from here onward with -0,4. Komodo thinks white is ok with +0,2.)} 19. Rf1 Bd7 20. Rb1 Be8 21. f4 g4 22. Ne5 h5 23. Bf2 h4 24. c4 h3 {(Here we see again typically Leela playing chess. On both wings the rook-pawns are advanced as far as possible. On top the white bishop is buried alive. Leela already shows -1,4. Komodo sees no danger with 0.0)} 25. Bh1 bxc4 26. Rb7 Qd8 27. Rfb1 Bxe5 28. fxe5 Nh5 {(The only open file is controlled firmly by white. Black has also a very bad bishop. Still Leela is very optimistic.)} 29. e3 Qg5 30. R1b6 Bg6 31. Qc1 Rac8 32. Rb2 f4 {(With this pawnsacrifice Leela frees the bad bishop. Komodo still considers the position equal with 0.0)} 33. exf4 Qf5 34. R7b6 Qd3 35. Rd2 Qf5 36. Ra2 {(No move-repetition as Komodo starts to realize things are not looking so good anymore.)} 36... Rb8 37. Be1 Qd3 38. Qc3 Rxb6 39. cxb6 {(Only now Komodo evaluates the position as likely lost.)} 39... Rb8 40. Rb2 Ng7 41. Kf2 Rb7 42. Bd2 Kf7 {(The execution of the win by Leela is impressive.)} 43. Ke1 Bh7 44. Rb4 Qxc3 45. Bxc3 Bc2 46. Kd2 Bb3 47. Kc1 Nf5 48. Kd2 Ke8 49. Ke2 Kd7 50. Kd2 Kc8 51. Ke2 Rb8 52. Kd2 Kb7 53. Ba1 Ka6 54. Bc3 Rxb6 55. Rxb6+ Kxb6 56. Kc1 c5 57. dxc5+ Kxc5 58. Bb4+ Kd4 59. Kd2 Ba2 60. Bc3+ Kc5 61. Kc1 Bb3 62. Bb4+ Kc6 {(Komodo shows for some moves already -7 and resigns. It probably detested the cat and mouse-game Leela was playing with him.)} 0-1
Leela plays this game as many others with an understanding of open lines, bad bishops which is much more advanced than Komodo.

If you have replayed the 2 previous games then you probably start to wonder why Leela didn't destroy Komodo in the match. Well tactically things got often completely wrong. A nice example is the next one in which Leela sees the combination 5 moves too late.
[Event "DESKTOP-VE6O9HB, Rapid 15m+10s"] [Site "DESKTOP-VE6O9HB"] [Date "2019.06.27"] [Round "2"] [White "Komodo 11 64-bit"] [Black "Lc0 v0.21.2"] [Result "1-0"] [ECO "C42"] [PlyCount "53"] [EventDate "2019.??.??"] [Eventtype "rapid"] [CurrentPosition "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"] 1. e4 e5 2. Nf3 Nf6 3. Nxe5 d6 4. Nf3 Nxe4 5. d4 d5 6. Bd3 Nc6 7. O-O Be7 8. Nc3 Nxc3 9. bxc3 Be6 10. Re1 O-O {(I got this opening on the board in my game against Sven Stange of 2017.)} 11. Rb1 Rb8 12. Bf4 Re8 13. Rxe6 fxe6 14. Ne5 Nxe5 15. Bxe5 Bd6 16. Qh5 h6 17. f4 Bxe5 18. dxe5 Rf8 {(The weakness of Leela is tactics. Komodo shows after this move immediately +2,5 while Leela is blind with only +0,3.)} 19. Qg6 Rxf4 20. Qh7+ Kf8 21. Bg6 Qg5 22. Qh8+ {(Meanwhile Komodo shows +18. Leela still is hanging at +0,3.)} 22... Ke7 23. Qxg7+ Kd8 {(Only now Leela awakens. Her evaluation raises to +3. Leela only took Qxb8 into account.)} 24. Rxb7 Rc8 25. Rxc7 Rf1+ 26. Kxf1 Qf4+ 27. Ke2 {(It is remarkable but all these moves were published in an analysis on my blog before.)} 1-0
Fans of my blog will likely already recognized the link to my article the butterfly-effect. All the moves were already covered in that article so it was definitely a surprise to see them all executed on the board.

I got to enjoy testing of chess-engines via these kind of matches. A new match won't be for immediately as other work needs to done first. Besides Leela is building a new network from scratch and today it is still much weaker than the networks of a couple of months ago. It would also be nice for a next match to have by that time newer and stronger hardware.


