Friday, November 15, 2013

Stockfish 4

Last month I was triggered by hypekiller5000 that a new release of Stockfish became available and scored remarkably well on the ccrl (computer chess rating list) with a 3rd place. Important detail is that the program can be downloaded for free. Now I am always a bit reluctant to use free software as I immediately think about illegal copies but eventually I let myself seduce to test and use the program in my analysis. The main reason for this is that my method of analyzing is based on 2 engines (see blogarticle analyseren met de computer) and with such method of analyzing it is recommended to use 2 approximately equal engines (preferably also complementing engines). Last year I wrote on this blog that I bought  Houdini 2.0, which replaced Fritz 11. As a consequence Rybka 3 remained as second engine but I quickly experienced that the gap in strength between the 2 engines became too big to have a good return with my method of analyzing. We should not forget that the release-date of Rybka 3 was august 2008 so we may state that the expiry-data has been passed.

The first thing which stands out from Stockfish is the way how the engine evaluates the positions. If you are used to classical evaluations of Rybka, Fritz and Houdini then you are in for some surprise. I mean with Stockfish you can easily have evaluations which divert 1 or even more pawns (so 100/100sten). An absolute record I detected in an analyzed variation of my recent game against Steven Geirnaert, see below screenshot.
Stockfish shows an evaluation of 94 points for black !

Stockfish shows an advantage of 94 points for black. Even if you promote all the remaining pawns then still you can't reach this sum. Houdini by the way only shows 11 points advantage for black after 10 minutes calculating. On chesspub this fact was mentioned as a negative quality of Stockfish but I believe this needs to be nuanced.  The program is in the first place made to play as strong as possible and uses therefore a mechanism for the evaluations which helps optimal. These evaluations are shared pure informative to the end-users but it is never the intention to make a final judgement of the position about who has the advantage and how big it exactly is. 

One would expect with such high evaluations that the engine will be very strong in tactics. However comparing with Houdini then I notice it is considerably weaker. Especially with quiet unexpected sacrifices Stockfish seems to have troubles. The solution of the below analyzed variation is found within a second by Houdini but after 10 minutes Stockfish still didn't!
[Event "Analyzed variation Groffen - Brabo"] [Date "2013"] [SetUp "1"] [FEN "r4r1k/1p5p/p1p1b3/3pPp2/1P1Q1P2/4P1R1/q6P/2R2BK1 w - - 0 23"] [PlyCount "9"] 23. e4 $3 {(Stockfish misses this move completely despite the fact that Houdini finds it very quickly.)} Rg8 (23... dxe4 24. Bc4 Bxc4 25. e6 Rf6 26. Qxf6#) (23... Rf7 24. exf5 Rxf5 25. Bh3 Rh5 26. f5 $18) (23... fxe4 24. Bh3 Rae8 25. f5 Rg8 26. Bg4 $18) 24. exf5 Rxg3 25. hxg3 Bxf5 26. g4 Be4 (26... Bxg4 27. e6 Kg8 28. Qe5 $18) 27. Re1 $1 $18 *'/>
It is incredible that Houdini finds this breakthrough-move e4 so quickly and correctly calculates the consequences. Besides, the keymove reminds me on the only time that I was completely surprised by my opponent in my correspondence-career (20 games played in the period 1998-2003). With some trouble I still escaped with a draw.
[Event "EU/M/1234"] [Date "1998"] [White "Verhoef, H."] [Black "Brabo"] [Result "1/2-1/2"] [SetUp "1"] [FEN "q5k1/1r4p1/r1pbp2p/P2p1p2/3P4/P5P1/3QPPBP/R1R3K1 w - - 0 27"] [PlyCount "47"] 27. e4 $3 {(This move was a complete surprise for me. Houdini shows this move immediately but Stockfish needs on my desktop still more than 4 minutes.)} fxe4 28. Bf1 Rxa5 29. Rxc6 Bxa3 30. Bh3 Kh8 31. Bxe6 Rb2 32. Qc3 Rb8 33. Rc5 Rxc5 34. dxc5 Qc6 35. Bxd5 Qxc5 36. Qxc5 Bxc5 37. Bxe4 {(The worst is over and I have little trouble to make a draw.)} Rf8 38. Ra2 g5 39. Kg2 Kg7 40. f3 Rd8 41. h4 gxh4 42. gxh4 Rf8 43. Kh3 Rf4 44. h5 Be7 45. Ra7 Kf8 46. Kg2 Bg5 47. Bg6 Kg8 48. Rd7 Kf8 49. Kf2 Kg8 50. Ke2 1/2-1/2'/>
Again Houdini finds the move instantly ( in 1999 this move never popped up on the screen) while Stockfish still needs more than 4 minutes. I still can show other tactical examples (eg. 8.g4 in my article on Houdini 2.0) but I assume that in the meantime it is sufficiently clear. Stockfish cuts a lot in the tree of variations to make an evaluation which causes it to regularly miss some tactic. Now how is it possible that there is only a gap of 25 points with Houdini, looking to the elo-rankings of the engines? Well clearly there is more than just tactics. It is very difficult to quantify but looking how Stockfish plays in stonewall-positions, I notice that the engine better understands than Houdini which plans are possible. On the other hand, in positions with fixed pawnchains as e.g. in the Portisch Hookvariant I notice no real difference in strength with Houdini. I deduct that pawnmoves could be a very important subset of how the mechanism for evaluating works of Stockfish.

As expected this effect is enlarged in the endgame. This is also confirmed in my first analyses. In this phase Stockfish overpowers completely Houdini. First I show an analyzed variation from my game against Raetsky which I briefly already mentioned in my previous article.
[Event "Analyzed variation Raetsky - Brabo"] [Date "2005"] [SetUp "1"] [FEN "R1bB2k1/6pp/4p3/2N2p2/1p1P4/4P1P1/5PKP/2r5 b - - 0 37"] [PlyCount "49"] 37... Rxc5 38. Ba5 Rc4 39. Bxb4 Kf7 $1 {(Stockfish chooses to keep the bishops on the board which looks to me as the rigth choice.)} (39... Rxb4 $6 { (Houdini chooses for the rook-endgame but with 10 seconds per move can not hold the position.)} 40. Rxc8 Kf7 41. Rc7 Kf6 42. Rc6 Kf7 43. h4 Rb2 44. h5 h6 45. Rc7 Kf6 46. Kf3 Rd2 47. Ra7 Rb2 48. Ra6 Rd2 49. Ra8 Kg5 50. Rg8 Kf6 51. Rf8 Ke7 52. Rc8 Kf6 53. g4 fxg4 54. Kxg4 Ra2 55. Kf3 Ra3 56. Rc5 Rb3 57. Rc6 Kf5 58. Ke2 Rb2 59. Kd3 Rb3 60. Kc4 Ra3 61. e4 Kxe4 62. Rxe6 Kf5 63. d5 Ra2 64. f3 Rc2 65. Kd4 Rh2 66. Re4 Kf6 67. d6 Kf7 68. Kd5 Rxh5 69. Kc6 Rh1 70. d7 Rc1 71. Kd6 Rd1 72. Kc7 Rc1 73. Kd8 g5 74. Re7 Kf8 $18) 40. Ra7 Ke8 41. Bd6 Bd7 42. h4 Bc6 $1 {(To restrict the activity of the white king seems indeed the best choice to me.)} ( 42... Rc2 $6 {(Houdini chooses to hang on the f-pawn but gets later into problems once white decides to sacrifice that pawn.)} 43. Kf3 h6 44. h5 Bc6 45. Kf4 Rxf2 46. Ke5 Bd5 47. Rxg7 Re2 48. Kf6 Rc2 49. Re7 Kd8 50. Rh7 f4 51. Bxf4 Rc6 52. Ke5 Ke8 53. e4 Bb3 54. Rxh6 Ra6 55. d5 Kf7 $18) 43. Kf1 Bf3 44. Ke1 Rc2 45. Re7 Kd8 46. Rxg7 Re2 47. Kf1 Rb2 48. Bc7 Ke8 49. Ke1 Re2 50. Kd1 h5 51. Bd6 Kd8 $1 {(Only after 4 minutes calculating, Houdini understands winning material with Rxe3 is not optimal.)} (51... Rxe3 $6 52. Kd2 Re2 53. Kd3 Kd8 54. Be5 Rxf2 55. Ke3 Rf1 56. Kf4 Bg4 57. Kg5 Rc1 58. Kf6 Rc6 59. Rh7 Bd1 60. Kg5 Rc1 61. Bf6 Ke8 62. Re7 Kd8 63. Rxe6 Kd7 64. Re1 Kd6 65. Be5 Kd5 66. Rf1 Ra1 67. Rxf5 Ra2 68. Bg7 Ke6 69. Rc5 Bf3 70. Kf4 Rf2 71. Re5 Kd7 72. Ke3 Rf1 73. Rf5 Bg2 74. Rxh5 Rf3 75. Kd2 Rxg3 76. Be5 Rb3 77. Rg5 Be4 78. h5 Rh3 79. Bg7 Ke7 80. h6 Kf7 81. Re5 Bh7 82. Ke2 Bd3 83. Kf2 Bh7 84. Rb5 Be4 85. Rb6 Kg8 86. Re6 Bd5 87. Rd6 Rh5 88. Be5 Bf7 89. Rb6 Bd5 90. Kg3 Bc4 91. Bf4 Kf7 92. Rc6 Bd3 93. Rc5 Bf5 94. d5 Bd7 95. Kf3 Kg6 96. Ke4 Rf5 97. Rc7 Rf7 98. Ke5 Rf5 99. Kd6 Rxf4 100. Kxd7 $18) 52. Kc1 Rxf2 53. Rh7 Re2 54. Bf4 Ra2 55. Bc7 Ke8 56. Bd6 Kd8 57. Be5 Re2 58. Bf6 Kc8 59. Bg5 Rg2 60. Bf4 Kd8 61. Rc7 Be4 {(The king is an important piece in the endgame but here he is cut off. As a consequence, the extra pawn is insufficient to force the win.)} *'/>
3 times Houdini loses the endgame while Stockfish marvelously defends (which doesn't mean that I claim that the endgame is for sure a draw against perfect play). Also in the 2 endgames discussed in my blogarticle on Houdini 2.0 Stockfish is clearly superior. 42...Th4! is found by Stockfish within seconds while Houdini 2.0 needs more than 3 minutes. Houdini 2.0 doesn't find the brilliant 48...Kd5! while Stockfish again does in about 7 minutes. However Shirovs brilliant Bh3 seems again a bit too hard for Stockfish as after 10 minutes it is still not found but of course here we are again talking about tactics.

Meanwhile it is for me clear that the program very well complements with Houdini 2.0. I am surprised that such strong program is offered for free. On the other hand I also realize that a collective of volunteers often presents better results than 1 or 2 professionals. Moreover it is expected that the next release of Stockfish could very well be the new number 1 in computerchess. No need to panic as we are still extremely far from solving chess so there still remains many years of pleasure to search for the unknown.

Brabo

4 comments:

  1. Very interesting review.
    You have confirmed in words what I have been experiencing while using Stockfish and Houdini (albeit Houdini 1.5a in my case).

    ReplyDelete
  2. I didn't check if there is a big difference between Houdini 1.5a and 2.0. I assume not but I anyway bought 2.0 to support the developer so he continues to make further improvements to his already excellent engine.

    ReplyDelete
  3. My Stockfish 5 took 6 seconds to see e4 in both examples

    ReplyDelete
  4. It is a pure coincidence but yesterday I downloaded Stockfish 5 and indeed it has improved drastically in tactics compared with version 4. There is a gap of 100 points on faster timelimits as shown on http://computerchess.org.uk/ccrl/404/rating_list_all.html.

    ReplyDelete