Friday, November 15, 2013

Stockfish 4

Last month I was triggered by hypekiller5000 that a new release of Stockfish became available and scored remarkably well on the ccrl (computer chess rating list) with a 3rd place. Important detail is that the program can be downloaded for free. Now I am always a bit reluctant to use free software as I immediately think about illegal copies but eventually I let myself seduce to test and use the program in my analysis. The main reason for this is that my method of analyzing is based on 2 engines (see blogarticle analyseren met de computer) and with such method of analyzing it is recommended to use 2 approximately equal engines (preferably also complementing engines). Last year I wrote on this blog that I bought  Houdini 2.0, which replaced Fritz 11. As a consequence Rybka 3 remained as second engine but I quickly experienced that the gap in strength between the 2 engines became too big to have a good return with my method of analyzing. We should not forget that the release-date of Rybka 3 was august 2008 so we may state that the expiry-data has been passed.

The first thing which stands out from Stockfish is the way how the engine evaluates the positions. If you are used to classical evaluations of Rybka, Fritz and Houdini then you are in for some surprise. I mean with Stockfish you can easily have evaluations which divert 1 or even more pawns (so 100/100sten). An absolute record I detected in an analyzed variation of my recent game against Steven Geirnaert, see below screenshot.
Stockfish shows an evaluation of 94 points for black !

Stockfish shows an advantage of 94 points for black. Even if you promote all the remaining pawns then still you can't reach this sum. Houdini by the way only shows 11 points advantage for black after 10 minutes calculating. On chesspub this fact was mentioned as a negative quality of Stockfish but I believe this needs to be nuanced.  The program is in the first place made to play as strong as possible and uses therefore a mechanism for the evaluations which helps optimal. These evaluations are shared pure informative to the end-users but it is never the intention to make a final judgement of the position about who has the advantage and how big it exactly is. 

One would expect with such high evaluations that the engine will be very strong in tactics. However comparing with Houdini then I notice it is considerably weaker. Especially with quiet unexpected sacrifices Stockfish seems to have troubles. The solution of the below analyzed variation is found within a second by Houdini but after 10 minutes Stockfish still didn't!

It is incredible that Houdini finds this breakthrough-move e4 so quickly and correctly calculates the consequences. Besides, the keymove reminds me on the only time that I was completely surprised by my opponent in my correspondence-career (20 games played in the period 1998-2003). With some trouble I still escaped with a draw.

Again Houdini finds the move instantly ( in 1999 this move never popped up on the screen) while Stockfish still needs more than 4 minutes. I still can show other tactical examples (eg. 8.g4 in my article on Houdini 2.0) but I assume that in the meantime it is sufficiently clear. Stockfish cuts a lot in the tree of variations to make an evaluation which causes it to regularly miss some tactic. Now how is it possible that there is only a gap of 25 points with Houdini, looking to the elo-rankings of the engines? Well clearly there is more than just tactics. It is very difficult to quantify but looking how Stockfish plays in stonewall-positions, I notice that the engine better understands than Houdini which plans are possible. On the other hand, in positions with fixed pawnchains as e.g. in the Portisch Hookvariant I notice no real difference in strength with Houdini. I deduct that pawnmoves could be a very important subset of how the mechanism for evaluating works of Stockfish.

As expected this effect is enlarged in the endgame. This is also confirmed in my first analyses. In this phase Stockfish overpowers completely Houdini. First I show an analyzed variation from my game against Raetsky which I briefly already mentioned in my previous article.

3 times Houdini loses the endgame while Stockfish marvelously defends (which doesn't mean that I claim that the endgame is for sure a draw against perfect play). Also in the 2 endgames discussed in my blogarticle on Houdini 2.0 Stockfish is clearly superior. 42...Th4! is found by Stockfish within seconds while Houdini 2.0 needs more than 3 minutes. Houdini 2.0 doesn't find the brilliant 48...Kd5! while Stockfish again does in about 7 minutes. However Shirovs brilliant Bh3 seems again a bit too hard for Stockfish as after 10 minutes it is still not found but of course here we are again talking about tactics.

Meanwhile it is for me clear that the program very well complements with Houdini 2.0. I am surprised that such strong program is offered for free. On the other hand I also realize that a collective of volunteers often presents better results than 1 or 2 professionals. Moreover it is expected that the next release of Stockfish could very well be the new number 1 in computerchess. No need to panic as we are still extremely far from solving chess so there still remains many years of pleasure to search for the unknown.

Brabo

4 comments:

  1. Very interesting review.
    You have confirmed in words what I have been experiencing while using Stockfish and Houdini (albeit Houdini 1.5a in my case).

    ReplyDelete
  2. I didn't check if there is a big difference between Houdini 1.5a and 2.0. I assume not but I anyway bought 2.0 to support the developer so he continues to make further improvements to his already excellent engine.

    ReplyDelete
  3. My Stockfish 5 took 6 seconds to see e4 in both examples

    ReplyDelete
  4. It is a pure coincidence but yesterday I downloaded Stockfish 5 and indeed it has improved drastically in tactics compared with version 4. There is a gap of 100 points on faster timelimits as shown on http://computerchess.org.uk/ccrl/404/rating_list_all.html.

    ReplyDelete