Monday, February 18, 2019

Big Database

Less and less players are still willing to invest into a big database. You can find today a lot of free online (see the introduction of my article ultracorr-x) and without the (too) expensive updates most don't have enough energy to keep a database up to date. Besides the current top-engines offer in most cases much stronger moves than the ones played by any grandmaster.

In Chessbase part 1 I wrote that many CB users never use 95% of the features. I am convinced that this is largely because people don't have easily access to a good database. CB15 and in a lesser extent Fritz lose (almost) all flexibility without a database.
  • Automatic game-analysis with references to games played with the same opening
  • Opening-reference-function
  • Creating an opening-book from a selected database
  • Searching themes
  • Plan explorer
  • Endgame function 
  • Automatic preparation against an opponent
  • Calculating elo-ratings in a database
  • Speed of research and maintenance of databases
  • Exploring databases
Anyway it is weird to buy CB when you are not going to use any of the features mentioned above.

Therefore I advise CB- and Fritz-users to maintain a database which can be used as reference. The next question is of course which database. For this we need to check a number of criteria:
  • What is the price not only of purchasing the database but also keeping it up to date?
  • How often are updates done to the database?
  • Can we find games of amateurs so it is possible to prepare for them?
  • Of which countries games are stored in the database?
  • Are pure engine-games, anonymously online played games (mostly blitz),... added to the database just to pump up the size?
  • Can you find old, historical games in the database? Are efforts made to expand this archive?
  • Are names, ratings, places ... correctly inserted for each game in the database?
  • Can you get automatic updates of the database?
  • Are the games annotated?

Well it is impossible to compare all available databases in the world. At you can find more than 100 links to different databases and I am sure this summary is not complete. On the other hand if we ignore the price then I am sure that the Mega Database upgrade (from the previous bigbase or previous mega database) is the best choice. Quality, quantity and service is not equaled by any competitor. However 120 euro each year is not cheap. Also you could wonder how much value do have annotations. Within a couple of years the analysis are outdated  see my article of 2016 in which I indicated that the top-engine gains averagely each year 55 ratingpoints.

So I recommend to not spend lots of money for annotated games. CB-users have a much cheaper alternative with the online update reference-database for only 60 euro per year. Fritz interface-users should choose between big database 2019 for 70 euro per year or otb-openingmaster for 59 euro per year. The otb-openingmaster is somewhat cheaper and on top you get 3 updates per year. However you can only get the database via a download-link and it is not a CB-product.

Finally the cheapest alternative without losing much quality is probably still good old TWIC. By investing 10 minutes per week, you can download and add a nice collection of recently played games to your reference-database completely for free. If you don't find it critical to get each week the newest games then you can choose to bundle the downloads twice per year like I do. That way you only need twice 1 hour per year. So in 2 hours I saved 60 euro or is this too optimistic? This would be nice to find out so I bought Big database 2019 and compared that with the Mega database 2016 complemented with the twics of the last 3 years. I think there were about 1-2 weeks difference in favor of twics which can explain some small deviations. Below table shows a detailed comparison of the numbers in different categories: total, + 2500, + 2300, world-top 10, Belgian top 10, Belgian players and history. Just for information I also added the numbers of the free online database chess.db although you can't use this as a reference-database.
More games in a database don't mean a more interesting database. We see chess.db claims to have 2 million games more than the Big Database although it lacks many relevant games of Belgian players.

Concerning twic it is remarkable that you don't miss any games of grandmasters. Still we can't ignore the 650.000 missing games over 3 years. Chessbase clearly makes an extra effort to also include games into their database of the amateurs. They know that their customers are in most cases not top-players but clubplayers interested in what is played by their direct rivals.

Despite few will consider the 1800 "new" historic games as valuable and likely it is not interesting financially for Chessbase, I do like the nice bonus. Fortunately Chessbase not only focus to the commercial interest of the database but also takes the role of archivist of the chess-history. It takes a lot of time to digitize old newspapers and magazines contrary to the few clicks needed to download a new collection of games from a website.

Personally I find it rather expensive to buy each year a new big database for the extra you get compared to the twics so I only do it once every 3 years. Also I detected another advantage of the big database. Twic only shows the first initial of the first name and often goes wrong with the spelling of surnames (especially Chinese players). The data of the games is in the Big Database much more complete which allows to search quicker and easier.

This year I bought a new big database while using the prize-vouchers of my son. Beside hereby I also got a new up to date powerbook. Few players are aware about it but the powerbook 2019 offered by Chessbase for 70 euro is something you can create from the Big database by yourself. I even created 2 different ones:  1 openingbook with games of which 1 player has at least +2300 elo and 1 openingbook with games of which both players have +2500 elo. You must have patience as on my 4 year old laptop it took 12 hours to create the first and a bit less than 2 hours for the second.
Openingbook of games with at least 1 player +2300 elo, filtered from the big database 2019
Openingbook of games with both players +2500, filtered from the big database 2019
It is funny that the rating barely influences the popularity of the first move. I guess most amateurs like to copy what the professionals are playing. However we do notice that the advantage of the first move increases slightly for the higher ratings. We see the advantage goes up after 1.e4 from 62 elo to 78 elo, after 1.d4 from 60 elo to 74 elo, after 1.Nf3 from 42 elo to 64 elo and after 1.c4 from 40 elo to 60 elo.

Naturally statistics are one of the main assets of a database. Anyway most of the treasures are hidden in the games. I started my article by telling that some people think engines are sufficient to get a good evaluation of an opening. Well I think this is a bit too simplistic as we have a very rich history of chess. Many ideas can be discovered again in a database which are still valuable today and which engines won't able to show you. No don't think engines know always more. Just look at the ongoing TCEC superfinal of season 14 in which the so called invincible Stockfish is at this moment a point behind while we are less than 1rd from the finish. LeelaChessZero (derived from Alphazero) is busy writing history.


No comments:

Post a Comment