Friday, December 25, 2020

Lichess

Usually I take 2 weeks off from the job at the end of the year but this year I decided otherwise. Normally we travel around Christmas to Russia to visit my parents-in-law but corona made this impossible. Also visiting people is currently not allowed in Belgium and all the enjoyable activities outside are cancelled. I hope next year will be better as I transferred the maximum allowed holidays. One of the positive sides of this situation is that I don't spend much money. Even online I haven't looked for any sales.

Sure Chessbase as every year has prepared again a number of flashy new products but I don't see the sense in buying the new Megadatabase 2021 for the exuberant price of 190 euro. This year you get 400.000 new games on top of the previous release see Megadatabase 2020. That sounds a lot especially in times where hardly any standard games were played due to corona. It is even more surprising if you compare with last year when only 350.000 games were added. Obviously Chessbase doesn't tell us on purpose that most of the newly added games were played online. I don't think it would be smart to put in an advertisement that people will mainly pay 190 euro for a collection of bullet, blitz or rapidgames which can be found for free online.

One of the places where you can find such games for free is lichess. Their database is accessible to anybody and can be downloaded in a few clicks. It is also 1000 times bigger as you get already 78 million standard games for only last month November (all rated games as unrated are not stored !). I tried it out but this went less smooth than I hoped for. Despite Lichess compressing their databases, it was for November only still 19,4 GB to download. Via my wifi-network this took 4-5 hours so very long for just 1 month of games. Beside after decompressing the file, it exploded to 160 GB !! A PGN of that size can't be opened by any program so next I needed to split it in smaller pieces by using e.g. pgnsplit. As such I got 154 pgn-files of approximately 1 GB.

Those files I could open but it was not practical at all to work with it as checking something takes ages if you need to go through all the 154 files one by one. So my next step was to transfer the pgns into 1 big cbh-file which has no size-limit and in which the Chessbase-filter works 100 times faster. By the way I am curious if any readers have bought already Chessbase 16 as they claim that the filter works much faster in this new release which is of course very useful for such big databases. 

Anyway I am still stuck with an older version of Chessbase. Therefore each transfer of a single pgn to the new cbh-file took about 10 minutes. 10 minutes * 154 files = almost 26 hours so not surprisingly after 20 files I broke off this painstaking job. Anyhow I now had a cbh-database of 12 GB (another advantage of cbh compared to pgn is that the file is much smaller and doubtless much better structured). This corresponds to 10 million games played on the lichess-platform. Such sample is definitely sufficient for doing research about who is exactly the typical online-player.

Unfortunately filters don't always work fast in such large database. Searching for a name or rating is no problem but keywords like blitz/ rapid/ tournament take on my computer easily an hour. As I needed to do many such time-consuming searches, I decided to limit myself to only the elite-games (1 of both players has a rating above 2400 elo) by creating a new database for only those ones. Also if you look at the quality of most games played online then only the best are interesting for study. Below I show you my results of the research.
This table allows us to make some conclusions:

1) Strong players don't play ultrabullet. I assume because such games don't resemble at all to standard chess.
2) Nobody likes to sit and wait in front of the computer so rapid, classical and correspondence chess are only played by few players.
3) I find it slightly remarkable that bullet becomes more popular at the expense of blitz for the higher rated players. I think it is because there is anyway little to learn for them in blitz and bullet is probably a better compromise of time/ quality to achieve maximum pleasure.
4) About 75% of the games played online are single games. 25% is played in a tournament. This is approximately the same for all ratings. A single game is much easier to plan than playing a complete tournament.

If we also take time into consideration then the picture looks a bit different.
We see above 2700 elo bullet rules. However this is a very small niche of players so it is maybe too early to conclude anything from this small sample.

Anyway for me this is sufficient proof that blitz is the most popular choice for online chess as almost 2/3 of our time online is spent at blitz. Let us therefore once look a bit closer to this timecontrol. Blitz knows a lot of variety. To learn more about it I created a second small research only focused on blitzgames. Unfortunately again this didn't go smoothly as for some reason I couldn't manage to filter on the exact details of the timecontrols in Chessbase. So I had to look for an alternative. After some fooling around with different tools (notepad, word,... ) in the end excel became my preferred choice as that tool allowed me very quickly to know how often each of the timecontrols were used in a pgn-file. Still there was one last hurdle to take as 66000 blitzgames played by the 2400 elite couldn't be inserted into excel at once. Only after I downscaled the database to the 2500 elo-elite, excel finally accepted the full pgn = 22000 games. Below you see an overview of the timecontrols used in those games.
180 seconds = 3 minutes per person for a game without increment is the most popular choice for online chess. It is also the one I choose standard for my games. Second far less popular is 3 minutes + 2 seconds increment with a close third 5 minutes K.O. In the end without increment clearly dominates blitz.

I tried to google if somebody else made already a similar research but didn't find anything useful. Anyway it confirms what I already suspected for many years and now I have finally the proof which I wanted. So I won't download such databases anymore in the future although it allows to track down any rated game played online at lichess.

The advice which I got from a friend last week is much more practical. I didn't know but you can easily download all lichess-games of 1 person by the command "https://lichess.org/games/export/username" which is very useful for preparing against that person. Also don't think that you are safe by deleting your account. In July I wrote that WBoe3 deleted his account after I discovered his real identity but today I can still download all his games with a single click.

I guess some players will now move to other online sites to play chess but that won't help much either. I was able to find a similar trick very quickly for chess.com. It is slightly more work as you first need to check which months a player has played by the command : "https://api.chess.com/pub/player/username/games/archives"  and then you need to define which months you want to download one by one via the command:  "https://api.chess.com/pub/player/username/games/2020/10/pgn" (so this would be October 2020).

I even managed with those commands to download games from chess.com-accounts which were deleted already more than 5 years ago. Of course those games have little value today but I just want to say that anything you do online is somehow stored forever and can be viewed by others.  At schaaksite I read that we should adapt ourselves to this loss of privacy but some people make it very easy for their opponents by adding their real name to their account. It is exactly the reason why I still don't use facebook, instagram... and I write this blog with the nickname Brabo. However I am also not in favor of completely banning the internet as then you would miss too many interesting things.

One last thing I want to share is a new free site which I  recently learned about which can take over the role of the closed chess.db-platform. Since a couple of months chessbase has a new serious competitor: chessabc. It looks very professional and offers now already a lot of features (advanced game-preparations, news, 7 piece-tablebases...). The big question with such beautiful initiatives is always if free will last. I am amazed that lichess still survives purely from donations after 10 years. Don't hesitate to support them if you are enjoying their service.

Brabo

2 comments:

  1. Hello Brabo!

    I've just been referred to your excellent blog and I'm thoroughly enjoying working my way through the "back issues"!

    I wanted to add something to this particular post- in that rather than using the separate website api's to research specific opponents- you can make use of www.openingtree.com (it allows you to enter the Lichess or chess.com username of your opponent and visually work through each of their most common lines- phenomenal for research!).
    [I explain the basics in a video here: https://www.youtube.com/watch?v=GxO_NCYyXBk ].

    Thanks again for your amazing content,

    Rob

    ReplyDelete
    Replies
    1. Hi Rob,

      I stopped maintaining the English version of this blog due to lack of responses so I am happy to see yours.

      However I still maintain the Dutch version of this blog in which I recently covered in more detail the openingtree see http://schaken-brabo.blogspot.com/2021/09/de-afrekening.html

      There I explained some shortcomings of that tool which I can bypass using the api's.

      With google-translate it should be nowadays readable. Again I think this makes a manual translation rather obsolete.

      Brabo

      Delete