Columns | July 29, 2010 17:42

The Total Chess Library

Piranesi - Carceri XIVBeing a database programmer, perhaps I shouldn't have been surprised when I recently dreamt I had to develop a chess database. But it wasn't an ordinary chess database.

Carceri XIV - Giovanni Battista Piranesi

I was told by a faceless person to make a chess database of all chess games ever played. If that doesn't sound like much, it's because that was not all. The man told me it must also contain all chess analyses ever made, as well as every comment, opinion or text ever written about any move. It would be a database of all existing chess knowledge -an endless chess library. It was like making the chess version of Jorge Luis Borges' Total Library. The ultimate Mega Database - an entire chess universe.

I started by collecting all existing chess books ever written - both ancient manuscripts and newly printed books. I visited all chess libraries in the world and went through all privately owned chess book collections. But this clearly wasn't enough. I had to visit every chess player in person to ask for any scoresheets of games that they had in their possession. Then, I went through all local club magazines and internet blogs to find games I missed. This reminded me that I had to get all chess magazines as well. And, of course, I downloaded all digital books, DVD's, game analyses and instruction guides on chess.

When I had rubricized all material and put it in a more or less logical order, I started thinking about how to put everything in a database. It didn't take me long to realize I wouldn't be able to use existing chess database software. It would just be too impractical. For 1.e4 alone, hundreds if not thousands of comments somehow had to be entered in the database, and this can't be done with a regular database program. While it is possible to add comments in different languages in some software, you can't add comments by different sources - at least not dynamically.

So I started thinking about how to develop this chess database myself. Basically it had to contain many more dimensions than the current ones - in fact, it had to have an infinite amount of possible entries for comments and analyses. All published praise of 47...Bh3!! and 23...Qg3!! had to be entered into the database somehow. Actually, it should also be possible to add multiple annotation symbols, because perhaps some commentators had awarded these moves not with two, but only with one exclamation mark (a grave sin, I must say). The database design must take this into account as well.

With the help of data warehouse design techniques, I was able to establish which dimensions my database should have. Obviously there should be dimensions with information about the sources (the books themselves), and information related to the games, or game fragments. This could be players' names, the year in which it was played, where it was played, and so on. The moves and sub variations (including move number, to keep track of things) should be stored in a different dimension (or, in its technical term, a 'fact table'). Any game, including its sub-lines, could develop like a garden of forking paths, leading to an infinite amount of moves.

Database

'Datavault' model of a data warehouse

The same was obviously true of comments. But there was an additional problem: comments could not only be related to moves, but also to people who had written them. in his books, Kasparov often refers to older authors, for example. At this point in my dream, my faceless principal interrupted my musings. He ordered me to also store all information about the people who had written the annotations: what use would the project otherwise be? This implied I had to include all biographies of chess commentators in my database. And of course, the commentators could also be chess players themselves, so they should also be linked back to the players and games dimensions.

When I had finished my design - or at least thought I had - a long-feared question arose in my head: where to start? Which data should be put into the database first? Would it be wise to work 'backwards' in time, starting with the most recent chess books and adding entries in the database for every name, move or comment that returned a blank? Wouldn't it be wiser to start with the first chess manuscripts - the recent reconstruction of Francesch Vicent's mysterious treatise, the surviving games of Ruy Lopez, or perhaps even the first ancient Arab chess problems?

In the end, I decided it wouldn't really matter - it was a Sisyphus job in any case - and so I started with a game collection from 2010. It happened to be a new book on Capablanca. Slowly but steadily I worked my way back. Then I realized I had forgotten something crucial. Within comments, there could also be references to other works - references to database entries that didn't exist in my digital library yet! I was suddenly faced with what is sometimes called 'orphans' - database references that can't be traced back (anymore) to their primary dimension. In order to proceed, I had to put all titles in the system first. And so I started again.

My success didn't last long. I soon found out that many chess authors use references to non-chess related literature all the time. Kasparov quotes Ilf & Petrov, Donner quotes Nietzsche. Once you start paying attention to it, chess and literature are completely intertwined. To be complete, the entire world literature should be included in the list as well. And that's only the beginning of a myriad of problems. For instance, how to deal with references to literature that has been lost over the centuries?

I now realized the entire Total Chess Library idea would be quite pointless without having access to each and every chess book ever written; every game or analysis - including those that have been destroyed, mutilated, lost for good. I was trapped in a labyrinth I had created myself.

Then I woke up, of course. While I cycled to work, I thought about what use such a megalomanic project could be. Nobody would ever be able to use this monstrous database. The information would be sitting there in some kind of super computer without anyone ever touching it. At first I felt anger, then sadness. Then I felt like nothing had really changed. It was just like work.

As I switched on my laptop at work and opened the data warehouse environment I was currently working on, I remembered the words from another Borges story, The Library of Babel:

At that time it was also hoped that a clarification of humanity's basic mysteries -- the origin of the Library and of time -- might be found. It is verisimilar that these grave mysteries could be explained in words: if the language of philosophers is not sufficient, the multiform Library will have produced the unprecedented language required, with its vocabularies and grammars.

For four centuries now men have exhausted the hexagons ... There are official searchers, inquisitors. I have seen them in the performance of their function: they always arrive extremely tired from their journeys; they speak of a broken stairway which almost killed them; they talk with the librarian of galleries and stairs; sometimes they pick up the nearest volume and leaf through it, looking for infamous words.

Obviously, no one expects to discover anything.

Share |
Arne Moll's picture
Author: Arne Moll

Chess.com

Comments

Remco Gerlich's picture

I have been thinking about a database of all chess positions (well, only those that people bothered to enter into it), editable by the public a la Wikipedia. I think most of what you describe would go into a large article on, say, the position after 1.e4.

My gimmick would be that there would be a difference between a user's personal annotations (only visible by himself, private notes) and public information. I haven't decided yet whether it would be possible to keep moves (and thus positions) private as well until someone makes them public, or whether they should always be public.

iLane's picture

Did you dream about copyrights too?? :)

iLane's picture

But if you think about it for real I think you should take the model of Wikipedia (which is also intended to be a total library in a way). So what I mean is that once the basic structure is defined just let the public community to fill in the content step-by-step.

jussu's picture

Thank you very much, this was one enjoyable story! The fact that I instantly recognised Shirov and Marshall tells me that I have wasted way too much time on this game :)

Peter Doggers's picture

Felt exactly the same, jussu! :-)

Alexander's picture

Excellent story. It reminded me of a certain Jean-Pierre Deleule, French catholic writer, mathematician and diplomat, who is now known principally for his theological correspondence with G. K. Chesterton (whom he also translated into French). In 1889 - on the peak of the century - he wrote a treatise named "Le jeu d’échecs pris dans l’écriture, suivi par le court traité sur l’impossibilité de la libraire complète, démontré par la théorie cantorienne des ensembles."

The main topic of the paper was obviously the art of writing chess moves. Deleule's system is far from current rules of annotation, being somewhat complicated and baroque in its character. For instance, he proposed that the openings moves (up to eighth move in his mind) should not be written separately and numerically, but in typical pairs and descriptively. Instead of our "1.e4 c5" he thus uses the term "Un coup de côté initial"; instead of "3.Bc4 Nf6" phrase "Une prémunition de la bataille", and so on.

But the problem of annotation is not the only topic of the book, for its main part is appended by a short treatise on the possibility of a complete chess library (as is already indicated in the title). In it Deleule uses a fairly simple argument against such a possibility. His demonstration is as following: were a complete chess library possible, then the index of all its books have to be made, so that the library should be useful at all. Since the alphabetical index is of no use to a chess player, the books had to be catalogued by a certain chess-related criterion (by openings for instance). But that would obviously make this index itself a chess book; we would then have to add the index itself to the library collection. In order to catalogue the addition of the index we would have to write another index, and that index would on its turn again have to be added to the library, forcing us to invent yet another index, etc., etc. Deleule's conclusion goes as follows: "Were a complete chess library possible, it would have to be infinite in its nature, constantly expanding and thus never complete [jamais entière]."

It seems that Jean-Pierre Deleule - a somewhat forgotten spirit, remembered only in a certain vein of English Catholicism - had well in advance demonstrated the logical nullity of your dream. Perhaps his treatise was waiting for your dream, like a hunter waiting for a thing he himself deems impossible: a tiger with a shade of blue.

L.Medemblik's picture

Yep, thats what you dream when you have a self-analyzing computer chess brain. Maybe some nanoids will help you sleep better!?

bayde's picture

Talk to your countryman Jurgen Stigter. He is trying to actually do something like this :)

Arne Moll's picture

I know of Jurgen's project, bayde (he's a fellow club member of mine and I even offered him help with his database) but as far as I know his idea is limited to collecting all book titles (a very ambitious project in itself!), rather than linking all actual content of these books to each other - let alone on the level of individual moves!

noyb's picture

Outstanding article Arne, one of the most enjoyable you've written, and you've written quite a few good ones.

I had a short thought at the end of reading it and that was that the ultimate Chess db would simply be the compendium of all possible legal games sans any comments or evaluations except 1-0, 0-1, or 1/2-1/2. I can smell the sizzling silicon...

Adolfo's picture

Cool story!. But furthermore for an Argentinean like me who while I was reading your article I looked up in my computer shelve, picked Borges Complete Works and (much as the Marshall and Shirov moves) only had to (out of 630 pages in the VOL II) jump and pass ahead (or back) a few pages to find exactly the text that you referred to in your last quotation.
A single question came to my mind instantaneously:
How many chess books (only books, not articles) are estimated that have ever been written?.
Difficult to say ha; well, I suggest that trough the ISBN register (or other similar) we could count, for instance, the number of those published in 2009, then, the number of those in 2004, and keep that gap of regressive five all the way back to, say, Morphy times, and then keep going backwards with single year estimates (as the number of books at those times are far lesser). Once we have that, we could multiply them by an average between 0.5 and 4 MB, which is about the weight most digitalized books (for CBH format) have these days.
I have no concrete information (maybe you do, or could ask Edward Winter, Jurgen Stigter, or someone) but according to Wiki (http://en.wikipedia.org/wiki/Chess_libraries) it states that
- In 1913, preeminent chess historian H.J.R. Murray estimated the total number of books, magazines, and newspaper columns pertaining to chess to be about 5,000 at that time
- B.H. Wood estimated that number, as of 1949, to be about 20,000.

Later, from those references on, I have no specific information, but as an arbitrary (or rather imaginary) growth rate I could speculate that:

Period 1650-1850 : 7,5 books per year =1500
Period 1850-1900 : 70 books per year = 3500 (Total 5.000)
Period 1900-1950 : 300 books per year =15000 (Total 20.000)
Period 1950-2000 : Who knows?
Period 2000-2010 : Who knows?

Only if the rate per year would have been duplicated respect to the previous 50 years, we would have, in the period 1950-2000, a number of 600 per year= 30.000 (Total 50.000), and in the period 2000-2010 (sure, “only ten years”, but let’s give them the benefit of the doubt) 1600 per year, so a total of about 100.000 books.
Only considering this later period with my imaginary numbers, it doesn’t sound as such a crazy amount to believe, supposing that there are about 30 Publishing chess houses in the world, and each of them releases somewhat between 50 to 60 books a year.
Now if we were to digitalize every of those fantasy number of books at a rate of 2 MB each, we could have them in about 200.000 MB, that is, some 190 to 200 GB of information, less than one quarter of my outdated HHDD of 1 TB.

The point of the Borges 1st article (The total library), and later the 2nd (The library of Babel) were that –at least the second-, it contained every possible book to be written from a mathematical point of view, departing from the 25 orthographic symbols (22 letters of the alphabet –don’t ask me why not 26 or 27, like in our western alphabet-, stop, comma, and space) and 410 pages per book (I supposed, to make each volume manageable). Someone calculated that number (although departing from the 32 current symbols, being the 27 letters, stop, comma, space, plus 1 and 0) in approximately 101.974.751TB, that is, a number one followed by 1.974.751 zeros. The link in case you can read Spanish is http://otarioblog.wordpress.com/2009/10/24/borges-en-terabytes/.

Borges played around a lot in the articles with the implications of such a monster (incidentally, TERA in Greek means monster), such as the answer of all mysteries of humanity, as in Arne Moll quotation, but I prefer to keep the later (excuse me for my amateur translation from the original Spanish version)

“…We also know about another superstition of that time: that of The Men of the Book. In some shelve of some hexagon there must be a book that constitutes the perfect number and the compendious of all the rest: some librarian has searched it and it is analogous to a God. (…) I pray to the ignored gods that a men –only one, even if thousands of years ago! - have examined and read it. If honour and wisdom are not for me, let them be for others. Let myself be outraged and annihilated, but in one instant, in one being, your enormous Library could be justified…” (Jorge Luis Borges, Complete Works, “Fictions”, “The library of Babel”, page 469).

Regards to all,

Adolfo.

Frans's picture

I had to design a datamodel once for the registration of all the objects in the historical collections of a big Amsterdam historical institute. That really was a challenge like Borges' Total Library.

It took me about a year to discover/invent the basic idea. Once I had it, the idea seemed very simple. Of course it wasn't. But it turned out to be perfectly logical and possible. Not only in dreams, but in reality too. So keep on trying!

Arne Moll's picture

Thanks, Adolfo, fascinating numbers! Somehow I'm not convinced of the size of our hypothetical database. Consider, the ChessBase Mega Database 2009 folder is already over 2,5 GB and most of the games are not annotated at all. If we were to add all information of all chessbooks ever written and link it back to each other, I dare say it would be more than the number (190-200 GB) you suggest!

By the way, another comparison that came to my mind after publishing the article was with the supercomputer from The Hitchhiker's Guide to the Galaxy (aptly called Deep Thought!) which could calculate the answer to the Ultimate Question of Life, the Universe, and Everything, but after millions of years of calculating came up with the (humanly) uneless answer '42'. I think such supercomputers or libraries are always bound to be useless and Borges seems to have realized this even before computers were invented.

Paul Janse's picture

Great article, Arne, indeed one of your best and leading to reactions that are even more interesting than usual.

Do you know the article by Hugo Brand-Corstius on Borges' library? It is in 'Rekenen op Taal', I think.

And nice that you slipped in the reference to Kasparov's 'children of Lothar Schmidt'.

Paul

Dan's picture

Cool dream. I wish mine were as interesting.

Zomerschaker's picture

Nice this drawing by Piranesi, never heard of this artist before. But what is all this talk about Shirov and Marshall? I don't get it.

iLane's picture

Zomerschacker: a hint: Topalov - Shirov 1998 47...Bh3!!

Juan's picture

I'm the author of the post that Alfredo mentions: http://otarioblog.wordpress.com/2009/10/24/borges-en-terabytes/ (Spanish).
The conclusion of that article is that Borges' Total Library would need 10^1834091TB to be stored (a number "1" followed of 1 834 091 "zeros"). It's impossible: if every atom in the Universe could store 1TB, there won't be enough to store the whole Library.
Now, let's imagine a similar idea with chess games. I mean, storing every possible chess game, it doesn't matter if it has been played or not. Obviously, it would be as huge as the Library Borges imagined. But... which one would be bigger? I mean, which language is more complex, which one is greater? Mankind's words or chess' words? ¿Qué Dios detrás de Dios la trama empieza?

René Olthof's picture

Mind you Lothar Schmid, not Schmidt!

Your comment

By posting a comment you are agreeing to abide our Terms & Conditions