Bartek Macieja | September 07, 2011 0:10


Draw problem in chess

I have just read Jeff Sonas' article published on ChessBase.

I have found it very interesting, however I regret that 2 things have not been clarified:


I think it is worthy to add below the third picture that:

a) The level of play of Top10 players increases (for instance, in my opinion, Steinitz played more or less at 2400 level, Capablanca and Alekhine about 2600), so we expect that a number of draws should increase, as it is shown on the second picture.

b) Chess is considered by the vast majority of elite grandmasters to be a drawn game (of course no one can prove it so far and will not be able soon). Taking into account that the theory of openings develops, there is less and less space for players to commit a mistake.

The same and for the same reasons we should observe for Top100.


Trying to compare the percentage of draws before move 25 (Picture 6) is very tricky for at least 2 reasons:

a) Nowadays in many tournaments it is not allowed to offer a draw before move 30.

b) The theory is very much advanced in comparison to previous decades.

The influence of "a" is obvious.

The influence of "b" is easy to be understood. Many elite games are played according to the scheme: theory, a few own moves, draw agreed. No much has changed for decades. Games become longer only because "the own part" starts later.

Share |
Bartek Macieja's picture
Author: Bartek Macieja

Bartlomiej Macieja is a grandmaster from Poland.



Help's picture

Good points. It always amazes me that certain subjects that seem simple enough (draws, rating inflation) become complicated quickly as soon as one starts to study them in detail.

Maybe they should as Jeff Sonas to blog as well, that would make for some interesting articles back and forth I'm sure! :)

stevefraser's picture

"Each game will proceed to a minimun of sixty moves unless a decisive result is reached. The first player while on the move to produce a second exact repetition of the position will be declared as having lost the game". Draws will be reduced to a minimum and many more interesting games will be created.

brabo's picture

I want to indicate that very recentely the following manificent work has been published about ratinginflations:

This completely contradicts Jeff Sonas statements. Also here about draws, he creates the wrong testenvironments (as indicated by GM Macieja) and makes again some random conclusions. I don't understand why Chessbase keeps on using his work.

Frits Fritschy's picture

I just started reading Regan/Hayworth and I have some understanding why Chessbase prefers to publish the Sonas article... But from the abstract I understand that the authors have proven that there is a direct connection between the quality of moves played and elo rating, and therefore there is no proof of elo inflation. A very interesting conclusion, and I will not question it: I would need some advanced statistical training to do so (like 99,9% of the readers here, is my unscientifical guess).
But could you please descent from your ivory tower to explain why this "completely contradicts Jeff Sonas' statements"? His main conclusions are, as far as I can see, that the number of draws is just slowly increasing and the number of short draws is dramatically decreasing on top level. Rating and rating inflation only play a small part in his story.
If there is something wrong with his test environment, in what way and to what extent does that influence Sonas' conclusions? Really enough to use statements as 'completely contradicts' and 'random conclusions'? Let alone your last sentence.
A more positive attitude would be showing how to refine or conduct this research, or doing this research yourself. Now you make yourself look like a kid sitting on his father's neck shouting: "I'm bigger than the rest of you!"

brabo's picture

Jeff Sonas makes continuous propaganda that the ratings have inflated enormously. He also claims that the ratingssystem should be adjusted in such way that the ratings drop again. This is not what the work of Regan and Hayworth is stating.

Sonas testenvironment doesn't take into account that he is comparing different group of players. It is like you say in 1950 we had apples and apples costed 1 euro, while today we have pears and they cost 1,5 euro. This means the price of fruit inflated with 0,5 euro. Translated to Sonas example: the top 100 in 1950 made x% of draws while the top 100 now makes y% of draws so draws have inflated by (y-x)%

With random conclusions I mean that he is looking only from mathematical point of view to the figures while besides the figures there also something you need to know how the game is played to interprete the figures correctly (as pointed out already by GM Macieja).

Yes, I agree my attitude is negative toward Jeff Sonas but it is exactly due to his repeating comments on chessbase that I hear continuously from other chessplayers that ratings have been inflated enormously and a lot of people actually think that better chess was played by the topplayers in earlier times which is not true.

Carl 's picture

I didn' t get your point, however it is clear that you dislike Sonas whatever he did. Unfortunately you have add nothing to discussion for the quality of your statemens are well below your own desire to smash and crush any Sonasian ideas.

Peter Doggers's picture

Fyi Arne is planning to discuss Regan/Hayworth soon here.

mdamien's picture

That study has some flaws. Particularly, it doesn't account for opening theory, which is refined over time so more modern players naturally make more accurate moves in the opening than older masters. More crucially, ELO ratings measure the results of games between players in an active pool: the quality of the moves is entirely incidental, even if it could be accurately measured. The study suggests that a play of ELO x will make accurate moves at the percentage of y, but even if that were statistically true (I don't think it is), it is not relevant to what ELO measures and the question of rating inflation.

brabo's picture

The study very well shows that the accurateness of the moves has been more or less stable with the elo over the years. You simply ignore the results of the study if you deny that.

Elo has been developed to identify somebodies strength (difference of quality in chessmoves compared with other players) in an active pool. A simple method was developed based on past gameresults. This isn't a perfect method but an easy to maintain method with decent results as the study proves. Preferably the elo should have no inflation/ deflation (compared to quality of producing chessmoves) as this would make it difficult to compare between people playing regularly and people playing seldom or after long period of absence.

I don't fully understand your point with opening theory. Maybe it is my poor English knowledge but can you rephrase your point?

GMBartek's picture

Thanks for all the comments.

In my opinion Jeff Sonas makes a terrific work, I am always very glad to cooperate and discuss with him. I don't find a strong criticism of his works justified. Often they are very instructive/helpful.

In this particular case I also don't see a contradiction. He showed some trends/effects, I tried to explain possible reasons standing behind them. The only thing I regret is that he didn't include them himself in his article, because now some of readers may take not fully correct conclusions.

I have also started a cooperation with Regan/Haworth and I find it very positive.

I would like to pay your attention to the fact that Sonas and Regan use a very much different definition for a term "inflation". I have studied their works and talked to them. The conclusion is that ... they are both right, when Sonas says "there is an inflation" and Regan says "there is no inflation". They only mean by "inflation" something different.

brabo's picture

I would like to know what kind of definition Sonas gives to inflation as he doesn't mention that anywhere clear in his articles while everybody assumes it is about rating becoming overtime less and less worth compared to quality of moves played on the board. A lot of people I know are convinced by Jeff Sonas articles that your rating will go up if you keep on playing as good as before (meaning number of mistakes and magnitude of the mistakes stays the same). This is not true. Your rating will only increase if you play better moves.

Zooty's picture

I'm so bored by these endless discussions about draws.
We have so many problems in the chess world, I think the problem of too many or too short draws is one of the least. Still there are some guys around who want to make us believe that without draws chess would make the headline news.
I remember times when 20-move draws between Karpov and Kotchnoi were printed on the front pages of the newspapers. But who cares about some obscure matches conducted at the end of the world, when the only news we make is when our big boss invites some dictator for a coffeetable talk?
A draw is a basic part of our game - don't give up that game just because of the shouting of some guys who seem to have never played seriously themselves.

Han Seat's picture


Helmut's picture

'Brabo':'Elo has been developed to identify somebodies strength (difference of quality in chessmoves compared with other players)in an active pool'

This is untrue as was pointed out above. Repeating buzz-words from that post makes no difference in this regard.
For further discussion, I refer to the article 'Chess Ratings and Titles' available at (

brabo's picture

First thanks for the links. However I don't believe a few letters proof that my statement is untrue.

The discussion between Arpad and Kenneth Whyld has of no importance as none of them represents fide. Fide has its own idea what the rating should represent and therefore already did some adaptations to the original formulas (remember the K-factor discussions).

One comment on Kenneth Whylds letter. I fully disagree that ratings should somehow have a link with the worldrankingplace somebodies possesses. For me a worldchampion can as well have 1000 ratingpoints as 3000 ratingpoints as long the rating represents his ability to produce a certain quality of chessmoves. The wish of having a system which shows via the elo, your worldrankingplace is quite difficult as such system should somehow take into account people joining or leaving the system.

Barone's picture

A classical Chess question: who would win in a Kasparov vs Fischer match, given both players at their best shape and with equal theory updating?
This kind of question has really no possible answer, simply because the hipotetical conditions cannot be matched.
By this I mean that an judgment on the REAL value of players cannot be based on ELO, as this piont assignement system is RELATIVE: every chess player's ELO, no matter what his level his, is a value describing the player's performance against his actual opponents, limiting the assessment of his actual strength to enviormental factors such as the time (era) and places (tournaments) he has the possibility to play.
With the ELO rating you can guess how much the number 1 player of the world is better then his contemporaries, but you cannot evaluate the ELO inflaction or deflaction by changes of the ELO through time.

IMHO, the changing of significance of the ELO rating could really be understood in two possible ways:
1) intuitively (and questionably), by comparing the chess moves's quality and by the efficiency in converting rated games of players of different epoques with the same rating (this should be done with players of all possible ELOs).
2) building up a complete mathematical model based on the ELO-points attribution formula and fitting it with the actual statistical data from the whole ELO-chess-history.

So I think, as long as the 2nd possible way I described isn't being implemented by some serious mathematician (IF it is possible, anyway!), nobody can say the last word on how "enormously" the ELO has inflated, deflated or stagnated since this point-system was adopted.

brabo's picture

You really should once read the work of Regan and Hayworth. The idea of that work is based on a teacher giving points to games played by anybody at any time. The teacher is a supercomputer much stronger than any human being. The pointmechanism is built based on evalualtiondifferences between move played and best moves. The whole study shows that over the years, the elo has been pretty stable to the quality of the moves played on the board.

I believe that a 1500 rating (arbitrary and estimation from my part) could be given to the average beginner (some starters will have a lower rating, some will have a higher rating) which means that this rating isn't influenced yet by theory or other elements. The average starter with 1500 elo should stay stable over the years in a proper working elosystem. The ratings of advanced players will of course be higher than the average beginner. I think everybody will agree that thanks to increase of knowlege over time that the gap between the beginner and the topplayers increases. This perfectly explains why the top 100 players have increased their rating over time. Fully acceptable as long the average beginner stays on 1500 elo.

Barone's picture

This method is, as I said, questionable, because it still makes general assumptions that cannot be considered true a-priori:
1) the supaercomputer/teacher is a relative measure, as long as it cannot use some 32-men tablebases (a "solved game" reference): this may seem not relevant, but you can understand its influence if you compare any computer evaluation of the moves' strength of a mainly tactical vs mainly positional player (given the two players have the same ELO rating), and if you consider that any player's efficiency in gaining results is a product of how many times he is able to play a game based on his strenghts vs times when his opponent drags the game in his own field of experties (the opening expert tricking the endgame guru or viceversa, for example): some "average move" system cannot be the ONLY influencial factor for a player results, and for his consequent ELO.
2) expecially when you compare different era's players, the "places/tournaments" conditions I mentioned before are VERY influential on the ELO rating: the number of games Carlsen can play against highest (or "high in general", even more significantly) rated opponents nowadays vs the number Capablanca (or Fischer, or even Kasparov) could play in his time, is much higher for the former one.

I belive this second point is the main reason many are convinced of a general inflaction of the ELO rating of players.
And still they're virtually wrong, because as the number of, say, the 2700+ELO grows (or change in general), so does the general understanding of the game itself among professional players: the two respective rates of growth (or change in general) of these two factors should be confronted, in order to understand which one dominates the complex phenomenon of "change of significance of ELO in determining a Chess player strenght through time" we're speaking about here (and when, as there could be times when one factor prevails and times when the other one does).
That's why I was suggesting that without serious and correct modellization of this complex matter there will never be an answer which is not questionable.

brabo's picture

You really need to read once the study as your comments clearly indicate that you still didn't. I admit that it is heavy stuff.

1) The evalation takes into account positional and tactical themes in a game. So not just one simple evaluation method for everything.
2) The evaluation takes into account some human behaviour as consolidating a winning position or searching tactical complications when you are lost.
3) The evaluation of somebodies moves takes into account the strength of the opposition.

Again if you read the work of Regan and Hayworth than you would understand that very serious and a lot of modellization of this complex matter has been done. A whole team has been working on it and although I am sure there are still some gaps (as they admit themselves), it is by far the best analytical work I've seen on this topic.

Barone's picture

Does the evaluation take into account the fashion in the opening theory and in playing style of each chess era? This factor and the adjustments it brings throughout time (going from Tal to Botvinnik, from Karpov to Kasparov, from King's Gambit to French Defence...), will probably affect the two rates of change I was referring to in different ways: the study you mention is not modelled on time and does not take these factors into consideration.

Any relative evaluation is a simplification, as you would NEED to know how many of all the possible continuations would lead to win, draw or lose for every played move, in order to make an assessment of that particular moves's strenght. Just as an example, a player who is very good and confindent in his endgame and late middlegame tchnique, may well play second rate moves for half of his "average game", just because he's exchanging pieces, and still win most of his games: Elo is calculated on only three results of complete games among players of predetermined ELOs, not on fractions of evaluation points of each move of the games.

This doesn't mean the study is not serious, or done by competent people: it only means it's not the last word on the matter, and its conclusions are still questionable.

Barone's picture

Just to calify: often the strongest engines evaluate an endgame position as having one player in a situation of 1 "mean pawn" and a half of equivalent material advantage, and still there's no way for that player to win, and the "drawn continuations" have an infinite number (because of repetition of move: if one of the players doesn't repeat, he loses). All the moves played before such a situation was reached by the player with the evaluation advantage may well be evaluated as the best possible, and still the game itself could have been won by a move rated as "not the best" by the computer, such as a well timed simplification: the player would have shown better understanding and higher strenght by chosing the "not best" move.
This is an example of the problem with relative evaluation, an example which we can understand and maybe partially avoid with rstricting condition in the study: but we have not enough informations on the game of chess to avoid such a thing completely, and anyway the restrictions end up reducing the value of the whole study: the well timed simplification I toke as an example is a clear signal of great strenght of a player, a signal that would show in his ELO but we are forced to exclude from the study simply because our "teacher/computer" cannot understand it.

Barone's picture

Sorry for the errors, but it's not my native language. I hope you can still understand what I mean.

brabo's picture

Sure English isn't my native language either.

Please be aware that the evaluation used is not the same as you see on your computerscreen by Rybka or any other topprogram. No it is a clever complex modified evaluation which takes into account besides the quality of the moves played also the strength of the opponent, the type of opening (tactical or not) and a well estimated guess of the intention of the players.
In the paper is mentioned that the endgame is a rather weak point for computers as less domininant compared with humans especially if no tablebases are presented. Still I don't believe this has a major impact on the finalevaluation as such wrong evaluated situations are really seldom compared with the other evaluated positions.

I fully agree that the whole testsetup stays nevertheless a simulation but a simulation based on 150.000 games taking all these different elements into account. The researches themselves admit that a better simulation is certainly possible but today with the testing done, it is one of the best (if not the best) simulation achievable. Besides I doubt that an improved simulation will give an important difference in results.

Barone's picture

But that's exactly the problem: at a certain point you have to say "... I don't believe this has a major impact ...".
No final word, and questionable sensible reasons to say the ELO has or has not the same meaning today as it had when it was adopted.
My point is not against one hypothesys or the other, mind you. I'm just saying we still have no decisive answer, as at a certain point one have to "belive" in something which is not proven.

This is not trivial, as something very similar happens when you start arguing about the final result of Chess as a game: many think it is a drawn game, and actual play seems to confirm their opinion, especially because of the many typical endgame exceptions we all know about (with extra pawns in the wrong line, or with the inappropriate extra piece and so on). It seems a very sensible assumption, but still it would be sufficient one SINGLE "perfect game" where White or Black win by force (over the almost infinite possible games of Chess we can possibly play) to refute the "drawn game" theory: only if by solving Chess we can know if it's a draw or a win/lose.

brabo's picture

A decisive answer on the drifting of elo over time, you will never have. Not even with a solved tablebase of 32 man ,as strength is not only a matter of quality of moves compared with perfect play but especially compared with not perfect play.

I don't believe that I claimed that the researchers found the final answer (using a perfect simulation). I believe today we can pretty much say for sure (with maybe a 95% certainty) that elo hasn't seriously drifted over time contrary to what some people claim.
If somebody comes up with different results after having done more research on this topic with a more optimized evaluationmechanism then this will overrule the results of the team Regan/Hayworth. However looking at the magnitude of the work done, I don't believe this will happen soon.

You can compare this with rules or laws in other domains which are used although people know that it isn't a decisive answer. The rules/ laws are used until somebody comes up with something better. The team of Regan/Hayworth shows via a simulation (rules/laws/hypothesis,...) that the elosystem isn't seriously drifting so unless somebody can come up with an improved simulation which gives different results, I insist we stick to the results of the study. I don't find it correct to say that serious drifting (inflation/ deflation) of elo has happened as nobody can with a 100% certainty prove the opposite.

Helmut's picture


I don't really understand what you're getting at with your last post. It feels as if some kind of answer is in order though.
As explained in the link I gave, Mr. Elo developed the rating system that bears his name. In 1981, FIDE took charge of the calculations. Granted, they introduced a number of changes/adjustments. These have, however, not changed the way it basically works.
Therefore, Prof. Elo's opinion on what his rating system is supposed to measure is still indicative in my view.

brabo's picture

Just want to make 2 points:
- I respect Prof Elo's opinion which was certainly correct in his time. Today we use the ratingsystem for completely different reasons than the 70s. So it is normal that today we expect different results from the rating system than at the 70s.

- I really like the ratingsystem used today. The earlier mentioned study also indicates (too early to say fully proved) that we have a rather stable ratingsystem which is very good news. I strongly object against people wanting to change the ratingsystem because it doesn't provide the data as they would like to see.

Latest articles