Gratis uses results from games played without handicaps at those AGA tournaments for which records exist. In practice most such games come from open tournaments, but Gratis also includes games from handicap tournaments (simple or progressive) that happened to be played even.
Most rating schemes assign a strength to each player and adjust the strengths from time to time, either after each game or at fixed intervals. Gratis does not. Rather, the entire rating scheme is recalculated as often as needed.
Gratis attempts to rate players a set at a time. The requirement on a set is that for any division of the set into two subsets, each subset will contain at least one player who has defeated a player from the other group.
In practice this leads to a division of the players into
Mathematically, a player with victories against rated players and no defeats would have a rating of positive infinity, a player with defeats and no victories would have a rating of zero, and a player with neither victories nor defeats has an undefined rating.
Another way of looking at the set requirement is that it must be possible to establish a chain of victories from any player in the scheme to any other player in the scheme, e.g. A once beat B, who once beat C, who once beat D. Without such a chain the ratio of the strengths of D and A would be driven to positive infinity. (As a joke Gratis uses this to define the Guo number, by analogy to the Erdös number of mathematics: Guo Yi Ming has a Guo number of zero, if you've beaten Guo Yi Ming you have a Guo number of 1, if you've beaten someone who's beaten Guo you have a Guo number of 2, etc..)
Let vijn be the discounted number of victories of player i over player j in tournament n, and pij be the probability that player i will defeat player j in any particular game.
We wish to choose the vector of ratings that is most consistent with the observed results. One way we could do this would be to maximise the product over all games of the probability of the actual result, i.e. Πijnpijvijn, where i and j are over all players and n is over all tournaments.
We modify this to take account of the reduced information in old games by applying a factor fn, defining P = Πijnpijfnvijn. Then ln(P) = Σijwijln(pij), where wij = Σnfnvijn. This isn't rigorous.
The results of any tournament are discounted for time by multiplying it by a factor fn=e-0.2328y, where y is the number of years since the game was played (measured to the nearest month).
This rate of discounting is partially justified by an analysis of the temporal auto-correlation of tournament performance, but not in a rigorous way.
We now need to relate a player's rating to their probability of victory in even matches.
We associate with each player i in the set a strength si, which is a positive real number. We assume that the probability of player a defeating player b in any particular game is pab = sa/(sa+sb) = 1-pba.
There is an assumption of transitivity here. That is, if player A has a win-loss ratio of x against player B, and player B has a ratio of y against player C, then player A will have a ratio of xy against player C. This assumption is consistent with the European approach to ratings but not the US one. A brief and unrigorous study of my dataset suggests this assumption is probably not too bad.
To maximise P we want to find a point where its derivatives with respect to si are all zero. In practice it is numerically convenient to require 0 = Ei = ∂ln(P)/∂ln(si) = Σj(wijsj-wjisi)/(si+sj).
This may look like n equations in n unknowns, where n is the number of players. In fact it is n-1 equations in n-1 unknowns. One degree of freedom comes out of the equations because they always sum to zero. One degree comes out of the si because only ratios between player's ranks are observable.
Gratis solves this system of equations using Newton's method. It adopts the solution to these equations as its best guess of the players' relative ratings.
The solution to obtained by Newton's method is the most consistent with the results. We wish to identify a range of results around this point that are reasonably consistent with the results. Assuming that the probability of ...
We wish to relate the abstract strength si to a traditional rank.
Consider the point value of all errors a player commits in a game, relative to perfect play. Let us assume that the variation in this quantity is proportional to its average value. This in turn is proportional to G-r, the number of stones the player would need to take from a perfect player, where G is the rank of a perfect player and r is the rank of the player.
The ratio of two player's strengths will depend on the ratio between on the one hand the variability of their performance and on the other hand the difference in their average performance. Motivated by this we assume strength will follow s=k(G-r)-b, where r is the rank in dan, or, equivalently, ln(s)=a-b×ln(G-r).
The constants a and b are determined using linear regression, and G chosen numerically to minimise the sum of square of errors of that regression.
We can therefore calculate an effective dan rank from each player's calculated strength. The calculated effective rank is normally quoted in hundredths of a dan, so, for instance, 400 is a strength typical of an average 4 dan, and 275 is a bit weaker than the average 3 dan. This effective dan rank is the most useful representation of a player's strength provided by the program.
We would like to be able to estimate the uncertainty in a player's rating. We can do this by identifying the 10th and 90th percentiles of the distribution of the likelihood of each player's strength about the best guess. To make this calculation we assume that the strengths are log-rectangularly distributed.
Consider the case of two nearly disjoint groups of players, with just one victory in each direction linking them. A player's strength relative to frequent opponents may be well-determined, but the relative standards of the two groups will be ill-determined. This common effect in inter-club rankings is not captured by the uncertainty calculations above, illustrating the fact that players are only rated relative to each other, and not in an absolute sense. In effect this makes the calculation pessimistic, since the difference between two particular players may be much better determined than than the strength of either player relative to the rest of the world.
I welcome feedback at David.Bofinger@dsto.defenceSpamProofing.gov.au (delete the spamproofing).
This page is hosted by GeoCities, in return for carrying their advertising they will give you a free home page much like mine. Everything on this site varies without notice, especially after I get feedback.