Rating System
Update
We decided upon using glicko. The implementation is pretty much a straightforward copy of the glicko documentation, except some differences in constants (see source code), and the following details:
- The function that computes the expected value is implemented as to normalize any rating difference bigger than MAX_DIFF to MAX_DIFF; without this the results were way off, most probably because of precission issues (is this true?)
- Like the documentation states, there is a need for a MIN_DEVIATION constants, that must also be included when computing the weight for a game
- Since glicko was invented for chess games, and our contests include multiple users, we consider each possible pair of users as a game; this tends to flip out when we have contents with a huge number of users and will be fixed soon
- The scores in glicko were 0 for loss , 1/2 for draw, 1 for winning. I made a function that gives a real number in the interval [0..1] based on the standard deviation of the score set; the result is 0.5 for equal score, a number from (0.5 , 1] if the first score is bigger and a number from [0, 0.5) if the second score is bigger. The current formula gives scores based on the proportion of the score difference and double the value of the deviation. (Feel free to suggest something else, this formula is based an educated guess)
Old info
I (Mircea) have been reading about several rating systems like ELO, glicko, TrueSkill and the home-made rating system by Radu Grigore.
The basic outline of the rating algorithm is:
- receive old ratings and current contest scores as input data
- compute new ratings as output data
There are a number of desired characteristics our rating system should have:
- Ratings are scaled properly regardless of section (X points of rating for debug problems should be comparable with X points of rating for classic problems, etc.) This seems trivial to achieve, given the similar structure of all types of contests we are planning.
- Deviation: Deviation is a measure of how precise the current rating is. In TrueSkill? this is done by also adding an "uncertainty" value, which decreases as the number of matches played increases. The idea is that the smaller the deviation, the more accurate the rating is. Do we really need this? This involves more complex calculations. A nice spin-off done by glicko is that the deviation increases by time. The more a person competes, the more accurate his rating is. Glicko assumes that a person competing after a long period of time this system, or are they encouraged to not compete for a while to achieve a has a different skill level now, so he's deviation is large. Can we use this? Are people that compete often rewarded byhigh deviation? Should we care how often a person competes?
- Points not ranks: all rating systems so far seem to use ranks to do their computations; I think this is a bad thinking. Coming in 1st place with only 1 point more is obviously less valuable than when you have something like 200 points advantage. No rating system seems to take this into account.
- User-friendly: the more complex the rating system the harder it is for the user to comprehend; I think it is desirable to have a rating system that is easy to understand.. saying 3000 points on TopCoder? means nothing for a normal person... saying 900 points out of 1000 on infoarena makes much more sense to a normal person.
Although we need to decide what exactly we want from our rating system, it seems that there is no standard thing out there, we shall have to make one.
Note: all studied rating systems have quadratic complexity regarding the number of participants
![[infoarena] development](/chrome/site/logo.png)