
Multiplayer Elo
The Elo rating system is designed for twoplayer games,
like chess.
It would be nice to have a similar rating system for games with more players.
Unfortunately, generalization of the Elo system to nplayer (multiplayer) games has proven difficult.
Simple Multiplayer Elo (SME)
Generalization may not be necessary, though.
I've come up with a simple and effective way to apply the twoplayer Elo system to multiplayer scenarios:

At the end of a game, make a list of all the players and sort it by performance.

Think of each player as having played two matches: a loss vs. the player right above him on the list,
and a win vs. the player right below him.

Update each player's rating accordingly using the twoplayer Elo equations.
I call this method "Simple Multiplayer Elo" (SME) and am making it public domain.
If you decide to use SME, I would appreciate an email!
Performance
To measure the performance of SME, I did the following experiment:

Imagine 10 players with "true" Elo ratings of 1100, 1200, ..., 2000.

Each player is assigned a random initial SME rating of 1500 ± 1.

Multiple rounds of an imaginary game are simulated:

The players are assigned random game scores according to their true ratings.
The scores have normal distributions with
μ = true rating,
σ² = 200.

The predictive ability of the SME ratings is calculated by enumerating the 45 possible pairings between the 10 players,
counting the number of times the higherrated player beat the lowerrated player, and dividing by 45.

The predictive ability of the true ratings is calculated using the same method. This is the ideal predictive ability.

The SME ratings are updated according to the SME method, described above.
I ran this experiment millions of times and averaged the results:
After round...

SME predictive ability

Ideal predictive ability

 50.0% (random)  84.5% 
1  62.3%  84.5% 
2  72.5%  84.5% 
3  77.3%  84.5% 
4  79.2%  84.5% 
5  80.3%  84.5% 
10  82.2%  84.5% 
20  83.3%  84.5% 
1000  83.9%  84.5% 
You can see a graph of this curve here.
I think this experiment shows that SME only needs a small number of rounds to achieve nearideal predictive ability,
and is thus a strong candidate for any application that requires a multiplayer rating system.
Compared to Microsoft's TrueSkill
A wellknown multiplayer rating system is Microsoft's TrueSkill
(web site,
paper).
It's a much more complicated system that's based on Bayesian inference.
I ran the experiment described above using Jesse Buesking's implementation of TrueSkill
(GitHub):
Round

SME

TrueSkill

1  62.3%  63.4% 
2  72.5%  68.7% 
3  77.3%  71.7% 
4  79.2%  73.7% 
5  80.3%  75.1% 
10  82.2%  78.7% 
20  83.3%  81.1% 
100  83.9%  83.7% 
1000  83.9%  84.4% 
SME approaches ideal accuracy much faster than TrueSkill.
Eventually TrueSkill produces ratings that are almost perfect (after hundreds of rounds)
but at that point the difference is marginal.
Files
