Tom Kerrigan's Home Page
|
|
Multiplayer Elo
The Elo rating system is designed for two-player games,
like chess.
It would be nice to have a similar rating system for games with more players.
Unfortunately, generalization of the Elo system to n-player (multiplayer) games has proven difficult.
Simple Multiplayer Elo (SME)
Generalization may not be necessary, though.
I've come up with a simple and effective way to apply the two-player Elo system to multiplayer scenarios:
-
At the end of a game, make a list of all the players and sort it by performance.
-
Think of each player as having played two matches: a loss vs. the player right above him on the list,
and a win vs. the player right below him.
-
Update each player's rating accordingly using the two-player Elo equations.
I call this method "Simple Multiplayer Elo" (SME) and am making it public domain.
If you decide to use SME, I would appreciate an e-mail!
Performance
To measure the performance of SME, I did the following experiment:
-
Imagine 10 players with "true" Elo ratings of 1100, 1200, ..., 2000.
-
Each player is assigned a random initial SME rating of 1500 ± 1.
-
Multiple rounds of an imaginary game are simulated:
-
The players are assigned random game scores according to their true ratings.
The scores have normal distributions with
μ = true rating,
σ² = 200.
-
The predictive ability of the SME ratings is calculated by enumerating the 45 possible pairings between the 10 players,
counting the number of times the higher-rated player beat the lower-rated player, and dividing by 45.
-
The predictive ability of the true ratings is calculated using the same method. This is the ideal predictive ability.
-
The SME ratings are updated according to the SME method, described above.
I ran this experiment millions of times and averaged the results:
After round...
|
SME predictive ability
|
Ideal predictive ability
|
| 50.0% (random) | 84.5% |
1 | 62.3% | 84.5% |
2 | 72.5% | 84.5% |
3 | 77.3% | 84.5% |
4 | 79.2% | 84.5% |
5 | 80.3% | 84.5% |
10 | 82.2% | 84.5% |
20 | 83.3% | 84.5% |
1000 | 83.9% | 84.5% |
You can see a graph of this curve here.
This experiment shows that SME only needs a small number of rounds to achieve near-ideal predictive ability,
and is thus a strong candidate for any application that requires a multiplayer rating system.
Compared to Microsoft's TrueSkill
A well-known multiplayer rating system is Microsoft's TrueSkill
(web site,
paper).
It's a much more complicated system that's based on Bayesian inference.
I ran the experiment described above using Jesse Buesking's implementation of TrueSkill
(GitHub):
Round
|
SME
|
TrueSkill
|
1 | 62.3% | 63.4% |
2 | 72.5% | 68.7% |
3 | 77.3% | 71.7% |
4 | 79.2% | 73.7% |
5 | 80.3% | 75.1% |
10 | 82.2% | 78.7% |
20 | 83.3% | 81.1% |
100 | 83.9% | 83.7% |
1000 | 83.9% | 84.4% |
SME approaches ideal accuracy much faster than TrueSkill.
Eventually TrueSkill produces ratings that are almost perfect (after hundreds of rounds)
but at that point the difference is marginal.
Files
|