Multiplayer Elo - Tom Kerrigan's Home Page

Tom Kerrigan's Home Page

Multiplayer Elo

Multiplayer Elo

The Elo rating system is designed for two-player games, like chess.

It would be nice to have a similar rating system for games with more players. Unfortunately, generalization of the Elo system to n-player (multiplayer) games has proven difficult.

Simple Multiplayer Elo (SME)

Generalization may not be necessary, though. I've come up with a simple and effective way to apply the two-player Elo system to multiplayer scenarios:

At the end of a game, make a list of all the players and sort it by performance.
Think of each player as having played two matches: a loss vs. the player right above him on the list, and a win vs. the player right below him.
Update each player's rating accordingly using the two-player Elo equations.

I call this method "Simple Multiplayer Elo" (SME) and am making it public domain.

If you decide to use SME, I would appreciate an e-mail!

Performance

To measure the performance of SME, I did the following experiment:

Imagine 10 players with "true" Elo ratings of 1100, 1200, ..., 2000.
Each player is assigned a random initial SME rating of 1500 ± 1.
Multiple rounds of an imaginary game are simulated:
- The players are assigned random game scores according to their true ratings. The scores have normal distributions with μ = true rating, σ² = 200.
- The predictive ability of the SME ratings is calculated by enumerating the 45 possible pairings between the 10 players, counting the number of times the higher-rated player beat the lower-rated player, and dividing by 45.
- The predictive ability of the true ratings is calculated using the same method. This is the ideal predictive ability.
- The SME ratings are updated according to the SME method, described above.

I ran this experiment millions of times and averaged the results:

After round...	SME predictive ability	Ideal predictive ability
	50.0% (random)	84.5%
1	62.3%	84.5%
2	72.5%	84.5%
3	77.3%	84.5%
4	79.2%	84.5%
5	80.3%	84.5%
10	82.2%	84.5%
20	83.3%	84.5%
1000	83.9%	84.5%

You can see a graph of this curve here.

This experiment shows that SME only needs a small number of rounds to achieve near-ideal predictive ability, and is thus a strong candidate for any application that requires a multiplayer rating system.

Compared to Microsoft's TrueSkill

A well-known multiplayer rating system is Microsoft's TrueSkill (web site, paper). It's a much more complicated system that's based on Bayesian inference.

I ran the experiment described above using Jesse Buesking's implementation of TrueSkill (GitHub):

Round	SME	TrueSkill
1	62.3%	63.4%
2	72.5%	68.7%
3	77.3%	71.7%
4	79.2%	73.7%
5	80.3%	75.1%
10	82.2%	78.7%
20	83.3%	81.1%
100	83.9%	83.7%
1000	83.9%	84.4%

SME approaches ideal accuracy much faster than TrueSkill. Eventually TrueSkill produces ratings that are almost perfect (after hundreds of rounds) but at that point the difference is marginal.

Files

sme_graph.png

Graph of SME predictive ability