Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

TrueSkill

From Wikipedia, the free encyclopedia
Rating system supporting games with more than 2 players

TrueSkill is a skill-based ranking system developed byMicrosoft for use with video game matchmaking on theXbox network. Unlike the popularElo rating system, which was initially designed forchess, TrueSkill is designed to support games with more than two players.[1][2][3] In 2018, Microsoft published details about an extended version of TrueSkill, named TrueSkill2.[4]

It is based on aThurstonian model with a Gaussian score distribution. It does not satisfyLuce's Choice Axiom.[5]

History and use

[edit]

TrueSkill was developed by researchers atMicrosoft Research as a Bayesian skill rating system designed for online games in which players often compete in teams and matches may involve more than two players or teams.[2] In their first published description of the system, Ralf Herbrich, Tom Minka and Thore Graepel evaluated the method using match data collected by Bungie Studios during the beta test ofHalo 2 and described its operation in theXbox Live service.[2]

Christopher Bishop later wrote that TrueSkill was deployed on Xbox Live in 2005 and has operated continuously since then, processing millions of game outcomes per day.[6] Herbrich and colleagues reported that, as of September 2005, Xbox Live had over 2 million subscribed users and that the Xbox 360 Live service used TrueSkill for automatic player rating and matchmaking, processing hundreds of thousands of games per day.[2]

In a survey of ranking systems for games and assessments, Educational Testing Service researchers described TrueSkill as a generalization of Elo-style ratings for multiplayer and team competitions and noted its use in Microsoft's Xbox online games.[7] Later research from Microsoft proposed revisions and extensions to the original model, including TrueSkill2.[4]

Statistical model

[edit]

In the model described by Herbrich, Minka and Graepel, each player's skill is treated as an unobserved (latent) quantity, and the system maintains a probability distribution over that skill.[2] The prior over player skills is assumed to factorise across players, with each individual skill modelled as aGaussian distribution with meanμi{\displaystyle \mu _{i}} and varianceσi2{\displaystyle \sigma _{i}^{2}} representing the system's uncertainty about that player's skill.[2]

Match outcomes are modelled as arising from noisy performances. For a given game, each playeri{\displaystyle i} generates a latent performance valuepi{\displaystyle p_{i}} that is normally distributed around their skillsi{\displaystyle s_{i}}, with a fixed performance variance parameterβ2{\displaystyle \beta ^{2}}:[2]siN(μi,σi2),piN(si,β2).{\displaystyle s_{i}\sim {\mathcal {N}}(\mu _{i},\sigma _{i}^{2}),\qquad p_{i}\sim {\mathcal {N}}(s_{i},\beta ^{2}).}The parameterβ{\displaystyle \beta } controls how strongly a single game's result is expected to vary around underlying skill (smallerβ{\displaystyle \beta } implies less randomness in outcomes).[2]

For games with teams, the model assumes an additive team-performance structure: the latent performancetj{\displaystyle t_{j}} of teamj{\displaystyle j} is the sum of the performances of its members,tj:=iAjpi{\displaystyle \textstyle t_{j}:=\sum _{i\in A_{j}}p_{i}}, whereAj{\displaystyle A_{j}} is the set of players assigned to that team.[2] A match result is represented as a ranking of the teams (with ties possible), and the likelihood of the observed ranking is defined in terms of inequalities between the corresponding team-performance variables.[2]

Draws are incorporated by introducing a draw marginϵ>0{\displaystyle \epsilon >0}. For two adjacent teams in the ranking, a win is modelled by requiring the higher-ranked team to exceed the lower-ranked team by more thanϵ{\displaystyle \epsilon }, while a draw is modelled by requiring their performance difference to lie within±ϵ{\displaystyle \pm \epsilon }.[2] Herbrich and colleagues relatedϵ{\displaystyle \epsilon } to an assumed or empirically estimated draw probability, allowing the draw rate to be calibrated for a given game mode.[2]

Inference and rating updates

[edit]

TrueSkill represents the joint model of skills, performances and observed outcomes as afactor graph.[2] Inference is then framed as the computation of approximate single-variable marginals by message passing using thesum-product algorithm.[2] For a single match, most of the messages in the graph can be represented as one-dimensional Gaussians, which makes the update computationally efficient.[2]

A key difficulty is that the outcome constraints (wins and draws) introduce non-Gaussian terms. In the factor graph, these appear as comparison factors that impose inequalities on performance differences (and, for draws, on an interval around zero).[2] The corresponding exact messages are non-Gaussian; the TrueSkill paper appliesexpectation propagation (EP) to approximate these messages by moment matching, replacing the intractable truncated-Gaussian forms with Gaussian approximations that have the same mean and variance.[2][8] Because the comparison messages are approximate, the algorithm iterates message updates along paths connecting the affected variables until the approximate marginals stabilise.[2]

After inference for a match, each player's updated marginal remains Gaussian and can be summarised by an updated meanμ{\displaystyle \mu } and standard deviationσ{\displaystyle \sigma } for use in later matches.[2] The original system also includes adynamics variance parameter (often writtenτ2{\displaystyle \tau ^{2}}) to model skill changes over time between games.[2]

Parameterisation and rating display

[edit]

TrueSkill maintains a Gaussian belief distributionN(μi,σi2){\displaystyle {\mathcal {N}}(\mu _{i},\sigma _{i}^{2})} for each player's skill, but practical deployments must also choose a numerical scale and values for model parameters such as the performance noise and the rate at which skills are allowed to drift over time.[2] Herbrich and colleagues reported that Xbox Live used an initial prior scale withμ0=25{\displaystyle \mu _{0}=25} andσ02=(25/3)2{\displaystyle \sigma _{0}^{2}=(25/3)^{2}}, corresponding to a prior in which negative skills are unlikely.[2]

In the same deployment description, the variance of a single-game performance was set relative to the prior uncertainty asβ2=(σ0/2)2{\displaystyle \beta ^{2}=(\sigma _{0}/2)^{2}}, and the between-gamedynamics variance (modelling skill changes over time) was set toτ2=(σ0/100)2{\displaystyle \tau ^{2}=(\sigma _{0}/100)^{2}}.[2]

For player-facing displays such as leaderboards, Xbox Live displayed a conservative estimate rather than the posterior mean alone. The TrueSkill paper described displaying a player's skill as the 1% lower quantile of the belief distribution,μi3σi{\displaystyle \mu _{i}-3\sigma _{i}}.[2] Herbrich and colleagues wrote that this choice was intended to ensure that top positions in leaderboards were held by players who were both highly rated and estimated with high certainty, and noted that the prior implied an initial displayed value of0=μ03σ0{\displaystyle 0=\mu _{0}-3\sigma _{0}}.[2]

Worked example

[edit]

The following worked example illustrates how TrueSkill updates ratings after a three-player free-for-all match, using the model and win-update equations described by Herbrich, Minka and Graepel.[2] In TrueSkill, each playeri{\displaystyle i} has a skill belief distributionN(μi,σi2){\displaystyle {\mathcal {N}}(\mu _{i},\sigma _{i}^{2})}, whereμ{\displaystyle \mu } is the mean skill estimate andσ{\displaystyle \sigma } is the uncertainty.

After a match, the system combines the prior belief with the match result to obtain a new belief distribution (theposterior). Theposterior mean is the mean of that updated distribution, i.e., the system's best single-number estimate of the player's skill given the evidence so far.[2]

In the original Xbox Live scale described in the TrueSkill paper, a common choice for the performance noise isβ=25/6{\displaystyle \beta =25/6}.[2] This example uses that value and assumes no draws.

Displayed rating

[edit]

In some deployments, a conservative skill estimate is displayed rather than the posterior mean. The TrueSkill paper describes displaying the 1% lower quantile of the belief distribution, which for aGaussian distribution is approximatelyμ3σ{\displaystyle \mu -3\sigma }.[2] In plain terms, this is a value that the system expects the player's true skill to exceed about 99% of the time, given the current uncertainty. As a result, a player with a high mean but large uncertainty (largeσ{\displaystyle \sigma }) will have a lower displayed rating than a similarly rated player whose skill is estimated more confidently.[2]

Setup

[edit]

Three players enter a match with the following prior ratings:

Playerμ{\displaystyle \mu } (before)σ{\displaystyle \sigma } (before)Displayed (before)μ3σ{\displaystyle \mu -3\sigma }
Alice30612
Ben2574
Chloe208−4

The game finishes with Chloe in first place, Alice in second place, and Ben in third place.

How a three-player result is represented

[edit]

For a strict ranking with no ties, the factor-graph formulation represents the result using adjacent comparison constraints (Chloe beats Alice; Alice beats Ben).[2] There is no separate "Chloe beats Ben" factor in this minimal representation because it is implied by the ranking: if Chloe is above Alice and Alice is above Ben, then Chloe is above Ben.

In the full TrueSkill algorithm, message passing in the factor graph uses both adjacent constraints together to update all three players in a single inference problem (and may iterate).[2] The step-by-step calculations below apply the standard two-player win update to each adjacent comparison in sequence to make the mechanics transparent; Chloe's advantage over Ben is still reflected through the chain, because Chloe's win pushes Alice down, and Alice's win pushes Ben down.

Win update equations (two players)

[edit]

For a winnerw{\displaystyle w} and loserl{\displaystyle l}, definec=σw2+σl2+2β2,t=μwμlc.{\displaystyle c={\sqrt {\sigma _{w}^{2}+\sigma _{l}^{2}+2\beta ^{2}}},\qquad t={\frac {\mu _{w}-\mu _{l}}{c}}.}

Letϕ{\displaystyle \phi } andΦ{\displaystyle \Phi } be thestandard normal probability density and cumulative distribution functions. For a win (no draw margin), definev(t)=ϕ(t)Φ(t),w(t)=v(t)(v(t)+t).{\displaystyle v(t)={\frac {\phi (t)}{\Phi (t)}},\qquad w(t)=v(t){\bigl (}v(t)+t{\bigr )}.}

The mean updates areμw=μw+σw2cv(t),μl=μlσl2cv(t),{\displaystyle \mu _{w}'=\mu _{w}+{\frac {\sigma _{w}^{2}}{c}}v(t),\qquad \mu _{l}'=\mu _{l}-{\frac {\sigma _{l}^{2}}{c}}v(t),}and the uncertainty decreases according toσw2=σw2(1σw2c2w(t)),σl2=σl2(1σl2c2w(t)).{\displaystyle \sigma _{w}'^{2}=\sigma _{w}^{2}\left(1-{\frac {\sigma _{w}^{2}}{c^{2}}}w(t)\right),\qquad \sigma _{l}'^{2}=\sigma _{l}^{2}\left(1-{\frac {\sigma _{l}^{2}}{c^{2}}}w(t)\right).}

Step 1: Chloe beats Alice

[edit]

Using Chloe(μ=20,σ=8){\displaystyle (\mu =20,\sigma =8)} as winner and Alice(μ=30,σ=6){\displaystyle (\mu =30,\sigma =6)} as loser, withβ=25/6{\displaystyle \beta =25/6}:

Applying the update gives:

Step 2: Alice beats Ben

[edit]

The second comparison uses Alice's updated rating from step 1, with Ben unchanged so far. Alice(μ=25.61,σ=5.33){\displaystyle (\mu =25.61,\sigma =5.33)} beats Ben(μ=25,σ=7){\displaystyle (\mu =25,\sigma =7)}:

Applying the update gives:

Ratings before and after the match

[edit]
Playerμ{\displaystyle \mu } (before)μ{\displaystyle \mu } (after)σ{\displaystyle \sigma } (before)σ{\displaystyle \sigma } (after)Displayed (before)μ3σ{\displaystyle \mu -3\sigma }Displayed (after)μ3σ{\displaystyle \mu -3\sigma }
Alice3027.6664.891212.97
Ben2521.4875.9743.56
Chloe2027.8086.34−48.79

In this example, Chloe's first-place finish requires her performance to exceed Alice's, and Alice's to exceed Ben's, so the inferred skill distributions shift accordingly. Chloe's large increase inμ{\displaystyle \mu } reflects that her win over a substantially higher-rated opponent was unexpected under the prior ratings, while the decreases inσ{\displaystyle \sigma } reflect that the match provides additional information about the players' relative skills.

TrueSkill2

[edit]

In 2018, Microsoft researchers publishedTrueSkill2 as an extension of the original TrueSkill model that incorporates additional information available in some online games, beyond the final win–loss ordering.[4] The paper describes using signals such as player experience, squad membership, individual statistics (for example kill and death counts), quitting behaviour, and performance in other game modes, with the aim of improving the accuracy of inferred skills for matchmaking.[4]

The authors described TrueSkill2 as retaining the same interface as classic TrueSkill, a single-number skill rating intended to remain compatible with existing matchmaking systems, while changing how skills are inferred from historical data.[4] They described automatic parameter estimation from batches of historical matches, and two operating modes: an online mode that propagates skills forward in time and a batch mode that infers parameters and skills over all times (referred to asTrueSkill Through Time).[4]

Compared with classic TrueSkill, the paper lists model changes including: incorporating individual player statistics alongside team win/loss; treating mid-game quitting as a surrender for rating purposes; borrowing information across game modes by modelling skills as correlated between modes; and modelling skill evolution as biased towards improvement, especially in a player's early matches in a mode.[4] It adds an explicit squad effect, modelling players who queue together as performing better than they would individually.[4]

On match data fromHalo 5, the authors reported that TrueSkill2 improved prediction of historical match outcomes, giving 68% accuracy on their evaluation compared with 52% for the original TrueSkill system.[4]

Related models and extensions

[edit]

TrueSkill is part of a broader family of statistical rating systems used to infer relative strength from match outcomes. TheElo rating system, originally developed for chess, models win probabilities from rating differences and updates ratings after games based on the difference between observed and expected results.[9]

Glickman later proposed dynamic paired-comparison models that incorporate uncertainty in a player's estimated strength and allow that uncertainty to change over time, motivated in part by limitations of Elo-style updates for large competitive populations.[10] In a survey of ranking systems used in games and assessments, ETS researchers described TrueSkill as an Elo-style generalisation for multiplayer and team competitions that maintains uncertainty estimates alongside point ratings.[7]

Some related probabilistic models instead treat rankings as samples from a distribution that satisfiesLuce's choice axiom. Guiver and Snelson note that a Thurstonian model with independent score variables induces a Plackett–Luce distribution if and only if the scores follow a Gumbel distribution, and they contrast this with TrueSkill, which uses Gaussian score variables.[5] They state that TrueSkill's Gaussian-score model does not satisfy Luce's choice axiom, while noting its use in large-scale online rating systems.[5]

Several research extensions build on TrueSkill's probabilistic formulation. Dangauthier, Herbrich, Minka and Graepel proposedTrueSkill Through Time, which replaces the original filtering-style updates with smoothing over a time series of player skills, allowing retrospective estimation of skill trajectories while retaining uncertainty estimates and an explicit draw model.[11]

Patents and trademarks

[edit]

TrueSkill is patented,[12] and the name is trademarked, so it is limited to Microsoft projects and commercial projects that obtain a license to use the algorithm.[13] The patent is scheduled to expire on April 9th, 2029.[14]

See also

[edit]

References

[edit]
  1. ^Murphy, Kevin (2012).Machine Learning: A Probabilistic Perspective. MIT Press.ISBN 978-0262018029.
  2. ^abcdefghijklmnopqrstuvwxyzaaabacadaeafHerbrich, Ralf; Minka, Tom; Graepel, Thore (2006)."TrueSkill™: A Bayesian Skill Rating System"(PDF).Advances in Neural Information Processing Systems 19 (NIPS 2006). MIT Press. pp. 569–576. Retrieved11 January 2026.{{cite conference}}: CS1 maint: url-status (link)
  3. ^"TrueSkill™ Ranking System".Microsoft Research. Retrieved11 January 2026.
  4. ^abcdefghiMinka, Tom; Cleven, Ryan; Zaykov, Yordan (22 March 2018).TrueSkill 2: An improved Bayesian skill rating system(PDF) (Technical report). Microsoft Research. MSR-TR-2018-8. Retrieved11 January 2026.{{cite tech report}}: CS1 maint: url-status (link)
  5. ^abcGuiver, John; Snelson, Edward (2009)."Bayesian inference for Plackett–Luce ranking models"(PDF).Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). Association for Computing Machinery. pp. 377–384.doi:10.1145/1553374.1553423. Retrieved11 January 2026.{{cite conference}}: CS1 maint: url-status (link)
  6. ^Bishop, Christopher M. (13 February 2013)."Model-based machine learning".Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.371 (1984).doi:10.1098/rsta.2012.0222.PMC 3538442.PMID 23277612. Retrieved11 January 2026.
  7. ^abRotou, Ourania; Qian, Xiaoyu; von Davier, Matthias (July 2015).Ranking Systems Used in Gaming Assessments and/or Competitive Games(PDF) (Report). ETS Research Memorandum. Educational Testing Service. Retrieved11 January 2026.
  8. ^Minka, Thomas P. (2001)."Expectation propagation for approximate Bayesian inference"(PDF).Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI 2001). Morgan Kaufmann. pp. 362–369. Retrieved11 January 2026.{{cite conference}}: CS1 maint: url-status (link)
  9. ^Elo, Arpad E. (1978).The Rating of Chessplayers, Past and Present (2nd ed.). Arco Pub.ISBN 9780668047210. Retrieved11 January 2026.
  10. ^Glickman, Mark E. (1999)."Parameter estimation in large dynamic paired comparison experiments"(PDF).Journal of the Royal Statistical Society: Series C (Applied Statistics).48 (3):377–394.doi:10.1111/1467-9876.00159. Retrieved11 January 2026.
  11. ^Dangauthier, Pierre; Herbrich, Ralf; Minka, Tom; Graepel, Thore (2007)."TrueSkill Through Time: Revisiting the History of Chess"(PDF).Advances in Neural Information Processing Systems 20 (NIPS 2007). Retrieved11 January 2026.{{cite conference}}: CS1 maint: url-status (link)
  12. ^"US8538910B2 – Determining relative skills of players".Google Patents. Retrieved11 January 2026.
  13. ^"Copyright notice".Microsoft Learn. 28 April 2025. Retrieved11 January 2026.
  14. ^"US20090227313A1 – Determining relative skills of players".Google Patents. Retrieved11 January 2026.

External links

[edit]
Concepts
Methods and computer models
Elo family
Polls and opinion
By sport
People
Main
projects
Languages, compilers
Distributedgrid computing
Internet,networking
Other projects
Operating systems
APIs
Launched as products
MSR Labs
applied
research
Live Labs
Current
Discontinued
FUSE Labs
Other labs
Retrieved from "https://en.wikipedia.org/w/index.php?title=TrueSkill&oldid=1337743665"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp