key: cord-0203110-1y7f67xx
authors: Whelan, John T.; Klein, Jacob E.
title: Bradley-Terry Modeling with Multiple Game Outcomes with Applications to College Hockey
date: 2021-12-02
journal: nan
DOI: nan
sha: 667a03e28ef6ce5ef6255379472865c37fa2866c
doc_id: 203110
cord_uid: 1y7f67xx

The Bradley-Terry model has previously been used in both Bayesian and frequentist interpretations to evaluate the strengths of sports teams based on win-loss game results. It has also been extended to handle additional possible results such as ties. We implement a generalization which includes multiple possible outcomes such as wins or losses in regulation, overtime, or shootouts. A natural application is to ice hockey competitions such as international matches, European professional leagues, and NCAA hockey, all of which use a zero-sum point system which values overtime and shootout wins as 1/3 of a win, and overtime and shootout losses as 1/3 of a win. We incorporate this into the probability model, and evaluate the posterior distributions for the associated strength parameters using techniques such as Gaussian expansion about maximum a posteriori estimates, and Hamiltonian Monte Carlo.

The Bradley-Terry model [3, 18] has long been used for evaluating paired comparisons, such as games between pairs of teams in which one team or the other wins each game. The model assigns a strength parameter to each team, and the odds ratio associated with the probability of a team winning a game is equal to the ratio of the strengths. These strength parameters can be estimated based on the full set of game results and used to rank teams or make future predictions. The model has been extended by Davidson [7] to contests in which a tie or drawn contest is a possible outcome. For many years, such ties were a common occurrence in the sport of ice hockey, but recently tie-breaking methods such as an overtime period played under different rules and/or a shootout in which the teams alternate penalty shots are used to determine a winner. Results in overtime or shootouts can be evaluated differently from wins in regulation play. For instance, since 2006 [8] competitions organized by International Ice Hockey Federation (IIHF) have awarded three points in the standings to a team winning in regulation, two points for a win in overtime or a shootout, one point for a loss in overtime or a shootout, and no points for a loss in regulation, and many leagues have followed suit. Compared to the prior system which awarded two points for a win, one for a tie, and none for a loss, which effectively treated a tie as half a win and half a loss, the four-outcome system treats an overtime/shootout win as 2/3 of a win and 1/3 of a loss. 1 One possible approach to either of these situations (games with three or four outcomes) is to use standard Bradley-Terry and assign fractional wins as appropriate to the point system (see, e.g., [17] ). However, this is unsatisfying, as it provides no way to assign a probability for a future game to end in a tie or overtime or shootout result. In this paper, we instead consider a generalization of the tie model of [7] which associates one strength parameter to each team, along with a single parameter describing the tendency for games to go into overtime.

The rest of this paper is organized as follows: In Section 2, we describe the three models (standard Bradley-Terry, Bradley-Terry-Davidson including ties, and a new model with four possible game outcomes including overtime/shootout wins and losses), and exhibit a generalization of the relevant formulas which describes all three cases. In Section 3 we describe methods for inferring the relevant parameters of these models given a set of game results: maximum likelihood estimation, and Bayesian inference using either a Gaussian approximation or Hamiltonian Monte Carlo. In Section 4 we demonstrate these methods using a recent set of game results: the 2020-2021 Eastern College Athletic Conference (ECAC) season. This season used the standard IIHF system with 3-2-1-0 points assigned for regulation wins, overtime/shootout wins, overtime/shootout losses, and regulation losses, respectively. For the purposes of illustration, we evaluate the ECAC results with the four-outcome model as well as with the other two models, treating in one case all wins the same, and in the other all overtime/shootout results as ties.

In the standard Bradley-Terry model [3, 18] each team has a strength π i ∈ (0, ∞), and the modelled probability that team i will win a game with team j is θ W ij = π i π i + π j (2.1) 1 We do not consider non-zero-sum point systems such as that used in association football (soccer) which awards three points for a win and one for a draw, so that drawn matches are only worth two points total rather than three. Likewise, the National Hockey League awards all wins two points and overtime/shootout losses one point; this 2-2-1-0 system awards three total points for games which go into overtime, but only two for games decided in regulation.

so that the probability of a set of game outcomes D in which team i plays team j n ij times and wins n W ij of those games is 2

where t is the number of teams, [7] proposed an extension for competitions which include the probabilities of ties, in which the probabilities of the three possible outcomes of a game are

where ν ∈ [0, ∞) is an additional parameter which describes how likely ties are to occur. (The probability of a tie in a game between evenly matched teams is ν 2+ν .)

The probability of a given set of game outcomes in which the n ij = n W ij + n T ij + n L ij games between teams i and j result in n W ij wins, n T ij ties and n L ij losses for team i (where n T ij = n T ji and n L ij = n W ji ) is

We propose an extension appropriate for a system in which a win in overtime or a shootout is treated as 2/3 of a win and 1/3 of a loss, Writing the four possible game outcomes as RW for regulation win, OW for overtime/shootout win, OL for overtime/shootout loss, and RL for regulation loss, the modelled probability of 2 The first form explicitly includes each pair of teams only once, while the second corrects for the double-counting, taking advantage of the fact that n W ii = 0 = n L ii . If the order of the games between pairs of teams is ignored, the sampling distribution for the {n W ij } is instead

(2.5d)

The probability for a set of game outcomes will then be

If we write λ i = ln π i ∈ (−∞, ∞) and τ = ln ν ∈ (−∞, ∞), we can describe all three models as special cases of a general model in which the probability of a game between teams i and j ending in outcome I is

is a vector equivalent of the logistic function known as the softmax function. [4] The probability for a set of game outcomes is

Specifically,

• For the standard Bradley-Terry model, p W = 1, p L = 0, and o W = o L = 0.

• For the Bradley-Terry-Davidson model with ties, p W = 1, p T = 1 2 , p L = 0, o W = o L = 0, and o T = 1. 3 The exponents are chosen to correspond to the share of the points (2/3 and 1/3, respectively) awarded for an overtime/shootout win or loss. This has the desirable feature that the maximum likelihood equation (3.7) becomes (after multiplying by 3)

i.e., that the expected number of points for each team equals the actual number. See also the discussion in Section 5 about possible alternative models, including extended models in which the exponents are not fixed, but inferred from the data.

• For the model introduced in this paper, p RW = 1, p OW = 2 3 , p OL = 1 3 , p RL = 0, o RW = o RL = 0, and o OW = o OL = 1. All of these models satisfy I θ I ij = 1, and have "opposite" outcomes I and −I such that θ −I ij = θ I ji , p −I = 1 − p I , and o −I = o I . They also satisfy 0 ≤ p I ≤ 1 and o I ∈ {0, 1}. We confine ourselves below to cases where these properties hold.

Maximum likelihood estimates (MLEs) of Bradley-Terry strength parameters [18, 9, 7] provide a straightforward way of associating a "rating" to each team based on their game results, and have been proposed as a replacement for less reliable ways of evaluating a team's game results in light of the difficulty of their schedule. [5] We can consider the probability P (D|{π i }, ν) = P (D|{λ i }, τ ) as a likelihood function of the parameters {λ i } and τ , with log-likelihood

We can use the identity

and

Using these, we can write the maximum likelihood equations as

and

can be interpreted in the models considered as the number of games which are tied or go to overtime, respectively, and

can be seen as the total number of "points" for team i. The maximum likelihood equation set each of these quantities equal to their expectation values. We can solve the maximum likelihood equations by a generalization of the iterative method in [9] . writing them

(where we have used the fact that the only non-zero term in the numerator has o I = 1) and

As in the standard Bradley-Terry model, the overall multiplicative scale of π k is undefined (because θ I ij can be written so that the team strengths appear only in the combination π j /π i ), so it is necessary to rescale the team strengths at each iteration to preserve a property such as t i=1 π i = 1. Beyond that, there are conditions for the maximum likelihood estimates to be finite and well-defined, which are explored in e.g., [2, 14, 6] .

It is useful to move beyond maximum likelihood estimates, both to quantify uncertainty in the model parameters, and to make predictions about the outcome of future games. (For instance, [17] proposed simulating future games with probabilites drawn from a posterior distribution capturing the uncertainty in the strength parameters, rather than fixed probabilties generated from the MLEs of those parameters.)

A convenient framework for parameter estimates including uncertainties is Bayesian inference, which defines the posterior probability density for the parameters {π i } and ν, or equivalently {λ i } and τ , given a set of game results D and prior assumptions I, as

A variety of choices can be made for the multivariate prior distribution on {λ i } [16] in the Bradley-Terry model, and likewise for the tie/overtime parameter τ .

For simplicity, we work in this paper with the improper Haldane prior 4

which means that the posterior distribution is proportional to the likelihood:

With this choice of prior, the posterior probability density will be independent of the combination t i=1 λ i , but otherwise will be normalizable under the same circumstances that lead to well-defined maximum likelihood estimates for the parameters.

One convenient approach is to Taylor expand the log-posterior ln f ({λ i }, τ |D, I) about the maximum a posteriori solution (which in this case is the maximum likelihood solution { λ i }, τ ). 5 Truncating the expansion at second order gives a Gaussian approximation

where H is the (t + 1) × (t + 1) Hessian matrix

(3.17c) 4 So named because the marginal prior distribution for probabilities such as θ ij will follow the Haldane prior [10, 11] , which is the limit of a Beta(α, β) distribution as α, β → 0. 5 Note that this method does not assign special significance to the MAP estimates, but uses them as the starting point for a convenient approximation to the posterior probability distribution. 6 Note the similarity to the Fisher information matrix

, which differs from the Hessian in that that H ij depends on the observed data, while I ij is a function defined on parameter space.

To compute the elements of the Hessian matrix, we return to the first derivative (3.4) and differentiate them to get

is the probability of a tie or overtime game, depending on the model, and we have used the fact that o 2

Similarly, using the properties

and, finally, differentiating (3.5) gives us

so that the Hessian matrix has components

Note that in the case of the Bradley-Terry model, where the only outcomes are win and loss, the condition o I = 0 simplifies the Hessian to H τ τ = H τ k = 0 (since the τ parameter is not actually part of the likelihood), and

which is the form seen in, e.g., [17] . The Hessian matrix in (3.22) is singular, since t =1 H τ = 0 and t =1 H k = 0, which ultimately arise from the fact that the probabilities {θ I ij }, and thus the likelihood, are unchanged by adding the same constant to all the {λ i }. This can be handled computationally by constructing a variance-covariance matrix Σ = H + which is the Moore-Penrose pseudo-inverse [12] 7 of the Hessian matrix, and approximating the posterior as a multivariate Gaussian with a mean of { λ i }, τ and a variance-covariance matrix Σ. This has the effect of enforcing the constraint

on the combination of the parameters which has no influence on the model. This Gaussian approximation can be used to produce analytic estimates of quantities of interest, or used for Monte Carlo sampling, as illustrated in Section 4. It can also be used as a starting point for importance sampling of the sort discussed in [17] . For the present work, we consider a different Monte Carlo method for sampling from the exact posterior.

Markov-chain Monte Carlo methods provide a convenient way to draw samples from a posterior distribution. We demonstrate in this paper how to draw posterior samples for the Bradley-Terry extensions considered, using Hamiltonian Monte Carlo as implemented in the Stan library. [15] There are a few technical considerations. Because the posterior on {λ i } and τ is improper, trying to draw from it directly will lead to chains which never converge. Any probabilities constructed from the samples will be well-behaved, since only the meaningless degree of freedom t i=1 λ i is unconstrained, but these apparent errors make it more difficult to detect other potential problems. It is thus useful instead to consider only variables γ ij = λ i − λ j (and τ ) which contribute to the probability model via (see (2.7))

Of course, the full set of t(t−1) 2 values γ ij are not independent. Instead, they are determined by the t − 1 parameters

In Appendix A we show the code of the Stan model used to perform Hamiltonian Monte Carlo simulations of all three models.

We now illustrate the application of the models described in this paper using game results from a competition which used the 3-2-1-0 point system: the 2020-2021 Eastern College Athletic Conference (ECAC) season. While the league ordinarily plays a balanced round-robin schedule in which each team plays each other team the same number of times, the season in question ended up being unbalanced due to cancellations of games arising from the COVID-19 pandemic. In Table 1 we show the results for the ECAC season, in the form of n RW ij and n OW ij for each team against each opponent, along with the total number of results of each type for each team, n I i = t j=1 n I ij .

7 For a real symmetric matrix with a complete eigenvalue decomposition, this operation replaces each non-zero eigenvalue with its reciprocal while leaving zero eigenvalues unchanged. for each team against each opponent. From these we can derive the total number of results of each type (RW, OW, OL and RL) for each team, which are used, for example, to generate the standings in the 3-2-1-0 point system. 

Note that the information included in Σ ij is also influenced by the constraint t i=1 λ i = 0, so for example the anti-correlation of the different log-strengths is somewhat artificial. We also show the maximum-likelihood estimates { θ W ij } for the head-to-head win probabilities between pairs of teams. 

As a first demonstration, we consider the standard Bradley-Terry model applied to the ECAC results with regulation and overtime/shootout wins being counted as simply "wins" and regulation and overtime/shootout losses being counted as "losses". I.e., we define n W ij = n RW ij + n OW ij and n L ij = n RL ij + n OL ij . The resulting maximum-likelihood solutions { λ i } and associated probabilities { θ W ij } are shown in Table 2 , along with the uncertainties and correlations encoded in the variance-covariance matrix {Σ ij } of the Gaussian approximation to the posterior distribution.

Since the log-strengths {λ i } have an arbitrary additive scale, a more meaningful understanding of the posterior distributions is obtained by considering the marginal distribution of the difference of a pair of team strengths γ ij = λ i − λ j . In Figure 1 , we illustrate the maximum likelihood estimate and posterior distribution of this quantity for two of the six pairs of teams: Quinnipiac-Colgate Figure 1 . Note that, due to the transformation of the probability density, the maximum of the probability density in this parameter is not the maximum likelihood value as it was in Figure 1 .

and Quinnipiac-Clarkson. We show the posterior in Gaussian approximation (for which the marginal posterior on γ ij is also a Gaussian), in a Monte Carlo drawn from the approximate multivariate Gaussian distribution, and in posterior samples from the exact posterior generated using Hamiltonian Monte Carlo with the Stan library. [15] We can transform the posterior on a difference γ ij in log-strength into a posterior on the corresponding probability θ W ij = logistic(γ ij ); this is shown in Figure 2 for the two sets of posterior samples. In all cases, the exact marginal posterior, as estimated by the Hamiltonian Monte Carlo is only slightly different from Table 3 . The maximum likelihood estimates for the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. The maximum likelihood estimates { λ i } and τ of the log-strengths and log tie parameter are used to compute the estimated probability θ W ij for a win and θ T ij for a tie between each pair of teams. Note that the estimated probability of a game between evenly-matched teams to end in a tie is e τ 2+e τ = 0.39, and it is lower the more different the two teams' strengths are. Table 4 . The parameters of the the Gaussian approximation to the posterior distribution for the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. In addition to the log-strength parameters considered for the Bradley-Terry model in Table 2 , there are uncertainties and correlations associated with the log-tie parameter τ . the Gaussian approximation. This is similar to results found using importance sampling in [17] .

Moving on to the Bradley-Terry-Davidson model with ties, we now consider inference of the log-strength parameters {λ i } along with the log-tie parameter τ . We illustrate the methods by reanalyzing the 2020-2021 ECAC results, with all overtime games treated as ties, so that now n W ij = n RW ij , n T ij = n OW ij + n OL ij , and n L ij = n RL ij . The maximum likelihood solutions { λ i } and τ are shown in Table 3 , along with the associated probabilities { θ W ij } for a win and { θ T ij } for a tie in contests between pairs of teams. In Table 4 , we show the maxumum-likelihood estimates along with the uncertainties in and correlations among the log-strengths {λ i } and the log-tie parameter τ , which are encoded in the variance-covariance matrix {Σ ij , Σ iτ , Σ τ τ } of the Gaussian approximation to the posterior distribution. As with the standard Bradley-Terry model, we can show the marginal posterior distributions on the differences {γ ij = λ i − λ j } between pairs of log-strength parameters, and we do this in Figure 3 for the same pairs of teams as before. Once again, samples drawn from the multivariate Gaussian approximation capture the shape of that distribution well, and samples drawn from the exact posterior using Hamiltonian Monte Carlo are slightly different but similar.

We cannot convert γ ij directly into a probability, however, since probabilities depend on the log-tie parameter τ as well. In Figure 4 we plot the marginal posterior on τ . The parameter τ can be transformed into a probability ν 2+ν where (ν = e τ ) for a game between evenly-matched teams to be tied, and we plot the posterior for this as well. Finally, in Figure 5 we illustrate the joint marginal posterior in γ ij and τ for our selected pairs of teams.

To illustrate the posterior on the probabilities {θ I ij |I = W, T, L} for a pair of teams, we note that the constraint I θ I ij = 1 means that the space is actually two dimensional. The natural visualization for the behavior of three quantities which sum to one is a ternary plot, and we contour plot density estimates of the posterior and its Gaussian approximation in Figure 6 , along with the maximum likelihood estimates { θ I ij }.

Having developed the mechanisms to characterize the posterior distribution for the Bradley-Terry-Davidson model with three outcomes (win, tie, and loss), we apply similar analogues for the model with four outcomes: regulation win (RW), overtime/shootout win (OW), overtime/shootout loss (OL), and regulation loss (RL), now applied to the full 2020-2021 ECAC results shown in Table 1 . As before, there is a log-strength parameter λ i for each team, and τ is now the log of a parameter associated with overtime results. We show the maximum likelihood estimates in Table 5 along with the probabilities { θ RW ij } for a regulation win { θ OW ij } for an overtime/shootout win in contests between pairs of teams. In Table 6 we show the parameters of the Gaussian approximation to the posterior.

As in Figure 6 . Ternary plots illustrating the joint posterior on θ W ij , θ T ij , and θ L ij , based on the Bradley-Terry-Davidson model applied to the 2020-2021 ECAC results, with all overtime games counted as ties. The horizontal gridlines correspond to lines of constant θ T ij , with θ T ij = 1 labelled as "Tie"; the diagonal gridlines correspond to lines of constant θ W ij or θ L ij , with θ W ij = 1 labelled with the abbreviation for team i ("Qn" for Quinnipiac in both cases) and θ L ij = 1 labelled with the abbreviation for team j ("Cg" for Colgate and "Ck" for Clarkson). The red triangle is the maximum likelihood point θ I ij . Note that for a given set of game results, the maximum likelihood point for all pairs of teams will lie along a one-dimensional curve in the Ternary plot. since, for a fixed τ , the maximum-likelihood probabilities are functions of the single value γ ij . The three sets of contours are as defined in Figure 5 . Note that the MLE is no longer the maximum of the posterior probability density after tranforming parameters from γ ij , τ to θ W ij , θ T ij , and Table 5 . The maximum likelihood estimates for a Bradley-Terry-like model with four game outcomes applied to the 2020-2021 ECAC results. The maximum likelihood estimates { λ i } and τ of the log-strengths and log overtime parameter are used to compute the estimated probability θ RW ij for a regulation win and θ OW ij for an overtime/shootout win between each pair of teams. Note that the estimated probability of a game between evenly-matched teams to got to overtime is e τ 1+e τ = 0.38, and it is lower the more different the two teams' strengths are.. the log-overtime parameter τ or equivalently the probability ν 1+ν where (ν = e τ ) for a game between evenly-matched teams to go to overtime (Figure 8) , and the joint marginal posterior in γ ij and τ for our selected pairs of teams (Figure 9 ). Table 6 . The parameters of the the Gaussian approximation to the posterior distribution for the Bradley-Terry-like model with four game outcomes applied to the 2020-2021 ECAC results. In addition to the log-strength parameters {λ i }, there are uncertainties and correlations associated with the log-overtime parameter τ .. Curves are as defined in Figure 1 ..

The posterior distribution on the probabilities {θ I ij |I = RW, OW, OL, RW} is more difficult to visualize, because we have four probabilities which sum to 1, so the posterior can be thought of as defined on the interior of a tetrahedron, which is an example of an Aitchison simplex [1] . However, since all four probabilities are determined by two parameters γ ij and τ , they must lie on a (curved) twodimensional subsurface of the simplex, defined by the constraint Figure 10 we illustrate one possibility for a two-dimensional plot of the marginal posterior on θ I ij , by plotting posterior density contours in θ W ij = θ RW ij + θ OW ij (the probability of any sort of a win) and θ O ij = θ OW ij + θ OL ij (the probability of an overtime result). This has the conceptual advantage that each side of the square corresponds to an edge of the tetrahedrom, and each vertex of the square corresponds to a vertex of the tetrahedron, at which θ I ij = 1 for some result I. However, the conversion of a point θ W ij , θ O ij into θ I ij is nontrivial and cannot be written in closed form, so further investigation of methods of presenting the posterior is called for. Figure 9 . Samples from the joint posterior probability density of the log-strength differences γ ij = λ i − λ j shown in Figure 7 and the log-overtime parameter shown in the left panel of Figure 8 . Contours are as defined in Figure 5 ..

We have defined a generalization of Davidson's extension to the Bradley-Terry outcome that handles the set of game outcomes currently distinguished in ice hockey: regulation wins, overtime/shootout wins, overtime/shootout losses, and regulation losses. We've explicitly computed maximum likelihood estimates, constructed a Gaussian approximation to the likelihood, and drawn posterior samples directly 

transformed from the joint probability on γ ij and τ shown in Figure 9 . Each point on this plot can be converted into a set of probabilitues {θ I ij |I} using the relations , when applied to the same set of results (albeit with overtime/shootout results interpreted differently). However, these computations are not meant to determine a "best" model, but to illustrate the capabilities of the algorithm. (By definition, we consider the appropriate model to be the one that corresponds to how the league actually assigns values to the results of games in the standings.) We now wish to discuss some limitations of the work to date, and possible approaches to address them: the use of an improper non-informative Haldane prior, the choice of the parameters {p I } in the probability model, and the application of the model to predict the outcomes of playoff games, which may not be played under the same conditions with overtime and shootouts.

First, for simplicity, we worked with a non-informative Haldane prior which was uniform in the log-parameters {λ i } and τ , so that the posterior probability distribution in those variables was proportional to the likelihood. There are a number of options for normalizable priors on the distribution of log-strengths {λ i } in the Bradley-Terry model (see [16] for a discussion), of which two promising options are a Gaussian prior or a generalized logistic prior [13, 17] , each of which has a hyperprior which can be fixed to previous seasons' data or estimated in a hierarchical model as in [13] . Similar options suggest themselves for the prior on the log-overtime parameter τ , although the situation is somewhat different in that τ has a meaningful origin, so one has to consider a possible location parameter. In particular, it's not clear whether the most natural "origin" for ν = e τ is 1, 2, or something else.

Second, we made something of an arbitrary choice by setting p OW = 2 3 and p OL = 1 3 . In the Bradley-Terry-Davidson model with ties, the requirement that p T = p −T = 1 − p T means p T = 1 2 is the only option, as there is only one zero-point system in the three-outcome model. With four outcomes, however, p OW = 2 3 is a choice. This choice was of course informed by the point system used for the standings, so that the maximum likelihood equations would enforce that the expected number of points for each team equals its actual number. Other point systems are possible, however. In an earlier experiment with shootouts the Central Collegiate Hockey Association awarded 5 points for a win in regulation or overtime, 3 for a shootout win, 2 for a shootout loss, and 0 points for a loss in regulation or overtime, so analysis of that season might have used p SW = 3 5 and p SL = 2 5 . Similarly, the NCAA, for tournament selection purposes, considers a win in 3-on-3 overtime worth 0.55 of a win, and treats games decided in a shootout as a tie. Capturing this in a model would require two parameters in addition to the team-strengths: one for overtime games and one for ties, and would have parameters like p RW = 1, p OTW = 0.55, p SO = 0.50, p OTL = 0.45, and p RL = 0. One avenue for future investigation would be to define an extended model in which the unconstrained values of {p I } are treated as additional parameters to be estimated from the data. For instance, in the four-outcome model, p OW could be treated as a parameter with prior support on the interval 1 2 < p OW < 1. Finally, the model has assumed all games are played under the same conditions, with 3-on-3 overtimes and shootouts. However, in a number of hockey leagues, playoffs and other postseason games are played to conclusion with overtimes played under the same set of rules with a full squad on the ice, and shootouts are not possible, To produce probabilities for such a game, one would have to decide what probability to assign to a win or a loss. The natural model is probably to use θ P OW ij = πi πi+πj , i.e., the conditional probability of winning a game given that it's not decided in (3-on-3) overtime or a shootout. Likewise if any playoff games are included in the results used for inference, their contribution to the likelihood would need to be adjusted.

Here we show the Stan model implementing the family of Bradley-Terry-like models described in this paper. The generalization allows a single Stan dynamic shared object (DSO) [15] to be used for all three models. This is computationally convenient, because compiling the the DSO is often the most time-consuming part of a Stan simulation. 

The statistical analysis of compositional data

On the Existence of Maximum Likelihood Estimates in Logistic Regression Models

Rank Analysis of Incomplete Block Designs: The Method Of Paired Comparisons

Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition

Ken's Ratings for American College Hockey. HOCKEY-L mailing list post

The existence of maximum-likelihood estimates in the Bradley-Terry Model and its extensions

On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments

Tie games are history; a win earns three points for teams. IIHF 100 Top Stories of the Century

Solution of a Ranking Problem from Binary Comparisons

A note on inverse probability

Theory of probability

A generalized inverse for matrices

Hierarchical Bayesian Bradley-Terry for Applications in Major League Baseball

Anderson's Conditions for the Existence of Maximum Likelihood Estimates in Logistic Regression Models

RStan: the R interface to Stan

Prior Distributions for the Bradley-Terry Model of Paired Comparisons

Prediction and Evaluation in College Hockey using the Bradley-Terry-Zermelo Model

Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung

Gabriel Phelan, the attendees of the UP-STAT conferences, and the members of the Schwerpunkt Stochastik at Goethe University, Frankfurt am Main, for useful discussions

School of Mathematical Sciences and Center for Computational Relativity and Gravitation

School of Mathematical Sciences, Rochester Institute of Technology, 85 Lomb Memorial Drive