key: cord-0588465-vwesr5qo authors: Delbianco, Fernando; Fioravanti, Federico; Tohm'e, Fernando title: The Relative Importance of Ability, Luck and Motivation in Team Sports: a Bayesian Model of Performance in Rugby date: 2021-09-27 journal: nan DOI: nan sha: aaba4736482c773cc5c6175c8f3bb3c291642729 doc_id: 588465 cord_uid: vwesr5qo Results in contact sports like Rugby are mainly interpreted in terms of the ability and/or luck of teams. But this neglects the important role of the {em motivation} of players, reflected in the effort exerted in the game. Here we present a Bayesian hierarchical model to infer the main features that explain score differences in rugby matches of the English Premiership Rugby 2020/2021 season. The main result is that, indeed, {em effort} (seen as a ratio between the number of tries and the scoring kick attempts) is highly relevant to explain outcomes in those matches. The development of mathematical models of sports faces many obstacles. Assessing the potential impact of unobservable variables and establishing the right relations among the observable ones are the main sources of hardships for this task. Even so, the study of team sports data has become increasingly popular in the last years. Several models have been proposed for the estimation of the parameters (characteristics) that may lead to successful results for a team, ranging from machine learning methods to predict outcomes (Strumbelj and Vacar, 2012 , Assif and McHale, 2016 , Baboota and Kaur, 2019 , fuzzy set representations (Hassanniakalager et al. 2020 ) and statistical models (Dyte and Clark 2000 , Goddard 2005 , Boshnackov, 2017 to Bayesian models (Baio and Blangiardo 2010 , Constantinou et al., 2012 , Wetzels et 2 al., 2016 , Santos-Fernández and Mergensen, 2019 . One of the main issues in the study of sports is to disentangle the relative relevance of the possible determinants of outcomes. While ability and luck constitute, at least for both the press and the fan base, the main explanatory factors of the degree of success in competitions, the motivation of players is usually invoked only to explain epic outcomes or catastrophic failures. One possible reason for the neglect of motivation is that, unlike ability and luck, it is hard to assess. Nevertheless, it is known that interest in a sport induces a higher level of effort in the play (Yukhymenko-Lescroart, 2021) . Accordingly, in this paper we define a particular notion of effort in Rugby games as a proxy for motivation and develop a Bayesian model of the final scores of teams in a tournament. These outcomes will be explained by several variables, among which we distinguish the ability of the teams and the effort exerted by them. We also include as explanatory variables other possible sources of psychological stimuli, as to capture a pure motivation to win, separated from those other factors. Some advantages of using Bayesian techniques to model sports are that beliefs or expert information can be incorporated as priors, to obtain posterior distributions of the parameters of interest, easily updated when new data becomes available, dealing more effectively with small datasets. In our case, we propose a Bayesian hierarchical model to explain score differences on a rugby match, i.e, the difference between home team points and away team ones. The main parameters of the model are the ability of teams, the effort exerted by them and the advantage (or disadvantage) of home teams. There are many papers that use Bayesian methods to model the score of a rugby game. Stefani (2009) finds that the past performance is better predictor of score difference than of the score total, and suggest that teams should focus strategy on score differences (to win or draw) rather than in score total. Pledger and Morton (2011) use Bayesian methods to model the 2004 Super Rugby competition and explore how home advantage impacts the outcomes. Finally, Fry et al. (2020) propose a Variance Gamma model where analytical results are obtained for match outcomes, total scores and the awarding of bonus points. The main difference between these works and ours, is that their primary goal is to predict outcomes, while ours is to explain them. The ability of a team can be conceived as its "raw material". The skills of its players, the expertise of its coaches and its human resources in general (medical staff, managers, etc.) constitute the team's basic assets. Their value can vary during a season due to injuries, temporary loss of skilled players called to play for the national team, players leaving the team, etc. In this model we assume that the capabil-ities of teams do not change much from a season to the next. Accordingly, the ability of a team at the start of the season is assumed to be a at a bounded distance from the performance in the previous season. Luck in games and sports has been largely studied, from philosophical perspectives (Simon, 2007 , Morris, 2015 to statistical ones (Denrell and Liu, 2012, Pluchino et al., 2018) . Maubossin (2012) define that games that are high in luck are the ones that are highly unpredictable, is not able to achieve great advantages through repetition and the 'reversion to the mean' effect in performance is high. Elias et al (2012) and Gilbert and Wells (2019) define many types of luck, that we will later introduce. In this model, following the line from the later authors, we consider that luck is when the unexplained part differs markedly from the mean value in the distribution of noises. That is, a series of unobserved variables have a huge impact on the outcome. Effort, in turn, can be conceived as the cost of performing at the same level over time and staying steadily engaged on a determinate task (Herlambang et al 2021) . Different measures of effort can be defined. In the context of decision making, effort can be the total number of elementary information processing operations involved (Kushnirok, 1995) or the use of cognitive resources required to complete a task (Russo and Dosher, 1983, Johnson and Payne, 1985) . Another measure of effort (or lack of it) can be defined in terms of the extent of anchoring in a self-reported rating scales, that is, the tendency to select categories in close proximity to the rating category used for the immediately preceding item (Lyu and Bolt, 2021). In order to define a measure of the effort exerted by a rugby team, we follow the lead of Lenten and Winchester (2015), Butler et al. (2020) and Fioravanti et al. (2021) . These works analyze the effort exerted by rugby teams under the idiosyncratic incentives induced in this game. Besides gaining points for winning or drawing in a game, teams may earn "bonus" points depending on the number of times they score tries on a game. Accordingly, any appropriate effort measure should also be defined taking into account the number of tries. In our model the effort is measured as the ratio between the number of tries scored and the sum of tries and scoring kicking attempts 1 . Several studies detected the relevance of home advantage, i.e. the benefit over the away team of being the home team. Schwartz and Barsky (1977) suggested that crowds exert an invigorating motivational influence, encouraging the home side to perform well. Still, a full explanation of this phenomenon requires taking into account the familiarity with the field of the home team, the travel fatigue of the away 1 By scoring kicks we refer to conversions, penalty and drop kicks; and by kick attempt we mean every time the team decided to do a scoring kick, no matter the outcome. The plan of the paper is as follows. In section 2 we present the data of the English Premiership championship in Rugby, played in 2020/2021. Section 3 presents a Bayesian hierarchical model of the variables that explain the difference of scores in that championship. Section 4 runs a statistical descriptive analysis of the matches of the Premier League in the light of the variables defined in the Bayesian model. Section 5 presents the results of estimating our model with the data of the Rugby Union competition. Section 6 considers the outliers found in the previous section, treating them as the result of luck in those games. We assess the aspects that justify considering them as instances of luck. Finally section 7 discusses the results obtained. The Premiership Rugby championship is the top English professional Rugby Union competition. The 2020/2021 edition was played by 12 teams. The league season comprises 22 rounds of matches, with each club playing each other home and away. The top 4 teams classify to the playoffs. Four points are awarded for the winning team, two to each team in case of a draw, and zero points to the loser team. However, a bonus point is given to the losing team in case the score difference is less than eight points, and also teams receive a bonus point in case they score four or more tries. During this season, if a game was canceled due to Covid-19, two points were awarded to the team responsible, and four to the other, while the match result was deemed to be 0 − 0. The 2020/2021 season was won by the Harlequins, who claimed their second title after ending in the fourth league position. The score, number of tries, converted tries, converted penalties, attempted penalties, converted drops, attempted drops and attendance at each of the 122 games of the 2020/21 Premiership season have been taken from the corresponding Wikipedia entry 3 . We generate the priors of our Bayesian model based on the final ranking from the 2019/20 season as follows. An attack and defense ranking is built using the number of points scored and received by each team: the team with most tries scored and less points received is ranked first in both rankings. These rankings are then normalized, and their corresponding means are computed. We base our model in a previous work of Kharratzadeh (2017) that models the difference in scores for the English Premier League. The score difference in game g, is denoted as y g , and is assumed to follow a t student distribution, where a dif f (g) is the difference in the ability of the teams, ef f dif f (g) is the difference in the effort exerted and ha(g) the home advantage at game g. The distribution of score differences is σ y . We give it a N (0.5, 1) prior. In turn, we assign a prior Gamma(9, 0.5) to the distribution of degrees of freedom ν. 4 We model the difference in abilities as follows: where a hw(g),ht(g) is the ability of the home team in the week where the game g is played (analogously for the away team). We assume that the ability may vary during the season, a hw(g),ht(g) = a hw(g)−1,ht(g) + σ · η hw(g),ht(g) f or hw ≥ 2 where σ and η have weak informative priors N (0, 0.1) and N (0, 0.5), respectively. The model is analogous for the away team. The abilities for the first week depend on the previous performance of the teams, where β prev is given the weakly informative prior N (0.5, 1) and prevperf (j) is the previous performance of team j. This value is obtained in the following way: first, two rankings of tries scored and received during the last season are built, where being on the top in both rankings means that the team scored the most and received the least tries in the last season. Then, the two rankings are normalized and averaged. The variable that captures the relative motivations of the teams in the game is the difference in efforts: where β ef f ort has a N (0.5, 1) prior and the effort of the home team is defined as (analogously for the away team): number of home tries in game g number of home tries + attempted scoring kicks in game g . Our intention with this is to capture the idea that scoring tries demands more effort than other means of scoring points, and motivated teams try to maximize this value. Finally, to capture home advantage, we consider both the attendance and non-attendance (such as the weather, long trips to play the game, etc.) effects . ha(g) = β home + β atten · atten(g), where β home and β atten have N (0.5, 1) priors and atten(g) is 0 if no fans were allowed and 1 otherwise. A graphical representation of the model is depicted in Figure 1 . To ensure robustness in our results we work with four different models. Model I does not include attendance as a variable of home advantage. Model II includes the attendance variable, while Model III, incorporates a day variable day(g), with a N (0.5, 1) prior, which has value 1 if the game was played on Saturday or Sunday and 0 otherwise. This day variable allows to find out whether playing on a day in which almost all the fans can attend the game benefits either the home or the away team. Finally, Model IV includes a variation of prevperf (j), where instead of the tries, we use total points scored and received by each team. On Table 1 and Table 2 we can see the descriptive statistics for home and away teams respectively where Score, T, C, P, D, AC, AP, AD and eff indicate, respectively, total points, tries scored, conversions scored, penalty kicks scored, drop kicks scored, attempted conversions, attempted penalty kicks, attempted drop kicks and effort exerted. 12.00 7.00 6.00 0 12.00 7.00 0 0.55 4.00 3.00 2.00 0.00 4.00 3.00 0.00 0.5 Max. 62.00 9.00 6.00 5.00 1.00 8.00 5.00 2.00 0.66 Figure 2 shows that most of the score differences are around zero (even though there very few draws obtained in the championship). This indicates that games usually end with little difference. Figures 3a and 3b depict the effort histograms. We can see great number of cases of ef f ort = 1 2 . This is because teams almost always have the chance to go for a conversion after scoring a try. We estimate the model with the R package rstan (Stan Development Team (2020)). We use 4 cores, each one to run 2500 iterations and 1500 warm-up ones. The Stan code of the estimated model can be found in the Appendix. The results obtained are sound: • The Rhat (or Gelman-Rubin) statistic measures the discrepancies between the chains generated in simulations of Bayesian models. The further its value is from 1, the worse. But we can see in all our results that Rhat is very close to 1. • n ef f is an estimate of the effective sample (of parameters) size. A large value indicates a low degree of error in the expected value of the parameter. We can see that, indeed, this is the case for all the parameters of interest. In tables 3 and 4, we can see the results of Model I and II, respectively. The difference between them is the presence of β atten in the latter one. The confidence intervals of model II, and the corresponding histograms, shown in figures 4 and 5, indicate that β prev and β ef f ort are the parameters distributed above zero. On the other hand, β home has a wider confidence interval that includes zero value if β atten is included, as in the comparison between Model I and Model II. Table 5 shows the difference of estimations when adding the β day parameter. The results seem to be show the result of changing the mean of prior of nu to 1. Finally, Table 6 presents the results of Model IV. The ranking score is here based on the points (not the tries) of the previous season. We find in this case that the β ef f ort parameter is similar as that found in the other models, while β prev has a lower mean. further results of our model. Figure 6 indicates that there are no apparent relation between the values obtained for β prev and β home and those of β ef f ort . This is a strong suggestion that effort captures an effect that differs from both the ability of the team and the potential support (or antagonism) received in the field. This is what we should expect from a representation of the intrinsic motivation of a team. Finally, Figure 9 shows the relation between the observed data and the estimated data. The blue line indicates the fitted linear relation. luck in a game can be understood as the difference between the actual performance observed and the ability of a team. In our case, we could identify it with the difference between the performance and both the effort and ability of a team. then, we find that: V ar(Ability) = 0.01098826 Then we can see that the variabilities in effort, ability and luck have slightly the same weight in the composition of the variability of performance. An alternative definition of luck is that it arises when the residual in the regression of y g on the explanatory variables defined in section 3 differs markedly from the mean value in the distribution of noises. needed a win to gain the home advantage during the playoffs. It was also the first game after more than 5 months that fans were allowed again to attend the play (type IV). Besides that, it was a record performance of Exeter, since it scored the largest difference in a top competition (type III). For the Falcons it was the longest trip of the tournament (type IV) and were shown a yellow card at minute 26, allowing two tries during the sin bin time (type II and III). In our analysis we found that while the results of rugby matches can be explained by the ability of the teams, another highly significant variable is the motivation of players, reflected by the effort exerted by them. On the contrary, luck seems to have had an impact only in a few games in our sample. We have followed here Lenten and Winchester (2015), Butler et al. (2020) and Fioravanti et al (2021), representing effort as the ratio of tries over the sum of tries and the scoring kick attempts. While the results obtained in our Bayesian analysis are sound, there exist many other ways of defining a proxy of motivation in Rugby. We can argue that an increased number of tackles indicates that a team has exerted a lot of effort, revealing that it is highly motivated. On the other hand, this team has not exerted a large effort in keeping the ball. One could also, with the help of GPS, track the physical effort of the players, and identify it with their motivation. In any case, the definition of a measure of effort as a proxy of motivation, like that of ability or luck has a degree of arbitrariness. Future lines of research can be to explore and compare the impact of our proxy for motivation in other tournaments, and even compare the correct definition of "effort" in different sports. Another topic that it is worth studying is the evolution of effort along time. The results of such investigation could be useful to assess how the incentives to the players may have changed, affecting the motivation of players. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Crowd effects and the home advantage In-play forecasting of win probability in One-Day International cricket: A dynamic logistic regression model Predictive analysis and modelling football results using machine learning approach for English Premier League Bayesian hierarchical model for the prediction of football results A natural experiment to determine the crowd effect upon home court advantage A bivariate Weibull count model for forecasting association football scores Bonus incentives and team effort levels: Evidence from the "Field Home-field effect and team performance: evidence from English premiership football Stan: A probabilistic programming language Top performers are not the most impressive when extreme performance indicates unreliability A ratings based Poisson model for World Cup soccer simulation Effort of rugby teams according to the bonus point system: a theoretical and empirical analysis Home advantage in Home Nations, Five Nations and Six Nations rugby tournaments Ludometrics: luck, and how to measure it Regression models for forecasting goals and match results in association football A conditional fuzzy inference approach in forecasting Modeling motivation using goal competition in mental fatigue studies Game location and aggression in rugby league Model-based clustering of non-Gaussian panel data based on skew-t distributions Effects of game venue and outcome on psychological mood states in rugby Hierarchical Bayesian Modeling of the English Premier League Secondary behavioural incentives: Bonus points and rugby professionals A psychometric model for respondent-level anchoring on self-report rating scale instruments Testosterone, territoriality, and the 'home advantage Evidence of referees' national favoritism in rugby Alone against the crowd: Individual differences in referees' ability to cope under pressure Does the home advantage depend on crowd support? Evidence from same-stadium derbies Bayesian statistics meets sports: a comprehensive review The home advantage Deserving to be lucky: reflections on the role of luck and desert in sports Predicting score difference versus score total in rugby and soccer The role of passion for sport in college student-athletes' motivation and effort in academics and athletics