key: cord-0207942-ou5wyokh authors: D'Agostini, Giulio; Esposito, Alfredo title: What is the probability that a vaccinated person is shielded from Covid-19? A Bayesian MCMC based reanalysis of published data with emphasis on what should be reported as 'efficacy' date: 2021-02-15 journal: nan DOI: nan sha: ff0335aca6acc598f473444d8ffaa399ed97b614 doc_id: 207942 cord_uid: ou5wyokh Based on the information communicated in press releases, and finally published towards the end of 2020 by Pfizer, Moderna and AstraZeneca, we have built up a simple Bayesian model, in which the main quantity of interest plays the role of {em vaccine efficacy} (`$epsilon$'). The resulting Bayesian Network is processed by a Markov Chain Monte Carlo (MCMC), implemented in JAGS interfaced to R via rjags. As outcome, we get several probability density functions (pdf's) of $epsilon$, each conditioned on the data provided by the three pharma companies. The result is rather stable against large variations of the number of people participating in the trials and it is `somehow' in good agreement with the results provided by the companies, in the sense that their values correspond to the most probable value (`mode') of the pdf's resulting from MCMC, thus reassuring us about the validity of our simple model. However we maintain that the number to be reported as `vaccine efficacy' should be the mean of the distribution, rather than the mode, as it was already very clear to Laplace about 250 years ago (its `rule of succession' follows from the simplest problem of the kind). This is particularly important in the case in which the number of successes equals the numbers of trials, as it happens with the efficacy against `severe forms' of infection, claimed by Moderna to be 100%. The implication of the various uncertainties on the predicted number of vaccinated infectees is also shown, using both MCMC and approximated formulae. Our perspectives about living with Covid-19 changed dramatically in the last months of 2020 with the results from the vaccine trials by Pfizer and Moderna and, a bit later, by AstraZeneca. The first one claimed a 90% efficacy [1] (then updated to 95% in a further press release [2] and in a much more detailed paper [3] ) and the second one 94.5% [4] (later updated to 94.1% [5, 6] ), while AstraZeneca press released and then published two values (90.0% and 62.1%) depending on two (unplanned [7, 8] ) different experimental settings. 1 In an unpublished first paper 2 based on the first press releases by Pfizer and Moderna, we remarked that, since the announcements did not mention any uncertainty, we understood that the initial Pfizer's number was the result of a rounding, with uncertainty of the order of the percent. But then, we continued, we were highly surprised by the Moderna's announcement, providing the tenths of the percent, as if it were much more precise. We had indeed the impression that the 'point five' was taken very seriously, not only by media speakers, who put the emphasis on the third digit, but also by experts from which we would have expected a phrasing implying some uncertainty in the result (see e.g. Ref. [10] ). The same remarks apply to the later AstraZeneca announcement. In fact, a fast exercise shows that, in order to have an uncertainty of the order of a few tenths of percent, the number of vaccine-treated individuals that got the Covid-19 had to be at least of the order of several hundreds. But this was not the case. In fact, the actual numbers were indeed much smaller, as we learned from the Moderna first press release [4]: "This first interim analysis was based on 95 cases, of which 90 cases of COVID-19 were observed in the placebo group versus 5 cases observed in the mRNA-1273 group, resulting in a point estimate of vaccine efficacy of 94.5% (p < 0.0001)". Now, it is a matter of fact that if a physicist reads for an experimental result a number like '5', she tends to associate to it, as a rule of thumb, an uncertainty of the order of its square root, that is ≈ 2.2. Applied to the Moderna claims, this implies an inefficacy of about (5.5 ± 2.3)%, or an efficacy of about (94.5 ± 2.3)%. Another reason that made us worry about the result was, besides the absence of an associated uncertainty [11] , the "p < 0.0001" accompanying it, being us extremely critical against p-values and other frequentist prescriptions (see Ref. [12] and references therein). In our first paper [9] we tried then to understand whether it was possible to get an idea of the possible values of efficacy consistent with the data, each one associated with its degree of belief on the basis of the few data available in those days. In other words, our purpose was and is to arrive to a probability density function (pdf), release date sample size vaccine -placebo n V I n P I n V Is n P Is Table 1 : Bare data concerning the number of infected in the vaccine group (n V I ) and in the placebo group (n P I ). 'Moderna-1' is just an interim result, based on the same sample of 'Moderna-2', that we had used in Ref. [9] . In the case of Moderna and Pfizer comprehensive results, also the numbers for the occurrence of 'severe forms of infection' were reported (n V Is and n P Is , respectively). In the case of AstraZeneca LD-SD stands for 'low dose followed by standard dose', SD-SD for 'two consecutive standard doses'. efficacy value 95% 'uncertainty interval' Moderna-1 [4] 0.945 ----Moderna-2 [6] 0.941 [0.893, 0.968] (confidence interval) Pfizer [3] 0.950 [0.903, 0.976] (credible interval) AstraZeneca LDSD [7] 0.900 [0.674, 0.970] (confidence interval) AstraZeneca SDSD [7] 0.621 [0.410, 0.757] (confidence interval) although not obtained in closed form, of the quantity of interest. In the present paper we not only extend our analysis to the published data [3, 6, 8 ] (see Tab. 1), but can also compare our results with the published ones, summarized in Tab. 2, which also include an indication of the uncertainty to be associated with them. What makes us confident about the validity of our simple model is that the press released and finally published results concerning 'efficacy' (see Tab. 2) are in excellent agreement with the mode of the distribution we get analyzing our model through a Markov Chain Monte Carlo (MCMC). This is not a surprise to us, indeed. In fact we are aware of statistical methods which tend to produce as 'estimate' the most probable value of the quantity of interest, that would be inferred starting from a flat prior [13] . The fact that different kind of 'uncertainty intervals' are provided will be discussed at the due point. We only anticipate here that they have in this case equivalent meaning. The paper is organized as follows. In Sec. 2 we describe and show how to implement in JAGS [14] the causal model connecting in a probabilistic way the quantities of interest, among which the primary role is played by the 'efficacy' ǫ. We also give, in footnote 4, some indications on how to proceed in order to get exact results for f (ǫ), although they can only be obtained numerically. The MCMC results are shown and discussed in Sec. 3. Then the question asked in the title is tackled, with a didactic touch and including some historical remarks, in Sec. 4. The observation that the resulting pdf's of ǫ can by approximated rather well by Beta distributions (Sec. 5) leads us to discuss in further detail the role of the priors, initially chosen simply uniform. Then Sec. 6 is devoted to the related question of predicting the number of vaccinated people that shall result infected, taking into account several sources of uncertainty. Finally, in Sec. 7 we extend our analysis to the level of protection given by the vaccines against the disease severity, in which the outcome of a simple application of probability theory is at odds with simplistic, extraordinary claims. 3 In Sec. 8 we sum up the analysis strategy and the outcome of the paper, also commenting on the optimal sharing of vaccine/placebo samples in the test trial. Then some conclusions and final remarks follow. Dealing with problems of this kind, we have learned (see e.g. Ref. [16] ) the importance of building up a graphical representation of the causal model relating the quantities of interest, some of them 'observed' and others 'unobserved', among the latter the quantities we wish to infer. Also in this case, despite some initial skepticism about the possibility of getting some meaningful results, once we have built up the model, very basic indeed, it was clear that the main outcome concerning the vaccine efficacy was not depending on the many aspects of the trials. Our initial doubts were in fact related to the several details concerning the people involved in the test campaign, but they finally resulted to be much less critical than we had at first thought. The causal model used in this analysis is implemented in the Bayesian network of Fig. 1 . The top nodes n V and n P stand for the number of individuals in the vaccine and placebo (i.e. control) groups, respectively, as the subscripts indicate, while the bottom ones (n V I and n P I ) are the number of individuals of the two groups resulting infected during the trial. These are the observed nodes of our model, whose values are summarized in Tab.1. Then, there is the question of how to relate the numbers of infectees to the numbers of the participants in the trial. This depends in fact on several variables, like the prevalence of the virus in the population(s) of the involved people, their social behavior, personal life-style, age, health state and so on. And, hopefully, it depends on the fact that a person has been vaccinated or not. Lacking detailed information, we simplify the model introducing an assault probability p A , that is a catch-all term embedding the many real life variables, apart being vaccinated or not. Nodes n V A and n P A in the network of Fig. 1 represent then the number of 'assaulted individuals' in each group, and they are modeled according to binomial distributions, that is represented in the graphical model by solid arrows. The 'assaulted individuals' of the control group are then assumed to be all infected, and hence the deterministic link with dashed arrow relating node n P A to node n P I follows (indeed the two numbers are exactly the same in our model, and we make this distinction only for graphical symmetry with respect to the vaccine group). Instead, the 'assaulted individuals' of the other group are 'shielded' by the vaccine with probability ǫ, that we therefore identify with efficacy, although we shall come back at the due point about what should be reported as 'efficacy'. The probability of becoming infected if assaulted is therefore equal to 1−ǫ, so that node n V I is related to node n V A by At this point all the rest is a matter of calculations, that we do by MCMC techniques 4 with the help of the program JAGS [14] interfaced with R [17] via rjags [18] . The nice thing using such a tool is that we have to take care only to describe the model, with instructions whose meaning is quite transparent: 5 model { nP.I~dbin(pA, nP) # 1. nV.A~dbin(pA, nV) # 2. pA~dbeta(1,1) # 3. nV.I~dbin(ffe, nV.A) # 4. [ ffe = 1 -eff ] ffe~dbeta(1,1) # 5. eff <-1 -ffe # 6. } We easily recognize in lines 1. and 2. of the R code the above Eqs. (1) and (2), while line 4. stands for Eq. (3). Line 6. is simply the transformation of '1−ǫ' ('ffe' in the code) to ǫ, the quantity we want to trace in the 'chain'. Finally lines 3. and 5. describe the priors of the 'unobserved nodes' that have no 'parents', in this case the joint pdf of all the variables in the network; condition on the certain variables; marginalize over all the uncertain variables besides ǫ. Referring to Ref. [16] for details, here is the structure of the unnormalized pdf obtained starting from uniform priors over ǫ and p A : where the three terms within square brackets are the three binomial distributions entering the model, stripped of all irrelevant constant factors. Simplifying and reorganizing the various terms we get We recognize that the integral 1 0 x r−1 · (1 − x) s−1 dx, in terms of a generic variable x, defines the special function beta B(r, s), thus obtaining Then the integral over ǫ follows, in order to get the normalization factor. Finally, all moments of interest can be evaluated. All this can be done numerically. However, we proceed to MCMC, being its use much simpler and also for the flexibility it offers (for example in the case we need to extend the model, as we shall do in Secs. 6 and 7). 5 Those who have no experience with JAGS can find in Ref. [16] several ready-to-run R scripts. MCMC results mean ± 'stand. unc.' centr. 95% 'cred. int.' P (ǫ ≥ 0. The MCMC based pdf's of ǫ are plotted in Fig. 2 with smooth curves showing the profile of the histograms of the ǫ values in the chains. For comparison, the vertical dashed lines show the results of the pharma companies ('efficacy value' in Tab. 2). As we can see, they correspond practically exactly to the modal values of the distributions. This makes us quite confident about the validity of our simple model for this quantitative analysis, although we maintain that the single number for the efficacy to be provided is not the mode, but rather the mean of the distribution, as we shall argue in Sec. 4. However, some remarks are in order already at this point. In fact, although there is no doubt about the fact that the most complete description of a probabilistic inference is given by the pdf of the quantity of interest, it is also well understood that it is often convenient to summarize the distribution with just a few numbers. Usually, when inferring physical quantities, the preference goes to the mean and the standard deviation (the latter being related to the concept of standard uncertainty [11] ) because of rather general probability theory theorems which make their use convenient for further evaluations ('propagations', as we shall also see in Sec. 6). Other ways to summarize with just a couple of numbers a probability distribution are intervals which contain the uncertain value of the variable of interest at a given probability level (credible interval). We report then in Tab. 3 the 95% central credible interval 7 evaluated from the MCMC chains as well as the 90% 'right side credible interval'. Other useful summaries, depending on the problem of interest, can be the most probable value of the distribution (mode) and the median, i.e. the value that divides the possible values into two equally probable intervals. As we have stated above, the modes of the MCMC based pdf's coincides with the values reported as 'efficacy value' in Tab. 2, which contains also what we have generically indicated as 95% 'uncertainty interval', in form of credible interval for Pfizer and confidence interval for the other two companies. 8 The MCMC also provides results for the other 'unobserved' nodes of the causal model, in our case p A and n V A . We refrain from quoting results on the 'assault probability', because they could easily be misunderstood, as they strongly depend, contrary to ǫ, on the values of n V and n P , being p A a catch-all quantity embedding several real life variables, including the virus prevalence. We have however checked that our main results on ǫ are stable against the (simultaneous) variations of n V and n P by orders of magnitude (thus implying similar large variations of p A ). 9 We give, instead, the results concerning n V A that we expect to be around n P I . We get, in fact, respectively for Moderna-1, Moderna-2, Pfizer, AstraZeneca (LDSD) and AstraZeneca (SDSD) the following values: 89 ± 13, 185 ± 19, 160 ± 18, 29 ± 8 and 70 ± 12 (note that the standard uncertainty is not simply the root square of n V I , as a rule of thumb would suggest). It is now time to come to the question asked in the title. We have already used the noun efficacy, associated to the uncertain variable ǫ of our model of Fig. 1 . Then, analyzing the published data, we have got by MCMC several pdf's of ǫ, that is , and so on (see Fig. 2 ). Hereafter, since what we are going to say is rather general, we shall indicate the generic pdf by f (ǫ | data, H), 7 The meaning of such an interval is that, conditioned on the data used and on the model assumptions, we consider P (ǫ < ǫ low ) = P (ǫ > ǫ high ) = 2.5%, where ǫ low and ǫ high are the boundaries of the interval. 8 It is important to understand that, strictly speaking, a 95% frequentistic confidence interval does not provide the interval in which the authors are 95% confident that the 'true value' of interest lies (see Refs. [12, 13] and references therein), although it is 'often the case' for 'routine measurements' [13] . 9 In other words, the gross result essentially depends on the ratios n VI :n PI and n V :n P . where H stands for the set of hypotheses 10 underlying our inference and not specified in detail. Let us now focus on the probability that an assaulted individual gets infected. Indicating by A the condition 'the individual is assaulted', by V the condition 'vaccinated' and by I the event 'the individual gets infected' (and therefore A, V and I their logical negations), we get, rather trivially, while, in the case of assault, the probability of infection depends on whether the individual has been vaccinated or not. In the case of placebo, following our model, we simply get Instead, in case the individual has been vaccinated, the probability of infection will depend on ǫ, that is, for the special cases of perfect shielding and no shielding (i.e. no better than the placebo), In general, if we were certain about the precise value of ǫ, the probability of getting infected or not is related in a simple way to ǫ: The above equations, and in particular Eq. (9), express in mathematical terms the meaning we associate to efficacy, in terms of the model parameter ǫ: the probability that a vaccinated person gets shielded from a virus (or from any other agent). But the value of ǫ cannot be known precisely. It is, instead, affected by an uncertainty, as it (practically) always happens for results of measurements [11] (and indeed also the pharma companies accompany their results with uncertainties -see Tab.2). In a probabilistic approach, this means that there are values of ǫ we believe more and values we believe less. All this, we repeat it, is summarized by the probability density function f (ǫ | data, H) . The way to take into account all possible values of ǫ, each weighted by f (ǫ | data, H), is to follow the rules of probability theory, i.e. 10 In general we are used to indicating by I the background state of information [16] , but we reserve here the symbol I for 'infected'. Using then Eq. (9) we get which represents the probability that a vaccinated person, not belonging to the trial sample, gets shielded from Covid-19, on the basis of the data obtained from the trial and all (possibly reasonable) hypotheses assumed in the data analysis. It is easy to understand that P (I | A, V, data, H) is what really matters and therefore what should be communicated as efficacy to the scientific community and to the general public. 11 Now, technically, Eq. (10) is nothing but the mean of the distribution of ǫ. This should then be the number to report, and not the mode of the distribution, which has no immediate probabilistic meaning for the questions of interest. 12 Now, if we compare the 'efficacy values' of Tab. 2 with the mean values of Tab. 3 we see that in most cases the differences are rather small (about 1/3 to 1/2 of a standard deviation), although the modal values (to which, as we have showed above, the published efficacies correspond) are always a bit higher than the mean values, due to the left skewness of the pdf's. Therefore our point is mostly methodological, 13 with some worries when the mean value and the most probable one differ significantly. Evaluating the probability of future events on the basis of the outcomes of previous trials on 'apparently the same conditions' is an old, classical problem in probability theory that goes back to about 250 years ago and it is associated to the names of Bayes [20] and Laplace [21] . The problem can be sketched as considering events whose probability of occurrence depends on a parameter which we generically indicate as p, i.e. √ ' indicates the 'observed' nodes of the network, that is the value of the quantity associated to it is (assumed to be) certain. The other node (only one in this simple case) is 'unobserved' and it is associated to a quantity whose value is uncertain. box after each extraction), the bias of a coin and the ratio of the chosen surface in which a ball thrown 'at random' can stop, with respect to the total surface of a horizontal table (this was the case of the Bayes' 'billiard', although the Reverend did not mention a billiard). A related problem concerns the number of times ('X') events of a given kind occur in n trials, assuming that p remains constant. The result is given by the well known binomial, that is whose graphical causal model is shown in the left diagram of Fig. 3 . The problem first tackled in quantitative terms by Bayes and Laplace was how to evaluate the probability of a 'future' event E f , based on the information that in the past n trials the event of that kind occurred X = x times ('number of successes') and on the assumption of a regular flow from past to future, 14 that is assuming p constant although uncertain. In symbols, we are interested in where H stands, as above, for all underlying hypotheses. Both Bayes and Laplace realized that the problem goes through two steps: first finding the probability distribution of p and then evaluating P (E f | n, x, H) taking into account all possible values of p. In modern terms 14 A reminder of Russell's inductivist turkey is a must at this point! The basic reasoning behind these two steps is expressly outlined in the Sixth and Seventh Principle of the Calculus of Probabilities, expounded by Laplace in Chapter III of his Philosophical Essay on Probabilities [22] : • the Sixth Principle, in terms of the possible causes C i responsible of the observed event E, is essentially what is presently known as Bayes' theorem, that is in which P 0 (C i ) is the so called prior probability of C i , i.e. not taking into account the piece of information provided by the observation of E. Note that the role of P 0 (C i ) was explicitly considered by Laplace, who 1) before gave the rule in the case of P 0 (C i ) numerically all equal, which then drop from Eq. (15); 2) then specified that "if these various causes, consideredà priori, are unequally probable, it is necessary, in the place of the probability of the event resulting from each cause, to employ the product of this probability by the possibility of the cause itself." (Here 'possibility' and 'probability' are clearly used as synonyms.) Then, the importance of the finding is stressed: "This is the fundamental principle of this branch of the analysis of chances which consists in passing from events to causes." Generalizing this 'principle' to an infinite number of causes, associated to all possible values of the parameter p, with the 'event' being the observation of X = x successes in n trials, we get the case sketched in the right diagram of Fig. 3 , in which the unobserved node is now p. Equation (15) becomes then, in terms of the probability function of X and of the pdf of p [for which we take the freedom of using the same symbol 'f ()'], • The Seventh Principle then states that "the probability of a future event is the sum of the products of the probability of each cause, drawn from the event observed, by the probability that, this cause existing, the future event will occur", that is Generalizing also this 'principle' to an infinite number of causes associated to all possible values of the parameter p we get Eq. (13) , and then Eq. (14): the probability of interest is the mean of the distribution of p. The solution of Eq. (16) , in the case X is described by Eq. (11) and we consider all values of pà priori equally likely, is a Beta pdf, that is 15 with r = x + 1 and s = n − x + 1. Mean value and variance of the possible values of p are then Finally, using Eq. (13) and Eq. (14) we get the Laplace's rule of succession Thus, in the special case of 'n successes in n trials', "we find that an event having occurred successively any number of times, the probability that it will happen again the next time is equal to this number increased by unity divided by the same number, increased by two units" [22] , i.e. P (E f | n, x = n, H) = n + 1 n + 2 . In the case of x = n = 11 we have then 12/13, or 92.3%. Reporting thus 100% (see footnote 3) can be at least misleading, especially because such a value can be (as it has indeed been) nowadays promptly broadcasted uncritically by the media (see e.g. Ref. [15] -we have heard so far no criticism in the media of such an incredible claim, but only sarcastic comments by colleagues). Moving to our results about the 'model parameter ǫ' (it is now time to be more careful with names), reported in Tab. 3 and Fig. 2 , it should now be clear why the number to report as efficacy should be the mean of the distribution. As far as the distribution of ǫ is concerned, given the similarity of the inferential problem that was first solved by Bayes and Laplace, we have good reasons to expect that it should not 'differ much' from a Beta. In order to test the correctness of our guess we have done the simple exercise of superimposing over the MCMC distributions of Fig. 2 The result is shown in Fig. 4 . As we can see, the agreement is rather good for all cases, especially for Moderna and Pfizer, for which the Beta and MCMC curves practically coincide. The fact that the MCMC results are described with high degree of accuracy by a Beta distribution is not only a sterile curiosity, but has indeed an interesting practical consequence. As we have seen, the pdf's of ǫ have been obtained starting from a uniform prior. The same must be for Pfizer, since they also did a Bayesian analysis, as explicitly stated in their paper [3] and as revealed by the expression 'credible interval' (see Tab. 2), and their values practically coincide with ours. Instead, in the case of the other results, the expression 'confidence interval' seems to refer to a frequentistic analysis, in which "there are no priors". But in reality it is not difficult to show that sound frequentist analyses (e.g. those based on likelihood) can be seen as approximations of Bayesian analyses in which a flat prior was used (see e.g. Ref. [13] ). The resulting 'estimate' corresponds to the mode of the posterior distribution under that assumption. The question is now what to do if an expert has a 'non flat' informative prior (indeed none wouldà priori believe that values of ǫ close to zero or to unity would be equally likely!). Should she ask to repeat the analysis inserting her prior distribution of ǫ? Fortunately this is not the case. Indeed, as we have discussed in Ref. [16] , due to the symmetric and peer roles of likelihood and prior in the so called Bayes' rule, each of the two has the role of 'reshaping' the other. Moreover, since a posterior distribution based on a uniform prior concerning the variable of interest can be interpreted as a likelihood (besides factors irrelevant for the inference), we can apply to it an expert's prior in a second time (see Ref. [16] for details). It becomes then clear the importance of the observation that the pdf's of ǫ derived by MCMC can be approximated by Beta distributions: from the MCMC mean and standard deviation we can evaluate the parameters of the Beta of interest, as we have seen above; this function can be then easily multiplied by the expert's prior; the normalization can be done by numerical integration and finally the posterior distribution of ǫ also conditioned on the expert's prior can be obtained. This implementation in a second step of the expert opinion becomes particularly simple if also her prior is modeled by a Beta, recognized to be a quite flexible distribution. For example, indicating by r F and s F the Beta parameters [calculated with Eq. (23) -(24)] obtained by a flat prior and by r 0 and s 0 the parameters of the Beta informative prior, the posterior distribution will still be a Beta with parameters as it can be easily shown. 17 17 In fact, assuming that the MCMC based pdf of ǫ starting from a flat prior ('F ') can be approximated by a Beta, we can write it, neglecting irrelevant factors, as Expressing also the informative prior by a Beta, that is Once we have got the efficacy of a vaccine, or, even better, the full information about ǫ inferred by the data, we can tackle another interesting problem: if we vaccinate n ′ V individuals in (possibly) another population, how many of them will become infected? Obviously, this depends not only on f (ǫ) but also on many other parameters that we can model with the assault probability p ′ A in the new population, and, obviously, on the fact that n ′ V must be only a small part of the entire population, so that we do not have to consider issues related to herd immunity. In order to do this exercise we need to enlarge our causal model, thus getting the one shown in Fig. 5 . But, indeed, there is no need to set up a new JAGS model and rerun the MCMC. We can just use the chain of ǫ obtained processing through the MCMC the original model of Fig. 1 , and do the remaining work with 'direct' Monte Carlo. However, having observed that the resulting pdf of ǫ is approximated a Beta, we can do it starting from the mean and standard deviation obtained from the MCMC. Here is and applying the Bayes' rule, we get for the posterior ('p') e.g. how the evaluation of n ′ V I can be performed by sampling (in the R code we have indicated n ′ V as nV, and so on): mu <-0.944; sigma <-0.019; e.rep <-0.950 # Pfizer # mu <-0.935; sigma <-0.019; e.rep <-0.941 # Moderna # mu <-0.861; sigma <-0.075; e.rep <-0.900 # AZ LDSD # mu <-0.599; sigma <-0.090; e.rep <-0.621 # AZ SDSD # uncomment the following line to simulate a negligible uncertainty # sigma <-0.0001 r = (1-mu)*mu^2/sigma^2 -mu s = r*(1-mu)/mu cat(sprintf("r = %.2f, s = %.2f ", r, s)) ns <-1000000 nV <-100000 pA <-0.01 nA <-rbinom(ns, nV, pA) cat(sprintf("nA: mean+-sigma: %.1f +-%.1f ", mean(nA), sd(nA))) eps <-rbeta(ns,r,s) nvI <-rbinom(ns, nA, 1-eps) hist(nvI, nc=100, col='cyan', freq=FALSE, main='') cat(sprintf("nvI: mean+-sigma: %.1f +-%.1f ", mean(nvI), sd(nvI))) lines(rep(pA*nV*(1-mu), 2), c(0,1), col='red', lty=1, lwd=2) lines(rep(pA*nV*(1-e.rep), 2), c(0,1), col='red', lty=2, lwd=2) A number of hundred thousand vaccinated individuals has been used, with an absolutely hypothetical value of assault probability of 1 %. The script can also be used to simulate the effect of a precise value of ǫ, thus exactly corresponding to the efficacy, just setting its standard deviation to a very small value. The result of this idealized situation, using e.g. ǫ = 0.944, i.e. the mean resulting from the MCMC analysis of Pfizer data, 18 is shown in the top histogram of Fig. 6 . In this idealized case the distribution of n ′ V I is simply a binomial, i.e. n ′ V I ∼ Binom(n ′ V , p ov ) , with the overall probability p ov being equal to p ′ A · (1−ǫ), that is p ov = 0.00056 for the numbers reported in the script. Expected value and standard deviation are then 56.0 and 7.5, in perfect agreement with the Monte Carlo result shown in the upper panel of Fig. 6 . The effect of the uncertainty about ǫ is shown in the second (top-down) histogram of the same figure. As we can see, the distribution becomes remarkably wider and more asymmetric, with a right-hand skewness, effect of the left-hand skewness of f (ǫ). We see then, in the third histogram, the effect of a hypothetical uncertainty about p ′ A , modeled here with a standard deviation of σ(p ′ A ) = 0.1 × p ′ A (but this has to be understood really as an exercise done only to have an idea of the effect, because a reasonable uncertainty could indeed be much larger). Finally, including both sources of uncertainty, we get the histogram and the numbers at the bottom of the figure. Vertical lines show the predicted values for n ′ V I by using the MCMC mean value (solid line) and using the modal value (dashed lines). As a further step, following Ref. [16] (see in particular Secs. 5.2.1 and 5.3.1 there), let us try to get approximated formulae for the expected value and the standard deviation of n ′ V I . The idea, we shortly remind, is to start with the expected value and variance evaluated for the expected values of ǫ and p ′ A , and then make a 'propagation of uncertainty' by linearization (as if ǫ and p ′ A were 'systematics'). Here are the resulting formulae which we have checked to be in excellent agreement with the results from direct Monte Carlo. 19 6.1 More on the probability that a vaccinated person gets shielded from the virus -a Monte Carlo approach Section 4 has been devoted to explain the reason why the proper number to be reported as 'efficacy', meant as the probability that a vaccinated person is shielded spA <-0.1 * pA est.nvI <-nV * pA * (1-mu) est.sigma.nvI <-sqrt( nV * pA * (1-mu) * (1 -pA * (1-mu)) + (nV*(1-mu))^2 * spA^2 + (nV*pA)^2 * sigma^2 ) cat(sprintf("Approximated nvI: mean+-sigma: %.1f +-%.1f ",est.nvI,est.sigma.nvI)) from the virus, is the mean of the pdf of the model parameter ǫ. Having talked in this section about predicting the number of infects, we can use the same extended model of Fig. 5 in order to check the outcome of that reasoning. It is in fact enough to set n ′ V = 1 and p ′ A = 1 and analyze the result of the MCMC. Indeed, n ′ V A will be identically 1, in the sense that the person will be 'assaulted' with certainty, but the output n ′ V I can have now only two possible values, 0 (person not infected) or 1 (person infected). If ǫ were exactly known, and let us indicate it by ǫ K , the probability of n ′ V I = 0 would be ǫ K and that of n ′ V I = 1 would be 1 − ǫ K . A simple direct Monte Carlo would then produce a fraction of occurrences of n ′ V I = 0 around ǫ. If, instead, ǫ is unknown, then the values used in the bottom-left side of the network will be those occurring in the MCMC chain. Therefore, we reobtain the same result seen in Sec.4, i.e. that the relative frequency of the occurrence of n ′ V I = 0 will be equal to the mean of ǫ in the chain. This MCMC strategy offers a further argument, to which some practitioners might be more reactive, in support of the thesis that the number to be reported as 'efficacy' should be the average of the probability distribution of ǫ, rather than other possible summaries of the distribution. A last point we wish to address in this paper is related to the efficacy of the vaccines against disease severity, based on the data reported by Moderna and Pfizer (see Tab. 1): in the first case 30 people got a 'severe form' out of 185 infectees in the control group; none of the severe cases occurred in the group of 11 vaccinated infectees; in the second the corresponding numbers are 9 in 162 and 1 in 8. In order to analyze this further pieces of information we can simply extend the Bayesian network of Fig. 1 adding four nodes (see Fig. 7 ): n V Is and n P Is represent the number of infected individuals that got the disease in a severe form, while p V Is and p P Is represent the corresponding probability of developing severe diseases in either group. Again we can use the binomial distributions: ffe~dbeta(1,1) # 5. eff <-1 -ffe # 6. pS_P~dbeta(1,1) # 7. pS_V~dbeta(1,1) # 8. nS.V~dbin(pS_V,nV.I) # 9. nS.P~dbin(pS_P,nP.I) # 10. } However, looking at the Bayesian network of Fig. 7 , it is clear that, being n V I and n P I observed nodes, i.e. n V I and n P I are just data, the bottom nodes involving (n V Is , n V I , p V Is ) and (n P Is , n P I , p P Is ) get 'separated' from the rest of the network. In other words there is no flow of evidence from (n V Is , p V Is ), or from (n P Is , p P Is ), to the rest of the network. Therefore the problem has a rather simple solution. In particular, using uniform priors for p V Is and p P Is , we get p V Is ∼ Beta(n V Is +1, n V I −n V Is + 1) p P Is ∼ Beta(n P Is +1, n P I −n P Is + 1) . Nevertheless, for didactic purposes, we report in Fig. 8 As far as the control groups are concerned (green, narrower histograms and curves in Fig. 8) , the results from Moderna and Pfizer data are quite different. In both cases we get rather narrow distributions, as expected from the rather large numbers involved (and therefore the central values are close to the proportion of severe cases with respect to the total number). But they differ substantially and, using mean and standard deviation to summarize them, we get Moderna: p P Is = 0.170 ± 0.028 Pfizer: p P Is = 0.055 ± 0.018 , differing by 0.115 ± 0.033. In particular, what results by analyzing Moderna data is in good agreement with the estimates by the WHO, which states that "approximately 10-15% of cases progress to severe disease, and about 5% become critically ill" [23] . Obviously, we are not in the position to make any statement about the reason of the disagreement, that could be due to the different populations on which the trials have been performed. But, if this were the case, one could have some doubts about the validity of comparing the different sets of data. We can only leave the question to epidemiology experts. Passing to the vaccine groups (red, broader histograms and curves in Fig. 8 ), the crude summaries in terms of mean and standard deviation give Moderna: p V Is = 0.077 ± 0.071 Pfizer: p V Is = 0.200 ± 0.121 . If we consider just the mean values, it seems that there is a very large difference between the performances of the two vaccines. However their difference is hardly significant, being their difference −0.12 ± 0.14. Also the fact that the two curves looks substantially different should not impress a statistics expert if she knew from which number they have been derived. The main difference is due to the fact that Moderna has zero severe disease against one of Pfizer. If they had had one too, which would not be surprising in a hypothetical 'second trial', the curve would be not dissimilar from the Pfizer one (dashed line in Fig. 8 ), vanishing at zero and yielding p V Is = 0.154 ± 0.096 -practically the same, an experienced physicist would say. It is quite evident that it is not possible to draw general conclusions on the efficacy of the generic vaccine on softening the impact of the disease. But the real point we wish to highlight, given the spread of distributions, is that we do not have enough data for drawing sound conclusion. For this reason we wish to point out that, even for this aspect, press releasing a 100% efficacy and not dealing with the unavoidable uncertainties and their impact when applied to decision making is quite misleading. Figure 8 indeed shows that the probability of becoming severely ill in the vaccine group is definitively low but, quite obviously, not zero and with a relevant overlap with the distribution evaluated for the control group. In this paper, spurred by the press releases first and then by the published results about the performance of the candidate vaccines, we have reanalyzed the published data with the help of a Bayesian network processed with MCMC methods. The aim was that of obtaining, for each data set, the pdf of the model variable ǫ, whose meaning is the following: if we assume for it an exact value, then the probability that a 'virus assaulted individual' gets infected would be exactly 1−ǫ. Our results are in excellent agreement with the published ones, if the latter are properly interpreted. In fact, it turns out that they coincide with the modes of the respective pdf f (ǫ), although we maintain that if a single number has to be provided, especially to the media, it should be the mean of the distribution. This has, in fact, the meaning of the probability that a vaccinated person will be shielded by the virus, taking into account the unavoidable uncertainty on ǫ, fully described by f (ǫ). And this is what really matters to define the efficacy of a vaccine. Willing however to reduce the result of our analysis to a single number to be compared with the released ones, we get respectively and with reasonable rounding, 93%, 94%, 94%, 86% and 60% for Moderna-1, Moderna-2, Pfizer, AstraZeneca (LDSD) and AstraZeneca (SDSD), versus 94.5%, 94.1%, 95.0%, 90.0% and 62.1% of Tab. 2. Therefore, as far as these numbers are concerned, there is then a substantial agreement of the outcome of our analysis with the published results, simply because when a probability distribution is unimodal and rather symmetrical then mode and mean tend to coincide. Therefore, with respect to the main results, our contribution to this point is mainly methodological. The probability theory based result is, instead, at odds with Moderna 100% claimed efficacy against severe disease, for which a more sound 92% should be quoted. In order to summarize more effectively the probability distribution of ǫ with just a couple of numbers, our preference goes to mean and standard deviation, although we also report the bounds of the central 95% credible interval. This interval is, once more, in excellent agreement not only with the Pfizer result, which has also published an interval having exactly the same meaning, but also with the uncertainty intervals of the other companies, although they provide confidence intervals, which, strictly speaking, do not have the same meaning of the credible intervals. This is not a surprise to us. We are in fact aware that in many practical cases not only frequentistic point estimates are equivalent to the mode of the posterior distribution of the model parameter, if a uniform prior was used in a Bayesian analysis based on the same data, but also '95% confidence intervals' tend to be, numerically, equal to the 95% probable intervals. 20 This takes us to the question of the priors. As just reminded, a uniform prior over ǫ has been used in our analysis. But, clearly, not because we believe that the efficacy of a vaccine that has reached the Phase-3 trial has the same chance to be close to zero or to one. Instead, a flat prior can be considered a convenient practical choice, if the inference is dominated by the data, as it is often the case. Moreover, the advantage of a uniform prior in parametric inference is that the effect of an informative prior reflecting the opinion of experts can be taken into account at a later time. This 'posterior use of priors' might sound paradoxical, but it is important to remind that in Bayesian inference 'prior' does not indicate time order but rather 'based on the status of knowledge without taking into account the new piece of information' provided by the data entering the specific analysis. Having, in fact, prior and likelihood symmetric and peer roles in the Bayes' rule, an expert can use her prior to 'reshape' the posterior pdf resulting from data analysis, if a flat prior was used, without having to ask to repeat the analysis (see Ref. [16] for details). This reshaping becomes particularly simple if the prior is modeled by a convenient, rather flexible probability distribution such as the Beta. In fact, as we have seen, the pdf of ǫ starting from a flat prior tends to resemble a Beta. The same is then true if also the prior is modeled by a Beta (this is related to the well known fact of the Beta being the conjugate prior of a binomial distribution, even though our model is not just a simple binomial). These observations are particularly interesting because they lead to Eqs. (25) - (26) , which, together with Eqs. (23) -(24), allow to take easily into account the expert priors. In fact, if the priors are rather vague, r 0 and s 0 appearing in Eqs. (25) - (26) are quite small (although larger than one, since ǫ = 0 and ǫ = 1 areà priori reasonably ruled out) and, in particular, smaller that r F and s F . 21 If, instead, the expert has a strong opinion about the possible values of ǫ, then r 0 and s 0 will play a role in her posterior, and in that of her community, if its members trust her. 22 Coming back to the way to summarize f (ǫ), our preference goes to its mean and standard deviation. The mean because, as reminded above, has the meaning of efficacy for vaccine treated people not having been involved in the trial, if all possible values of ǫ are taken into account. The standard deviation because it is mostly convenient, together with the mean, to make use of the result of the inference in further considerations and in 'propagation of uncertainties', thanks to general probability rules. We have just reminded the utility of mean and standard deviation in order to re-obtain f (ǫ), under the hypothesis that it is almost a Beta distribution, making use of Eqs. (23) - (24) . The application related to 'propagation of uncertainties' that we 22 Remember that Science and its popularization is based on a long chain of rational beliefs [13] . Think for example to the reasons you believe in gravitational waves, provided that you really believe that they could exist and that they have finally being detected on Earth starting from 2015 -our trusted source ensures us that 67 of them have been 'observed' so far [24] . have seen in the paper has to do with predicting the number of individual that will get infected in a group that it is going to be vaccinated. This is a problem in probabilistic forecasting and the number of interest is uncertain for several reasons. There is, unavoidably, the uncertainty deriving from the inherent binomial distribution, having assumed an assault probability p ′ A in the new population. But also the uncertainties on the values of p ′ A and ǫ play a role, that can even be dominant with respect to the 'statistical' effect of the binomial. Now, the probability distribution of the number of vaccinated infectees can be evaluated extending our basic Bayesian network, as we have done here. But we have also stressed the importance of having approximated expressions, based on linearization, for its expected value and standard deviation. And such expressions, thus obtained considering ǫ and p ′ A as 'systematic' [13] , depend then on their mean and standard deviation. For example the contribution to σ(n ′ V I ) due to the uncertain ǫ, and then to be added 'in quadrature' to the other sources of uncertainty, is given by n ′ V · E(p ′ A ) · σ(ǫ). This gives at a glance the contribution to the global uncertainty without having to run a Monte Carlo. 23 Finally, a comment on how to possible reduce σ(ǫ) is in order. In fact, the relative uncertainty on ǫ depends on the small number of vaccinated infectees. This suggests that the quality of its 'measurement' could be improved, keeping constant the total numbers of individuals entering the trial, if the size of the placebo group is reduced. We have checked by simulation that reducing it by 2/3, thus having about a factor of five between the two groups, σ(ǫ) is expected to be reduced by about 20%. Not much indeed, but this different sharing of individuals in the two groups would have the advantage of increasing the chance of detecting side effect of the vaccine, basically at the same cost. A re-analysis of the 2020 data on vaccine trials by Pfizer, Moderna and AstraZeneca has been performed with didactic intent and focusing on uncertainties and on what most effectively should be reported as outcome. With this respect, we appreciate once more the role provided by Bayesian networks in clarifying the meaning of each variable entering the model and its relation with the others. In particular, we make a distinction between the 'efficacy variable' ǫ and the efficacy to be reported to the scientific community and to the general public as probability that a newly vaccinated person will shielded from Covid-19. The uncertain values of ǫ are characterized by the pdf f (ǫ), obtained in this work by MCMC, and whose mean value has exactly the meaning of efficacy, in analogy to Laplace's rule of succession to quantify the probability of a future event. We have stressed not only the importance of providing the most complete information of f (ǫ), whose graphical representation provides better than many words its uncertainty, but also that of summarizing the results (if only two numbers have to be chosen) with mean and standard deviation of the distribution. In fact, although the probability distribution of ǫ is what is really needed to the development of predictive what-if scenarios, mean and standard deviation are the most useful quantities to be used for further approximated evaluations. With regards to the comparison with the published result, the efficacy values obtained in this analysis as mean of the probability distribution of ǫ are in good agreement with them for the reasons just reminded in the previous section, with the exception of the 100% claim that speaks for itself. This agreement plays an important role in validating the causal model on which the present analysis is based, indeed very simple (but not simplistic!), that can be used by students and researchers to repeat the analysis -for this reason pieces of programming code have also been provided. We have learned in the meanwhile that the strategy of choosing the vaccine sample much larger than the placebo one, which we suggest at the end of the previous section, has been used in real life trials by Sputnik [25] and AstraZeneca [26] , which used vaccine vs placebo sample ratios of 3:1 and 2:1, respectively. for the C4591001 Clinical Trial Group (Pfizer), Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine AstraZeneca) Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK Inferring vaccine efficacies and their uncertainties Guide to the expression of uncertainty in measurement The Waves and the Sigmas (To Say Nothing of the 750 GeV Mirage) JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling Absolutely remarkable': No one who got Moderna's vaccine in trial developed severe COVID-19 New York Times, Moderna Applies for Emergency F.D.A. Approval for Its Coronavirus Vaccine Moderna chiede l'autorizzazione all'uso di emergenza alla Fda e all'Ema. L'azienda americana annuncia i risultati della fase 3 Moderna applies for FDA authorization for its Covid-19 vaccine Checking individuals and sampling populations with imperfect tests R: A language and environment for statistical computing. R Foundation for Statistical Computing rjags: Bayesian Graphical Models using MCMC An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes Mémoire sur la probabilité des causes par lesévénements English quotes from the classical translation by coronaviruse/risk-comms-updates/update-36-long-term-symptoms Cumulative Event/Candidate Rate Plot O1-O3 Safety and efficacy of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine: an interim analysis of a randomised controlled phase 3 trial in Russia AZD1222 US Phase III trial met primary efficacy endpoint in preventing COVID-19 at interim analysis