key: cord-1045999-v034mb7z
authors: Lin, Dan-Yu; Zeng, Donglin; Mehrotra, Devan V; Corey, Lawrence; Gilbert, Peter B
title: Evaluating the Efficacy of COVID-19 Vaccines
date: 2020-12-19
journal: Clin Infect Dis
DOI: 10.1093/cid/ciaa1863
sha: 9e5ab8bccc48a407ecdb6f885a1ea9719ccd5d00
doc_id: 1045999
cord_uid: v034mb7z

A large number of studies are being conducted to evaluate the efficacy and safety of candidate vaccines against novel coronavirus disease-2019 (COVID-19). Most Phase 3 tri- als have adopted virologically confirmed symptomatic COVID-19 disease as the primary efficacy endpoint, although laboratory-confirmed SARS-CoV-2 is also of interest. In addi- tion, it is important to evaluate the effect of vaccination on disease severity. To provide a full picture of vaccine efficacy and make efficient use of available data, we propose using SARS-CoV-2 infection, symptomatic COVID-19, and severe COVID-19 as dual or triple pri- mary endpoints. We demonstrate the advantages of this strategy through realistic simulation studies. Finally, we show how this approach can provide rigorous interim monitoring of the trials and efficient assessment of the durability of vaccine efficacy.

There is an urgent need to develop effective vaccines against SARS-CoV-2, the virus causing the global COVID-19 pandemic. Several candidate vaccines have shown strong immune responses and acceptable safety profiles and have moved rapidly into large-scale Phase 3 trials. 1−8 As of December 8, 2020, a total of 28 Phase 3 trials on 13 with Matrix-M1 TM adjuvant. 6 The vaccine regimens have generally protected against COVID-19 disease endpoints in animal models 5 and have induced binding and neutralizing antibody responses to vaccineinsert Spike proteins in most vaccine recipients, exceeding response levels seen in convalescent sera. [2] [3] [4] 6 The antibody marker endpoints are of the types that have been accepted as surrogate endpoints for many approved vaccines 9 Most Phase 3 trials have adopted virologically confirmed symptomatic COVID-19 illness as the primary efficacy endpoint, although laboratory-confirmed SARS-CoV-2 is also acceptable. 10 It is possible that a vaccine is much more effective in preventing severe than mild COVID-19. Thus, we should also evaluate the effect of vaccination on severe COVID-19. 10 However, a large sample size is likely required for a trial using a severe COVID-19 endpoint.

We propose using SARS-CoV-2 infection, symptomatic COVID-19, and severe COVID-19 as triple primary endpoints or using SARS-CoV-2 infection and symptomatic COVID-19 or symptomatic COVID-19 and severe COVID-19 as dual primary endpoints, the specific choice depending on the expected incidence of the three events and on the targeted vaccine efficacy for the three endpoints. This approach incorporates more evidence on vaccine efficacy into decision making than using only one of the three events as the primary endpoint. It can improve statistical power and increase the likelihood of meeting vaccine success criteria, thus accelerating the discovery and licensure of effective vaccines.

We consider the endpoints of SARS-CoV-2 infection, symptomatic COVID-19, and severe COVID-19, referring to them as infection, disease, and severe disease, respectively. Suppose that a large number of individuals are randomly assigned to vaccine or placebo and that the trial records whether or not each participant has developed each of the three endpoints by the end of follow-up, as well as their length of follow-up.

We formulate the effect of the vaccine on each of the three endpoints through a Poisson model. Although investigators are mainly interested in the first occurrence of each event, the Poisson modeling approach provides a reasonable approximation to the data because the event rates for all three endpoints are relatively low. We define the vaccine efficacy in terms of the proportionate reduction in the event rate between vaccinated and un-vaccinated individuals.

The criteria for claiming that a vaccine is successful should be strict enough to ensure A c c e p t e d M a n u s c r i p t 5 worthwhile efficacy. A vaccine whose efficacy is higher than 50% can markedly reduce incidence of COVID-19 among vaccinated individuals and help to build herd immunity. An advisory panel convened by the World Health Organization (WHO) recommended 50% vaccine efficacy for at least 6 months post vaccination as a minimal criterion to define an efficacious vaccine. 11 The US Food and Drug Administration (FDA) guidance defines vaccine success criteria as a point estimate of vaccine efficacy at least 50% and the interim-monitoring adjusted lower bound of the 95% confidence interval exceeding 30%. 10 The FDA guidance criteria do not specify a minimum period of follow-up. However, given the intent of current vaccine development to identify efficacious vaccines within several months of trial initiation, the expectation seems to be reliable evidence for vaccine efficacy over approximately 6 months, consistent with the WHO recommendation.

Many Phase 3 trials specify assessment of vaccine efficacy over longer-term follow-up as an important study objective. The FDA guidance document states that "A lower bound ≤ 30% but > 0% may be acceptable as a statistical success criterion for a secondary efficacy endpoint, provided that secondary endpoint hypothesis testing is dependent on the success on the primary endpoint." This statement refers to earlier FDA guidance on a fixedsequence testing method, 12 under which vaccine efficacy is tested against a sequence of secondary endpoints in a pre-defined order, where tests of each endpoint are performed at the same significance level (one-sided type I error of 2.5%), moving to the next endpoint only after a success on the previous endpoint. The WHO Solidarity Trial protocol 13 specifies symptomatic COVID-19 through longer term follow-up (ideally 12 months or more) and severe COVID-19 over the same time frame as secondary endpoints. Following these guidelines and precedents, we consider hypothesis testing of vaccine efficacy over 12 months as a secondary analysis, using a null hypothesis that is less stringent than the 30% null hypothesis value used for the primary analysis, recognizing that it is more difficult for a vaccine to provide 12-month than 6-month protection and that even moderate vaccine efficacy through 12 months could be an important characteristic of a COVID-19 vaccine. In sum, we consider both the assessment of vaccine efficacy against primary endpoints over six months, using a A c c e p t e d M a n u s c r i p t 6 30% null hypothesis, and the assessment of vaccine efficacy against the same endpoints over 12 months, using a 0% or 15% null hypothesis.

For each of the three endpoints, we obtain the maximum likelihood estimator for the vaccine efficacy under the Poisson model. In addition, we calculate the score statistic for testing the null hypothesis that the vaccine efficacy is less than a certain lower limit, say 30%, against the alternative hypothesis that the vaccine efficacy is greater than the lower limit; we divide the score statistic by its standard error to create a standard-normal test statistic.

We propose to test all three null hypotheses, adjusting the significance threshold for the three test statistics to control the overall type I error at the desired level. We consider a vaccine to be successful if any of the three null hypotheses is rejected. We describe this multiple testing method in greater detail in Supplemental Appendix 1, where we also describe a sequential testing procedure to determine which of the three null hypotheses should be rejected.

In the sequential testing procedure, we order the three hypotheses according to the order of the three observed test statistics, from the most extreme observed value to the least extreme. We test the first null hypothesis using the significance threshold from the aforementioned multiple testing procedure. If the first null hypothesis is rejected, we test the second null hypothesis by applying the multiple testing procedure to the remaining two test statistics. If the second null hypothesis is rejected, we test the last null hypothesis by using the unadjusted significance threshold.

Clearly, this sequential testing procedure is more powerful than the multiple testing procedure in identifying which endpoints the vaccine is efficacious against. Both the proposed multiple testing and sequential testing methods properly account for the correlations of the test statistics and thus are more powerful than the conventional Bonferroni correction and related multiplicity adjustments that assume independence of tests.

If the effects of a vaccine are expected to be similar among the three endpoints, then we can enhance statistical power by combining the evidence of the vaccine effects on the three A c c e p t e d M a n u s c r i p t 7 endpoints and performing a single test of overall vaccine efficacy. Specifically, we propose taking the sum of the three score statistics and dividing the sum by its standard error to create a standard-normal test statistic. We refer to this method as the combined test (Supplemental Appendix 1); this is in the same vein as combining estimators for a common effect in meta-analysis. 14 Instead of the triple primary endpoints, we may consider the dual primary endpoints of infection and disease if severe disease is very rare or the dual primary endpoints of disease and severe disease if the vaccine is expected to be only weakly effective against infection.

Clearly, the above methods can be modified to test only two of the three endpoints.

It is desirable to periodically examine the accumulating data from a Phase 3 trial, so that the trial can be terminated if sufficient evidence emerges for a highly effective vaccine or a weakly effective candidate. In order to obtain rigorous stopping boundaries for a trial, we need to derive the joint distribution of the test statistics over interim looks. In Supplemental Appendix 2, we show that the proposed test statistics over interim looks are jointly normal with the independent increment structure, such that standard methods for interim analyses 15−18 can be applied.

We first conducted a series of simulation studies to compare the performance of the proposed methods with the use of a single primary endpoint in evaluating short-term vaccine efficacy. We assigned 27,000 subjects to vaccine or placebo at a ratio of 1:1. We assumed that subjects were enrolled at a constant rate over a 2-month period and vaccine efficacy was evaluated 6 months after the first subject was enrolled. We let 1% of the placebo subjects to acquire infection, 0.6% to experience disease, and 0.12% to develop severe disease (Supplemental Appendix 3). These event proportions were based on the assumption of annualized incidence of about 1.5% for symptomatic COVID-19 disease in the placebo group, together with the assumptions that about 40% of infections are asymptomatic and that about 20% of symptomatic COVID-19 cases will be severe. We set the vaccine efficacy for disease, denoted A c c e p t e d M a n u s c r i p t 8 by VE D , to 60%; we set the vaccine efficacy for infection, denoted by VE I , to 40%, 50%, 55% or 60%; and we set the vaccine efficacy for severe disease, denoted by VE S , to 60%, 70%, 80% or 90% (Supplemental Appendix 3). For each combination of VE I , VE D , and VE S , we simulated 100,000 datasets. (The average number of each endpoint can be easily calculated.

For example, there are approximately 189 cases of infection, 113 cases of disease, and 23 cases of severe disease under VE I = VE D = VE S = 0.6.) In each data set, we tested the null hypothesis that the vaccine efficacy is at most 30% against the alternative hypothesis that the vaccine efficacy is greater than 30% at the one-sided nominal significance level of 2.5%. Table 1 summarizes the power of various methods for testing the null hypothesis of no worthwhile efficacy (i.e., at most 30%). Using the single endpoint of disease has 80% power under VE D = 60%. Indeed, we chose the sample size and disease rate in the placebo group to achieve this power, which is considered the benchmark for other methods. When VE I is equal to or slightly below VE D , the single endpoint of infection is more powerful than the single endpoint of disease (e.g., 96% versus 80% power under VE I = VE D = 60%) because infection is more frequent than disease. Due to low incidence, the single endpoint of severe disease has poor power unless VE S is very high (e.g., 69% and 91% power under VE S = 80% and 90%, respectively). The combined test for the dual endpoints of infection and disease and the combined test for the triple endpoints are substantially more powerful than using disease as the single endpoint when VE I is similar to VE D (e.g., 94% and 93% power for the two combined tests versus 80% power for the single endpoint of disease under VE I = VE D = VE S = 60%). The combined test for the dual endpoints of disease and severe disease is more powerful than the single endpoint of disease when VE S is high (e.g., 93%

versus 80% power under VE S = 90%). The combined test is more powerful than multiple testing for the dual endpoints of disease and severe disease, but the opposite is true for the dual endpoints of infection and disease and the triple primary endpoints when VE I is low. The proposed multiple-testing method is appreciably more powerful than Bonferroni correction.

In order to investigate the ability of the proposed methods in detecting long-term vaccine A c c e p t e d M a n u s c r i p t 9 efficacy, we extended the follow-up time in the above simulation studies from a maximum of 6 months to a maximum of 12 months. We assumed that the event proportions for infection, disease, and severe disease in the placebo group over the 12-month period doubled those of the 6-month period. We reduced all values of vaccine efficacy by 30% to reflect the waning of vaccine efficacy against each endpoint over time. We tested the null hypothesis that the vaccine efficacy is 0% versus the alternative hypothesis that the vaccine efficacy is greater than 0% at the nominal significance level of 2.5%. The results are summarized in Table 2 .

Again, the proposed methods can substantially improve statistical power.

We have presented a simple and rigorous framework to consider the totality of evidence when evaluating the benefit of a COVID-19 vaccine in reducing SARS-CoV-2 infection, symptomatic COVID-19, and severe COVID-19. The proposed methods are more robust to different scenarios of vaccine efficacy than the use of a single primary endpoint. We recommend using the combined test to provide an overall assessment of worthwhile vaccine efficacy, then using the sequential test (Supplemental Appendix 1) to determine the endpoints against which the vaccine is efficacious.

If a vaccine is more effective in preventing severe than mild COVID-19, then using symptomatic COVID-19 and severe COVID-19 as dual primary endpoints will be more powerful than using either of the two events as a single primary endpoint. If the vaccine efficacy for infection is nearly as high as that for disease, then using infection, symptomatic COVID-19, and severe COVID-19 as triple primary endpoints will be the most powerful.

Most Phase 3 trials have targeted 90% power for detecting 60% (short-term) vaccine efficacy against COVID-19 disease. The actual power may be lower if the vaccine is less effective, the disease incidence is lower than anticipated, or it is an interim analysis. In our simulation studies, using disease as a single primary endpoint had only 80% power. However, the proposed methods could boost the power to 90%.

A c c e p t e d M a n u s c r i p t 10 19 as the sole primary endpoint, assessing severe COVID-19 as a secondary endpoint and assessing a composite burden-of-disease endpoint as either a secondary endpoint or an exploratory endpoint. 19 Under such a plan with a fixed-sequence strategy, hypothesis testing on secondary endpoints would be permitted only if the result on the primary endpoint is statistically significant. 12 In the likely scenarios that VE S is higher than VE D , using disease and severe disease as dual primary endpoints will be more powerful than using disease alone as the sole primary endpoint and thus may accelerate the discovery and deployment of effective vaccines.

We have focused on vaccine trials for populations enriched with high-risk individuals (e.g., front-line health-care personnel, factory workers, older adults, people with underlying health conditions), in which the risks for infection, disease, and severe disease are all appreciable.

In generally healthy populations, such as college students, the majority of infections are asymptomatic, and severe disease is rare. For such settings, power can be maximized by using the dual primary endpoints of infection and disease.

We have used Poisson models instead of Cox proportional hazards models for several reasons. First, there are considerable inaccuracies in determining the event times, especially the infection time; the Poisson modeling approach requires only the knowledge of whether or not the event has occurred by the end of follow-up. Second, Poisson models are simpler than Cox models, both conceptually and computationally. Because the event rates are relatively low, the two modeling approaches should provide similar results. 20 We fitted both Poisson and Cox models in our simulation studies, and the power of the two approaches was nearly identical (Supplemental Appendix 3).

We have emphasized hypothesis testing based on score statistics. In Supplemental Appendix 4, we extend our work to general Poisson regression, which can be used to estimate vaccine efficacy, construct confidence intervals, compare multiple vaccines, and accommodate baseline risk factors (e.g., age, gender, race, occupation, co-morbidity). Baseline risk factors can have major impact on the occurrences of SARS-CoV-2 infection, symptomatic COVID-19, and severe COVID-19. In addition, some participants in COVID-19 vaccine effi-A c c e p t e d M a n u s c r i p t 11 cacy trials may become unblinded through the use of available diagnostic tests, and at some point trials may become unblinded. Covariate adjustment in the analysis of vaccine efficacy against endpoints during post unblinding follow-up is important for minimizing bias due to potential differences in exposure to SARS-CoV-2 between the vaccine and placebo arms.

We have developed our methods in order to accelerate the discovery, characterization, and licensure of effective COVID-19 vaccines. An important function of the Phase 3 trials is to continue the unblinded follow-up of the vaccine and placebo groups after definite evidence of short-term efficacy has emerged, so as to assess duration of protection and improve precision for assessment of prevention of severe disease, as well as for assessment of safety. Duration of vaccine efficacy is an influential parameter in models of population impact of deployed vaccines, and understanding of how vaccine efficacy wanes over time is essential to deciding whether or not booster vaccinations may be required and to estimating the optimal timing of the boosts. The ability of our framework to use the joint distribution of the estimators to provide more precise confidence intervals around the three vaccine efficacy parameters than existing methods (e.g., Bonferroni correction) that do not account for the correlation of endpoints is advantageous regardless of whether one, two, or three endpoints are selected as primary.

M a n u s c r i p t 12 M a n u s c r i p t 15 

Effect of an inactivated vaccine against SARS-CoV-2 on safety and immunogenicity outcomes: Interim analysis of 2 randomized clinical trials

An mRNA vaccine against SARS-CoV-2-preliminary report

Phase 1/2 study of COVID-19 RNA vaccine BNT162b1 in adults

Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single-blind, randomised controlled trial

Single-shot Ad26 vaccine protects against SARS-CoV-2 in rhesus macaques

Phase 1-2 Trial of a SARS-CoV-2 Recombinant Spike Protein Nanoparticle Vaccine

Draft landscape of COVID-19 candidate vaccines

COVID-19 vaccine trials should seek worthwhile efficacy

Updates on immunologic correlates of vaccine-induced protection

Development and Licensure of Vaccines to Prevent COVID-19: Guidance for Industry. for-covid-19-vaccines

Multiple Endpoints in Clinical Trials: Guidance for Industry

An international randomised trial of candidate vaccines against COVID-19

On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

A multiple testing procedure for clinical trials

Discrete sequential boundaries for clinical trials

The B-value: a tool for monitoring data

Group Sequential Methods With Applications to Clinical Trials

Clinical endpoints for evaluating efficacy in COVID-19 vaccine trials

The efficiency of the proportions test and the logrank test for censored survival data

The authors are grateful to Yu Gu and Bridget I. Lin for assistance and to two referees for constructive comments. This work was supported by the National Institutes of Health. 

A c c e p t e d M a n u s c r i p t 13 A c c e p t e d M a n u s c r i p t