key: cord-0729579-fd3bupeu authors: Barda, Noam; Dagan, Noa title: The role of observational studies based on secondary data in studying SARS-CoV-2 vaccines date: 2021-12-11 journal: Clin Microbiol Infect DOI: 10.1016/j.cmi.2021.12.006 sha: 381738a8f1f2d901cd61c857ec5b5c456b9a40ed doc_id: 729579 cord_uid: fd3bupeu nan The Covid-19 pandemic is an ongoing public health crisis of enormous proportions. Of the many public health interventions taken to mitigate and contain the pandemic's effects, SARS-COV-2 vaccines constitute a critical measure. As new vaccines are rapidly developed and the pandemic continues to evolve with new variants appearing and receding, many important scientific questions naturally rise. These questions demand valid and timely answers to inform policy, and randomized controlled trials (RCTs) can provide only some of them. Observational studies based on secondary data -registry and clinical data originally collected for other purposes -are being used to fill these gaps. In this issue of the journal, Vokó et al. report a study which makes use of Hungarian nationwide centralized vaccine and outcome registries to estimate and compare the effectiveness of five different SARS-CoV-2 vaccines against SARS-CoV-2 infection and Covid-19 related death, using regression to adjust for differences between the study populations. The study was performed during a period when the alpha (B.1.1.7) variant was dominant in Hungary. This interesting study has several strengths. First, the reality in Hungary, in which several vaccines were used concurrently, allows the authors to study these different interventions in a single setting. This is particularly interesting as Hungary deployed, and this study includes, SARS-CoV-2 vaccines that have yet to be approved by the EMA and have not been as extensively studied in real-world settings. Second, the use of nationwide linked registries, which include exposure and outcome information, leads to a large sample size with little-to-no selection of individuals; this allows for precise estimation (i.e., with narrow confidence intervals) that should also generalize well to other locales. Last, the authors perform multiple sensitivity analyses to explore different modelling options and time period definitions, finding their estimates robust to these choices. This study also has certain limitations, which the authors candidly acknowledge. First, without access to data on patient's baseline health status and health behaviors, the adjustment performed is minimal, likely resulting in residual confounding. This is particularly concerning because, as the authors state, "some vaccines were specifically indicated for use in elderly and chronically ill patients". Second, the authors opt to model all the follow-up time available for each patient at once, implicitly assuming a constant effect throughout the study period. With the growing evidence of waning immunity, we know this not to be true. Last, as has now been discussed extensively in other studies (1) , it is likely that not all infections are identified, and that this misclassification occurs differentially between treatment groups. While these limitations are important, the effects observed, which are congruent with previous studies, are informative and provide a valuable addition to existing evidence. RCTs are the gold-standard for medical scientific evidence. Owing to the benefits of randomization and adherence to strict protocols, the internal validity of the evidence generated by such trials is high. This validity underscores their crucial role in directing public health policy and regulatory approval of therapeutics. However, due to logistical and ethical considerations, RCTs cannot answer all scientific questions of interest, necessitating observational studies, today mostly based on secondary data sources. This was never more evident as in research on Covid-19 vaccines, where invariably RCTs answered initial questions regarding vaccine efficacy and safety, and observational studies proceeded to address a wide range of resulting issues, including real-world effectiveness, safety in regard to rare adverse events, waning immunity, effectiveness against different variants, effectiveness in pregnant women and more. The two types of study are complementary. For example, safety signals originally generated from RCTs (2) were further explored using observational studies with larger sample sizes (3) . In a more methodologically interesting example, RCTs established the early period following vaccination as a negative control outcome (in which no effect of the vaccine is expected) (2) , which was then used by observational studies to detect bias (4) . The main advantages of observational studies based on secondary data are the large sample size, which allows exploration of rare outcomes relating to vaccine effectiveness (e.g. severe disease and death) and vaccine safety, and exploration of outcomes within subgroups; the fact that they include less selected populations, such as individuals with unstable chronic conditions and pregnant women; their reflection of real-life conditions in which adherence to predetermined protocols may be less strict; the integration with different sources of data, which allows studying varying outcomes and adjusting for many confounders; and the immediate availability of the data with little additional costs, which allows rapid answers to emerging questions (e.g., waning immunity (5)). Observational studies that are based on secondary data sources also have important disadvantages for vaccine studies, as they do for other questions. The main disadvantage concerns the quality of the data, which is not collected for research purposes, and for which quality assurance measures vary between locales and times. To address this, the researcher must be intimately familiar with the data collection and curation mechanisms, to know which data are trustworthy and what corrections need to be made. A second major disadvantage is that secondary data sources amplify the usual threats to validity of observational studies. Specific variables that were not measured (e.g., behavioral factors) often make control of confounding impossible; measurement error is rife as e.g., individuals select whether to be tested (6) ; selection bias is a possibility as when including only individuals infected, tested or admitted to the hospital (7) ; and missing data is a constant threat. There are no easy solutions to any of these problems. Negative controls can be particularly helpful in these circumstances, and often complex methodology and many bias analyses are required to ensure valid conclusions. Despite these disadvantages, the crucial role played by observational studies based on secondary data during the Covid-19 pandemic cannot be ignored. As more highquality data infrastructures are created, integrating data on background clinical and sociodemographic characteristics with real-time data on relevant exposures (e.g., vaccination) and outcomes (e.g., infections, hospitalizations, deaths), the role of such studies is projected to grow, both within the context of infectious disease epidemiology and beyond. This emphasizes a goal that healthcare organizations must strive for -improving clinical databases so they can be more reliable for research. ND reports institutional grants to Clalit Research Institute from Pfizer outside the submitted work and unrelated to COVID-19, with no direct or indirect personal benefits. Protection of BNT162b2 Vaccine Booster against Covid-19 in Israel. The New England journal of medicine Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. The New England journal of medicine Safety of the BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Setting. The New England journal of medicine BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting. The New England journal of medicine Waning Immunity after the BNT162b2 Vaccine in Israel. The New England journal of medicine Effectiveness of a third dose of the BNT162b2 mRNA COVID-19 vaccine for preventing severe outcomes in Israel: an observational study. The Lancet Collider bias undermines our understanding of COVID-19 disease risk and severity