key: cord-0875143-tx217r8t authors: Mulholland, Rachel H; Vasileiou, Eleftheria; Simpson, Colin R; Robertson, Chris; Ritchie, Lewis D; Agrawal, Utkarsh; Woolhouse, Mark; Murray, Josephine L K; Stagg, Helen R; Docherty, Annemarie B; McCowan, Colin; Wood, Rachael; Stock, Sarah J; Sheikh, Aziz title: Cohort profile: Early pandemic evaluation and enhanced surveillance of COVID-19 (EAVE II) database date: 2021-04-16 journal: Int J Epidemiol DOI: 10.1093/ije/dyab028 sha: 1ae2ce58354314d29874733899bbad86c572349b doc_id: 875143 cord_uid: tx217r8t nan In December 2019, a novel coronavirus COVID-19 emerged from Wuhan, China, and was soon declared as pandemic by the World Health Organization (WHO) on the 11 March 2020. 1 The UK soon followed suit and implemented a national lockdown on the 23 March 2020. As of 9 December 2020, according to WHO, this highly infectious virus has infected more than 67 million people and led to over 1.5 million deaths across the world. 2 There is a growing body of evidence on the epidemiology of the condition, risk factors for poor outcomes and effects of interventions. [3] [4] [5] [6] [7] [8] [9] The rapid generation of robust data is crucial to monitor, understand and mitigate the effects of COVID-19. The Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) database creates a national, realtime prospective cohort using Scotland's health data infrastructure, to describe the epidemiology of COVID-19 infection, patterns of healthcare use and outcomes, and insights into the effectiveness of and safety of vaccines and treatments for COVID-19. 10 This work builds on an established cohort for seasonal and pandemic influenza vaccine and anti-viral assessment in Scotland EAVE (Early Estimation of Vaccine and Anti-Viral Effectiveness). 11, 12 EAVE is a dormant pandemic protocol that is part of the National Institute for Health Research (NIHR) Pandemic Preparedness Research Portfolio and a platform for previous studies on influenza vaccine and antiviral assessment. [12] [13] [14] [15] [16] Who is in the cohort? We obtained ethical approval from the National Research Ethics Service Committee, Southeast Scotland 02. This prospective baseline cohort study contains all 5.4 million individuals registered with a general practitioner (GP) in Scotland from 23 February 2020 which, according to the National Records of Scotland (NRS) 2019 mid-year estimates, covers around 98-99% of the Scottish population. 10, 17 A map of the baseline EAVE II cohort by the National Health Service (NHS) Health Board shows that most of the cohort are based in the central belt of Scotland ( Figure 1) . A summary of the baseline population by sex, age group (as of 23 February) and deprivation used the Scottish Index of Multiple Deprivation (SIMD) 18 and Scottish Government Urban Rural Classification. 19 SIMD is a measure of deprivation built on seven domains and is unique to Scotland, with lower quintiles representing the most deprived areas. 18 These primary care records are linked to other data sources from out-of-hours, emergency and secondary care. There are additional linkages to other datasets such as laboratory testing data, registration and mortality data, selfreported data and enhanced surveillance data such as the COVID-19 Clinical Information Network (CO-CIN). This is done using the Community Health Index (CHI), the unique identifier provided by NHS Scotland. It is allocated to all residents in Scotland registered with a GP and to all patients who receive care in Scotland, even if they are non-Scottish residents. 10 Summaries of these data sources are given in Table 2 , with a data flow diagram on how they are linked together in Figure 2 . This cohort therefore consists of specific groups of interest that are used in EAVE II sub-studies such as the COVID-19 in Pregnancy in Scotland (COPS) 37 and for investigating ethnic and social inequalities in COVID- 19. How often have they been followed up? The baseline GP records will be updated on a biannual to 3-monthly basis, if possible. The first update in early 2021 will contain COVID-19-specific GP codes that were created during the pandemic and were therefore missed in the initial extract. This will capture information on COVID-19 related appointments, vaccinations, therapies and vaccination-induced adverse effects. Information on influenza will also be included to facilitate analyses on the effectiveness of and safety of COVID-19-specific and preexisting vaccines, therapies and treatments. To facilitate All serology data will be provided by the 'Seroprevalence' work carried out and commissioned by the COVID-19 Enhanced Surveillance cell of Public Health Scotland (PHS) 10 Genome sequencing data b Positive laboratory RT-PCR swab samples for COVID-19 will also be sent to national sequencing centres where 500 COVID-19 genome sequences will be performed 10 (Continued) UK-wide research, QCOVID groups will also be added to allow validation of the QCOVID 'living' risk prediction model on the Scottish population. 38 Information on shielded risk groups will also be included to assess the impact of COVID-19 on those most at risk for severe illness where a 12-month self-isolation was recommended by the UK government on 23 March 2020. 39 Regular updates on a number of linked datasets and the underlying GP data will be undertaken on a daily, weekly or monthly basis, as available and necessary (see Table 3 ). Those who have transferred GPs within Scotland will stay in the cohort. Participants who die or permanently leave Scotland (and deregister from general practices) will drop out of the cohort. Characteristics of individuals lost to follow-up compared with those remaining in the cohort will also be provided in the study. Missing data will also be reported for each variable. Combining these rich data sources together provides a wealth of information on the natural history of the condition and patients' journeys across Scotland's NHS. We provide a high-level summary of key available data in Table 4 . Permissions to link these datasets were received in May 2020 and the flow of linked data began in June 2020. The initial GP data extract contained the baseline cohort and the EAVE II risk groups, which were based on the risk groups for seasonal influenza, as research at the time of extract did not know exact risk groups for COVID-19. This includes comorbidities and household characteristics, for Residents in Scotland are asked to fill in a census questionnaire every 10 years and provide information on their demographic (e.g. ethnicity), socioeconomic, health and other circumstances. NRS will provide data from the latest Scottish Census in 2011 28 Derived data COVID-19 shielding patient list b Uses a combination of primary and secondary care held in Public Health Scotland to derive groups considered to be at high risk if they contract COVID-19 29 Births and pregnancyrelated data The SBR is a web-based system developed on the NHSNet to ensure that every baby born in Scotland will have one record which will act as the foundation for future information collection. The system has been implemented to varying degrees in all Scottish hospitals providing midwifery and/or neonatal care 30 example an indicator of living in a care home. This EAVE II risk group dataset contained more individuals than the baseline cohort, with over-representation in certain populations. This is likely to have resulted in residents being registered at multiple GP practices, people who have left Scotland or visitors. To overcome this, weights were calculated by comparing the age and sex profile in the EAVE II cohort with the age and sex profile for the 2019 NRS midyear population estimates in Scotland. 17 A summary of the number of EAVE II risk groups using these weights is shown in the Supplementary material, along with the individual risk groups (Supplementary Table S1 , available as Supplementary data at IJE online). The following analyses were performed using these weights. Initial explorations showed that as age increased, lower levels of deprivation using SIMD quintiles slightly increased and the number of risk groups increased ( Figure 3 ). These did not differ substantially between sexes ( Figure 3 ). Since the first follow-up of COVID-19 outcomes from 1 March to 10 November, there have been a total of 835 803 (15.4%) tested, 57 416 (1.1%) with a positive test (out of the total cohort), 9847 (0.2%) hospitalized with COVID-19, 5350 (0.1%) admitted to an intensive care unit (ICU) or died with COVID-19 on the death certificate and 4726 (0.1%) who have died with COVID-19 on the death certificate within the EAVE II cohort. The proportions of these outcomes split by age and sex for the same time period show that more elderly residents have been tested with a resulting positive test (Figure 4) . Elderly residents, particularly males, are also more represented in the more severe outcomes (Figure 4) . These age profiles were repeated for deprivation levels (using SIMD quintiles), the number of risk groups and the 20 most frequent individual risk groups within the EAVE II study (Supplementary material). This showed that there were higher proportions of positive tests and more severe outcomes in more deprived areas, residents belonging to multiple risk groups and those who had comorbidities (Supplementary material, available as Supplementary data at IJE online). The map of the proportion of these outcomes by NHS Health Board demonstrated that despite high rates of testing in more rural areas in the northern and southern parts of Scotland, positive tests were low ( Figure 5 ). The central belt had a higher proportion of positive tests out of the total baseline population and higher rates of more severe COVID-19 outcomes ( Figure 5 ). All relevant R code scripts for the summary tables and figures will be made available on the EAVE II GitHub page [https://github.com/EAVE-II]. This will also contain a data dictionary for the entire EAVE cohort which will be updated when new updates and data linkages are made. We are currently working on the development of a national risk prediction algorithm to identify risk factors for poor outcomes i.e. hospitalisation and death from COVID-19, 10 What are the main strengths and weaknesses? The EAVE II cohort will be widely generalizable to the Scottish population as it contains all individuals registered within GP practices in Scotland, with exception of homeless, itinerant or travelling groups, those in prison, those who are institutionalized due to mental health reasons and other reasons. Regularly updating and monitoring this cohort over a long period of time will also be quick and cost effective as the underlying data sources are mainly routinely collected, quality assured and easily linkable using unique CHI numbers. This in turn means insights can be kept up to date with the rapidly evolving pandemic situation. The completeness and coverage, in terms of both population and breadth of data, are also a major strength. The key limitations are the possibility of some selection biases because of excluded patients, although this is estimated to be under 2% of the Scottish population, and the risk of residual confounding in the context of analytical epidemiological studies. Considerable care will need to be taken when making inferences about the effectiveness of interventions, because of non-randomized comparisons. Can I get hold of the data? Where can I find out more? Data can be accessed by contacting the corresponding author. For more information on the cohort, refer to the published EAVE II protocol. 10 The study findings will be presented at international conferences and published in peer-reviewed journals. Supplementary data are available at IJE online. Surveillance of COVID-19 (EAVE II) database creates a national, real-time prospective cohort using Scotland's health data infrastructure, to describe the epidemiology of COVID-19, patterns of health care use and outcomes, and insights into the effectiveness and safety of vaccines and treatments for COVID-19. As far as we are aware, EAVE II is the first national end-to-end clinical surveillance platform for COVID-19 predominantly using routinely available data. • This study contains all 5.4 million individuals registered with a GP in Scotland from 23 February 2020, covering 98-99% of the Scottish population. These primary care records are linked to other data sources from out-of-hours, community, emergency and secondary care, in addition to data on registrations and mortality, laboratory testing, selfreport and enhanced surveillance. • These data will be updated throughout the course of the pandemic. Participants who die or permanently leave Scotland (and deregister from general practices) will drop out of the cohort. Archived: WHO Timeline -COVID-19 COVID-19 Resource Centre Access to OUP Resources on COVID-19 Springer Nature. COVID-19 and COVID-19 BMJ's Coronavirus (Covid-19) Hub PLOS. COVID-19 Pandemic Early pandemic evaluation and enhanced surveillance of COVID-19 (EAVE II): protocol for an observational study using linked Scottish national data The UK's pandemic influenza research portfolio: a model for future research on emerging infections Early estimation of pandemic influenza Antiviral and Vaccine Effectiveness (EAVE): use of a unique community and laboratory national data-linked cohort study Vaccine effectiveness in pandemic influenza -primary care reporting (VIPER): an observational study to assess the effectiveness of the pandemic influenza A (H1N1)v vaccine Effectiveness of H1N1 vaccine for the prevention of pandemic influenza in Scotland, UK: a retrospective observational cohort study Seasonal Influenza Vaccine Effectiveness (SIVE): an observational retrospective cohort studyexploitation of a unique community-based national-linked database to determine the effectiveness of the seasonal trivalent influenza vaccine Evaluating the effectiveness, impact and safety of live attenuated and seasonal inactivated influenza vaccination: protocol for the Seasonal Influenza Vaccination Effectiveness II (SIVE II) study Mid-2019 Population Estimates Scotland Scottish Index of Multiple Deprivation Scottish Government Urban Rural Classification SubID¼9#Skip-Navigation Data and Intelligence. Primary Care Out of Hours Services. Emergency Care Data Dictionary A-Z. SMR00 -Outpatient Attendance Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study National Data Catalogue. Rapid Preliminary Inpatient Data (RAPID) Search Criteria for Highest Risk Patients for Shielding (v4.0) National Data Catalogue. Notification of Abortion Statistics (AAS) National Data Catalogue. Child Health Systems Programme -School Information Services Division (Public Health Scotland). National Data Catalogue. Scottish Immunisation Recall System (SIRS) COVID-19 in Pregnancy in Scotland (COPS): protocol for an observational study using linked Scottish national data Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: national derivation and validation cohort study Scottish Government. Coronavirus (COVID-19): Shielding Advice and Support Air Quality in Scotland. Scotland's environment