key: cord-0750580-vefbdwl1 authors: Anscombe, C.; Lissauer, S.; Thole, H.; Rylance, J.; Dula, D.; Menyere, M.; Kutambe, B.; van der Veer, C.; Phiri, T.; Banda, N. P.; Mndolo, K. S.; Mponda, K.; Phiri, C.; Mallewa, J.; Nyirenda, M.; Katha, G.; Cornick, J.; Mwandumba, H.; Gordon, S. B.; Jambo, K. C.; Feasey, N.; Blantyre COVID-Consortium,; Barnes, K. G.; Morton, B.; Ashton, P. M. title: Genomically informed clinical comparison of three epidemic waves of COVID-19 in Malawi date: 2022-02-19 journal: nan DOI: 10.1101/2022.02.17.22269742 sha: 59a6c01f5cf86cae7926ed253b2340538e2c6515 doc_id: 750580 cord_uid: vefbdwl1 Background Compared to the abundance of clinical, molecular, and genomic information available on patients hospitalised with COVID-19 disease from high-income countries, there is a paucity of data from low-income countries. Methods We enrolled a cohort of patients with PCR confirmed COVID-19 disease at Queen Elizabeth Central Hospital, the main hospital for southern Malawi, between July 2020 and September 2021. The recruitment period covered three waves of SARS-CoV-2 infections in Malawi. Clinical and diagnostic data were collected using the ISARIC clinical characterization protocol for COVID-19. The viral material from PCR-positive swabs was amplified with a tiling PCR scheme and sequenced using the MinION sequencer in Malawi. Consensus genomes were generated using the ARTIC pipeline and lineage assignment was performed using Pangolin. Results Sequencing data showed that wave one was predominantly B.1 (8/11 samples), wave two consisted entirely of Beta variant of concern (VOC) (6/6), and wave three was predominantly Delta VOC (25/26). Patients presenting in the second and third waves had progressively fewer underlying chronic conditions, and patients in the third wave had a shorter time to presentation (2 days vs 5 in the original wave). Multivariable logistic regression demonstrated increased mortality in wave three, dominated by the Delta VOC, compared to previous waves (OR 6.6 [CI 1.1-38.8]). Conclusions Patients hospitalised with COVID-19 in Blantyre during the Delta wave had more acute symptom onset; fewer underlying conditions; and were more likely to die. Whilst we demonstrate the value of linking virus sequence data with clinical outcome data in a low-income setting, this study also highlights the considerable barriers to establishing sequencing capacity in a setting heavily affected by disruptions in supply chain and inequity of resource distribution. Policy makers need robust data to inform the clinical and public health response to the COVID-19 pandemic. The International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) has developed a variety of tools and protocols to support the collection and analysis of data during the pandemic (1) (2) (3) . These simplify the establishment of observational cohorts, and enable high-quality, harmonised, clinical research in response to emerging threats. At Queen Elizabeth Central Hospital (QECH), Blantyre, patients have been enrolled under the ISARIC Tier 1 protocol since April 2020 (4) . We previously demonstrated that, in the first wave of infection, patients admitted to hospital with suspected COVID-19 who were PCR negative, but IgG positive for SARS-CoV-2 had analogous immunological profiles to those who were PCR positive. These patients were less likely to receive COVID-19 specific treatments such as dexamethasone. Previously, however, there was limited sequencing capacity at our institution and no description of viral genomes was possible. Genome sequencing has been essential to the global response to the COVID-19 pandemic. The early release of the Wuhan-1 genome sequence (5) enabled the development of specific diagnostic tests (6) and the design of mRNA vaccines, used to such great success in high-income countries (7, 8) . The evolution of the virus has led to the emergence of lineages designated variants of concern (VOCs), usually detected and defined by genome sequencing, and this has been one of the defining features of the pandemic to date (9, 10) . These VOCs have caused further global waves of infection with specific political and public health responses required for Alpha, Beta, Delta and Omicron VOCs. Linking of genomic data to clinical and public health data is important in determining the impact of viral mutations on disease severity and outcomes, particularly in areas where resources are constrained and there are high rates co-morbidity including HIV infection and TB (11) . Here, we describe the sequencing of the SARS-CoV-2 genomes from swabs collected from adult patients admitted to the hospital with symptomatic COVID-19 during three sequential waves of the pandemic. We place clinical outcome data in pathogen genomic context, to improve our understanding of the genomic epidemiology of the SARS-CoV-2 pandemic in Blantyre, Malawi. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) We prospectively recruited adult patients (>18 years) using the tier one sampling strategy from the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) Clinical Characterisation Protocol (CCP) (3), as previously described (4) . Patients were recruited at Queen Elizabeth Central Hospital (QECH), Blantyre, Malawi, which is a large referral hospital in Southern Malawi. During the recruitment period, patients with COVID-19 were cohorted in wards capable of providing oxygen therapy, but without capacity for invasive mechanical ventilation, intensive care facilities, continuous positive airways pressure (CPAP) or high flow oxygen. Patients with suspected or confirmed SARS-CoV-2 infection were approached for informed consent with an aim to recruit within 72 hours of hospital admission. Respiratory samples (combined nasopharyngeal and oropharyngeal swab) and peripheral blood samples were collected at the point of patient recruitment. SARS-CoV-2 PCR diagnostic testing was carried out as previously described (4) Clinical data were analysed using Stata V15.1 (StataCorp, Stata Statistical Software: Release 15, College Station, Texas, USA). Categorical variables were compared using Fisher's exact test. Continuous variables were tested for normality and appropriate statistical tests were applied; non-normally distributed measurements are expressed as the median [IQR] and were analysed by the Kruskal-Wallis test to compare clinical parameters across the three waves. The primary outcome variable was survival to hospital discharge. We selected the following covariates a priori to determine potential predictors of mortality: pandemic infection wave (W1: 04/2020 -10/2020, W2: 11/2020 -03/2021 and W3: 04/2021 -08/2021); vaccine status; age; sex; HIV infection status; prior diagnosis cardiac disease; prior diagnosis diabetes mellitus; time from symptoms to hospital admission; respiratory rate; and SpO 2. All the above variables are available at, or shortly after, hospital admission. Univariable and multivariable logistic regression analyses were fitted using the STATA "logistic" command to generate odds ratios and confidence intervals (data and code available in supplementary materials). The overall statistical significance of the difference in mortality between waves was assessed using a likelihood ratio test, comparing the univariable model against a null, intercept-only model and the full multivariable model against a null model with all covariates except for the categorical variable encoding the epidemic wave. Exact binomial confidence intervals for the proportion of each genotype during each wave were calculated in R v4.1.0 (12) using the binom.test function. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 19, 2022. ; https://doi.org/10.1101/2022.02.17.22269742 doi: medRxiv preprint SARS-CoV-2 molecular biology and genome sequencing Samples were extracted using the Qiasymphony-DSP mini kit 200 (Qiagen, UK) with offboard lysis. Samples were then tested using the CDC N1 assay to confirm the Ct values before sequencing. ARTIC protocol V2 sequencing protocol was used until June 2021, after which we switched to the V3 protocol. ARTIC version 3 primers were used for the tiling PCR until we switched to the University of Zambia (UNZA) primer set that provided good results for Delta VOC in August 2021 (13) . Initially two primer pools were used, however a third pool was made for primer pairs that commonly had lower depth compared to the average (details Supplementary Table 1 ). PCR cycling conditions were adapted to the new sequencing primers, with annealing temperature changed to 60 o C. Sequencing was carried out with the Oxford Nanopore Technologies MinION sequencer. Samples that had poor coverage (<70%) with the ARTIC primer set were repeated with the UNZA primer set. Raw FAST5 data produced by the MinION were processed with Guppy v5.0.7. FAST5s were basecalled with guppy_basecaller, basecalled FASTQs were assigned to barcodes using guppy_barcoder, including the `--require_barcodes_both_ends` flag. The per-sample FASTQ files were processed with the artic pipeline using the `medaka` option (14) . The lineage of each consensus genome was identified using pangolin with the following versions; pangolin v3.1.17, pangolearn 2021-12-06, constellations v0.1.1, scorpio v0.3.16, pango-designation used by pangoLEARN/Usher v1.2.105, pango-designation aliases v1.2.122 (15) . Samples were reanalysed when the Pangolin database was updated. The run was repeated if there was contamination in the negative control. To set reasonable Ct thresholds for selecting samples to sequence in future work, we plotted the true positive rate versus the false positive rate (i.e. ROC curves) for a range of Ct thresholds from 15 to 40, where the true positive rate was defined as the proportion of samples with a genome coverage >=70% that had a Ct below the threshold. The false-positive rate was defined as the proportion of samples with a genome coverage <70% that had a Ct below the threshold. Code to calculate the values for the ROC curves is available herehttps://gist.github.com/flashton2003/bb690261106dc98bb1ae5de8a0e61199. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2022. ; https://doi.org/10.1101/2022.02.17.22269742 doi: medRxiv preprint Between July 2020 and September 2021, we recruited 245 adults with COVID-19, using the ISARIC Clinical Characterisation Protocol. Participant characteristics are given in Table 1 There were no significant differences in sex or median age between the waves (Table 1) 3.54 -114.68) and admission during wave 3 (OR 6.59 CI: 1.11 -38.85) were independently associated with increased mortality for our patient cohort. There was no contribution to outcome from vaccine status, sex, HIV infection, presence of co-morbidities days from symptoms to admission or respiratory rate within the multivariable model ( Table 2 ). The multivariable likelihood ratio test for presence or absence of admission wave within the model demonstrated a significant effect (Chi 2 = 6.31, p = 0.043). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 2) . We sequenced 102 samples from 102 patients and obtained 43 genomes with more than 70% coverage at 20x depth (Supplementary Table 2 ). Low coverage of the genome (<70%) was related to low viral load. This was true for both ARTIC v3 and UNZA tiling PCR primer sets separately ( Figure 1) . Overall, the median Ct value of samples with <70% coverage was 30.7, compared with 24.5 for those above this threshold (Supplementary Table 2 ). ARTIC v3 produced significantly lower median genome coverage than UNZA for samples with Ct values less than 30 (68% vs 76%, Kolmogorov-Smirnov P-value = 0.0003). Characteristics of the sub-group of patients whose SARS-CoV-2 consensus genome had >=70% coverage are available in Supplementary Table 3 . Successful sequencing was more likely in females who formed only 34% of participants but gave rise to 63% of high coverage sequences. We produced ROC curves showing the True Positive Rate and False Positive Rate at a range of Ct thresholds (Supplementary Figure 2) . Based on visual inspection of these ROC curves, we chose Ct value thresholds of 28 for ARTIC v3 and 27 for UNZA as they provided a balance between reducing wasted sequencing runs, and generating as many sequences as possible for our purposes. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2022. We observed three pangolin lineages among the 11 SARS-CoV-2 samples from wave 1 ( . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2022. ; https://doi.org/10.1101/2022.02.17.22269742 doi: medRxiv preprint We established a platform for genome sequencing and analysis in Blantyre, Malawi and used it to sequence SARS-CoV-2 from a cohort of patients hospitalised with COVID-19 to investigate whether and how variants of concern (VOCs) influenced clinical outcomes. The first wave was predominantly B.1 and B.1.1. All successfully sequenced cases during the second wave were caused by Beta VOC. Whilst the number of successfully sequenced cases from the second wave was low, our data are consistent with data reported to GISAID from other researchers in Malawi confirming the dominance of Beta VOC in the second wave, whilst the Delta VOC dominated the third wave. Age ≥ 70 and SpO 2 ≤ 87% at admission were independently associated with increased risk of death within both univariable and multivariable analyses. Our patient cohort presented with fewer chronic medical conditions in the second and third waves (cardiac disease and diabetes) but were more likely to be administered treatments such as steroids and antibiotics. This may represent increased adherence to local treatment guidelines and improved clinical experience in managing COVID-19 and/or that the Beta and Delta VOCs were associated with more severe illness in otherwise healthy individuals (16) . Time to hospital presentation was significantly lower in the third wave, potentially suggesting that disease progression was more rapid or that patients were more aware of the need to present to hospital earlier, or that people had higher trust in the ability of the healthcare system to manage COVID-19. Multivariable analysis demonstrated that in-patient mortality amongst the recruited cohort was higher during the third/Delta VOC wave, compared to other waves (17) (18) (19) . Throughout the study there was no invasive or non-invasive ventilatory support available for COVID-19 patients and no access to Interleukin 6 antagonists, which are recommended for severe disease by the WHO (since July 2021). For clinical comparisons, our recruited cohort represented a sample of those presenting to hospital, mediated by clinical decisions and guidelines which changed over time. Together with population-level changes in health-seeking behaviour, caution is warranted in the interpretation of excess mortality being due to genetic variant alone. However studies from other settings have demonstrated increased hospitalisation or death in patients infected with the Delta VOC compared to other genetic lineages (17, 19) . There is a paucity of linked clinical data and sequencing data from LMIC settings, despite it being a hugely valuable resource and providing contextually useful information. This finding supports ongoing research, upscaling of sequencing capacity and highlights the importance of collaborative platforms such as ISARIC to draw firm conclusions about the impact of genetic variants across the sub-Saharan African region. No patients in this cohort were fully vaccinated, with 18% of patients in the third wave having received one vaccine. Malawi introduced COVID vaccination in March 2021 between the second and third COVID waves. As of October 1 st 2021, at the end of the third wave, 2.5% of the population of Malawi were fully vaccinated (available vaccines at that time were Oxford/AstraZeneca ChadOx1-S and Johnson and Johnson), with a further 2.5% having received a single dose of Oxford/Astra-Zeneca recombinant vaccine (Public Health Institute of Malawi publicly available data). Although numbers of vaccinated participants are low, there is a higher proportion of vaccinated individuals within the cohort than in the general population, and the reasons behind this are not clear. This may represent a more COVID-aware population attending the treatment centres or increased access/uptake of vaccines during the COVID wave by people . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2022. ; https://doi.org/10.1101/2022.02.17.22269742 doi: medRxiv preprint in urban centres. Given the small numbers and recent introduction of vaccines with intermittent availability it is difficult to draw conclusions from this dataset. With an overall rate of complete vaccination of 4% Malawi is below the continental fully vaccinated rate of 11% (20) , these low rates illustrate the unique challenges and inequities in tackling COVID-19 in LMIC. Vital to our success in establishing surveillance of SARS-CoV-2 in Malawi was the portability of the MinION sequencer; the public lab protocols (18) ; bioinformatics software from the scientific community (13) ; and the infrastructure and funding available to us as an international research institution. The MinION has become a vital part of outbreak response, as demonstrated for SARS-CoV-2 in Africa (19, 20) and elsewhere, and also during previous emerging viral outbreaks such as Ebola (21) and Zika (22). However, even with a portable and low-maintenance sequencer (with no service contracts or engineer visits required), experienced molecular biologists and bioinformaticians, and considerable international support, it was still very challenging to establish sequencing capability. We found it difficult to procure reagents, and this barrier to establishing sequencing capacity was compounded by border closures and travel restrictions. The pandemic has highlighted the inequity of health-related resource distribution and reinforced the need for prioritised distribution networks and more regional manufacturing of laboratory equipment and consumables. While the MinION sequencing platform is easily set up, the need for cold chain reagents and the short shelf life of flow cells makes maintaining a real-time sequencing service difficult. The development of more stable reagents, such as lyophilised enzymes, would increase the affordability and accessibility of this technology. Computationally, the inconsistent internet at the time of this study was a hurdle in setting up a server with the requisite software installed. The current bioinformatic trends of containerisation (i.e. where the software required is setup and packaged by a third party, alongside the operating system and dependencies required to run the software) and virtual environments are significant advantages for reproducibility, but they are "greedy" in terms of bandwidth. To install a single tool often requires the download of an entire operating system in the form of a Docker container. As our computer hardware was based in Blantyre, Malawi, once the initial setup was achieved, we did not need to transfer large amounts of data internationally, which was a significant advantage given the intermittent internet connection. Using a bioinformatics "lab-on-an-SSD" is one potential approach to solving the challenges of computational setup in settings with inconsistent internet connection. Our study has several limitations. Firstly, we produced a relatively small number of sequences. This was partly due to the limited number of patients recruited into the study during each wave but also because patients frequently presented with Ct values that were too high to produce good quality sequence data. Secondly, our observations are limited to a single centre in the Southern region of Malawi, however they appear to be broadly consistent with the national picture. Finally, we may not be capturing the full diversity of SARS-CoV-2 circulating in the community, as our sampling of hospitalised patients represents a considerable bias towards people with severe disease, and there is likely to be significant under ascertainment nationally (21) . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 19, 2022. ; https://doi.org/10.1101/2022.02. 17.22269742 doi: medRxiv preprint This inequity in the availability of clinical and preventative interventions was mirrored by the lack of timely sequencing data available to inform national public health measures and to contribute to international databases. The recent Omicron VOC was first described in South Africa in November 2021 because facilities were available to link clinical and laboratory observations -despite the barriers we faced, at the start of the fourth wave, we were able to confirm the presence of Omicron VOC within 4 weeks of its first detection globally and within three days of the swab being taken. The value of opensource clinical science in pandemic response: lessons from ISARIC. The Lancet Infectious Diseases Characterisation of inhospital complications associated with COVID-19 using the ISARIC WHO Clinical Characterisation Protocol UK: a prospective, multicentre cohort study. The Lancet Distinct clinical and immunological profiles of patients with evidence of SARS-CoV-2 infection in sub-Saharan Africa Novel 2019 coronavirus genome -SARS-CoV-2 coronavirus CDC's Diagnostic Test for COVID-19 Only and Supplies Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations -SARS-CoV-2 coronavirus Genomic Epidemiology Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Why genomic sequencing is crucial in COVID-19 response WHO | Regional Office for Africa R: A language and environment for statistical computing Austria: R Foundation for Statistical Computing First COVID-19 case in Zambia -Comparative phylogenomic analyses of SARS-CoV-2 detected in African countries ARTIC field bioinformatics pipeline Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool. Virus Evolution First case report of a successfully managed severe COVID-19 infection in Malawi Hospital admission and emergency care attendance risk for SARS-CoV-2 delta (B.1.617.2) compared with alpha (B.1.1.7) variants of concern: a cohort study. The Lancet Infectious Diseases Hospitalisation associated with SARS-CoV-2 delta variant in Denmark. The Lancet Infectious Diseases Increased risk of hospitalisation and death with the delta variant in the USA. The Lancet Infectious Diseases COVID-19 Vaccination SARS-CoV-2 exposure in Malawian blood donors: an analysis of seroprevalence and variant dynamics between The authors thank all study participants and the staff of the Queen Elizabeth Central Hospital (QECH) for their support and co-operation during the study. We would like to thank all the people mentioned in Supplementary File 1 for sharing their data to GISAID. We have no conflicts of interest to declare. All genome sequences are available in GISAID and INSDC databases -accessions are available in Supplementary Table 2.