key: cord-0897619-xq8hhrw6 authors: Kumar, A.; Dwivedi, P.; Kumar, G.; Narayan, R. K.; Jha, R. K.; Parashar, R.; Sahni, C.; Pandey, S. N. title: Second wave of COVID-19 in India could be predicted with genomic surveillance of SARS-CoV-2 variants coupled with epidemiological data: A tool for future date: 2021-06-12 journal: nan DOI: 10.1101/2021.06.09.21258612 sha: 11c11b52c55b8b824f8d6e6968fedfc7e7a5be98 doc_id: 897619 cord_uid: xq8hhrw6 India has witnessed a devastating second wave of COVID-19, which peaked during the last week of April and the second week of May, 2021. We aimed to understand whether the arrival of second wave was predictable and whether it was driven by the existing SARS-CoV-2 strains or any of the emerging variants. We analyzed the monthly distribution of the genomic sequence data for SARS-CoV-2 from India and correlated that with the epidemiological data for new cases and deaths, for the corresponding period of the second wave. Our analysis shows that the first indications of arrival of the second wave were observable by January, 2021, and by March, 2021 it was clearly predictable. B.1.617 lineage variants drove the wave, particularly B.1.617.2 (a.k.a. delta variant). We propose that genomic surveillance of the SARS-CoV-2 variants augmented with epidemiological data can be a promising tool for predicting future COVID-19 waves. massive suffering and loss of lives, which could have been significantly minimized if timely forecasts were available. A lot of policy discourse has, since then, revolved around whether this wave was predictable and preventable. However, there are no definitive answers backed by the evidence available to this question. This article seeks to ascertain whether a specific variant of the SARS-CoV-2 drove the second wave and whether it was possible to predict this wave. The second wave in India was arguably triggered by an emerging lineage of SARS-CoV-2 variants B.1.617, particularly its sub-lineage B.1.617.2 (a.k.a. delta variant) (1). B.1.617 lineage was recently recognized as a global variant of concern (VOC) by World Health Organization. B.1.617 variant was first reported in India in October 2020 (9) , and the strain evolved to three more sub-lineages B. Accumulating evidence suggests that B.1.617 lineage variants are more transmissible (5-10) and perhaps more lethal (8) than B.1.1.7 (a.k.a. alpha variant), which had been a dominant strain in Indian population before the arrival of second wave (1) . The studies have shown a significant reduction in the neutralization against B.1.617 lineage variants by antibodies received from natural infections and manycurrently used COVID-19 vaccines and multiple monoclonal antibodies (5) (6) (7) (8) . Higher transmissibility and immunoescape is being reported for B.1.617.2, which is currently the fastest growing SARS-CoV-2 strain in the Indian population (1, (9) (10) . Until December 2020, B.1.1.7 was a dominant strain in the Indian population, although the first cases of B.1.617 lineage had started appearing by then. By April 2021 India entered a full blown second COVID-19 wave (1, 2) . To ascertain whether the second wave was predictable and whether it was driven by any existing SARS-CoV-2 lineage or by a single or group of emerging variants, we analyzed two types of data set retrieved from secondary sources. First, we analyzed monthly distribution of the genomic sequence data for SARS-CoV-2 from India available in EpiCoV TM database of Global Initiative on Sharing All Influenza Data (GISAID). We also analyzed the epidemiological data for new cases and deaths from COVID-19 (downloaded from Worldometer: https://www.worldometers.info/coronavirus/coronavirus/country/india/) for the period from 1 st December 2020 to 30 th April 2021. We analyzed a total of 10115 SARS-CoV-2 genomic sequences, which were uploaded in GISAID database (last reporting date 7.06.2021), showing dates of collection for the samples falling in the period of study. We then plotted the genomic sequence data and epidemiological data together and drew graphical correlations. Data for the individual variants were categorized, and graphs were plotted to visualize the trends. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. December, 2020 to 30th April, 2021. Data source: SARS-CoV-2 genomic sequence-GISAID database: https://www.gisaid.org/; Epidemiological data-Worldometer: https://www.worldometers.info/coronavirus/coronavirus/country/india) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 12, 2021. ; https://doi.org/10.1101/2021.06.09.21258612 doi: medRxiv preprint To know whether the rise in B.1.617 lineage variants were localized to certain geographical regions which may have influenced the collective data trends, we further analyzed the genomic sequence data from the states and union territories individually. A similar increase in the detection of B.1.617 lineage variants was observable in most of the states and union territories of India (for which genomic data was available), with fewer exceptions (Fig. S2) . However, totally different patterns were visible in Kerala and Punjab, where B.1.1.7 was still a dominant variant (Fig. S2) , and remotely placed states/union territories, such as, Ladakh and Andaman-Nicobar group of islands (Fig. S2) , where a previously dominant variant (B.1 lineage) was still prevalent by December, and very few genomic sequences have been uploaded from these places in the period of study. Further, we wanted to know which particular B.1.617 lineage variants were dominating in the studied time period. Interestingly, an intra-lineage competition was distinctly visible between the sub-lineages of B.1.617 (Fig. 2) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 12, 2021. ; https://doi.org/10.1101/2021.06.09.21258612 doi: medRxiv preprint We observe that that the first indications for arrival of the second wave were clearly observable by January, 2021 when rise of B.1.617 lineage surpassed all VOCs and VOIs (Fig.1) . Moreover, by March, 2021, the wave was clearly predictable, when B.1.617.2 showed a steep rise in parallel with a similar rise in monthly new cases and deaths. Analyses taken together distinctly delineate that the formation of the second COVID-19 wave in India was closely associated with the rise of B.1.617 lineage variants, particularly its sub-lineage B.1.617.2 (Figs 1-2) . There have been multiple limitations in our study which could have impacted the interpretation of the findings. Firstly, the samples used in our analyses were not representative for populations asfor many geographical regions that has been greatly disproportioned. Hence, the genomic sequence data presented in this study doesn't reflect the accurate epidemiological scale of spread of the variants in the reported geographical regions, but only shows their relative proportion in the samples for which genomic sequences were uploaded in GISAID database. We assumed that similar proportions between variants exist in the actual population. Secondly, there has been inconsistency in reporting and uploading of the genomic sequences, which constrained examining a daily trend in the spread of the variants. The paucity of the genomic sequences and inconsistency in their uploading on the used databases for some states/union territories the variant dominance difficult. Thirdly, we didn't include data of SARS-CoV-2 genomic sequences beyond 30 th April, 2020, when the second wave was supposedly reaching a peak. There had been sudden interruption in the genomic sequencing and uploading in GISAID database in the initial weeks of May, 2021 from the involved laboratories, probably due to the emergency situations created by the second wave, hence sufficient data couldn't availed for this period. However, this couldn't have significantly impacted the analyses as all epidemiological trends were clear by the end of April, 2021 defining a second COVID-19 wave. Based on the findings of this study, we conclude that genomic surveillance of the variants augmented with epidemiological data can be a promising tool for predicting eminent COVID-19 waves in advance as early as it starts rising. However, the accuracy of the prediction would largely dependent on the population matched viral genomic sequencing and consistency in . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 12, 2021. ; https://doi.org/10.1101/2021.06.09.21258612 doi: medRxiv preprint uploading of the data from all geographical regions, which currently seems a big hindrance restricting timely predictions. The genomic sequence data for SARS-CoV-2 and official epidemiological data for COVID-19 for the period from 1 st December 2020 to 30 th April 2021 from India were downloaded from EpiCoV TM database of Global Initiative on Sharing All Influenza Data (GISAID) and Worldometer: https://www.worldometers.info/coronavirus/coronavirus/country/india/) respectively. A total of 9994 SARS-CoV-2 genomic sequences were analyzed, which were uploaded in GISAID database (last reporting date 2.06.2021) showing dates of collection for the samples falling under in aimed period of study. The number of sequences for each SARS-CoV-2 variants was retrieved using automatic search function feeding information for the lineage/sub-lineage and collection dates in EpiCoV TM database of GISAID. Total number of sequence per month for the studied time period was noted for each variant and their relative proportions were calculated (in percent). Data was tabulated and monthly distribution of each variant was charted against the COVID-19 epidemiological data (total new case and deaths per month) and graphs were plotted to visualize the trends. Further genomic sequence of SARS-CoV-2 variants were analyzed for the individual states and union territories, to check if there has been any deviation from the collective data trends. The study has used SARS-CoV-2 genomic sequence and epidemiological data from GISAID (https://www.gisaid.org/) and Worldometer (https://www.worldometers.info/coronavirus/coronavirus/country/india) databases respectively. Author contributions: A.K., P.D., G.K., collected samples, and performed data analysis. A.K. wrote first draft. A.K., R.K.N., R.K.J., R. P., C.S., and S. N. P. edited paper. All authors consented for submission of the final draft. The study used publically available open access data hence an ethical approval was precluded. Funding declaration: None . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 12, 2021. ; https://doi.org/10.1101/2021.06.09.21258612 doi: medRxiv preprint Genomic characterization and Epidemiology of an emerging SARS-CoV-2 variant in Delhi, India Tracking SARS-CoV-2 variants B.1.617 Lineage Report Weekly epidemiological update on COVID-19 -11 The Spike Proteins of SARS-CoV-2 B. 1.617 and B. 1.618 Variants Identified in India Provide Partial Resistance to Vaccine-elicited and Therapeutic Monoclonal Antibodies SARS-CoV-2 variant B. 1.617 is resistant to Bamlanivimab and evades antibodies induced by infection and vaccination SARS-CoV-2 B. 1.617 emergence and sensitivity to vaccineelicited antibodies SARS CoV-2 variant B. 1.617. 1 is highly pathogenic in hamsters than B. 1 variant Reduced sensitivity of infectious SARS-CoV-2 variant B. 1.617. 2 to monoclonal antibodies and sera from convalescent and vaccinated individuals Effectiveness of COVID-19 vaccines against the B. 1.617. 2 variant