key: cord-0790445-3w7dvc4j authors: Xi, Binbin; Chen, Zixi; Li, Shuhua; Liu, Wei; Jiang, Dawei; Bai, Yunmeng; Qu, Yimo; Rumdon Lon, Jerome; Huang, Lizhen; Du, Hongli title: AutoVEM2: a flexible automated tool to analyze candidate key mutations and epidemic trends for virus date: 2021-09-04 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2021.09.002 sha: 7500ccba585295c540baf3d4318a843c351527e7 doc_id: 790445 cord_uid: 3w7dvc4j In our previous work, we developed an automated tool, AutoVEM, for real-time monitoring the candidate key mutations and epidemic trends of SARS-CoV-2. In this research, we further developed AutoVEM into AutoVEM2. AutoVEM2 is composed of three modules, including call module, analysis module, and plot module, which can be used modularly or as a whole for any virus, as long as the corresponding reference genome is provided. Therefore, it’s much more flexible than AutoVEM. Here, we analyzed three existing viruses by AutoVEM2, including SARS-CoV-2, HBV and HPV-16, to show the functions, effectiveness and flexibility of AutoVEM2. We found that the N501Y locus was almost completely linked to the other 16 loci in SARS-CoV-2 genomes from the UK and Europe. Among the 17 loci, 5 loci were on the S protein and all of the five mutations cause amino acid changes, which may influence the epidemic traits of SARS-CoV-2. And some candidate key mutations of HBV and HPV-16, including T350G of HPV-16 and C659T of HBV, were detected. In brief, we developed a flexible automated tool to analyze candidate key mutations and epidemic trends for any virus, which would become a standard process for virus analysis based on genome sequences in the future. SARS-CoV-2 has infected over 151,812,556 people and caused 3,186,817 deaths by 2 May 2021 [1] . At present, a variety of vaccines against SARS-CoV-2 are being used over the world, including mRNA-1273 [2] , BNT162b2 [3] , CoronaVac [4] and so on, hoping to form the effect of herd immunity. However, it is reported that N501Y mutation in the spike protein may reduce the neutralization sensitivity of antibodies, and may influence the effectiveness of some vaccines [5] . Therefore, real-time monitoring the epidemic trend of SARS-CoV-2 mutations is of great significance to the update of detection reagents and vaccines. In our previous work, we found 9 candidate key mutations [6] , including A23403G causing D614G amino acid change on the S protein, which has been proved to increase the infectivity of SARS-CoV-2 by several in vitro experiences [7] [8] [9] [10] [11] . With the further global spread of SARS-CoV-2, it is difficult to prevent its mutation. Therefore, we proposed an innovative and integrative method that combines high-frequency mutation site screening, linkage analysis, haplotype typing and haplotype epidemic trend analysis to monitor the evolution of SARS-CoV-2 in real time. And we developed the whole process into an automated tool: AutoVEM [12] . We further found that the 4 highly linked sites (C241T, C3037T, C14408T and A23403G) of the previous 9 candidate key mutations have been almost fixed in the virus population, and the other 5 mutations disappeared gradually [12] . In addition, we found another 6 candidate key mutations with increased frequencies over time [12] . Our research on the trend of haplotype prevalence and other studies on the trend of single site prevalence both show that SARS-CoV-2 is constantly emerging new mutations, and the frequency of some mutations is increasing over time, while the frequency of some mutations is decreasing or even completely disappearing over time [6, 12, 13] . The consistent findings indicated that the integrative method we proposed is reliable. Moreover, the haplotype prevalence trend we used makes the new epidemic mutants less complicated. However, AutoVEM we developed is only for SARS-CoV-2 analysis. With the changes in the global natural environment, new and sudden infectious diseases are continuously emerging, such as the outbreak of SARS in Feb 2003 [14] , MERS in 2012 [15] , Ebola in 2014 [16] , and the ZIKV in 2015 [17] . Therefore, we need a more flexible automated tool to identify and monitor the key mutation sites and evolution of various viruses. In this research, we further developed AutoVEM into AutoVEM2. AutoVEM2 is composed of three different modules, including call module, analysis module and plot module. The call module can carry out quality control of genomes and find all single nucleotide variations (SNVs) for any virus genome sequences with various optional parameters. The analysis module can carry out candidate key mutations screening, linkage analysis, haplotype typing with optional parameters of mutation frequency and mutation sites. And the plot module can visualize the epidemic trends of haplotypes. The three modules can be used modularly or as a whole for any virus, as long as the corresponding reference genome is provided. Therefore, AutoVEM2 is much more flexible than AutoVEM. Here, we analyzed 3 existing viruses by AutoVEM2, including SARS-CoV-2, HBV and HPV-16, to show its functions, effectiveness and flexibility. The SARS-CoV-2 genomes from the UK, Europe, and the USA were analyzed separately due to their large number of SARS-CoV-2 genomes in the GISAID. In addition to existing viruses, AutoVEM2 can also be used to analyze any virus that may appear in the future. We think our integrated analysis method and tool could become a standard process for virus mutation and epidemic trend analysis based on genome sequences in the future. AutoVEM2 is a highly specialized, flexible, and modular pipeline for quickly monitoring the candidate key mutations, haplotype subgroups, and epidemic trends of different viruses by using virus whole genome sequences. It is written in Python language, in which Bowtie 2 [18] , SAMtools [19] , BCFtools [20] , VCFtools [21] and Haploview [22] are used. AutoVEM2 consists of three modules, including call module, analysis module, and plot module, which can be used modularly or as a whole, and each module performs specific function(s) (Fig 1) . The call module performs the function of finding all SNVs for all genome sequences. The input of the call module is a folder that stores formatted fasta format genome sequences. The call module processes are as follows: 1. Quality control of genome sequence according to four optional parameters: --length, --number_n, --number_db, and --region_date_filter. The analysis module performs three functions: screening out candidate key mutations, linkage analysis of these candidate key mutations, and acquiring the haplotype of each genome sequence according to the result of linkage analysis. Linkage disequilibrium (LD) is the correlation between nearby variations, resulting a different correlation relationship compared with random association of alleles at different loci. The analysis on LD can help understanding the history of changes in population size and the patterns of gene exchange [22] . Haplotype identification is another method that helps understanding the role of key mutation sites, and tracking the population size of different haplotypes may provide new insights to virus control and medicine developing [23] . The linkage analysis is performed by Haploview v4.2 (command: java -jar Haploview.jar -n -skipcheck -pedfile -info -blocks -png -out) [24] , which calculates several metrics such as D' [25] , and this metric can reveal the linkage disequilibrium between two genetic markers (in the present study, genetic markers refer to the key mutation sites). Higher D' value corresponds to higher degree of linkage disequilibrium. The input is the snp_merged.tsv file produced by the call module. The analysis module processes are as follows: 1. Count the mutation frequency of all mutation sites. 2. Screen out candidate key mutation sites according to the --frequency (default 0.05) optional parameter, and candidate key mutation sites can also be specified by the --sites optional parameter. 3. Nucleotides at these specific sites of each genome are extracted and organized according to the order of genome position. 4. Linkage analysis of these specific sites by Haploview v4.2 [24] . 5. Acquire haplotypes using Haploview v4.2 [24] . Define the haplotype of each genome sequence according to the haplotype sequence, and if frequency of one haplotype <1%, it will be defined as "other". This finally results in a tsv file named data_plot.tsv. The plot module performs the function of visualizing epidemic trends of each haplotype in different countries or regions. The input of the plot module is the data_plot.tsv file produced by the analysis module. The plot module processes are as follows: 1. Divide the whole time into different time periods according to the --days parameter. 2. Count the number of different haplotypes in each time period of different countries or regions. 3. Visualize the statistical results. SARS-CoV-2 whole genome sequences of the United Kingdom, Europe (Table 1) . All HBV and HPV-16 nucleotide sequences, including whole genome sequences and fragments of whole genome, were downloaded from NCBI, resulting in 119,721 and 10,269 sequences, respectively (Table 1) . Reference genome sequences of the three viruses were downloaded from NCBI ( Table 1 ). The genome sequences were processed by in-house python script to make them meet the input format of AutoVEM2. Each formatted sequence consisted of two sections, the head section and the body section. The head section started with a greater than sign, followed by the virus name, sequence unique identifier, sequence collection time, and country or region where the sequence was collected, which were separated by vertical lines. And the body section was the nucleotide sequence. For SARS-CoV-2, sequences with length < 29000, number of unknown bases > 15, number of degenerate bases > 50, number of indels > 2, or unclear collection time information or country information were filtered out [6, 12] . Finally, there were 79,269 sequences of the UK, 139,703 sequences of Europe, and 30,142 sequences of the USA (Table 1 ). All SNVs of these genomes were found by the call module. Mutation sites with mutation frequency ≥ 0.15 of the UK and Europe (in order to include the five high linkage sites we found before [12] ), and 0.25 of the USA would be as their candidate key mutation sites. Linkage analysis of these specific sites was performed and haplotype of each genome sequence was obtained by the analysis module. Epidemic trends of each haplotype were visualized by the plot module. (Table 1) The naming of the haplotypes of SARS-Cov-2 is based on our previous works [6, 12] . The first letter "H" represents "haplotype". In our study in the early stage of the pandemic (2019.12 -2020.05.05), we found 9 specific mutation sites (C241T, C3037T, C8782T, C14408T, C17747T, A17858G, C18060T, A23403G, and T28144C) of SARS-CoV-2. The population of SARS-CoV-2 could be divided into four major haplotypes (H1, H2, H3, and H4, the number after the letter "H" named according to their proportion of the population, the bigger the proportion, the smaller the number) and some minor haplotypes according to the 9 mutation sites [6] . Among these haplotypes, H1 contains 4 of the 9 specific sites, including C241T, C3037T, C14408T, and A23403G, and H1 has been the most prevalent haplotype all over the world since March 2020. In our subsequent study, we found that the 4 sites of H1 have been fixed in the SARS-CoV-2 population and the others have gradually disappeared over time. In addition, we found other 6 specific mutation sites: T445C, C6286T, C22227T, G25563T, C26801G, and G29645T. Combined with the above 4 mutation sites of H1, there were 10 specific mutation sites. And we could get 3 haplotypes with large proportion: H1-1, H1-2, H1-3, according to the 10 sites (the proportion of H1-2 is bigger than H1-3). Thereinto, H1-1 has no other specific mutation sites based on H1; H1-2 has other 5 specific mutation sites (T445C, C6286T, C22227T, C26801G, and G29645T) based on H1; H1-3 has another 1 mutation site (G25563T) based on H1 [12] . In the present study, we found another haplotype H1-3-2, which has one more mutation C1059T based on the H1-3. H1-4-1 and H1-4-2 have the same prefixes "H1" and "H1-4", for that they were found later than H1-3 and have the same other 17 mutation sites based on the H1 haplotype. And H1-4-2 has one more A17675G mutation based H1-4 (H1-4-1). Other haplotypes are named according to the same rule described above. For HBV and HPV-16, sequences with length <90% and the number of unknown bases >1% the length of reference genomes were filtered out, resulting in 11,088 HBV genome sequences and 1,637 HPV-16 genome sequences. All SNVs of HBV and HPV-16 were found using the call module. Mutation sites with mutation frequency ≥ 0.25 of HBV and HPV-16 would be as the candidate key mutations. Linkage analysis of these specific sites was performed and haplotype of each genome sequence was obtained by the analysis module. (Table 1) The candidate key mutation sites of SARS-CoV-2 in the UK, Europe, and the USA were annotated by an online tool of China National Center for Bioinformation (https://bigd.big.ac.cn/ncov/online/tool/annotation?lang=en), respectively. The candidate key mutation sites of HBV and HPV-16 were annotated by in-house python scripts, respectively. Among the random mutations in virus genome, the mutation sites which have a positive effect on the adaptability of the virus trend to gradually accumulate in the virus population, which means if a mutation or a haplotype accumulates in the virus population gradually, it may suggest this mutation or haplotype may have a "positive" effect on the survival or spread of the virus [12] . Since mutation sites with higher frequency are worthy for further epidemiological study [12] , only those sites with a relatively high mutation frequency were kept for further analysis in the present study ( Fig S1) . Therefore, the mutation sites with a frequency higher than 0.25 were selected in most of the datasets, except for the UK and Europe SARS-CoV-2 data, the cutoff were set to 0.15 to include five high linkage sites we found before [12] , because the mutation frequency of these sites changed by the increasing of samples. The same 27 candidate key mutation sites were screened from the 79,269 SARS-CoV-2 (UK) and 139,703 SARS-CoV-2(Europe) genomes with frequency cutoff of 0.15. Through linkage analysis of the 27 sites, it can be divided into 6 and 5 haplotypes with a proportion ≥1% for the UK and Europe, respectively. The 13 candidate key mutation sites were screened from the 30,142 SARS-CoV-2(USA) genomes with frequency cutoff of 0.25. Through linkage analysis of the 13 sites, the SARS-CoV-2 in the USA can be divided into 21 haplotypes with a proportion ≥1%. (Table 2) The 7 of HBV and 12 of HPV-16 candidate key mutation sites were found from the 11,088 HBV genomes and 1,637 HPV-16 genomes with frequency cutoff of 0.25, respectively. HBV and HPV-16 can be divided into 24 and 18 haplotypes with a proportion ≥1% by the 7 sites and 12 sites, respectively. (Table 2) The detailed information for the 27 candidate key mutation sites screened from the UK and Europe was showed in Table 3 . According to the linkage analysis, only 6 and 5 haplotypes with a frequency ≥1% were found and accounted for 93.47% and 85.77% of SARS-CoV-2 population in the UK and Europe, respectively (Table 4) , which showed highly linked among the 27 candidate key mutation sites (Fig 2A, Fig 2B) . For the UK, the 5 of 6 haplotypes (including H1-1-1, H1-2-1, H1-4-1, H1-4-2, and H1-4-3), which derived from H1 with previous 4 specific mutation sites (C241T, C3037T, C14408T, and A23403G) [6] , accounted for 91.95% of the population (Table 4 ). H1-1-1 with only previous 4 specific mutation sites had almost disappeared in the UK by early 2021 (Fig 3) . H1-2-1 with previous 4 specific mutation sites and the other 5 specific mutation sites (T445C, C6286T, C22227T, C26801G, and G29645T) appeared around July 21, 2020, became one of the major haplotypes circulating in the UK in early December 2020 [12] , and gradually decreased, and there was only a very small population still circulating by late Feb, 2021 (Fig 3) . While H1- 4-1 with previous C5388A, C5986T, T6954C, C14676T, C15279T, T16176C, A23063T, C23604A, C23709T, T24506G, G24914C, C27972T, G28048T, A28111G, and C28977T) with mutation frequencies around 0.78, and H1-4-2 with one more mutation site (A17615G) compared with H1-4-1 showed a trend of increasing gradually since early December, 2020. And H1-4-1 and H1-4-2 had become the dominant epidemic haplotypes in the UK by early February, 2021 (Fig 3) . Notably, the H1-4-1 and H1-4-2 haplotypes both had A23063T mutation causing the N501Y mutation on the S protein, and the N501Y Among the 17 sites, 11 caused amino acid changes, of which 5 mutation sites were located on the S protein (including N501Y, P681H, T716I, S982A, and D1118H) ( Table 3 ). This may influence the epidemic traits of SARS-CoV-2 and the effectiveness of vaccines, especially mRNA vaccines. For Europe, the 5 haplotypes were the same as the 5 of 6 haplotypes of the UK (Table 4 ). Among the 5 haplotypes, 4 haplotypes (including H1-1-1, H1-2-1, H1-4-1, and H1-4-2) derived from H1 with previous 4 specific sites accounted for 84.67% of the population. And the epidemic trends of H1-1-1, H1-2-1, H1-4-1, and H1-4-2 were similar to those in the UK (Fig 4) . That is, the H1-1-1 and H1-2-1 were gradually decreased, while the H1-4-1 and H1-4-2 were gradually increased. The detailed information for the 13 candidate key mutation sites screened from the USA was showed in Table 5 . According to the linkage analysis, 21 haplotypes with a frequency ≥ 1% were found and accounted for 87.94% of SARS-CoV-2 population in the USA (Table 6) , which showed some degree linked among the 13 candidate key mutation sites (Fig 2C) . Among the 21 haplotypes, H1-1-1, H1-3-2, and H1-3-3, with a frequency >5%, all derived from H1 with previous 4 specific sites [6] (Table 6 ). H1-1-1 with previous 4 specific sites had a stable proportion (about 18%) between December 1, 2020 and February 28, 2021 in the USA (Fig 5) . H1-3-2 and H1-3-3 were derived from H1-3 directly, and H1-3 derived from H1 directly with one more mutation site (G25563T) compared with H1 [6, 12] . H1-3-2 had previous 5 specific sites (C241T, C3037T, C14408T, A23403G, and G25563T) [12] and C1059T ( Table 5, Table 6 ), which had a stable prevalent trend between December 01, 2020 and February 02, 2021 in the USA (Fig 5) . H1-3-3 had previous 5 specific sites and 8 new missense mutation sites (C1059T, C10319T, A18424G, C21304T, G25907T, C27964T, C28472T, and C28869T) ( Table 5, Table 6 ), which increased gradually between December 01, 2020 and February 02, 2021 in the USA (Fig 5) . In general, the haplotype subgroup diversity in the USA is much more complicated than those of in the UK and Europe. The detailed information for the 7 candidate key mutation sites screened from HBV genomes was showed in Table 7 . 5 of the 7 sites were missense mutations, including 356S>A (T192G), 444S>P (T456C), 807D>V (A1546T), 10R>K (G2337A) on P gene, and 331A>V (C659T) on the S gene (Table 7) . These 5 mutations were all on the P gene or the overlapping part of the P gene and other genes. Linkage analysis and haplotype analysis were performed and found 24 haplotypes with a proportion ≥ 1%, of which there was not a major haplotype, indicating that the 7 sites of HBV had a low degree of linkage( Fig S2A, Table S1 ). The detailed information for the 12 candidate key mutation sites screened from HPV-16 genomes was showed in Table 7 . Among them, 8 specific mutations were missense mutation, including 83L>V (T350G) on the E6 gene, 219P>S (C3409T) on the E2 gene, 39I>L (A3977C) and 60I>V (A4040G) on the E5 gene, 43E>D (A4363T) and 330L>F (A5224C) on the L2 gene, 228H>D (C6240G) and 292T>A (A6432G) on the L1 gene. Linkage analysis and haplotype analysis were performed on the 12 specific mutation sites and screened out 18 haplotypes with a proportion ≥1% (Table S2) , and the 12 specific sites showed a low degree of linkage (Fig S2B) . Among the 18 haplotypes, there were 5 major haplotypes with a frequency ≥4%, including H1, H2, H3, H4, and H5. The haplotype H2 had 5 specific mutation sites (A2925G, T4226C, A4363T, G4936A, and A5224C). H4 has 9 specific mutation sites (A2925G, C3409T, A3977C, A4040G, A4363T, G4936A, A5224C, A6432G, and G7191T), and H3 had one more mutation site (T350G) compared with H4, while H1 had two more mutation sites (T350G and T4226C) compared with H4. (Table 7, Table S2 ) In this study, we developed a flexible tool to quickly monitor the candidate key mutations, haplotype subgroups, and epidemic trends for different viruses by using virus whole genome sequences, and analyzed a large number of SARS-CoV-2, HBV and HPV-16 genomes to show its functions, effectiveness and flexibility. Module and Plot Module. It is developed for researchers who intend to analyze the haplotypes of any virus genome. It could be very easy for users who have the basic knowledge of Linux OS following the installation and running documentation. By applying the commonly used filtering threshold, the impact of ambiguous nucleotides can be reduced by the QC step. Besides, compared with the phylogenetic tree building based lineage identification tools such as PANGO lineages and NextStrain clades [26, 27] , the efficiency of AutoVEM2 is much higher because of mutation filtering and haplotype-based variation tracking. Haplotype based method does not need to deal with the evolution relationship with all SNV, different key mutation accumulations in haplotypes can be used to determine the haplotype subtypes evolution relationship. Therefore, the speed of haplotype based epidemic trends and evolution analysis, which can also track different linages, is much faster than the phylogenetic tree building methods. For the UK and Europe, we obtained the same 27 candidate key mutation sites, which could divide the SARS-CoV-2 population into 6 and 5 haplotypes, respectively. From the epidemic trend analysis, it showed that H1-4-1 and H1-4-2 with N501Y mutation on the S protein, which almost completely linked with the other 16 loci, had continued increasing from early December 2020 and became the dominant epidemic haplotypes in the United Kingdom and Europe by late February 2021. The B.1.1.7 lineage [28], corresponding to H1-4-1 and H1-4-2, has been reported that it has a more substantial transmission advantage based on several epidemiology researches [29, 30] and is greater in infectivity and adaptability [31] . Several studies have reported that the The present study proposed a new integrative method and developed an efficient, flexible automated tool to screen out the candidate key mutations and monitor haplotype epidemic trends over time for any virus evolution. This new integrated analysis tool will be significant for monitoring the variation, candidate key mutations and haplotype subgroup epidemic trends for any virus evolution effectively. In addition, it could identify the key mutation sites that may be related to infectivity, pathogenicity or host adaptability of virus quickly and accurately by combining epidemic trends and clinical information. Generally, this tool has the potential to become a standard method for virus mutation and epidemic trend analysis based on large number of genome sequences in the future. Through the analysis of 79,269(the UK) and 139,703(Europe) SARS-CoV-2 genomes, the same 27 candidate key mutation sites were found, including the N501Y mutation on the S protein, and the N501Y mutation was found completely linked to the other 16 specific sites. Through the analysis of SARS-CoV-2 in the USA, 13 candidate key mutation sites were found. Compared with the UK and Europe, a more complicated haplotype subgroup diversity is observed in the USA. Through the analysis of 11,088 HBV genomes and 1,637 HPV-16 genomes, some valuable mutations, including the T350G of HBV and the C659T of HPV-16, were detected. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. BX developed the tool, carried out the data analysis, and wrote the manuscript. ZC revised the manuscript. SL collected the data and wrote the manuscript. WL collected the data. DW, YB, YQ, RL, and LH revised the manuscript. HD conceived and supervised the study and revised the manuscript. The developed AutoVEM2 software has been shared on the website (https://github.com/Dulab2020/AutoVEM2) and can be freely available. All data relevant to the study are included in the article or uploaded as supplementary information. Not required. Table 1 : Information of SARS-CoV-2, HBV, and HPV-16 genomes and the analysis process of the three viruses Table 2 : Candidate key mutation sites and haplotypes results of SARS-CoV-2, HBV, and HPV-16 Table 3 : The annotation of the 27 sites of SARS-CoV-2(UK and Europe) with a mutation frequency ≥15% Table 4 : Haplotypes and their frequencies of the 27 sites of SARS-CoV-2(UK and Europe) Table 5 : The annotation of the 13 sites of SARS-CoV-2(USA) with a mutation frequency ≥25% Table 6 : Haplotypes and their frequencies of the 13 sites of SARS-CoV-2(USA) Table 7 . The annotation of the 7 sites of HBV and 12 sites of HPV-16 with mutation frequency ≥25% Table S1 : Haplotypes and their frequencies of the 7 sites of HBV genomes was referred to AutoVEM [12] . An mRNA Vaccine against SARS-CoV-2 -Preliminary Report Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine Safety, tolerability, and immunogenicity of an inactivated SARS-CoV-2 vaccine in healthy adults aged 18-59 years: a randomised, double-blind, placebo-controlled, phase 1/2 clinical trial. The Lancet infectious diseases KSCR: mRNA-1273 vaccine induces neutralizing antibodies against spike mutants from global SARS-CoV-2 variants Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes reveal its epidemic trends The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types Structural Impact of Mutation D614G in SARS-CoV-2 Spike Protein: Enhanced Infectivity and Therapeutic Opportunity Bimodular effects of D614G mutation on the spike glycoprotein of SARS-CoV-2 enhance protein processing, membrane fusion, and viral infectivity The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity AutoVEM: An automated tool to real-time monitor epidemic trends and key mutations in SARS-CoV-2 evolution GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia The Ebola outbreak, 2013-2016: old lessons for new epidemics Góes Cavalcanti L: Zika virus outbreak in Brazil Fast gapped-read alignment with Bowtie 2 The Sequence Alignment/Map format and SAMtools A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data The variant call format and VCFtools Linkage disequilibrium--understanding the evolutionary past and mapping the medical future HLA-DRB1* and DQB1* allele and haplotype diversity in eight tribal populations: Global affinities and genetic basis of diseases in South India Haploview: analysis and visualization of LD and haplotype maps On measures of gametic disequilibrium Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool COG-UK: COG-UK update on SARS-CoV-2 Spike mutations of special interest Quantifying the transmission advantage associated with N501Y substitution of SARS-CoV-2 in the UK: an early data-driven analysis Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom Emerging SARS-CoV-2 variants reduce neutralization sensitivity to convalescent sera and monoclonal antibodies