key: cord-1047032-8k6ioauv authors: Nguyen, Phuoc Truong; Plyusnin, Ilya; Sironen, Tarja; Vapalahti, Olli; Kant, Ravi; Smura, Teemu title: HaVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences date: 2021-02-13 journal: bioRxiv DOI: 10.1101/2021.02.12.431018 sha: d254308bfe32e8d8000ac2365c60e13c6544578e doc_id: 1047032 cord_uid: 8k6ioauv Background SARS-CoV-2 related research has increased in importance worldwide since December 2019. Several new variants of SARS-CoV-2 have emerged globally, of which the most notable and concerning currently are the UK variant B.1.1.7, the South African variant B1.351 and the Brazilian variant P.1. Detecting and monitoring novel variants is essential in SARS-CoV-2 surveillance. While there are several tools for assembling virus genomes and performing lineage analyses to investigate SARS-CoV-2, each is limited to performing singular or a few functions separately. Results Due to the lack of publicly available pipelines, which could perform fast reference-based assemblies on raw SARS-CoV-2 sequences in addition to identifying lineages to detect variants of concern, we have developed an open source bioinformatic pipeline called HaVoC (Helsinki university Analyzer for Variants Of Concern). HaVoC can reference assemble raw sequence reads and assign the corresponding lineages to SARS-CoV-2 sequences. Conclusions HaVoC is a pipeline utilizing several bioinformatic tools to perform multiple necessary analyses for investigating genetic variance among SARS-CoV-2 samples. The pipeline is particularly useful for those who need a more accessible and fast tool to detect and monitor the spread of SARS-CoV-2 variants of concern during local outbreaks. HaVoC is currently being used in Finland for monitoring the spread of SARS-CoV-2 variants. HaVoC user manual and source code are available at https://www.helsinki.fi/en/projects/havoc and https://bitbucket.org/auto_cov_pipeline/havoc, respectively. 2 variants of concern, we have developed an open source bioinformatic pipeline called HaVoC 27 (Helsinki university Analyzer for Variants Of Concern). HaVoC can reference assemble raw 28 sequence reads and assign the corresponding lineages to SARS-CoV-2 sequences. 29 30 Conclusions: HaVoC is a pipeline utilizing several bioinformatic tools to perform multiple 31 necessary analyses for investigating genetic variance among SARS-CoV-2 samples. The 32 pipeline is particularly useful for those who need a more accessible and fast tool to detect and 33 monitor the spread of SARS-CoV-2 variants of concern during local outbreaks. [20] . Early detection and understanding of the potential impact of emerging variants of SARS-CoV-2 207 is of primary importance and can assist in more efficient surveillance and control of the disease. 208 The likelihood of emergence of novel SARS-CoV-2 variants of concern is increased and 209 accelerated by the high mutation rates typical in RNA viruses and the growing number of 210 transmissions and infections both locally and globally. 211 With the rising number of variants detected worldwide and with many of them associated with 213 increased transmissibility and lower vaccine efficacy, there is an emerging need for fast, 214 efficient and reliable pipelines to help detect, identify and trace SARS-CoV-2 lineages. These 215 pipelines should in addition be accessible to researchers who may not be familiar with utilizing 216 complex bioinformatic tools or scripting pipelines. 217 218 Due to these challenges, we have developed HaVoC, a simple, reliable and user-friendly 219 pipeline, which can be simply downloaded from our repository and run without being installed. 220 All its dependencies can be installed via existing package managers, of which we recommend 221 Bioconda. HaVoC could help in the current pandemic situation by detecting variants of concern 222 in the sequencing centers and public health or other organisations currently running and tracing 223 variants of concern worldwide. HaVoC is currently utilized for detecting and tracing SARS-CoV-224 2 variants of concern, mainly B. Centers for Disease Control and Prevention (CDC) Zika: the origin and spread of a 246 mosquito-borne virus Risk factors for human disease emergence Host range and emerging and reemerging 251 pathogens Emerging Pandemic Diseases: How We Got to COVID-19 Worldometer -COVID-19 Virus Pandemic Accessed 3 Preliminary genomic 257 characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike 258 mutations Accessed 2 Early transmissibility assessment of the 262 N501Y mutant strains of SARS-CoV-2 in the United Kingdom Emergence 265 and rapid spread of a new severe acute respiratory syndrome CoV-2) lineage with multiple spike mutations in South Africa Covid-19: Novavax vaccine efficacy is 86% against UK variant and 60% against 269 South African variant Vaccine 2.0: Moderna and other companies plan tweaks that would 271 protect against new coronavirus mutations J&J says vaccine effective against Covid, though weaker against South Africa 273 variant Genomic 276 characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Centers for Disease Control and Prevention (CDC). Emerging SARS-CoV-2 Variants fastp: an ultra-fast all-in-one FASTQ preprocessor Trimmomatic: a flexible trimmer for Illumina sequence data Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 287 arXiv Fast gapped-read alignment with Bowtie 2 Sambamba: fast processing of NGS 291 alignment formats