key: cord-0984117-8qgzo7ne authors: Harris, R. Alan; Raveendran, Muthuswamy; Lyfoung, Dustin T.; Sedlazeck, Fritz J; Mahmoud, Medhat; Prall, Trent M.; Karl, Julie A.; Doddapaneni, Harshavardhan; Meng, Qingchang; Han, Yi; Muzny, Donna; Wiseman, Roger W.; O’Connor, David H.; Rogers, Jeffrey title: Construction of a new chromosome-scale, long-read reference genome assembly for the Syrian hamster, Mesocricetus auratus date: 2021-11-03 journal: bioRxiv DOI: 10.1101/2021.07.05.451071 sha: 2cb565996f8d1cab157fee6be3ac29a49a9944a9 doc_id: 984117 cord_uid: 8qgzo7ne Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome- scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes are annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models. The Syrian hamster (Mesocricetus auratus) has been suggested as a useful 26 mammalian model for a variety of diseases and infections, including infection with 27 respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was 28 generated in 2013 using whole-genome shotgun sequencing with short-read sequence 29 data. Current more advanced sequencing technologies and assembly methods now 30 permit the generation of near-complete genome assemblies with higher quality and 31 greater continuity. 32 Findings 33 Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) 34 using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-35 scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 36 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits 37 significantly improved continuity with a scaffold N50 that is 6.7 times greater than 38 The Syrian hamster (Mesocricetus auratus, NCBI:txid10036) has been used in 55 biomedical research for decades because it is a good model for studies of cancer [1] , 56 reproductive biology [2] and infectious diseases [3, 4] , including SARS-CoV-2, influenza 57 virus, and Ebola virus [5] [6] [7] [8] [9] . The use of Syrian hamsters in research has declined [10], 58 likely due to advances in the genetic and molecular tools available for other rodents, 59 especially laboratory mice, and not to a reduction in the utility of hamsters in biomedical 60 research [3] . 61 Syrian hamsters are particularly important for COVID-19 research. They spontaneously 62 develop more severe lung disease than other animal models, such as wild-type mice, 63 macaques, marmosets, and ferrets [5, [11] [12] [13] [14] . After intranasal infection, Syrian hamsters 64 consistently show signs of respiratory distress, including labored breathing, but typically 65 recover after 2 weeks [15] . This is in stark contrast to wild-type laboratory mice that are 66 minimally susceptible to most SARS-CoV-2 strains that were circulating in 2020, though 67 laboratory mice may be more susceptible to certain variants of concern that began 68 circulating in 2021 [8, 16] Oxford Nanopore long-read sequencing 124 We prepared three separate genomic DNA isolates from the same Syrian hamster 125 (BioSample SAMN18096087). These aliquots were sheared to distinct target fragment 126 lengths (10 kb, 20kb and 30kb) in order to assess the effect of fragment size on flowcell 127 yield and improve efficiency. The two smaller length fragment libraries were sheared 128 using Covaris gTube and the 30kb targeted size library was fragmented with Diagnode 129 Megarupter 3, all following manufacturer's recommendations. The Oxford Nanopore 130 sequencing libraries were prepared using the ONT 1D sequencing by ligation kit (SQK-131 LSK109 The optical mapping analysis identified 84 conflicts with the prior Flye/Pilon scaffolds 207 and these initial scaffolds were broken at those 84 sites. The completed assembly was 208 submitted to NCBI and is available under accession GCA_017639785. 1. 209 Gene annotation 210 NCBI performed gene annotation using RNA-Seq data including data consisting of 211 multiple tissues including lung, trachea, brain, olfactory bulb and small intestine that are 212 Despite having a similar total length, BCM_Maur_2.0 shows an improved continuity with 271 a scaffold N50 that is 6.7 times greater than MesAur1.0 ( Table 1) Table 1 The improved Syrian hamster assembly and annotation described here will facilitate 375 research into this important animal model for COVID-19. Specifically, reagents for 376 studying immune re sponses in hamsters have lagged behind those available for 377 laboratory mice. BCM_Maur_2.0 will facilitate the identification of cross-reactive 378 reagents originally developed to study immunity in other species. Additionally, a more 379 accurate genome assembly will improve the analyses of host responses to infection by 380 enabling more accurate interpretation of RNA-seq experiments. 381 20 Relative to other recent assemblies that use a combination of long-read sequencing and 382 short-read polishing, this genome assembly and annotation compares very favorably. 383 The scaffold N50 of >85 Mbp is quite consistent with other long read assemblies. The 384 contig N50 and total number of scaffolds or contigs are likewise reasonable and 385 consistent with other similar mammalian reference genomes. The number of protein 386 coding genes identified is within the expected range, although additional attention will 387 likely be needed to resolve duplicated, repetitive gene loci, potentially leveraging recent 388 advances in ultralong read sequencing. 389 What additional genomic resources would be needed to make hamsters a better model 390 for COVID-19? Deep long read transcriptome analysis of multiple tissues and ages 391 would be the best next step, in order to define not just the genes expressed but the 392 alternative splicing of genes across tissues and developmental stages. Also, long read 393 RNA-seq of tissues following experimental challenge with SARS-CoV-2 and other 394 viruses would facilitate improvements in the quality of antiviral gene models. 395 The availability of higher accuracy sequences should lead to the development of 396 specific reagents for monitoring immune responses. For example, epitopes that are 397 shared between hamsters and other rodents can be used to identify monoclonal 398 antibody reagents for flow cytometry that are predicted to be cross-reactive. Additional 399 reagent development will be enabled by creating synthetic versions of hamster proteins 400 that can be used as immunogens to make hamster-specific antibodies. 401 One surprising motivation for this study is that Syrian hamsters, which were quickly 402 identified as a high value model for COVID-19, did not have a higher quality reference 403 content of this publication is solely the responsibility of the authors and does not 443 necessarily represent the official views of the National Institutes of Health. 444 Oncolytic 484 adenovirus expressing interferon alpha in a syngeneic Syrian hamster model for the treatment 485 of pancreatic cancer Photoperiodic modulation of ovarian metabolic, survival, 487 proliferation and gap junction markers in adult golden hamster, Mesocricetus auratus De novo assembly, annotation, and 490 characterization of the whole brain transcriptome of male and female Syrian hamsters Hamster, a close model for visceral leishmaniasis: Opportunities and 493 challenges Clinical and Pathological Manifestations of Coronavirus Disease 2019 (COVID-19) in a Golden 496 Syrian Hamster Model: Implications for Disease Pathogenesis and Transmissibility Natural Immunity to Ebola Virus in the Syrian Hamster 499 Requires Antibody Responses A potent SARS-CoV-2 501 neutralising nanobody shows therapeutic efficacy in the Syrian golden hamster model of 502 COVID-19 SARS-CoV-2 Delta 504 Variant Pathogenesis and Host Response in Syrian Hamsters Animal models used to assess influenza antivirals Generation of transgenic golden Syrian 509 hamsters Syrian 511 hamsters as a small animal model for SARS-CoV-2 infection and countermeasure development Comparative pathogenesis of COVID-19, MERS, and SARS in a nonhuman primate model Isolation of CoV-2 neutralizing antibodies and protection from disease in a small animal model Susceptibility of ferrets, cats, 520 dogs, and other domesticated animals to SARS-coronavirus 2 Animal models for COVID-19 The B1.351 524 and P.1 variants extend SARS-CoV-2 host range to mice Western diet 526 increases COVID-19 disease severity in the Syrian hamster STAT2 529 signaling restricts viral dissemination but drives severe pneumonia in SARS-CoV-2 infected 530 hamsters Leveraging the antiviral 532 type I interferon system as a first line of defense against SARS-CoV-2 pathogenicity Structural and functional modelling of SARS-CoV-2 entry in animal 535 models CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven 538 Immunological and 540 cardio-vascular pathologies associated with SARS-CoV-2 infection in golden syrian hamster Assembly of long, error-prone reads using 543 repeat graphs Pilon: an 545 integrated tool for comprehensive microbial variant detection and genome assembly 546 improvement QUAST: quality assessment tool for genome 548 assemblies Versatile and 550 open software for comparing large genomes Assemblytics: a web analytics tool for the detection of variants 552 from an assembly Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM BUSCO: Assessing Genome Assembly and Annotation 556 Completeness Reevaluating assembly evaluations with feature response 560 curves: GAGE and assemblathons Exploring genome characteristics and sequence quality without a reference A fast, lock-free approach for efficient parallel counting of 566 occurrences of k-mers A comparative evaluation of genome assembly 568 reconciliation tools Mesocricetus auratus Annotation Report Mesocricetus auratus Annotation Report genome at the start of the pandemic. While we worked quickly to generate this data and 404 make it available to the scientific community, better preparedness will be critical for 405 future unexpected epidemics. To this end, we would encourage investment in continued 406 refinement and improvement of reference genomes for all of the rodent, bat and 407 nonhuman primate models that are commonly used to study viruses in order to prevent 408 this situation from recurring in the future. Such an investment would also yield improved 409 genomic resources that would provide broad benefit to the entire scientific community. tissues that were used for the sequence analyses described here. We also thank Dr. 457Benjamin tenOever for sharing Syrian hamster RNA-Seq datasets generated by his 458 group prior to publication. And we also wish to thank two reviewers for their helpful 459