key: cord-0996747-qbb8q9k0 authors: Chueca, Luis J.; Kochmann, Judith; Schell, Tilman; Greve, Carola; Janke, Axel; Pfenninger, Markus; Klimpel, Sven title: De novo Genome Assembly of the Raccoon Dog (Nyctereutes procyonoides) date: 2021-04-29 journal: Front Genet DOI: 10.3389/fgene.2021.658256 sha: 620acebd1ffa11eddfbf20f555a72cd98ac09835 doc_id: 996747 cord_uid: qbb8q9k0 nan The raccoon dog, Nyctereutes procyonoides (NCBI Taxonomy ID: 34880, Figure 1a ) belongs to the family Canidae, with foxes (genus Vulpes) being their closest relatives (Lindblad-Toh et al., 2005; Sun et al., 2019) . Its original distribution in East Asia ranges from south-eastern Siberia to northern Vietnam and the Japanese islands. In the early 20th century, the raccoon dog was introduced into Western Russia for fur breeding and hunting purposes, which led to its widespread establishment in many European countries, Figure 1b . Together with the raccoon (Procyon lotor), it is now listed in Europe as an invasive species of Union concern (Regulation (EU) No. 1143/2014) and member states are required to control pathways of introductions and manage established populations. The raccoon dog is a host and vector for a variety of pathogens, including rabies and canine distemper virus. Whether, it is involved in the transmission of coronaviruses to humans is inconclusive (Guan, 2003; Chan and Chan, 2013) , but experimental studies have demonstrated that raccoon dogs are susceptible to SARS-CoV-2 infection and its transmission to contact animals (Freuling et al., 2020) . However, a recent study using predictions by sequence alignment suggests that the mammalian ACE2 receptor of N. procyonoides binds less effectively to the S-protein of SARS-CoV and SARS-CoV-2 than those of other species like cows and rodents (Luan et al., 2020a,b) . Several subpopulations have been recognized in their current range of distribution in Europe and East Asia based on mtDNA (Kim et al., 2013; Paulauskas et al., 2016) , microsatellite (Drygala et al., 2016; Hong et al., 2018) , and SNP markers (Nørgaard et al., 2017) . Interestingly, continental populations from Asia and Europe seem to have a higher number of chromosomes (2n = 54) than those from Japanese islands (2n = 38) (Wada and Imai, 1991; Nie et al., 2003) . Moreover, the raccoon dog is also known to be one of the few Carnivora species which presents B chromosomes (Bs) in its karyotype (Duke Becker et al., 2011; Makunin et al., 2018) . Several mitochondrial genome sequences of wild and bred raccoon dogs are known (Sun et al., 2019) , however, a complete nuclear genome is not still available. Apart from its potential role as disease vector, N. procyonoides is of interest because it is the only extant species in the genus Nyctereutes and the only canid known to hibernate. Here, a first draft genome of a raccoon dog sampled in Germany is presented, which will provide a basis for deeper understanding of its phylogenetic relationships, the evolution and function of B chromosomes in mammals, give insights in the evolution of hibernation, provide markers for future studies on invasive population structures in Europe and serve as a resource for studying gene-disease associations. One adult female individual of raccoon dog, Nyctereutes procyonoides (Figure 1a) , was bagged in February 2020 in Germany (52 • 06 ′ 51.2 ′′ N 12 • 03 ′ 03.6 ′′ E) according to hunting regulations. Blood samples as well as various types of tissue were immediately stored on dry ice or in RNAlater and kept at −80 • C until further processing (Figure 1c) . A SMRTbell library was constructed following the instructions of the SMRTbell Express Prep kit v2.0 with Low DNA Input Protocol (Pacific Biosciences, Menlo Park, CA). Blood (5 ml) was used for high molecular weight DNA extraction using Genomic-tip 100/G (QIAGEN) according to the manufacturers' instructions. One SMRT cell sequencing run was performed in CLR mode on the Sequel System II with the Sequel II Sequencing Kit 2.0. For chromosome-level genome information, genomic DNA was isolated from ear tissue (62 mg) following the OMNI-C Proximity Ligation Assay (Version 1.1) with some modifications. The library was sequenced on the NovaSeq 6000 platform using a 150 paired-end sequencing strategy at Novogene (UK). The fragment size distribution and concentration of each of the final libraries was assessed using the TapeStation (Agilent Technologies) and the Qubit Fluorometer and Qubit dsDNA HS reagents Assay kit (Thermo Fisher Scientific, Waltham, MA), respectively. For more information on the different protocols see Supplementary Information . To obtain Oxford Nanopore Technologies (ONT) long reads, we ran three flow cells on a MinION portable sequencer (FLO-MIN106). Total genomic DNA was used for library preparation with the Ligation Sequencing kit (SQK-LSK109) from ONT using the manufacturer's protocols. Base calling of the reads from the three MinION flow cells was performed with guppy v4.0.11 (https://nanoporetech.com/nanopore-sequencing-dataanalysis), under default settings. Afterwards, ONT reads quality was checked with Nanoplot v1.28.1 (https://github.com/ wdecoster/NanoPlot) and reads shorter than 1,000 bases and mean quality below eight were discarded by running Nanofilt v2.6.0 (https://github.com/wdecoster/nanofilt). A mix of different tissues (liver, heart, gonads, brain, kidney, muscle) was ground into small pieces using steel balls and a Retsch Mill. A total of 120 mg of the tissue was shipped on dry ice to Novogene (UK) for Illumina paired-end 150 RNA-seq of a 250-300 bp insert cDNA library. Genome size was estimated following a flow cytometry protocol with propidium iodide-stained nuclei described in Hare and Johnston (2012) . Ear tissue of one frozen (−80 • C) adult sample of N. procyonoides and neural tissue of the internal reference standard Acheta domesticus (female, 1C = 2 Gb) was mixed and chopped with a razor blade in a petri dish containing 2 ml of ice-cold Galbraith buffer. The suspension was filtered through a 42-µm nylon mesh and stained with the intercalating fluorochrome propidium iodide (PI, Thermo Fisher Scientific) and treated with RNase II A (Sigma-Aldrich), each with a final concentration of 25 µg/ml. The mean red PI fluorescence signal of stained nuclei was quantified using a Beckman-Coulter CytoFLEX flow cytometer with a solid-state laser emitting at 488 nm. Fluorescence intensities of 5,000 nuclei per sample were recorded. We used the software CytExpert 2.3 for histogram analyses. The total quantity of DNA in the sample was calculated as the ratio of the mean red fluorescence signal of the 2C peak of the stained nuclei of the raccoon dog sample divided by the mean fluorescence signal of the 2C peak of the reference standard times the 1C amount of DNA in the standard reference. Six replicates were measured on 6 different days to minimize possible random instrumental errors. Furthermore, we estimated the genome size by coverage from mapping reads used for genome assembly back to the assembly itself using backmap 0.3 (https://github.com/ schellt/backmap; Schell et al., 2017) . In brief, the method divides the number of mapped nucleotides by the mode of the coverage distribution. By doing so, the length of collapsed regions with many fold increased coverage is taken into account. SMRT reads longer than 7 kb were assembled under two different approaches (wtdbg v2.5; Ruan and Li, 2020 and Flye v2.7.1; Kolmogorov et al., 2019) . The resulting assemblies were compared in terms of contiguity using Quast v5.0.2 (Gurevich et al., 2013) , and evaluated for completeness by BUSCO v3.0.2 (Simão et al., 2015) (under short mode) against the laurasiatheria_odb9 data set (Supplementary Table 1 ). The assembled genome obtained with Flye presented the highest contiguity and completeness of both approaches and was therefore selected for downstream analyses. To further improve the assembly, we applied two rounds of scaffolding and gap closing to the selected genome assembly. The genome was first scaffolded with the SMRT reads by SSPACElongread v1.1 (Boetzer and Pirovano, 2014) and then with ONT reads by SLR (Luo et al., 2019) . TGS gapcloser v1.0.1 (Xu et al., 2019) was run after each scaffolding step. Subsequently, Omni-C reads were employed to further scaffold the draft genome following the HiRise pipeline (Putnam et al., 2016) operated by the Dovetail Genomics TM team. The assembly was screened for contamination using BlobTools v1.1.1 (Kumar et al., 2013; Laetsch and Blaxter, 2017) by evaluating coverage, GC content and sequence similarity against the NCBI nt database of each sequence (Figure 1d ). Quality of raw Illumina sequences was checked with FastQC (Andrews, 2010) . Low quality bases and adapter sequences were subsequently trimmed by Trimmomatic v0.39 (Bolger et al., 2014) and the transcriptome was assembled using Trinity v2.9.1 (Haas et al., 2013) . The transcriptome assembly was evaluated for completeness by BUSCO v3.0.2 against the laurasiatheria_odb9 data set (C: 81.8% [S: 36.0%, D: 45.8%], F:8.0%, M:10.2%). Moreover, the clean RNA-seq reads from different tissues were aligned against the reference genome by HISAT2 (Kim et al., 2015) . RepeatModeler v2.0 (Smit and Hubley, 2008) was run to construct a de novo repetitive library from the assembly. The specific repetitive library was merged with the canid RepBase (Jurka et al., 2005 ; http://www.girinst.org/repbase/ 18/10/2020), which was further annotated and masked using RepeatMasker v4.1.0 (http://www.repeatmasker.org/). After the repeat sequences were masked, genes were predicted using the homology-based gene prediction tool GeMoMa v1.7.1 (Keilwagen et al., 2016 (Keilwagen et al., , 2018 Venter et al., 2001) . First, from the mapped RNA-seq reads, introns were extracted and filtered by the GeMoMa modules ERE and DenoiseIntrons. Then, we independently ran the module GeMoMa pipeline for each reference species using MMseqs2 (Steinegger and Söding, 2017) as alignment tools and including the mapped RNAseq data. Finally, the 11 gene annotations were combined into a final annotation by using the GeMoMa modules GAF and AnnotationFinalizer. Predicted genes were annotated by BLAST search against the Swiss-Prot database with an e-value cutoff of 10 −6 . InterProScan v5.39.77 (Quevillon et al., 2005) was used to predict motifs and domains, as well as Gene ontology (GO) terms. The execution of this work involved using many software tools, for which settings and parameters are described below. Software tools indicated within brackets are dependencies employed during the execution of the main indicated tools. All the tools employed in this work are listed in Supplementary Table 3 . GeMoMa.m=100000 GeMoMa.Score=ReAlign AnnotationFinalizer.r=NO o=true; 4.4 java -Xmx30G -jar GeMoMa-1.7.1.jar CLI GAF; 4.5 java -Xmx30G -jar GeMoMa-1.7.1.jar CLI AnnotationFinalizer u=YES i c=UNSTRANDED coverage_unstranded; 4.6 java -Xmx30G -jar GeMoMa-1.7.1.jar CLI Extractor p=true c=true; (5) BUSCO v3.0.2 [python v3.7.4 augustus v3.3.2]: parameters: -l /laurasiatheria_odb9/ -m prot; (6) Interproscan v5.39.77: parameters: -f tsv -iprlookup -pa -goterms -exclappl SignalP_GRAM_NEGATIVE,SignalP_GRAM_POSITIVE dp; (7) ncbi-blast v.2.10.0: parameters: 7.1 makeblastdb -in uniprot_sprot_2020_04.fasta -parse_seqids -dbtype prot, 7.2 blastp -evalue 1e-6 -max_hsps 1 -max_target_seqs 1 -outfmt 6. (1) backmap.pl v0.3 [minimap2 v2.17, samtools v1.10, qualimap v2.2.1, bedtools 2.28.0, Rscript v3.6.3, multiqc 1.9]: parameters: -pb -v. The calculated DNA content through flow cytometry experiments was 3.10 Gb, similar to previous flow cytometry studies (3.19 Gb; Wurster-Hill et al., 1988) . The genome size estimation by read coverage resulted in 3.23 Gb. Although our draft genome assembly was smaller than the values obtained by flow cytometry and coverage, the assembly length obtained of 2.39 Gb was in the range of other Carnivora genomes ( Table 1, Supplementary Table 2 ) and showed good completeness with 92.9% completely recovered BUSCOs. The difference regarding assembly vs. estimated genome size could be explained by the complex chromosome structure of the raccoon dog which presents large chromatin proximal regions and a fluctuating number of B chromosomes (Duke Becker et al., 2011; Makunin et al., 2018) . Both uncommon structures in carnivores are mostly compound by repetitive elements that were most likely not properly resolved and collapsed. 1 | A. Genome assembly and annotation statistics for raccoon dog (Nyctereutes procyonoides) and comparison with related species. B. Repeat statistics: De novo and homology based repeat annotations as reported by RepeatMasker and RepeatModeler; Families of repeats included here are long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long tandem repeats (LTR), DNA repeats (DNA), unclassified (unknown) repeat families, small RNA repeats (SmRNA), and others (consisting of small, but classified repeat groups). The total is the total percentage of base pairs made up of repeats in each genome assembly, respectively. C. Number and percentage of functional annotated predicted protein-coding genes. A. GENOME STATISTICS Table 2) . We also compared synteny between raccoon dog and dog genome assemblies by running Jupiterplot v1.0 (https://github.com/JustinChu/JupiterPlot). The Jupiterplot displays the largest 58 raccoon dog scaffolds, which covered more than 99% of the dog genome (Figure 1e) . The colored bands represent synteny between both genome assemblies. The plot shows high synteny between both genomes with several genomic rearrangements and break points, some of them previously identified (Duke Becker et al., 2011) . All these results makes the N. procyonoides genome the best genome recovered so far for the Vulpini tribe. Animal cell infection by SARS-CoV-2 is determined by specificity between the receptor-binding domain (RBD) spike protein (Sprotein) of SARS-CoV-2 and the membrane proteins ACE2 (peptidase domain of angiostensin I converting enzyme 2) and TMPRSS2 (transmembrane serine protease) (Lam et al., 2020) . We identified both proteins in the raccoon dog genome annotation, showing high similarity with dog and fox orthologues. ACE2 protein alignments between dog and raccoon dog showed 99.3% of similarity, with only 6 of 894 different amino acids (Supplementary Figure 1) . Moreover, the affinity in the binding process between S-protein from SARS-CoV-2 and ACE2 have been found to be smaller for groups like canids (Canis, Vulpes), chiroptera (Rhinophus, Pteropus) and pangolins (Manis) among others due to the matching of 14 of the 20 key amino acids in human ACE2 protein (Luan et al., 2020a) . However, the reported infections of SARS-CoV-2 in domestic dogs and ferrets (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017; Shi et al., 2020) indicated that the raccoon dog can be considered as a potential host and vector for this virus along its natural distribution range in East Asia and also in its introduced populations within Europe. Ethical review and approval was not required for the animal study because the animal was culled in full accordance to German hunting laws (waidgerecht), which means that unnecessary suffering was avoided. Moreover, the individual was not killed for the study. We used one that was killed anyway in accordance to the Convention on Biological Diversity CBD (in § 8h), that stipulates precaution, control and eradication of invasive species as a goal and task of nature conservation under international law. In 2000, the states committed themselves to developing national strategies in Decision V/8(6). SK, JK, and MP conceived this study. JK and CG prepared the samples. CG conducted lab work. LC performed bioinformatic analyses and data statistics with support of TS. LC, JK, AJ, TS, and MP discussed and interpreted the data. LC wrote the manuscript and all authors commented and revised the manuscript. The present study is a result of the Centre for Translational Biodiversity Genomics (LOEWE-TBG) and was supported through the program LOEWE-Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz of FastQC: a quality control tool for high throughput sequence data SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information Trimmomatic: a flexible trimmer for Illumina sequence data Tracing the SARS-coronavirus Lineage-specific biology revealed by a finished genome assembly of the mouse The sequence of the human genome Homogenous population genetic structure of the non-native raccoon dog (Nyctereutes procyonoides) in Europe as a result of rapid population expansion Anchoring the dog to its relatives reveals new evolutionary breakpoints across 11 species of the Canidae and provides new clues for the role of B chromosomes Data, disease and diplomacy: GISAID's innovative contribution to global health The sequence and analysis of a Chinese pig genome Susceptibility of raccoon dogs for experimental SARS-CoV-2 infection Isolation and characterization of viruses related to the SARS Coronavirus from animals in Southern China QUAST: quality assessment tool for genome assemblies De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis Chapter 1 of propidium iodide-stained nuclei Genetic diversity and population structure of East Asian Raccoon Dog (Nyctereutes procyonoides): genetic features in central and marginal populations Repbase update, a database of eukaryotic repetitive elements Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi Using intron position conservation for homology-based gene prediction HISAT: a fast spliced aligner with low memory requirements Phylogeography of Korean raccoon dogs: implications of peripheral isolation of a forest mammal in East Asia Assembly of long, error-prone reads using repeat graphs Red fox genome assembly identifies genomic regions associated with tame and aggressive behaviours Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots BlobTools: interrogation of genome assemblies SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals The sequence and de novo assembly of the giant panda genome Genome sequence, comparative analysis and haplotype structure of the domestic dog Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears SARS-CoV-2 spike protein favors ACE2 from Bovidae and Cricetidae Spike protein recognition of mammalian ACE2 predicts the host range and an optimized ACE2 for SARS-CoV-2 infection SLR: a scaffolding algorithm based on long reads and contig classification Sequencing of supernumerary chromosomes of red fox and raccoon dog confirms a non-random gene acquisition by B chromosomes Comparative chromosome painting defines the karyotypic relationships among the domestic dog, Chinese raccoon dog and Japanese raccoon dog Population genomics of the raccoon dog (Nyctereutes procyonoides) in Denmark: insights into invasion history and population development Genetic characterization of the raccoon dog (Nyctereutes procyonoides), an alien species in the baltic region Initial sequence and comparative analysis of the cat genome Chromosome-scale shotgun assembly using an in vitro method for long-range linkage InterProScan: protein domains identifier Fast and accurate long-read assembly with wtdbg2 An annotated draft genome for Radix auricularia (Gastropoda, Mollusca) Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARScoronavirus 2 GISAID: global initiative on sharing all influenza data -from vision to reality BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs RepeatModeler Open-1.0. Available online at MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets The complete mitochondrial genome of the raccoon dogs (Canidae: Nyctereutes ussurienusis) and intraspecific comparison of three Asian raccoon dogs On the Robertsonian polymorphism found in the Japanese raccoon dog (Nyctereutes procyonoides viverrinus) Banded karyotype of a wildcaught male Korean raccoon dog, Nyctereutes procyonoides koreensis Fragile sites, telomeric DNA sequences, B chromosomes, and DNA content in raccoon dogs, Nyctereutes procyonoides, with comparative notes on foxes, coyote, wolf, and raccoon TGS-GapCloser: fast and accurately passing through the Bermuda in large genome using error-prone third-generation long reads A whole-genome assembly of the domestic cow We thank the Genome Technology Center (RGTC) at Radboudumc for the use of the Sequencing Core Facility (Nijmegen, The Netherlands), which provided the PacBio SMRT sequencing service on the Sequel II platform. We also thank Damian Baranski for help with the DNA isolation and library preparations, and Norbert Peter and Dorian D. Dörge for providing samples.