key: cord-016108-jlono0x7 authors: Marthaler, Douglas; Bohac, Ann; Becker, Aaron; Peterson, Nichole title: Next-Generation Sequencing for Porcine Coronaviruses date: 2015-09-10 journal: Animal Coronaviruses DOI: 10.1007/978-1-4939-3414-0_19 sha: doc_id: 16108 cord_uid: jlono0x7 The outbreak of porcine epidemic diarrhea virus and the discovery of porcine deltacoronavirus in the USA have led to multiple questions about the evolution of coronaviruses in swine. Coronaviruses are enveloped virus, containing a positive-sense single-stranded RNA genome (26–30 kb) that can cause respiratory or enteric illness in swine. With current technologies, the complete viral genomes can be determined to understand viral diversity and evolution. In this chapter, we describe a method to deep genome sequence porcine coronavirus on the Illumina MiSeq, avoiding the number of contaminating reads associated with the host and other microorganisms. Coronaviruses (CoVs) have negatively impacted the health of pigs for multiple decades [ 1 ] . Currently, fi ve swine CoVs have been identifi ed: transmissible gastroenteritis virus (TGEV), porcine respiratory coronavirus (PRCV), porcine epidemic diarrhea virus (PEDV), hemagglutinating encephalomyelitis virus (HEV), and porcine deltacoronavirus (PDCoV) [ 2 , 3 ] . Multiple studies have described methods to detect CoVs by PCR methods [ 4 -7 ] . In addition, multiple manuscripts have described Sanger sequencing methodologies to investigate the genetic diversity and evolution of individual or partial CoV genes [ 7 -9 ] . However, investigating partial genome of these CoVs underestimates the evolutionary history of these viruses [ 10 ] . In the pursuit to investigate recombinant regions within the CoV genome and to further enhance our understanding of CoV evolution, CoV genome sequencing has become very valuable. In addition, next-generation sequencing ( NGS ) technology has facilitated the use of complete genomic sequencing with extreme high coverage and reduced the cost compared to Sanger sequencing. Many laboratories have purchased desktop NGS sequencers to expand their sequencing capabilities, due to the relatively low cost associated with the equipment. However, achieving viral genomes directly from samples can be diffi cult since the total RNA, including mRNA from host cells and bacteria, is also sequenced [ 11 -13 ] . Nevertheless, NGS technology is a very powerful tool in generating CoV genomes, which could lead to a better understanding of CoV evolution. 10 2. Reference-based assembly for coronavirus genomes. (a) Open the SeqManNGen program, select reference-based assembly, and load the reference CoV genome and the correlating paired fastq fi les to the sample and run the assembly. (b) Evaluate assembly of reads ( see Note 9 ). If there are regions with minimal coverage, this indicates that the reads do not match the reference sequence. Remove the reference strain and split the contig at the low-coverage regions. Trim the 5′ and 3′ regions of the new contigs at the split region. (c) Preform reference-based assembly with the newly generated contigs. The contig ends will extend with the viral reads. (d) Merge the contigs together to generate a single contig. (e) Remap the reads to the contig to verify accurate generation of the viral genome and suffi cient coverage. The contig can be saved as a fasta fi le for future phylogenetic analysis. 3. De novo assembly for viral genomes. (a) Open the SeqManNGen program, select reference-based assembly, and load the Susscrofa genome and the correlating paired fastq fi les to the sample. In the advance options, select saved unmapped reads, which will save the reads that did not map to Sus scrofa genome once the assembly has fi nished. Run the assembly. (b) After the program fi nished running, open SeqManNGen program again, select de novo assembly, and load the unassembled fastq fi le from the previous reference-based assembly. (c) Once the de novo assembly has fi nished, BLAST the contigs to locate the designated coronavirus contigs. (d) Remap the reads to the coronavirus sequence and verify accurate and suffi cient coverage. The contig can be saved as a fasta fi le for future phylogenetic analysis. 1. Some samples may contain excess organic material and clog the fi lter. If this occurs, centrifuge the sample for another 20 min at 3000 × g . Quantifi cation needs to occur with a fl uorometry system that measures only single-stranded nucleic acids and is insensitive to organic contaminants commonly used in extraction kits, which allows for more accurate quantifi cation of input mass and higher probability of successful library preparation. 3. The RNA sample needs to be assessed for the concentration of host ribosomal RNA (rRNA). This assessment does not remove host rRNA, but ensures that the RNA is free of rRNA, which severely dilutes the number of viral reads (Fig. 1 ). The mRNA purifi cation strategy should be skipped since the oligo-dT-coated magnetic beads were used to specifi cally bind poly-A-tailed mRNA. The viral RNA will not be poly-A tailed. Start the library preparation at step "Incubate RFP," which is the start of random primed cDNA synthesis. 5. Typical concentrations of viral RNA libraries for the TruSeq RNA library preparation kit are 1-15 ng/μL. 6 . The concentration of coronavirus must be estimated by RT-PCR . However, the concentration by RT-PCR may not indicate successful generation of the complete viral genome since total RNA was used in the library preparation. Generally, lower concentration of viral particles by RT-PCR indicates that more reads are needed to generate a complete genome. If libraries have limited host contamination and have an acceptable concentration of viral RNA (Ct value <25), a 1 million read output per sample should be suffi cient for assembly. If more reads are needed per sample, the MiSeq v2 250 PE kit can be used. 7. Two major assembly methodologies are available, reference based and de novo assembly. However, due to the genetic diversity of viruses, gaps in coverage can occur during referencebased assembly. We will briefl y discuss both options in this chapter. Reference-based assembly maps the MiSeq reads to a known sequence (template) to build a contig while de novo assembly does not use a sequence to build a contig, which take longer to run. Since the MiSeq generates reads from total RNA, host reads need to be removed to facilitate de novo assembly, which can be done by fi rst mapping the reads to the swine genome and saving the unmapped reads. Hence, the de novo assembly process described here fi rst utilizes a referencebased assembly (to remove host reads) and then a de novo assembly to construct contigs. 8. Many different programs are available to remove adapter sequences and low-quality read and assembly genomes, and each program uses slightly different algorithms and operations to accomplish this task. Removing low-quality reads is necessary before attempting viral assembly. 9. Coverage across the genome will vary and is expected. The valleys, which indicate less coverage, should have approximately the same amount of coverage (Fig. 2a ) . If valleys with less coverage are comparable to the other valleys with high coverage, the reads did not match to the reference due to genetic diversity of the CoV strain (Fig. 2b ) . a b Fig. 2 Examples of reads mapping to reference genome. The x -axis represents the length of the genome while the y -axis represents depth of coverage. ( a ) Unequal coverage across the genome. ( b ) Gaps in coverage due to genetic differences between the reference genome and sequenced sample, indicated by the black arrows Diseases of swine Fields virology Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus Rapid detection, complete genome sequencing, and phylogenetic analysis of porcine deltacoronavirus A multiplex RT-PCR assay for rapid and differential diagnosis of four porcine diarrhea associated viruses in fi eld samples from pig farms in East China from 2010 to 2012 Development of a nested polymerase chain reaction test for the diagnosis of transmissible gastroenteritis of pigs Respiratory and fecal shedding of porcine respiratory coronavirus (PRCV) in sentinel weaned pigs and sequence of the partial S-gene of the PRCV isolates Phylogenetic analysis of porcine epidemic diarrhea virus (PEDV) fi eld strains in central China based on the ORF3 gene and the main neutralization epitopes Genetic variability and phylogeny of current Chinese porcine epidemic diarrhea virus strains based on spike, ORF3, and membrane genes Distinct characteristics and complex evolution of PEDV strains New viruses in veterinary medicine, detected by metagenomic approaches Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery Comparison of tissue sample processing methods for harvesting the viral metagenome and a snapshot of the RNA viral community in a turkey gut