key: cord-0972296-nkpwnm62 authors: Chim, Stephen S. C.; Chiu, Rossa W. K.; Lo, Y. M. Dennis title: Genomic Sequencing of the Severe Acute Respiratory Syndrome-Coronavirus date: 2006 journal: Clinical Applications of PCR DOI: 10.1385/1-59745-074-x:177 sha: 9b4467b93e8f62f49ecdb1c79aa0af228102141d doc_id: 972296 cord_uid: nkpwnm62 The polymerase chain reaction (PCR), which can exponentially replicate a target DNA sequence, has formed the basis for the sensitive and direct examination of clinical samples for evidence of infection. During the epidemic of severe acute respiratory syndrome (SARS) in 2003, PCR not only offered a rapid way to diagnose SARS-coronavirus (SARS-CoV) infection, but also made the molecular analysis of its genomic sequence possible. Sequence variations were observed in the SAR-CoV obtained from different patients in this epidemic. These unique viral genetic signatures can be applied as a powerful molecular tool in tracing the route of transmission and in studying the genome evolution of SARS-CoV. To extract this wealth of information from the limited primary clinical specimens of SARS patients, we were presented with the challenge of efficiently amplifying fragments of the SARS-CoV genome for analysis. In this chapter, we will discuss how we managed to accomplish this task with our optimized protocols on reversetranscription, nested PCR amplification, and DNA cycle sequencing. We will also discuss the sequence variations that typified some strains of SARS-CoV in the different phases during this epidemic. PCR amplification of the viral sequence and genomic sequencing of these critical sequence variations of re-emerging SARS-CoV strains would give us quick insights into the virus. Severe acute respiratory syndrome-coronavirus (SARS-CoV), the etiologic agent of SARS (1-3) , is a virus that was unknown to us before the SARS epidemic. The concerted efforts of researchers have promptly elucidated its genetic code. The genome of SARS-CoV is a 29,727-nucleotide, polyadenylated RNA. The genomic organization is typical of coronaviruses, having the characteristic gene order , spike [S] , envelope [E] , mem-brane [M] , and nucleocapsid [N]-3') and short untranslated regions at both termini (4, 5) . With this sequence information, rapid PCR-based molecular diagnostic tests of SARS-CoV infection were designed (1, [6] [7] [8] [9] [10] . Besides offering molecular diagnosis and quantitative measurement of viral load, PCR-based technologies have also been exploited to amplify the genomic fragments of SARS-CoV for sequence analysis. The high sensitivity and specificity of PCR has made this genomic sequence analysis possible even for uncultured clinical specimens. Unlike the conventional microbiological methods, PCR-based technologies may not require viral culture, which could introduce culture-derived artifacts in the genomic sequence. The specific PCR primers selectively amplify SARS-CoV sequences from the background of other nucleic acid sequences contributed by the patient or other microbes. Moreover, the PCR-based method is versatile in terms of the type of clinical specimens. In our hands, we have successfully analyzed the SARS-CoV genome directly from uncultured samples of serum, nasopharyngeal aspirate, and stools (11). This obviates any concern about the poor or even unsuccessful viral culture of the precious clinical specimens. The risk in handling large-volume and hazardous viral culture could also be avoided. Genomic sequence variations were observed in the SARS-CoV obtained from different patients in this epidemic. Based on these sequence variations, most of the isolates are typified by two groups: isolates obtained from patients who were epidemiologically linked to the Metropole Hotel in Hong Kong, and those who were not (3,12,13). For example, there are seven sequence variations that can distinguish isolate CUHK-Su10, which is linked to the Metropole Hotel, from isolate CUHK-W1, which is not linked to this hotel case cluster ( Table 1) . Among them, four variations at nucleotide positions 17564, 21721, 22222, and 27827 (according to the Tor2 sequence in GenBank, accession no. AY274119 [5] ) were suggested by The Chinese SARS molecular epidemiology consortium (14) as part of a haplotype configuration that marks the different phases of a tri-phasic SARS epidemic in Guangdong Province of China. CUHK-W1 carried a haplotype G:A:C:C that typified the middle phase. Notably, the same haplotype was observed in CUHK-L2, which was one of the earliest confirmed case of SARS in Hong Kong, having been documented even before any report of the hotel case cluster (15). CUHK-Su10 carried a haplotype T:T:T:T that typified the late phase, marked by the hotel case cluster that spread the virus to many other parts of the world. Genomic sequence variations in SARS-CoV have also revealed the route of infection from within communities and across cities. For instance, compared with isolate CUHK-Su10, two mutations, T3852C and C11493T, first appeared a Sequence variations at seven positions between the two viral strains (CUHK-Su10 and CUHK-W1) are indicated. The nucleotide positions are numbered according to the sequence of GenBank accession number AY274119. b Part of the haplotype suggested by The Chinese SARS molecular epidemiology consortium for distinguishing the early, middle, and late phase of the SARS epidemic in 2003. in isolates CUHK-AG01, CUHK-AG02, CUHK-AG03 (GenBank accession numbers AY345986, AY345987, AY345988) obtained from patients involved in the Amoy Gardens outbreak in Hong Kong (11). Later, these two genetic fingerprints appeared in 10 completely sequenced Taiwanese isolates (16). Interestingly, toward the end of the epidemic, another type of fingerprint was found by PCR-based method. A variant of the SARS-CoV with a 386nucleotide deletion was reported in a cluster of patients that seem to be epidemiologically related (17). Most of the cases were part of a documented outbreak in the North District Hospital in Hong Kong. We have illustrated that sequence variations among different isolates have a remarkable epidemiological correlation. Thus, PCR amplification followed by sequencing is a powerful tool in tracing the route of transmission. The sequence information may provide objective support to epidemiological investigations. Moreover, in the event that SARS re-emerges, one could quickly gain important insight into the origin and evolution status of the SARS-CoV simply by sequencing the critical sequence variations of the genome, as exemplified here. However, to extract this wealth of information from the limited primary clinical SARS specimens, we need very sensitive and efficient protocols to efficiently amplify fragments of the SARS-CoV genome for analysis. SeqScape software (Applied Biosystems). Genomic sequencing involves PCR amplification, which produces numerous copies of the target DNA, and cycle sequencing, which requires the pipetting and manipulation of PCR products. These steps could easily contaminate the laboratory environment with amplified products. Such contamination problems would affect the interpretation of sequencing results, and adversely affect the performance of diagnostic tests designed to detect the same viral sequences. Hence, extreme care should be taken to avoid contamination. We suggest the following precautions: 1. Perform RNA extraction, PCR amplification, and genome sequencing in different laboratories, or at least in separate and dedicated compartments of the same laboratory. 2. Transfer reagents and samples only with aerosol-resistant pipet tips. 3. Prepare the PCR reagent master mix in a hood dedicated for this purpose. A set of clean gloves and dedicated lab gown should be worn in this area. Illuminate the hood with ultraviolet before and after use. 4. Any steps that involve the handling of cDNA, primary and secondary PCR products (including addition of DNA templates in assembling the PCR), electrophoresis, and cycle sequencing should be performed in a dedicated area far away from any PCR reagents. A separate lab gown and set of gloves should be worn in this area. 5. Discard all pipet tips that contacted DNA with extreme care. Use a double bag for disposal. 6. Include multiple negative PCR controls in each amplification to monitor for environmental contamination. 1. Prepare AVL lysis buffer and AW1 and AW2 wash buffers according to manufacturer's (Qiagen) instructions (see Note 1). 2. In a biosafety level 2 (or above) containment laboratory, lyse 0.28 mL (1 vol) of viral culture by adding 1.12 mL (4 vol) of AVL buffer, mixing and incubating at room temperature for 10 min. Direct clinical samples, e.g., serum, nasopharyngeal aspirates, and stools, can also be used (see Note 2). 3. Add 1.12 mL of absolute ethanol to the mixture. Pulse-vortex for 15 s. 4. Load the mixture to QIAamp spin column and wash the column according to the manufacturer's instructions. 5. Add 60 μL of RNase-free water onto the membrane and incubate for 1 min at room temperature. Centrifuge the spin column for 1 min at 6000g. 6. Quantify a small aliquot of the extracted viral RNA yield by real-time quantitative reverse-transcription (RT)-PCR (9) (see Note 3). 7. Store the extracted RNA at -80°C. 1. Prewarm two thermocycler blocks with heated lid at 72 and 25°C, respectively. 2. Mix 1 μL (50 pmol) random hexamer with 10 μL RNA in a 0.5-mL tube. Denature at 72°C for 10 min (see Note 4). 3. During this period, assemble the reaction mix in another tube on ice according to Table 2 using SuperScript III RNase H -Reverse Transcriptase (see Note 5). 4. After denaturation, snap-cool the RNA-primer mixture on ice for 1 min. Briefly spin the tubes. Add the reaction mix prepared in step 2 to the RNA-primer mixture to make up a total reaction volume of 20 μL. Mix by pipetting gently up and down. 5. Immediately transfer the tube from ice to the prewarmed 25°C thermocycler block for a 5-min incubation. Prewarm the other thermocycler block at 55°C. 6. Transfer the tube to the prewarmed 55°C thermocycler block for a 1-h incubation. 7. Heat inactivate at 72°C for 15 min. 8. Add 1 μL (2 U) of RNase H and incubate at 37°C for 20 min to remove RNA complementary to the cDNA. 9. Dilute the product two-to fivefold with distilled water. Store at -20°C before use. 1. Inside a hood dedicated for setting up PCR, assemble the PCR master mix for the 50 reactions according to Table 3 with cDNA polymerase mix (see Note 6) in a final reaction volume of 25 μL. Add 50 aliquots of 23 μL into a 96-well PCR microplate. 2. Add 5 pmol each of forward (PCR-F) and reverse (PCR-R) series of primers for each of the 50 reactions amplifying the overlapping amplicons that cover the whole SARS-CoV genome (see Note 7). The primer sequences are shown in Table 4 . 3. In an area separate from the hood dedicated for PCR, add 1 μL of diluted reversetranscribed products. 4. Commence with PCR in a thermocycler with initial denaturation at 95°C for 1 min and 35 cycles of 95°C for 0.5 min, 55°C for 0.5 min, 68°C for 1.5 min, and a final extension at 68°C for 10 min. 1. Inside a hood dedicated for setting up PCR, assemble the PCR master mix for the 50 reactions according to Table 3 in a final reaction volume of 25 μL. Add 50 aliquots of 23 μL into a new 96-well PCR microplate. 2. Add 5 pmol each of forward (PCR-F) and reverse (BSEQ-R) series of primers for each of the 50 semi-nested PCR reactions. The primer sequences are shown in Table 4 . 3. In an area separate from the hood dedicated for PCR, add 1 μL of the corresponding primary PCR product. 4. Commence PCR in a thermocycler with initial denaturation at 95°C for 1 min and 35 cycles of 95°C for 0.5 min, 55°C for 0.5 min, 68°C for 1.5 min, and a final extension at 68°C for 10 min. 5. Electrophorese 5 μL of the secondary PCR product in a 2% agarose gel to verify the success of the PCR amplification. Estimate the amount of PCR product by comparison to DNA marker. Only products with single band should be used for sequencing. Perform sequencing reaction based on the dideoxy dye terminator method, according to manufacturers' instructions: 1. Separate from the hood dedicated for PCR, assemble the cycle sequencing reaction with ASEQ-F, BSEQ-F, ASEQ-R, and BSEQ-R series of oligonucleotides as sequencing primers for each of the amplicon, and with 2-5 ng of secondary PCR product as sequencing template (see Note 8). 2. Commence with cycle sequencing reaction in a thermocycler. 3. Purify the extension products with either spin column purification or ethanol precipitation. Mix or resuspend the DNA in formamide solution according to the manufacturer's instructions. 4. Denature the purified extension products at 95°C for 5 min, snap-cool on ice, and load onto the automated capillary DNA sequencer for injection. Sequences for primary (PCR-F and PCR-R) and secondary (PCR-F and BSEQ-R) PCR primers. ASEQ-F, BSEQ-F, ASEQ-R, BSEQ-R are the sequencing primers. The sequences are from 5' to 3'. 1. Edit, align, and compare sequences using the Tor2 strain (GenBank accession number AY274119) as a reference with the software designed for this purpose, for example SeqScape (see Note 9). 2. Re-sequence regions that reveal nucleotide substitutions using a combination of different primer sets to ensure the quality of the sequencing data (see Note 10). 1. For RNA extraction, carrier poly(A) RNA is added to the lysis buffer to increase the yield. Because the PCR primers are specific to the SARS-CoV genome, the subsequent amplification would not be affected. However, if one wants to perform 3' rapid amplification of cDNA ends (3' RACE) or similar cloning operation that depends on oligo(dT) priming of poly(A) tail of the viral RNA, then the carrier poly(A) RNA should be avoided. Yields of viral RNA should be determined by quantitative RT-PCR, because spectrophotometric determination is prone to error as a result of low RNA quantity and interference by the carrier poly(A) RNA, which contributes to most of the RNA present. 4. A prolonged denaturation step is used to remove secondary RNA structures in the SARS-CoV genome that impede reverse-transcription. The use of random hexamer ensures an even representation of the whole RNA genome and allows more sequence information to be obtained from a limited amount of viral RNA. 5. We recommend the use of a reverse transcriptase with increased thermal stability, which facilitates reverse-transcription at a higher temperature (55°C) than normal (42°C). This unfolds some of the secondary RNA structures, and thus produces longer cDNA at higher yields. 6. We recommend the simultaneous use of two different DNA polymerases in the PCR amplification. For example, the cDNA polymerase mix that we use contains KlenTaq-1 DNA polymerase, and a second DNA polymerase with 3' to 5' proofreading activity. The inclusion of a minor amount of a proofreading polymerase results in an error rate that is significantly lower than that for Taq alone (18). This advantage is obvious when one is concerned about genomic sequence variations between different viral strains. The use of a two-polymerase system also increases the efficiency and yield, and hence the sensitivity, which is important when the viral titer is suboptimal. 7. The carryover of unused PCR primers into the sequencing reaction would lead to poor sequencing results. Like the sequencing primers, these unused PCR primers would also bind nonspecifically to the sequencing template in the cycle sequencing reaction, and, hence, generate noisy sequencing traces overshadowing the intended traces. Purification of the PCR products is, thus, usually recommended prior to their use as sequencing templates. However, these methods are laborintensive and pose extra contamination risk, as they involve additional steps of opening and handling PCR products. Notably, we have suggested an optimized PCR protocol for direct sequencing of PCR products without PCR product purification. With the low PCR primer concentrations and the optimal number of cycles, most of the PCR primers are consumed at the end of the PCR. Furthermore, a nested sequencing primer selectively extends the specific PCR product in the cycle sequencing reaction. This would suppress any nonspecific PCR product from extension. The combined effect is a neat sequencing trace. 8. The amount of PCR product used for the sequencing reaction must be optimized carefully with different sequencing systems. Although more PCR product input usually gives higher signal intensities, it may also give shorter read lengths and oversaturated signals. 9 . The PCR primers target 50 700-bp amplicons that overlap with each other along the SARS-CoV genome. The sequencing primers are designed in such a way that any sequence masked over by the PCR primer binding sites and the sequencing primer peak on one amplicon are reliably backed up by the homologous sequence in the overlapping amplicon. 10. We advocate scrutinizing efforts in validating any genomic sequence variation by resequencing regions with different combination of primers and sequencing chemistry. Because variation seen in a single viral isolate could potentially be a result of sequencing artifacts, we consider only the genomic sequence variations that are shared by at least two SARS-CoV isolates. Random hexamers (Applied Biosystems 5X First-strand synthesis buffer: 250 mM Tris-HCl PCR Amplification 1. Advantage cDNA Polymerase mix and buffer Genomic Sequencing 1. BigDye Terminator v1.1 Cycle Sequencing Kit (Applied Biosystems) ABI Prism 3100 Genetic Analyzer (Applied Biosystems) Terminator Kit (GE Healtcare-Biosciences MegaBACE 1000 Sequencing System (GE Healthcare-Biosciences) Identification of severe acute respiratory syndrome in Canada A major outbreak of severe acute respiratory syndrome in Hong Kong A cluster of cases of severe acute respiratory syndrome in Hong Kong Characterization of a novel coronavirus associated with severe acute respiratory syndrome The genome sequence of the SARS-associated coronavirus A novel coronavirus associated with severe acute respiratory syndrome Identification of a novel coronavirus in patients with severe acute respiratory syndrome Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: a prospective study Quantitative analysis and prognostic implication of SARS coronavirus RNA in the plasma and serum of patients with severe acute respiratory syndrome Serial analysis of the plasma concentration of SARS coronavirus RNA in pediatric patients with severe acute respiratory syndrome Genomic characterisation of the severe acute respiratory syndrome coronavirus of Amoy Gardens outbreak in Hong Kong Coronavirus genomicsequence variations and the epidemiology of the severe acute respiratory syndrome Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection The Chinese SARS molecular epidemiology consortium (2004) Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China Genomic sequencing of a SARS coronavirus isolate that predated the Metropole Hotel case cluster in Hong Kong Molecular epidemiology of SARS-from Amoy Gardens to Taiwan Tracing SARScoronavirus variant with large genomic deletion PCR amplification of up to 35-kb DNA with high fidelity and high yield from lambda bacteriophage templates This work is supported by a Special Grant for SARS Research (CUHK 4508/ 03M) from the Research Grants Council of the Hong Kong Special Administrative Region (China).