key: cord-0894007-bkm486jd authors: Tao, Ying; Tang, Kevin; Shi, Mang; Conrardy, Christina; Li, Kenneth S.M.; Lau, Susanna K.P.; Anderson, Larry J.; Tong, Suxiang title: Genomic characterization of seven distinct bat coronaviruses in Kenya() date: 2012-04-26 journal: Virus Res DOI: 10.1016/j.virusres.2012.04.007 sha: 0c9baad54199eda3886db5183934bbc0094d91f6 doc_id: 894007 cord_uid: bkm486jd To better understand the genetic diversity and genomic features of 41 coronaviruses (CoVs) identified from Kenya bats in 2006, seven CoVs as representatives of seven different phylogenetic groups identified from partial polymerase gene sequences, were subjected to extensive genomic sequencing. As a result, 15–16 kb nucleotide sequences encoding complete RNA dependent RNA polymerase, spike, envelope, membrane, and nucleocapsid proteins plus other open reading frames (ORFs) were generated. Sequences analysis confirmed that the CoVs from Kenya bats are divergent members of Alphacoronavirus and Betacoronavirus genera. Furthermore, the CoVs BtKY22, BtKY41, and BtKY43 in Alphacoronavirus genus and BtKY24 in Betacoronavirus genus are likely representatives of 4 novel CoV species. BtKY27 and BtKY33 are members of the established bat CoV species in Alphacoronavirus genus and BtKY06 is a member of the established bat CoV species in Betacoronavirus genus. The genome organization of these seven CoVs is similar to other known CoVs from the same groups except for differences in the number of putative ORFs following the N gene. The present results confirm a significant diversity of CoVs circulating in Kenya bats. These Kenya bat CoVs are phylogenetically distant from any previously described human and animal CoVs. However, because of the examples of host switching among CoVs after relatively minor sequence changes in S1 domain of spike protein, a further surveillance in animal reservoirs and understanding the interface between host susceptibility is critical for predicting and preventing the potential threat of bat CoVs to public health. Coronaviruses (CoVs) are large, enveloped viruses containing linear, positive-sense, single-stranded RNA genomes. Their genomes range approximately from 27-to 32-kb in length and contain 7-14 open reading frames (ORFs) (Woo et al., 2009a) . Six major ORFs encoding polymerase complex (ORF1a and ORF1b), spike glycoprotein (S), envelope protein (E), membrane glycoprotein (M), and nucleocapsid protein (N) are present in all CoVs . In addition, up to seven putative accessory ORFs and one ORF encoding hemagglutinin-esterase glycoprotein (HE) are interspersed between the six major ORFs. The numbers and sizes of these accessory ORFs differ markedly among CoVs (Woo et al., 2009a) . CoVs have been identified from a broad range of birds and mammals including humans in which they can cause respiratory, enteric, hepatic and neurologic diseases of varying severity (Weiss and Navas-Martin, 2005) . CoVs in the subfamily Coronavirinae are classified into three genera, Alphacoronavirus, Betacoronavirus, and Gammacoronavirus (former serogroups 1-3) (de Groot et al., 2011) . Alpha-and beta-coronaviruses have been exclusively isolated from mammals and majority of gamma-coronaviruses from birds. CoVs of a distinctive lineage were recently detected from birds and pigs (Chu et al., 2011; Woo et al., 2009b Woo et al., , 2012 and have been proposed to belong to a new genus, provisionally named Deltacoronavirus (de Groot et al., 2011) . The finding that the outbreak of severe acute respiratory syndrome (SARS) in early 2003 was caused by a novel CoV (SARS-CoV) has boosted interest in the search for novel CoVs in humans and animals. At least 30 previously unrecognized distinctive CoVs from human and various animal reservoirs were reported during recent years, including SARS-related CoVs and CoVs from all genera in the subfamily Coronavirinae which have significantly expanded our understanding of CoV diversity and complexity (Woo et al., 2009a) . Based on available data, bats appear to harbor a great diversity of CoVs. The frequency and diversity of CoV detection in bats, now in multiple continents, suggest that bats are likely a source for CoV introduction into other species globally and possibly play an important role in the ecology and evolution of CoVs. Recently we reported the identification of 41 divergent CoVs in bats from Kenya, based on limited ORF1b sequences (Tong et al., 2009) . These newly discovered bat CoVs were grouped into 8 different phylogenetic clusters. Of these, five clusters belonged to previously identified Alphacoronavirus genus, and three clusters belonged to previously identified Betacoronavirus genus, including a SARS-related CoV lineage. In the present study, we expand our sequence data for seven CoVs, representing 7 of the 8 distinctive clusters we identified in Kenya bats during 2006 summer (Tong et al., 2009) . The sample representing the eighth cluster of a SARS-related CoV was a weak positive and had limited specimen amount, therefore further sequencing studies were not included in this analysis. The purpose of our study was to further characterize the genomes and refine the phylogenetic relationships of these seven CoVs with other CoVs, based on the ORFs 1b, S, E, M, and N. Kenya was chosen as a major comparative Old World study location in Africa as part of the CDC Global Disease Detection program. Detailed information on bat capture and sampling is available in the previous publication (Tong et al., 2009 ). The protocols for animal capture and use were approved by the CDC Animal Institutional Care and Use Committee and the Ethics and Animal Care and Use Committee of the Kenya Wildlife Service (Nairobi, Kenya). In brief, representative samples at each site were collected from bats of available species, including adult and juvenile of both sexes. After euthanasia, a complete necropsy was performed in compliance with the approved field protocols. Samples included blood, various organs (liver, lung, and kidney), rectal and oral swabs. In this study, seven CoV-positive rectal swabs were selected as representatives of the seven different phylogenetic groups (Tong et al., 2009) (Tong et al., 2009) . Total nucleic acids (TNA) were extracted by using the QIAamp MinElute Virus Spin Kit (Qiagen, Santa Clarita, CA) according to the manufacturer's instructions from 200 l of phosphate buffered saline suspension of the rectal swab and homogenized organ tissues (liver, lung, and/or kidney) of each bat except for bats BtKY33 and BtKY43 whose organ tissues were not available. The TNA was eluted in 80 l DEPC-treated water and then stored at −80 • C. Each CoV-positive result on the rectal swab included in this study was repeated from different TNA aliquots. The presence of CoV RNA in organ tissues of these bats was determined using the pan CoV RT-PCR assays as described previously (Tong et al., 2009) and the sequence specific and/or group specific CoV RT-PCR assays (Table S1 ). The RT-PCR were performed as described previously (Tong et al., 2009) . Standard precautions were taken to avoid cross-contamination of samples before and after RNA extraction and amplification. Purified DNA amplicons were sequenced with the RT-PCR primers on an ABI Prism 3130 automated capillary sequencer using a BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems, Carlsbad, CA). High throughput 454 pyrosequencing on CoV RNA-positive bat samples was initially attempted, but failed to acquire any CoV-associated reads due to lower sensitivity. Therefore the RT-PCR-amplicon sequencing by Sanger chain-termination method was chosen in this study. Each of the seven contiguous sequences was obtained by using 4-6 pairs of semi-nested or nested consensus degenerate group specific primers and 4-7 pairs of semi-nested or nested sequence-specific bridging primers which generated a series of 8-13 overlapping fragments covering 15-16kb genomic sequences at the 3 end (Table S1 ). The other half genome sequence containing the ORF1a, was not recovered in this analysis due to the limited amount of rectal swab samples. Consensus degenerate primers of each group were designed from conserved sequences of known members of the corresponding sequence group or its close group based on CODEHOP strategy (Rose et al., 1998) . The 3 end of genome sequence was determined using the 3 RACE kit (Roche, Indianapolis, IN) according to the manufacturer's instructions. Semi-nested or nested primers were used to improve the PCR sensitivity. When nested primers were not available, the PCR product was re-amplified using the same RT-PCR primers. The RT-PCR reactions were performed with SuperScript III one-step RT-PCR High Fidelity kit (Invitrogen, San Diego, CA) according to the manufacturer's instructions, and the second round RCR reactions were performed with AccuPrime Taq DNA polymerase High Fidelity kit (Invitrogen, San Diego, CA). The RT-PCR products were visualized on 1% agarose gels containing 0.5 g/mL of ethidium bromide, and purified by QIAquick PCR purification kit (QIAGEN, Santa Clarita, CA). The RT-PCR amplicons for each sample were first sequenced with the consensus degenerate RT-PCR primers in both directions, and then the remaining internal gaps and 3 end genome were sequenced with sequence-specific bridging primers in both directions as described previously. The genomic sequences (ORF1b, S, ORF3, E, M, and N) of BtKY22, BtKY33, BtKY27, BtKY41, BtKY43, BtKY06, and BtKY24 were deposited in NCBI GenBank (HQ728480-HQ728486). Sequences were assembled in Sequencher (Genecodes, Ann Arbor, MI). Each putative ORF was predicted using the NCBI ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Nglycosylation sites were predicted using NetNGlyc 1.0 Server (http://www.cbs.dtu.dk/services/NetNGlyc/). BLAST analyses were performed against NCBI non-redundant protein database (Altschul et al., 1990) and against the Conserved Domain Database for protein classification (CDD) (Marchler-Bauer et al., 2005) to characterize the putative ORFs. Alignments of the seven Kenya bat CoV gene sequences with a representative set of 43 other CoV sequences, available in the public domain, were performed using the MUSCLE v3.6 (Edgar, 2004) . We constructed maximum likelihood trees for each gene alignment (ORF1b, S, E, M, and N) in MEGA software package v5.0 (Tamura et al., 2011) with 1000 bootstrap replications. We used General-Time-Reversible nucleotide (nt) substitution model with 4 categories of gamma distributed rate heterogeneity and a proportion of invariant sites (GTR + ␥4 + I). To identify potential recombination events of the seven Kenya bat CoVs, three methods implemented in recombination detection program RDP version 2 were used, including MaxChi (Smith, 1992) , Chimaera (Posada et al., 2002) , and Geneconv (Padidam et al., 1999) . Events detected by all three methods with default parameters were considered as potential recombination events. The aliquots of bat rectal samples for BtKY27, BtKY33, BtKY22, BtKY41, BtKY43, BtKY24, and BtKY06 were confirmed positive by the pan CoV RT-PCR assay, while among tissues (liver, lung, and/or kidney) that were available from bats BtKY27, BtKY22, BtKY41, BtKY24, and BtKY06, only the liver from bat BtKY22 (Chaerephon sp.) and the kidney from bat BtKY24 (Eidolon helvum) tested positive by RT-PCR. These data support an infection process rather than transit of ingested infected material through the digestive tract as the source of viral RNA in rectal swabs, particularly because these bat species do not feed on vertebrates. Negative results for other tissues may be explained by specific pathobiology and a limited tropism to the available tissues. Each acquired CoV genome sequence covers the complete ORF1b, S protein, ORF3, E protein, M protein, N protein, other putative ORFs after N and the 3 end untranslated region with a poly A tail. The genome organization and size for each of the ORFs are shown in Fig. 1 and Table 1 , respectively. They are similar to other known CoV genome organization in the order of 5 -ORF1b, S, ORF3, E, M, and N-3 , but have a variable number of putative ORFs downstream of the N gene. The sizes of these seven genomic sequences from ORF1b to the 3 end are between ∼15k and ∼16k and their G + C contents are between 37.6% and 42.6%. BtKY27 has no evidence of a putative ORF downstream of the N gene, but possesses a short untranslated region and poly-A tail similar to Bat-CoV 1A (Chu et al., 2008) . BtKY22, BtKY33 and BtKY43 have one small putative ORF (76-161 amino acids (aa)) downstream of the N with no significant homology to previously described CoV ORFs. BtKY06 and BtKY24 have two small putative ORFs downstream of the N with sequence similarity to NS7a and NS7b in Bat-CoV HKU9, respectively . BtKY41 has two small putative ORFs downstream of the N, which are overlapped and have no significant sequence homology to the previously described ORFs. Like most alphacoronaviruses, the BtKY27, BtKY33, BtKY22, BtKY41, and BtKY43 viruses share a core sequence 5 -CUAAAC-3 or similar putative transcription regulatory sequence (TRS) upstream of ORFs S, M, N, and ORFx and ORFy (Table 1) (Chu et al., 2008; Woo et al., 2005) . ORF3 and E have putative core TRSs that sometimes varied from that for the other ORFs. The BtKY06 and BtKY24 have a core sequence TRS 5 -ACGAAC-3 in the upstream of each ORF except E which has a core sequence TRS 5 -UCGAAC-3 (Table 1) . Spike proteins are the type I glycosylated membrane proteins, with a putative signal peptide at the N terminal. There are 31, 27, 28, 25, 31, 20, and 19 potential N-glycosylation sites in BtKY22, BtKY27, BtKY33, BtKY41, BtKY43, BtKY24, and BtKY06, respectively. As shown in Fig. 2 , spike proteins of the seven bat CoVs lack furin protease recognition site, such as RRADR-S in Murine Hepatitis Virus (MHV), RRSRG-A in human CoV OC43 (HCoV OC43), RRSRR-A in bovine CoV (BCoV) (Follis et al., 2006) , and cathepsin L cleavage site (VAYT-M) as in SARS-CoV (Bosch et al., 2008) . In spite of lacking conserved cleavage sites, they all consist of two domains, S1 and S2, showing the conserved GxCx motif in S1 around the cleavage site and the conserved nonamer motif IPTNFSISI or similar motif in S2. These motifs have been observed in other known CoVs (Follis et al., 2006) . The S1 is responsible for virus binding to the receptor on the target cells and may contain receptor binding domains (RBDs) that directly bind to host cellular receptors. For example, the RBDs of HCoV 229E, TGEV, and HCoV NL63 in Alphacoronavirus are mapped at the C terminus of their S1 domain (Bonavia et al., 2003; Godet et al., 1994; Lin et al., 2008) . The RBDs of MHV and SARS-CoV in Betacoronavirus are mapped at N terminus and central region of S1 domain, respectively Lin et al., 2008) . Alignment of aa sequences of S1 regions from BtKY22, BtKY27, BtKY33, BtKY41, and BtKY43 of Alphacoronavirus with the corresponding known RBD S1 regions of HCoV 229E, TGEV, and HCoV NL 63 showed 33-41% identity in S1 RBD domains to HCoV 229E and 24-29% identity to TGEV and HCoV NL63 ( Fig. S1A -C). BtKY24 and BtKY06 from Betacoronavirus are quite different in the corresponding RBD S1 regions from SARS-CoV and MHV (17-19% identity) ( Fig. S1D-E) . The dissimilarity of S1 regions of these bat CoVs to other CoVs may suggest their different host specificity. We constructed phylogenetic trees using maximum likelihood method based on nt sequences of ORF1b, S, E, M and N genes with representative viruses whose corresponding sequences of their genomes were available (Fig. 3) . The phylogeny of E gene is not shown due to the short length and limited value for inferring species phylogenies. Similar topologies were observed in the phylogenetic trees based on each of 5 ORFs (Fig. 3) . The analysis revealed that among the seven bat CoVs, five belonged to Alphacoronavirus while the other two belonged to Betacoronavirus (Fig. 3) . Phylogenetic clusterings within Alphacoronavirus varied slightly when different genes were analyzed. For example, BtKY22 and BtKY43 grouped into one monophyletic clade in ORF1b tree while they were grouped differently in the S and N gene trees with generally insignificant bootstrap values (Fig. 3) . Although recombination was suspected, we found no evidence of recombination in the seven analyzed viruses using MaxChi (Smith, 1992) , Chimaera (Posada et al., 2002) , and Geneconv (Padidam et al., 1999) . Since the analyses were based on representatives from each CoV species, the results suggest a lack of inter-species recombination in these viruses. One explanation is that the recombination frequency decreases significantly when the sequence divergence is high (Kleiboeker et al., 2005; van Vugt et al., 2001) . Alternatively, the lack of inter-specie recombination is due to rare co-infections as the viruses adapted to different bats species. Therefore, the phylogenetic incongruence observed in the gene trees is probably due to low phylogenetic signals, which may be improved by sampling more CoVs that are related to BtKY22 and BtKY43. The pairwise nt comparisons among these seven bat CoV gene sequences revealed 67-76% overall nt identity. Among the five alphacoronaviruses, three (BtKY22, BtKY41 and BtKY43) were distantly related to other known alphacoronaviruses with only 69-71% overall nt identity and with <90% aa identity in all five conserved domains (nsps 12-16) of ORF1b (Table 2 ). Since we were not able to obtain all the genome portions necessary for definite species classification (de Groot et al., 2011), we adopted the separation criteria based on the RdRp group units (RGU) (Drexler et al., 2010) . The aa distances in the 816 bp fragment of the RdRp gene from the Kenya bat CoVs described in this study were compared to the aa sequences from their close reference viruses (Table S2) . BtKY22, BtKY41, and BtKY43 had >4.8% aa distance in the RdRp fragment (Table S2 ). This suggests that they are most likely three distinctive alphacoronvirus species. BtKY27 and BtKY33 identified in Miniopterus bats were closely related to Bat-CoV 1A, which was identified from bent-winged Miniopterus bat in Hong Kong (Chu et al., 2006) with 85% and 75% overall nt identity and with >90% aa identity in 5/5 and 4/5 conserved domains (nsps 12-16) in ORF1b, respectively (Table 2 ). BtKY27 and BtKY33 had <4.8% aa distance in the 816 bp RdRp to their close reference viruses indicating that they are members of the established bat CoV species in Alphacoronavirus. As for the two members of Betacoronavirus genus identified, one (BtKY06 identified in Rousettus aegyptiacus bat) was likely a member of Bat-CoV HKU9 species identified from Rousettus leschenaulti bat in China , sharing 90% overall nt identity and 99% aa identity in 4/5 conserved domains (nsps12-16) in ORF1b ( Table 2 ). The other (BtKY24) was distantly related to other known betacoronaviruses with ≤70% overall nt identity and <90% aa identity in all 5 conserved domains (nsps 12-16) from ORF1b (Table 2) . Additionally, based on the RGU criteria, BtKY24 had >6.3% aa distance in the 816 bp RdRp fragment compared to its closest reference virus indicating that it is most likely a distinctive betacoronavirus. In conclusion, sequence data for the structural and nonstructural ORFs in the 3 -end of the genome of seven Kenya bat CoVs confirmed the high diversity and their phylogenetical placement into Alphacoronavirus and Betacoronavirus genera. The four clusters of Kenya bat CoVs represented by BtKY22, BtKY41, BtKY43, and BtKY24 respectively, most likely belonged to novel CoV species, the two clusters represented by BtKY27 and BtKY33 were likely members of Bat-CoV 1A, and the cluster represented by BtKY06 was likely a member of Bat-CoV HKU9 species. As noted with other novel CoVs, the genome organization is similar but differences were found in the number of putative ORFs downstream from the ORF N. The present results are in line with previous findings of extensive diversity of CoVs detected in bats and confirm that bat CoVs mainly belong to the Alphacoronavirus and Betacoronavirus genera Tang et al., 2006; Woo et al., 2007 Woo et al., , 2009b . Consistent with other reports, none of the bat CoVs characterized in the present study was sufficiently similar to the human SARS-CoV and other human CoVs to be suggested their direct progenitors. The examples of host switching among CoVs after relatively minor sequence changes in S1 domain of spike protein (Haijema et al., 2003; Kuo et al., 2000; Qu et al., 2005) suggest the potential risks for introduction into humans as occurred with SARS-CoV. Therefore characterization of novel CoVs and understanding species diversity in animals should help understand and respond to emerging zoonotic infections. Basic local alignment search tool Identification of a receptor-binding domain of the spike glycoprotein of human coronavirus HCoV-229E Cathepsin L functionally cleaves the severe acute respiratory syndrome coronavirus class I fusion protein upstream of rather than adjacent to the fusion peptide Avian coronavirus in wild aquatic birds Genomic characterizations of bat coronaviruses (1A, 1B and HKU8) and evidence for co-infections in Miniopterus bats Coronaviruses in bent-winged bats (Miniopterus spp Family Coronaviridae Genomic characterization of severe acute respiratory syndrome-related coronavirus in European bats and classification of coronaviruses based on partial RNA-dependent RNA polymerase gene sequences MUSCLE: multiple sequence alignment with high accuracy and high throughput Furin cleavage of the SARS coronavirus spike glycoprotein enhances cell-cell fusion but does not affect virion entry Major receptor-binding and neutralization determinants are located within the same domain of the transmissible gastroenteritis virus (coronavirus) spike protein Switching species tropism: an effective way to manipulate the feline coronavirus genome Simultaneous detection of North American and European porcine reproductive and respiratory syndrome virus using real-time quantitative reverse transcriptase-PCR Retargeting of coronavirus by substitution of the spike glycoprotein ectodomain: crossing the host cell species barrier Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Complete genome sequence of bat coronavirus HKU2 from Chinese horseshoe bats revealed a much smaller spike gene with a different evolutionary lineage from the rest of the genome Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Identification of residues in the receptor-binding domain (RBD) of the spike protein of human coronavirus NL63 that are critical for the RBD-ACE2 receptor interaction CDD: a Conserved Domain Database for protein classification RDP2: recombination detection and analysis from sequence alignments Possible emergence of new geminiviruses by frequent recombination Identification of a novel coronavirus in bats Recombination in evolutionary genomics Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences Analyzing the mosaic structure of genes MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods Prevalence and genetic diversity of coronaviruses in bats from China Detection of novel SARS-like and other coronaviruses in bats from Kenya High frequency RNA recombination in porcine reproductive and respiratory syndrome virus occurs preferentially between parental sequences with high similarity Coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Coronavirus diversity, phylogeny and interspecies jumping Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features We thank Ivan Kuzmin, Michael Niezgoda, and Charles E. Rupprecht from Division of High Consequence Pathogens and Pathology, CDC, Atlanta, GA; Robert F. Breiman from Global Disease Detection Division, CDC-Kenya, Nairobi, Kenya; and Bernard Agwanda from National Museum, Kenya Wildlife Service, Nairobi, Kenya for excellent technical and logistical assistance and field study. The study was supported in part by the Global Disease Detection program of CDC (Atlanta, GA). Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.virusres. 2012.04.007.