key: cord- -kllrqoxi authors: dos passos cunha, marielton; ortiz-baez, ayda susana; de melo freire, caio césar; de andrade zanotto, paolo marinho title: codon adaptation biases among sylvatic and urban genotypes of dengue virus type date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: kllrqoxi dengue virus (denv) emerged from the sylvatic environment and colonized urban settings, being sustained in a human-aedes-human transmission chain, mainly by the bites of females of the anthropophilic species aedes aegypti. herein, we sought evidence for fine-tuning in viral codon usage, possibly due to viral adaptation to human transmission. we compared the codon adaptation of denv serotype (denv- ) genotypes from urban and sylvatic habitats and tried to correlate the findings with key evolutionary determinants. we found that denv- codons of urban and sylvatic genotypes had a higher cai to humans than to ae. aegypti. remarkably, we found no significant differences in codon adaptation to human between urban american/asian and sylvatic denv- genotypes. moreover, cai values were significantly different, when comparing all genotypes to ae. aegypti codon preferences, with lower values for sylvatic than urban genotypes. in summary, our findings suggest the presence of a molecular signature among the genotypes that circulate in sylvatic and urban environments, and may help explain the trafficking of denv- strains to an urban cycle. dengue virus (denv) is the world's most important mosquito-borne viral human pathogen. it is widespread throughout tropical and subtropical regions (bhatt et al., ; gubler, ) . denv has a natural sylvatic maintenance cycle, involving species of blood-feeding arboreal mosquitoes of the genus aedes (ae. taylori, ae. furcifer, ae. vitattus, and ae. luteocephalus) and nonhuman primates, that develop viremic infections and serve to amplify transmission (amarasinghe et al., ; diallo et al., ) . urban cycle involve amplification hosts and peridomestic aedes spp. vectors, mainly ae. aegypti (gubler, ; vasilakis et al., ) . denv is an enveloped virus of the genus flavivirus, family flaviviridae, classified in four phylogenetically and antigenically distinct serotypes (denv- - ). its genome is made by a single-stranded, positive-sense rna molecule of approximately kb, which contains a single open reading frame encoding three structural [capsid (c), membrane (m) and envelope (e)] and seven non-structural (ns , ns a, ns b, ns , ns a, ns b, and ns ) proteins (chambers et al., ) . based on their genetic diversity and geographical distribution, distinct groups have been also identified within each serotype, defined as genotypes (cologna et al., ; costa et al., ) . six distinct denv- genotypes are recognized (costa et al., ; twiddy et al., ; wei and li, ) . the sylvatic genotype circulates in a sylvatic cycle, while the american, american/asian, asian , asian and cosmopolitan genotypes circulate in urban cycle (wang et al., ) . inferred evolutionary relationships of denv suggest that the urban genotypes evolved from sylvatic lineages due to several crossspecies transmission events into humans, followed by the recent (i.e., early th century) evolution of urban forms (messina et al., ; wang et al., ; wei and li, ) . notably, environmental and social factors such as the introduction of the anthropophilic vector, urbanization, deforestation and the increased human travel and trade have facilitated the emergence, spread and evolution of denv into human populations (mayer et al., ; parvez and parveen, ) . elements such as genetic composition and pathogen-host interaction are believed to be involved in different patterns of transmissibility (jenkins and holmes, ; lara-ramírez et al., ; lequime et al., ; lobo et al., ) . the degeneracy of the genetic code refers to the fact that a single amino acid can be coded by different codons. the redundancy in the genetic code has an important role in controlling metabolic processes, but not all species use its built-in codon redundancy in the same way. the unequal preference of specific codons over other synonymous codons during the translation process creates a bias in codon usage. codon usage biases are common throughout the tree of life, and for viruses, the balance between mutation and natural selection allows for changes in codon usage biases (morton, ) . genetic plasticity and the capacity to adapt to new hosts facilitate the emergence of rna viruses in novel, unexplored environments. this process entails both, (i) adapting to different types of cellular machinery involved in viral replication and (ii) the evasion from different types of cellular antiviral responses. however, the mechanisms used for virus for trafficking among hosts remain poorly understood (bahir et al., ; longdon et al., ; pal et al., ) . because denv must infect successfully and alternate between mosquito and human hosts, nucleotide analyses support the notion that arboviruses use a restricted and balanced nucleotide composition as a compromise to be able to infect both hosts, thus successfully ensuring several processes such as translational efficiency and replication (shen et al., ) . in this context, the codon adaptation index (cai) correlates with gene expression levels and adaptation of viral genes to their hosts (gustafsson et al., ; neame, ; pal et al., ) . in the present study, we performed comprehensive analyses of codon adaptation of denv- from different habitat settings and host systems. the sequences used in this study were downloaded from the national center for biotechnology information (ncbi) (https://www. ncbi.nlm.nih.gov/genbank/) website in genbank format. the dataset included sequences from all six genotypes as follows: american ( ), american/asian ( ), asian ( ), asian ( ), cosmopolitan ( ) and sylvatic ( ). comparisons of codon usage preferences were performed against reference sets from homo sapiens ( cdss) (https://github.com/caiofreire/cub) and aedes aegypti ( cdss) (http://www.kazusa.or.jp/codon/), using the standard genetic code. all the complete genomic sequences available for denv- with information regarding the location and year of isolation were recovered, and later converted into fasta format. fasta sequences were aligned using clustal omega (larkin et al., ) and recombinant sequences were screened using all algorithms implemented in rdp program (rdp, geneconv, bootscan, maxchi, chimaera, siscan and seq) using the standard settings (martin et al., ) . the alignment of recombinant free sequences was manually inspected and edited using the program aliview v. . (larsson, ) , resulting in a final dataset with coding sequences (table s ). viral phylogenies based on full-length coding sequences were estimated using maximum likelihood (ml) implemented in fasttree (price et al., ) . we first evaluated the best transition model to be gtr + Γ using jmodeltest v. . . . the final tree was then visualized and plotted using figtree v. . . (http://tree.bio.ed.ac.uk). all sequences used in this work are presented in the format: genotype/accession number/country/year of isolation. cai is a measure of silent, synonymous codon usage bias based on the codon preference of a viral strain and a codon usage table for a given host (sharp et al., ; sharp and li, ) . we applied the cai using a frequency table for housekeeping of human genes (eisenberg and levanon, ) (available at https://github.com/caiofreire/cub) and for ae. aegypti using the table available in the codon usage database (nakamura, ) . the cai values were calculated to measure the synonymous codon usage bias using the caical program (puigbò et al., a) . to evaluate the statistical support of the cai values, we define a threshold value or expected cai (e-cai) (puigbò et al., b) by generating random sequences with gc content, amino acid composition and sequence length similar to the denv- query sequences. cai values above the e-cai are interpreted as statistically significant, meaning that codon similarity arises from codon preferences rather than from internal biases (puigbò et al., b) . statistical significance was determined by the mann-whitney u test to compare the difference between cai values for humans and the mosquito, assuming a significance level of . . multiple comparisons among genotypes were performed using a kruskal-wallis test implemented in the pgirmess v . . package for the r statistical environment. to evaluate levels of independence, we further carried out a dunn's post hoc test. further, to avoid type i error inflation, we applied a bonferroni adjustment of p-values, using the pmcmr package, at a significance level of . . perl and r scripts were used to analyze most of the data in this study and are available from the authors upon request. as has been well documented, we recovered six phylogenetically structured groups or genotypes for denv- , based on the open reading frame of non-recombinant sequences. the most basal group encompassed lineages from west africa and asia of sylvatic genotype, which reinforced the notion that the urban genotypes emerged from the sylvatic cycle, as previously described (vasilakis et al., ; vasilakis et al., a) . urban genotypes were also identified as: (i) american, (ii) american/asian, (iii) asian , (iv) asian and (v) cosmopolitan genotypes (fig. a, fig. s ). comparison of the cai values obtained from human and ae. aegypti codon usage tables, shown in fig. b , revealed significant differences between hosts (p-value < . ), demonstrating that denv- codon usage is more similar to that of humans than to ae. aegypti. in addition, we assessed the values of cai for each genotype in comparison to human and ae. aegypti hosts. because cai values for the polyprotein sequences were not normally distributed, as assessed using shapiro-wilk and kolmogorov-smirnov tests, we determined statistical significance with the mann-whitney u test to compare among cai estimates. the distribution of cai values for different genotypes showed that all the genotypes fell above the e-cai estimated for each host. furthermore, statistical analyses showed significant differences (pvalue < . ) among urban genotypes for both hosts (table ). significant differences between urban and sylvatic genotypes were identified in ae. aegypti (table ) . likewise, the genotypes with the highest cai were asian , asian and the american genotypes. we also observed that the cai values for the sylvatic genotype were the lowest when comparing with urban genotypes in a. aegypti (p-value < . ) ( fig. b and table ). this suggested that genotypes from urban setting could have experienced some fine-tuning process in codon optimization to translation in a. aegypti. for humans, the only pairwise comparison that was not significantly different between urban and sylvatic genotypes was that between the american/asian and the sylvatic genotype ( fig. b and table ), suggesting that they share similar silent selective pressures. the american, asian and cosmopolitan genotypes had the highest cai values for humans (fig. b) . finally, we observed that these values were similar and roughly constant over time (fig. ) . crucially, values were higher than the e-cai for all the genotypes in human and mosquito cells, indicating that codon usage preferences were not random in denv- genotypes (fig. ) . the potential emergence of sylvatic strains into urban cycles has become an increasing focus of research and public concern (moncayo et al., ; vasilakis et al., ; vasilakis et al., b) . recently, this concern become more tangible after some studies have suggested human infection cases with presumably sylvatic denv- strains. phylogenetic analyses showed that the highly divergent strains (qml / , d sab ) of denv- was basal and strikingly different from all other previously isolated strains (liu et al., ; pyke et al., ) . in both cases, we excluded these sequences from our analyses, because including highly divergent sequences without knowing which genotype belong, may impact on the estimates for the whole dataset due to sampling bias and differences in the evolutionary rates. previous data suggest that the nature of selective pressures is broadly equivalent in urban and sylvatic strains (vasilakis et al., ; vasilakis et al., b) . this could mean that the successful emergence of a novel sylvatic denv strains in an urban cycle is unlikely to require a major adaptive challenge (vasilakis et al., b) . insects and mammals are separated by about billion years of evolutionary history, and they differ at the organismal level and in many biochemical processes (lobo et al., ; shen et al., ) . since denv needs to negotiate with different hosts (i.e. human and aedes mosquitoes), it is fair to assume that the efficiency in the viral translation process is a key factor sustaining the urban cycle. our results suggested that codon adaptation was higher for human cells than for ae. aegypti (fig. , fig. ). this was also reported for other flavivirus (butt et al., ; freire et al., ; moratorio et al., ; pal et al., ) . in addition, similar cai values recovered for urban and sylvatic genotypes suggests their ability to possibly make similar use of human cellular machinery. this notion is consistent with kinetic experiments comparing the replication of sylvatic and urban denv- strains, which could explain that the emergence of urban denv- from sylvatic progenitors may not have required adaptation to replicate more efficiently in humans (vasilakis et al., b) . notwithstanding, when using the cai tables from ae. aegypti, we observed a conspicuous difference between urban and sylvatic genotypes with lower cai values for the sylvatic genotype (fig. ) . this finding suggested that the emergence of urban genotypes into the urban cycle may have required viral adaptation towards increased competence in ae. aegypti, which should then entail higher efficiency in viral replication. this is in good agreement with susceptibility experiments reported by moncayo et al. ( ) . furthermore, competence studies with ae. aegypti from regions where sylvatic genotype circulation occurs indicate its low vector capacity for dengue virus (amarasinghe et al., ; diallo et al., ) . despite this, there was no previous attempt to establish an association with codon usage biases. fluctuations in cai distribution across genotypes (fig. b ) indicated possible differences in codon adaptation for both hosts (table ) . differential selection pressures exerted by the complex interplay of distinct virus factors, such as infection efficiency, population density and herd immunity (plowright et al., ; salazar et al., ) . for example, a recent study reported by salje et al. ( ) demonstrated the association between population density and the establishment of transmission chains. the increase in mosquito populations has also been identified as a potential determinant of the emergence and increased transmission of arbovirus, particularly in immunologically naïve human populations (dudas et al., ; pettersson et al., ) . our findings bring into question whether sylvatic strains could reach similar levels of transmission as the urban genotypes in asian and american populations if the vector competence of ae.aegypti is increased. in fact, recent outbreaks caused by other emerging viruses [e.g. ebola virus, influenza a virus (h n ), middle east respiratory syndrome coronavirus (mers-cov)], have occurred due to the zoonotic spillover and adaptation of the virus to replication in human cells (dudas et al., ; mänz et al., ; plowright et al., ; urbanowicz et al., ) . in this context, as previous analyses have suggested (moncayo et al., ) , our findings support the notion that for the sylvatic strains to effectively colonize the urban environment, the virus needs a number of silent, adaptive nucleotide substitutions to optimize the codon usage to invertebrate host cells, while maintaining a compositional base balance suitable for efficient alternate spread among both human and insect hosts. apparently, even when humans are susceptible to infection by denv vectored by ae. aegypti, its reduced vectorial competence ultimately constitutes a hurdle rather than an enabler of virus transmission. in conclusion, our findings provide a comprehensive assessment of the codon adaptation of denv- in different habitats (i.e. urban and sylvatic settings) and host systems (i.e. homo sapiens and the mosquito vector aedes aegypti). in this context, although the virus replicates in both human and mosquitoes, our analysis suggested that denv- codon usage is better adapted to humans than to the main cosmopolitan vector (ae. aegypti). this would imply that there is still room for adaptation and improved transmission among new denv- strains for causing future pandemics. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . dengue virus infection in africa viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences the global distribution and burden of dengue evolution of codon usage in zika virus genomes is host and vector specific flavivirus genome organization, expression, and replication selection for virulent dengue viruses occurs in humans and mosquitoes comparative evolutionary epidemiology of dengue virus serotypes amplification of the sylvatic cycle of dengue virus type , senegal, - : entomologic findings and epidemiologic considerations vector competence of aedes aegypti populations from senegal for sylvatic and epidemic dengue virus isolated in west africa mers-cov spillover at the camel-human interface human housekeeping genes, revisited spread of the pandemic zika virus lineage is associated with ns codon usage adaptation in humans dengue and dengue hemorrhagic fever engineering genes for predictable protein expression the extent of codon usage bias in human rna viruses and its evolutionary origin large-scale genomic analysis of codon usage in dengue virus and evaluation of its phylogenetic dependence clustal w and clustal x version . aliview : a fast and lightweight alignment viewer and editor for large datasets determinants of arbovirus vertical transmission in mosquitoes highly divergent dengue virus type in traveler returning from borneo to australia virus-host coevolution: common patterns of nucleotide motif usage in flaviviridae and their hosts the evolution and genetics of virus host shifts multiple natural substitutions in avian influenza a virus pb facilitate efficient replication in human cells rdp : detection and analysis of recombination patterns in virus genomes the emergence of arthropod-borne viral diseases: a global prospective on dengue, chikungunya and zika fevers global spread of dengue virus types: mapping the year history dengue emergence and adaptation to peridomestic mosquitoes a detailed comparative analysis on the overall codon usage patterns in west nile virus the role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast dna codon usage tabulated from international dna sequence databases: status for the year gene expression: structure versus codon bias dengue virus type evolution and genomics: a bioinformatic approach evolution and emergence of pathogenic viruses: past, present, and future how did zika virus emerge in the pacific islands and latin america pathways to zoonotic spillover fasttree -approximately maximum-likelihood trees for large alignments caical: a combined set of tools to assess codon usage adaptation e-cai: a novel server to estimate an expected value of codon adaptation index (ecai) complete genome sequence of a highly divergent dengue virus type strain, imported into australia from sabah american and american/asian genotypes of dengue virus differ in mosquito infection efficiency: candidate molecular determinants of productive vector infection dengue diversity across spatial and temporal scales: local structure and the effect of host population size the codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference phylogenetic relationships and differential selection pressures among genotypes of dengue- virus human adaptation of ebola virus during the west african outbreak evolutionary processes among sylvatic dengue type viruses potential of ancestral sylvatic dengue- viruses to re-emerge fever from the forest: prospects for the continued emergence of sylvatic dengue virus and its impact on public health evolutionary relationships of endemic/epidemic and sylvatic dengue viruses global evolutionary history and spatio-temporal dynamics of dengue virus type this work was supported by the brazilian national council of scientific and technological development (cnpq) process no. / - . mpc and asob received fapesp grants: no. / - and no. / - , respectively. the funders had no role in the data collection or analysis, decision to publish, or preparation of the manuscript. we thank felipe g. naveca and valdinete a. nascimento for the support. key: cord- -v tjof authors: fahmi, muhamad; kubota, yukihiko; ito, masahiro title: nonstructural proteins ns b and ns are likely to be phylogenetically associated with evolution of -ncov date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: v tjof the seventh novel human infecting betacoronavirus that causes pneumonia ( novel coronavirus, -ncov) originated in wuhan, china. the evolutionary relationship between -ncov and the other human respiratory illness-causing coronavirus is not closely related. we sought to characterize the relationship of the translated proteins of -ncov with other species of orthocoronavirinae. a phylogenetic tree was constructed from the genome sequences. a cluster tree was developed from the profiles retrieved from the presence and absence of homologs of ten -ncov proteins. the combined data were used to characterize the relationship of the translated proteins of -ncov to other species of orthocoronavirinae. our analysis reliably suggests that -ncov is most closely related to batcov ratg and belongs to subgenus sarbecovirus of betacoronavirus, together with sars coronavirus and bat-sars-like coronavirus. the phylogenetic profiling cluster of homolog proteins of one annotated -ncov protein against other genome sequences revealed two clades of ten -ncov proteins. clade consisted of a group of conserved proteins in orthocoronavirinae comprising orf ab polyprotein, nucleocapsid protein, spike glycoprotein, and membrane protein. clade comprised six proteins exclusive to sarbecovirus and hibecovirus. two of six clade nonstructural proteins, ns b and ns , were exclusively conserved among -ncov, betacov_ratg, and batsars-like cov. ns b and ns have previously been shown to affect immune response signaling in the sars-cov experimental model. thus, we speculated that knowledge of the functional changes in the ns b and ns proteins during evolution may provide important information to explore the human infective property of -ncov. in december , the seventh human coronavirus, termed novel coronavirus ( -ncov) or severe acute respiratory syndrome coronavirus (sars-cov- ), was found in wuhan, china. on february , , the total number of infections and deaths due to -ncov globally was , and , respectively, according to the johns hopkins university center for systems science and engineering. coronaviruses are enveloped rna viruses that infect many species, including humans, other mammals, and birds. after infection, the host may develop respiratory, bowel, liver, and neurological diseases (weiss and leibowitz, ; cui et al., ) . coronaviruses are members of the order nidovirales and subfamily orthocoronavirinae. this subfamily is divided into four genera: alphacoronavirus, betacoronavirus, gammacoronavirus, and deltacoronavirus. generally, alphacoronavirus and betacoronavirus tend to infect mammals, while gammacoronavirus and deltacoronavirus typically infect birds. however, some gammacoronavirus and deltacoronavirus can infect mammals under specific conditions (woo et al., ) . in immunocompromised individuals, infection with one of the four human coronaviruses-human coronavirus nl (hcov-nl ), human coronavirus e (hcov- e), human coronavirus oc (hcov-oc ), and human coronavirus hku (ecov-hku )-usually results in cold-like symptoms. these viruses can cause severe infections in some infants and the elderly. due to the frequent interaction between wild animals and humans, wild animals are a common source of human t zoonotic infections. sars-cov and middle east respiratory syndrome coronavirus (mers-cov) are zoonotic coronaviruses that can cause severe respiratory diseases in humans; both belong to betacoronavirus (su et al., ; forni et al., ; cui et al., ; luk et al., ; ramadan and shaib, ) . -ncov is the seventh coronavirus discovered that infects humans. it causes acute respiratory disease in respiratory infections. immediately after its discovery, the complete genome sequence of -ncov was determined. the sequence (mn ) was released by genbank on january (lu et al., ) . the sequence of -ncov is % identical, at the wholegenome level, to a bat coronavirus (zhou et al., ) . the genomic characteristics and epidemiology of -ncov have been analyzed (lu et al., ) . nine inpatient culture isolates were subjected to next-generation sequencing, and individual complete and partial -ncov genomic sequences were obtained. phylogenetic analysis of these -ncov genomes and other coronaviruses was performed to determine the evolutionary history of the virus and to explore the origin of -ncov. at the first onset, homology modeling investigated the potential receptor-binding properties of the virus. however, sars-cov and mers-cov showed approximate similarities of % and % with -ncov, respectively. these findings indicated that there is not a close evolutionary relationship of -ncov with sars-cov and mers-cov. thus, -ncov is considered the seventh novel human betacoronavirus (lu et al., ) . in this study, we comprehensively characterized the relationship of the translated proteins of -ncov to other species of orthocoronavirinae. this was done using a combination of the phylogenetic tree constructed from the genome sequences and the cluster tree developed from the profiles retrieved from the presence and absence of homologs of ten -ncov proteins. the genomes and the combination of genome and protein sequences were used to develop a phylogenetic tree and phylogenetic profiling, respectively. the dataset of the genomes of the orthocoronavirinae subfamily was collected from the refseq database using the orthocoronavirinae ncbi taxonomy id (txid ). this dataset contains representative complete genomes from each species of that subfamily. (pruitt et al., ; federhen, ) (supplementary table ). additionally, we collected genome sequences from bat sarslike coronavirus (mg and mg ) from ncbi and betacov/ bat/yunnan/ratg / (epi_isl_ ) from gisaid (http:// www.gisaid.org). one species of the okanivirinae subfamily, the yellow head virus, was also collected as an outgroup (supplementary table ). the genome sequences were aligned using mafft multiple sequence alignment program provided at the xsede portal in the cipres science gateway with an automatic selection strategy (miller et al., ; katoh and standley, ) . a phylogenetic tree was constructed using the maximum likelihood method with raxml-hpc blackbox in the cipres science gateway (stamatakis, ) . the analysis used an automatic bootstrapping option using a general timereversible substitution model with a gamma-shape parameter (gtr+ g). the model was selected as the best-fit model under the akaike information criterion using modeltest-ng (darriba et al., ) . phylogenetic trees were viewed using figtree v . (http://tree.bio.ed.ac.uk/ software/figtree/). the annotated protein sequences of -ncov were collected from the data of one representative genome from ncbi (mn ). we built a blast database with the retrieved genome sequences data using blast+ version . . (camacho et al., ) . we then determined the presence and absence of homolog proteins of one representative set of annotated -ncov proteins against other genome sequences in a database using tblastn with a threshold of > and > bits score for protein sequences > amino acids (aa) and < aa in length, respectively. the results of the presence and absence of homolog proteins were converted into a binary matrix and used to build a clustering tree using ward hierarchical clustering method (ward jr, ) (supplementary table ). nonstructural protein (ns) b and ns local alignments were only positive in the sarbecovirus subgenus sample, excluding the sars coronavirus. additionally, we predicted the structural properties of the -ncov ns b protein, including the secondary structure and order-disorder propensity, using jpred and dichot, respectively (fukuchi et al., ; drozdetskiy et al., ) . we also predicted the structure using the contact assisted protein structure prediction (c-i tasser) composite approach (zhang et al., ) . additionally, we specifically collected the sequences that produced significant alignments of ns b using the mega x software (kumar et al., ) . the phylogenetic analysis using complete genome sequences showed that -ncov was the most closely related to batcov ratg and belonged to the sarbecovirus subgenus of betacoronavirus, together with sars coronavirus and bat-sars-like coronavirus (bat-sl-covzxc and bat-sl-covzc ) with the full support of reliability (fig. ) . additionally, hibecovirus with bat hp-betacoronavirus/zhe-jiang , as the representative species, was the most closely related subgenus of betacoronavirus to sarbecovirus as compared to other subgenera, including merbecovirus (under which mers-cov has been classified), nobecovirus, and embecovirus. these findings agree with previous phylogenetic tree and similarity plot data (paraskevis et al., ) . -ncov was found to be more closely related to the batinfecting sarbecovirus species, bat sars-like coronavirus, and betacov ratg than to the sars coronavirus that infects humans. this indicated that -ncov more likely originated from bats. however, the wuhan outbreak was first detected in december, which is a time of year when most bat species hibernate. moreover, the huanan seafood market, which is considered as ground zero of the outbreak, does not sell bats. instead, it has been suggested that there is an animal mediator for virus transmission from bats to humans, similar to the previous cases of sars-cov and mers-cov, wherein the masked palm civet (paguma larvata) and dromedary camel (camelus dromedarius) act as intermediate hosts, respectively (lu et al., ) . although coronaviruses can exchange genetic material during coinfection, a recent report described the lack of a mosaic relationship of -ncov to the closely related sarbecovirus, indicating the lack of a recombination event in the emergence of -ncov (paraskevis et al., ) . hence, -ncov likely emerged from the accumulation of mutations responding to altered selective pressures or from the infidelity of rna polymerase perpetuated as replication-neutral mutations. these speculations need to be studied further. a previously reported comprehensive similarity plot revealed notable mutational hotspots and conserved regions of the genome nucleotide positions of -ncov against closely related coronaviruses (lu et al., ; paraskevis et al., ) . the present findings provide a different perspective of the similarity among orthocoronavirinae species, using a cluster tree developed from the profiles retrieved from the presence and absence of homologs of ten -ncov proteins. this cluster was combined with the cladogram of a previously constructed phylogenetic tree (fig. ) . both the trees were consistent in their heatmap distributions. the tree of -ncov proteins comprised two clades. the first, indicated with a blue bar in fig. , contained a group of conserved proteins in most orthocoronavirinae species. these comprised orf ab polyprotein, nucleocapsid protein, spike glycoprotein, and membrane protein. spike and orf a regions of -ncov were previously shown to have the lowest sequence identity as compared to the closely related coronavirus species (lu et al., ; paraskevis et al., ) . however, since the translated spike glycoprotein and orf ab polyprotein from these regions are very long, the sequence similarity is still sufficient to classify them as homologs. in contrast, another clade, indicated by the green bar in fig. , comprised proteins specific to sarbecovirus for all proteins in this clade and hibecovirus for envelope protein only. this clade included proteins that were not completely conserved by all orthocoronavirus. two (ns b and ns ) of five nonstructural proteins were specific for -ncov and its closely related species, batcov ratg and bat-sars-like coronavirus (bat-sl-covzxc and bat-sl-covzc ). the other three nonstructural proteins (ns , ns , and ns a) were also detected in the sars coronavirus. based on these results, we propose that the comprehensive analysis of nonstructural proteins, especially ns b and ns , may provide new insight into the properties of -ncov. as shown in fig. , ns b and ns of -ncov, batcov ratg , and bat-sars-like coronavirus were distinct from other species of orthocoronavirus. in sars-cov, ns b is an integral protein localized in the golgi compartment. the protein is packaged into sars-cov particles (schaecher et al., ) . interestingly, open reading frame (orf) b, but not orf b deletion, induces interferon (ifn)-dependent reporter gene expression as well as apoptosis and the type i ifn response (pfefferle et al., ). moreover, the deletion of orf b enhance virus growth (pfefferle et al., ). thus, we speculate that the property of the non-conserved ns b in -ncov may affect the human infective property of the virus. similarly, the existence of nucleotide deletions in orf b has been described in sars-cov (oostra et al., ) . a study involving mers-cov described that orf b strongly antagonizes the inf-beta (β) promoter and orf b and b significantly suppress ifn induction (lee et al., b) . accessory proteins b and ab of sars-cov can suppress the inf-β signaling pathway (and thus interferon production) by their participation in the ubiquitin-mediated rapid degradation of inf regulatory factor (irf ) (wong et al., ) . in contrast, when we focused on mers-cov from bats and camels, orf b antagonized melanoma differentiation-associated protein -mediated nuclear factor kappa b (nf-κb) activation. orf b strongly inhibited tank-binding kinase -mediated induction of nf-κb signaling, but not iκb kinase epsilon and irf -mediated activations (lee et al., a) . thus, we speculate that the properties of the accessory proteins, ns b and ns , in -ncov may affect its ability to infect humans. further studies are required to confirm this speculation. ns b is a short peptide of residues. a three-dimensional structure is often difficult to obtain from such a short peptide. we predicted the three-dimensional structure of the queried ns b amino acid sequence using dichot and c-i-tasser ( supplementary fig. , fig. ) (fukuchi et al., ; zhang et al., ) ; a protein family (pf ) was found, but no known three-dimensional structure was found. the fahmi, et al. infection, genetics and evolution ( ) secondary structure of this query was also predicted using jpred ( supplementary fig. ) (drozdetskiy et al., ) . the secondary structure was predicted to be an α-helix, but that this very likely does not occur depending on the environment (fig. ) . the alignment of this protein revealed three polymorphism sites between -ncov, batcov ratg , and bat-sars-like coronavirus sequences (bat-sl-covzxc and bat-sl-covzc ) ( supplementary fig. ). in summary, some nonstructural proteins were conserved and others were not conserved between -ncov and sars-cov. by focusing on the -ncov-specific proteins, ns b and ns , we proposed a combination of phylogenetic profiling analysis and structural characterization of the genes that were specifically expressed in -ncov and the closely related bat coronavirus. the data provide insight for further characterization of the infective properties of this virus. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . this work was supported by the mext-supported program for the strategic research foundation at private universities (grant number s to t.i) and the takeda science foundation. the authors declare no competing interest. okavirus as the outgroup. the heatmap indicates the binary matrix of the homolog proteins of -ncov against other species in the dataset, with black and white colors as presence and absence, respectively. the bit pattern was arranged following the vertical and horizontal trees. the vertical tree is a phylogenetic profiling tree constructed from a binary matrix of the presence and absence of homolog proteins. it has two clades, indicated by blue and green bars. the horizontal tree is the cladogram of the maximum likelihood tree, as shown in fig. , with a collapsed clade of -ncov. (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) fig. . the model of nonstructural-structural transition of -ncov nonstructural protein b. the predicted protein structure of -ncov nonstructural protein b by c-i tasser is shown as the helix structure protein. blast+: architecture and applications origin and evolution of pathogenic coronaviruses modeltest-ng: a new and scalable tool for the selection of dna and protein evolutionary models jpred : a protein secondary structure prediction server the ncbi taxonomy database molecular evolution of human coronavirus genomes ideal in illustrates interaction networks composed of intrinsically disordered proteins and their binding partners mafft multiple sequence alignment software version : improvements in performance and usability mega x: molecular evolutionary genetics analysis across computing platforms middle east respiratory syndrome coronavirus-encoded accessory proteins impair mda -and tbk -mediated activation of nf-κb middle east respiratory syndrome coronavirusencoded orf b strongly antagonizes ifn-β promoter activation: its implication for vaccine design genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding molecular epidemiology, evolution, and phylogeny of sars coronavirus the cipres science gateway: enabling highimpact science for phylogenetics researchers with limited resources the -nucleotide deletion present in human but not in animal severe acute respiratory syndrome coronaviruses disrupts the functional expression of open reading frame full-genome evolutionary analysis of the novel corona virus ( -ncov) rejects the hypothesis of emergence as a result of a recent recombination event reverse genetic characterization of the natural genomic deletion in sars-coronavirus strain frankfurt- open reading frame b reveals an attenuating function of the b protein in-vitro and in-vivo ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins middle east respiratory syndrome coronavirus (mers-cov): a review the orf b protein of severe acute respiratory syndrome coronavirus (sars-cov) is expressed in virus-infected cells and incorporated into sars-cov particles raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models epidemiology, genetic recombination, and pathogenesis of coronaviruses hierarchical grouping to optimize an objective function coronavirus pathogenesis accessory proteins b and ab of severe acute respiratory syndrome coronavirus suppress the interferon signaling pathway by mediating ubiquitin-dependent rapid degradation of interferon regulatory factor discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus template-based and free modeling of i-tasser and quark pipelines using predicted contact maps in casp a pneumonia outbreak associated with a new coronavirus of probable vat origin we want to thank dr. motonori ota, dr. satoshi fukuchi, dr. kota kasahara, and dr. takeshi kikuchi for their support and helpful comments. key: cord- -g sx qu authors: saleemi, mansab ali; ahmad, bilal; benchoula, khaled; vohra, muhammad sufyan; mea, hing jian; chong, pei pei; palanisamy, navindra kumari; wong, eng hwa title: emergence and molecular mechanisms of sars-cov- and hiv to target host cells and potential therapeutics date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: g sx qu the emergence of a new coronavirus, in around late december which had first been reported in wuhan, china has now developed into a massive threat to global public health. the world health organization (who) has named the disease caused by the virus as covid- and the virus which is the culprit was renamed from the initial novel respiratory coronavirus to sars-cov- . the person-to-person transmission of this virus is ongoing despite drastic public health mitigation measures such as social distancing and movement restrictions implemented in most countries. understanding the source of such an infectious pathogen is crucial to develop a means of avoiding transmission and further to develop therapeutic drugs and vaccines. to identify the etiological source of a novel human pathogen is a dynamic process that needs comprehensive and extensive scientific validations, such as observed in the middle east respiratory syndrome (mers), severe acute respiratory syndrome (sars), and human immunodeficiency virus (hiv) cases. in this context, this review is devoted to understanding the taxonomic characteristics of sars-cov- and hiv. herein, we discuss the emergence and molecular mechanisms of both viral infections. nevertheless, no vaccine or therapeutic drug is yet to be approved for the treatment of sars-cov- , although it is highly likely that new effective medications that target the virus specifically will take years to establish. therefore, this review reflects the latest repurpose of existing antiviral therapeutic drug choices available to combat sars-cov- . coronaviruses (covs) are a group of enveloped and positive single-stranded rna genome viruses that infect both animals and humans (smith and denison, ; xu et al., ) . they contain the largest known rna genomes ranging from - kilobases in length (smith and denison, ) . coronavirus-related infections are known to affect the liver, central nervous system (cns), respiratory and gastrointestinal tract of various birds and mammals, including humans (xu et al., ) . they are classified into four genera: alphacoronavirus, betacoronavirus, gammacoronavirus, and deltacoronavirus (fung et al., ) . the novel coronavirus ( -ncov) also belongs to the same group of covs. it was named ‗severe acute respiratory syndrome coronavirus ' (sars-cov- ) by the international committee on taxonomy of viruses (ictv) (fung et al., ) . the transmission pattern of sars-cov- primarily involves direct close contact with the infected person via nose and mouth secretions, however it can also be transmitted indirectly through contaminated objects or surfaces (world health organization). it was discovered in late december in wuhan city of china in patients with acute respiratory distress syndrome (ards) symptoms, such as fever, cough, and dyspnea (li et al., ; after isolated from the respiratory epithelium (xu et al., ) . the genome sequence analysis of sars-cov- showed that it belongs to the betacoronavirus genus (β-cov) . the sars-cov- -related disease was recently named as a novel respiratory coronavirus disease by the world health organization (who) (xu et al., ) . up to date, it is unclear how sars-cov- interacts with the host antiviral immunity, hence lessons can be learned from previous studies of other members of the coronavirus family and also human pathogenic viruses, such as human immunodeficiency viruses and severe acute respiratory syndrome (sars) cov known as human covs (hcovs) due to their ability to cause human infections (andersen et al., ) . aside from nl and e; which belong to alphacoronavirus and the remaining five viruses come under the betacoronavirus genus. however, sars-cov, mers-cov, and sars-cov- are known to cause severe disease leading to death, whereas e, oc , nl , and hku have only been associated with mild symptoms in infected persons although they are recognized as a potentially zoonotic origins (andersen et al., ; corman et al., ) . all members of the coronaviridae contain enveloped, positive-stranded spherical rna virions with a length of - nm and are usually decorated with large visible petal-shaped surface projections known as ‗spikes' when viewed under an electron microscope (singh et al., ) . in addition, members of the coronaviridae share common characteristics, which are outlined in (table ) . within the family, there is also another subfamily such as torovirinae comprising two other genera torovirus and bafinivirus. they are different from coronaviridae mainly due to their different tubular virion shape. the morphology of virion, such as coronavirinae has been unraveled, as illustrated in (fig. a) . moreover, the classification of various genera within the subfamily is a little more complicated with high cov recombination frequency, thus necessitating the development of precise criteria proposed over the years (corman et al., ) . currently, the primary means of establishing their phylogenetic relationship is through the genomic structure, as shown in (fig. ) . it is based on these differences that a particular host may be infected by specific genus of the virus, for instance, alpha-and beta-coronavirus infects only mammals and gamma-and delta-coronavirus infects primarily birds and some studies suggest the possibility of the latter genera infecting mammals as well (cui et al., ; king et al., ; woo et al., ) . the specific distinctions between four genera are closely associated with ( ) the unique type of nsp known as a non-structural rna-binding protein within their genome which may differ in size and sequence and ( ) the presence or absence of a commonly shared accessory gene (sheikh et al., ) . several examples of viruses within the genera with their unique accessory proteins and genome arrangements can be found in (fig. ) . the complete sars-cov- genome sequencing data had begun to emerge as the real gravity of the situation started to loom over the medical personnels who first encountered this novel coronavirus back in december . their findings have since been published and provided us the information necessary to classify sars-cov- into its respective group under the serbecovirus (β-b group) subgenus chan et al., ) . the sequence is publicly available on genbank with the accession number mn possessing nucleotides with close relations to bat sars-like j o u r n a l p r e -p r o o f journal pre-proof coronavirus (cov) isolate bat sl-covzc (genbank accession number mg ) with . % nucleotide identity similarities . in contrast, the origins of hiv have been a subject of debate since its discovery back in s with many theories floating about and evidences not sufficiently conclusive. what is known is that infected individuals were at risk of developing acquired immune deficiency syndrome (aids), which at that time resulted in opportunistic pathogens infecting the individual and due to a pronounced weakened immune system, the individual would succumb eventually to the infection. current data suggest that over the past three decades, hiv has caused more than million deaths worldwide with million living with the virus (pharr et al., ) . hiv is classified with other known immunodeficiency viruses, which belong to phylum artveviricota, order ortervirales, and family (vemuri et al., ) . besides, three other retroviruses, such as human t-cell lymphotropic virus type , , (htlv , , ) exist within the same subfamily but genus deltaretrovirus also affects the immune system in humans (calattini et al., ) . as mentioned earlier, hiv has been categorized into two types; hiv- and hiv- , the former being known as a more pathogenic than the latter, making up the majority of aids cases around the globe, whereas hiv- mainly infects individuals in west africa (cloyd et al., ) . in general, the most plausible route of evolution can be established by genomic sequence of hiv, while its origin is yet to be understood. in , the genomic analysis found that hiv- virus was morphologically similar but genetically different from type and caused more serious aids disease in western africa (jaffar et al., ) . another genomic study showed that hiv- is firmly associated with the simian immunodeficiency virus (siv) isolated from macaque monkeys (nakayama and shioda, ) . since then, the simian close relatives have been further expanded with more and more species of primates in sub-saharan africa found to house similar viruses collectively named siv with special suffixes denoting their species of origin. in general, siv-derived host species have been found to be nonpathogenic, but can become pathogenic once they cross the species (meyerson et al., ) ; resulting in the discovery of current pandemic hiv strain (group m or main) that is similar to siv cpz in chimpanzees (pan troglodytes) after closer investigation. in addition, the natural host of viruses was identified in cameroon (hahn et al., ; keele et al., ) . journal pre-proof the morphological structure of hiv is usually spherical, enveloped with - nm in diameter, and nm surface glycoprotein projections. the internal structure of the virus nucleocapsid, such as a rod or truncated cone-shaped has been shown in (fig. b) . hiv viral genome follows the common pattern within the subfamily orthoretrovirinae, which carries two copies with the arrangement order: '-gag-pro-pol-env- ', each at approximately . kb in size. a schematic illustration of the hiv genome is shown in ( into viral envelope precursors with other accessory proteins. in general, the unique sequences of viruses give each genus of a specific viral family its distinguishing characteristics allow its classification into the respective genus, which is lentivirus in case of hiv. in addition to the primary structural genes reported, members of the lentivirus genus also have additional genes, for instance, accessory protein genes vif, vpr, vpu, nef, and regulatory protein genes tat and rev. several additional activities have been ascribed to the resulting proteins transcribed by these genes (secondary activities) which suggest the multifunctional role they play but generally their primary activities are highly conserved (faust et al., ) . a summary of these accessory proteins and their role has been shown in (table ) . atypical pneumonia cases emerged from the zoonotic transmission and showed > % homology with bat coronaviruses and > % similarity with the previous sars-cov mcintosh, ) . thus, chinese health authorities identified the outbreak as a novel coronavirus (covid- ) . although environmental samples taken from the market tested positive, researchers considered the possibility of human-to-human transmission in other geographical areas after people returned from wuhan celebrating the chinese new year (cny) with their families (wang et al., ; chen et al., ) . in this context, one study showed a significant correlation between domestic train transportation and the number of cases imported to other provinces in china. besides, non-stop passenger flights were also linked to the number of cases reported abroad, subsequently the cause of this pandemic (rodríguez-morales et al., ) . evidence suggests that virus is transmitted by respiratory droplets, direct contact, and airborne transmission. while analysis of confirmed and suspected cases of covid- suggested that novel virus could go beyond the respiratory tract (peng et al., ) . more interestingly, the study also found that virus can enter the body through eye exposure (li et al., ) . notably, several asymptomatic cases have emerged across the globe, arguably the cause of such widespread transmission (bai et al., ) . earlier this year, italy declared a state of emergency after registering two cases from wuhan tourists with suspected symptoms of coronavirus after they arrived in rome (giovanetti et al., ) . similarly, the usa confirmed the first positive case of sars-cov- identified in a woman in her 's, who had returned from china with good health in mid-january. later, her husband was also tested positive and admitted to the hospital. although he did not travel but was considered to be in close contact with his wife (ghinai et al., ) . similarly, a - year-old female entered the uk from hubei province without previous medical history and no regular medications. after arrival, she developed symptoms of malaise and fever, accompanied by a sore throat and dry cough within three days (lillie et al., ) . as some cases show delayed symptoms, the possibility of human-to-human transmission could occur during the asymptomatic incubation period (li et al., ; backer et al., ) . until an appropriate diagnostic cure is found, researchers have alerted countries to maintain a distance of one meter from infected persons to reduce the risk of transmission (mehraeen et al., ) . subsequently, many countries have now introduced movement control orders and lockdowns as preventative measures to curb the spread of the virus (hamzelou, ) , urging their citizens to maintain social distance to minimize the spread of the virus. in recent history, another deadly virus called the human immunodeficiency virus (hiv) had emerged, which is the retrovirus responsible for the acquired immunodeficiency syndrome (aids). as a common disease in asia, the concurrence of these diseases could present an alarming problem globally. recently, zhu and colleagues noted that -hiv infected patients need to be regarded as a vulnerable group,‖ after co-infection of covid- and hiv was reported in a patient in wuhan, china . previously, hiv- emerged in young homosexual men in (brodie et al., ) . five years later, hiv- virus was discovered with similar morphology to hiv- , but antigenically distinct, which increased aids cases reported in patients from western africa in (clavel et al., ) . nonetheless, hiv- viral strain was distantly related to hiv- and both strains were acquired by humans from non-human primates infected with simian immunodeficiency virus (siv) (heeney et al., ) . scientists believed that siv was transmitted to humans after contact with the blood of infected chimpanzees during bushmeat handling. notably, the virus mutated into hiv from its origins and gradually spread from africa to other parts of the world (sharp and hahn, ; ayouba et al., ; sousa et al., ) . thus, confirming the occurrence of both viruses (hiv- and hiv- ) as a result of zoonotic transmission from infected primates. in general, human-to-human spread is known to be transmitted from mother to child, and during unprotected sexual intercourse between couples (kassa, ) . when hiv infected body fluids, including blood, breast milk, semen, vaginal, and rectal secretion come into contact with mucous membrane in the mouth, rectum, penis, and vagina, it destroys the tissue or injects directly into the bloodstream by a needle or syringe (hladik and mcelrath, ; shaw and hunter, ) . in summary, it is still unknown whether any identified interrelationship presents between hivs and sars-cov- . despite the remarkable number of patients infected with hiv and novel covid- separately, which extends to cover a wide zone in asia and worldwide. there are very few reports on the sars-cov- infection among patients infected with hiv (joob and wiwanitkit, ) . it has been observed that hiv targets the specific cd + immune cells, including t cells, macrophages, and dendritic cells, resulting in a significant reduction in the number of these immune cells in hiv-infected patients (lane, ) . these cd + t cells (known as helper cells) are the backbone of the immune system where they activate b cells, macrophage, and cytotoxic t cells to secrete antibodies, destroy ingested microbes and kill the infected cells, respectively (laidlaw, ) . journal pre-proof the hiv membrane contains a transmembrane glycoprotein called glycoprotein- (gp ) and surface glycoprotein, namely glycoprotein- (gp ), which binds to cd receptors on the surface of cd + cells (pancera et al., ) . there are two other chemokine co-receptors known as c-c chemokine receptor (ccr ) and c-c chemokine receptor (cxcr ), which facilitate hiv binding and infusion into cd + cells (didigu et al., ) . cxcr co-receptor is expressed in many peripheral t-cells lymphoma found in the lymphatic sides, including lymph nodes, bone marrow, and spleen (machado et al., ) . however, ccr is highly expressed on the surface of t cells, macrophages, dendritic cells, eosinophils, and microglia (moore et al., ) . the binding of gp to cd receptor leads to a change in the envelope structure of virus, which allows the co-receptor, either ccr or cxcr (cxcr for t-tropic hiv strains, and ccr for m-tropic hiv strains) bind to a specific domain in the gp . after binding, the n-terminal fusion peptide (gp ) penetrates in the host cell membrane, followed by entry of viral capsid into the cytosol of t cell. after penetration, virus removes and exposes its capsid, which contains the rna genome and its associated enzymes (reverse transcriptase and integrase) into the host cell (naif, ) . the virus produces complementary dna (cdna) by transcribing its single-standard rna based on reverse transcriptase activity for producing duplicate strands of viral dna. however, complete viral dna migrates to the nucleus and integrates with dna inside the host cell and unfortunately affects the cellular activity. the integrated dna is used to generate mrna, thus synthesizing the viral proteins and allowing the new virus copy to develop. it has been observed that virus sabotages the cd + immune t cells for replication, which leads to increase virus load in the blood and a significant decrease in cd + t cells (goodsell, ) . interestingly, some researchers found another mechanism of hiv destruction of cd + t cells (bolton et al., ; yue et al., ) . they observed that depletion of cd + t cells by apoptosis pathway because of direct hiv infection was only - % of the total cd + t cell pool. it has been found that hiv enzyme (integrase) plays a crucial role in the activation of dna-pk sensor of host cell, which activates the apoptotic cascade inside the cell. however, another hiv enzyme (protease) activates the caspase- that further triggers the apoptosis of infected cells. besides, specific hiv proteins display on the surface of infected cd + t cells, which alarm cd + t cells to destroy the infected cd + t cells. the immune system reacts by producing antibodies, such as anti-hiv antibodies, which stick to the infected cd + t cells and mark their destruction by other immune cells (selliah and finkel, ) . however, it has been observed that indirect effect of hiv infection results in majority of cd + t cells death, which causes immune deficiency. the direct effect of hiv to cd + t cells triggers the expression of interferon-gamma induced protein- (ifi- ), followed by the activation of inflammatory cascade inside the infected cell, which ends up with cellular self-j o u r n a l p r e -p r o o f destruction called pyroptosis (doitsh et al., ; monroe et al., ) . the caspase- activated by high expression of ifi- triggers the production of cytokines, especially interleukin- beta inside the infected cells (lupfer et al., ) . the high production of il -β leads to the creation of inflammatory environment inside the cell, resulting in the initiation of apoptosis (feria et al., ) . the infected cells lead to release of interleukin- beta (il- β) to the outside cellular environment, which contributes to the activation of pyroptosis in the non-affected cells, causing a considerable loss of cd + t cells (doitsh et al., ) . unlike hiv, the exact mechanism how covid- induces immune defect is still unknown; however, it could be similar to the previous coronaviruses, including the severe acute respiratory syndrome coronavirus (sars-cov) and the middle east respiratory syndrome coronavirus (mers-cov). these viruses induce low level of lymphocytes abnormality in blood, or it is known as lymphopenia related to hiv infection. the binding of coronavirus to the host cell required a specific class i viral glycol-transmembrane-protein, such as s protein on the surface of the virus envelope. it needs a specific receptor on the host cells called angiotensin-converting enzyme (ace ) receptor (andersen et al., ; dhama et al., ) . ace is an enzyme spread in many cell membranes of different organs, including lungs, arteries, heart, kidney, and intestines; it plays a significant role in blood pressure regulation. the dramatic reduction in the peripheral cd + and cd + t cells is one of the symptoms in covid- cases related to hiv infection diao et al., ) . besides, over circulation of proinflammatory cytokines and chemokines, including interleukins (il- β, il ra, il , il , il , il , il , il , il , il , il p , il , il , il a) , eotaxin, basic fibroblast growth factor (fgf ), granulocyte-colony stimulating factor (gcsf), granulocytemacrophage colony-stimulating factor (gmcsf), interferon-gamma (ifnγ), ifn-γ-inducible protein-c-x-c motif chemokine ligand (cxcl ), interleukins (il- , il- , il- ) with interferongamma (ifn-γ), and transforming growth factor-beta (tgf-β) occurred in the acute stage of sars-cov infection triggering the cytokine storm (cameron et al., ; huang et al., ) . chloroquine is a primary therapeutic drug used to prevent and treat patients infected with malaria. however, the more extended use of chloroquine could produce severe side effects, such as cardiomyopathy (bernstein, ; ratliff et al., ) and the presence of some reports on macular retinopathy as a minor risk caused by this drug (cubero et al., ) . a survey of the patients infected with sars-cov- for adverse side effects of chloroquine treatment is yet to be conducted. chloroquine was considered the best therapeutic drug choice available for treating sars-cov- infected patients in hospitals. recently, the who has advocated a solidarity clinical trial for covid- treatments. the solidarity clinical trial will compare four treatment choices, including chloroquine/hydroxychloroquine, remdesivir, lopinavir/ritonavir, and lopinavir/ritonavir plus interferon beta- a against standard therapy and determine their relative efficacy against covid- (world health organization, meuri). (cockrell et al., ) . they found that the enzyme carboxylesterase c (ces c) was deleted and receptor (dipeptidyl peptidase , hdpp ) of humanized mers-cov was expressed in order to increase the pharmacokinetics of nucleotide prodrugs. another study conducted by agostini et al. ( ) reported that two mutations were recognized, such as v l and f l in the rnadependent rna polymerase gene of the virus after passages of remdesivir drug (agostini et al., ) . they also observed that these changes in amino acid reduced the burden of virus and lessened the pathogenesis of sars-cov in mice. in recent months, the prophylactic efficacy of remdesivir was also tested in a rhesus macaque (non-human primate) mers-cov infection model (de wit et al., ) . when the treatment with prophylactic remdesivir was started hours before inoculation, mers-cov was found to be inhibited from replicating in the respiratory tissues and prevented from causing clinical disease, which inhibited the development of lung lesions. the same findings were obtained when treatment with remdesivir drug was initiated hours after the virus inoculation (de wit et al., ) . the remdesivir therapeutic drug safety data are readily available for humans (mulangu et al., ) ; hence, human studies can be conducted to assess the effectiveness of this compound towards novel coronaviruses. the food and drug administration (fda) approved the therapeutics against mers-and sars-cov that have been assessed for antiviral activity. for instance, lopinavir (lpv), a protease inhibitor of hiv- , was tested to combine with ritonavir (rtv) in order to enhance the half-life of lpv. the combination of these two drugs (lpv/rtv) was found to be more effective in patients against sars-cov and in-vitro study. the calculated ec was µg/ml for the fetal rhesus kidney- cells (chu et al., ) . meanwhile, chan and coworkers reported that lpv/rtv also declined clinical sores, weight loss, and progression of disease in marmosets infected with mers-cov (chan et al., ) . however, the antiviral activity of lpv towards mers-cov in the in-vitro study remains controversial. another study conducted by chan et al., ( ) found no optimal ec in the vero cells, whereas ec was reported to be µm in the huh cells in a different study (de wilde et al., ) . it was found that infections with mers-cov were mediated by replication of the virus and host inflammatory responses after clinical observations in both humans and animals. those studies had suggested the use of combination therapies including interferons i and ii. however, interferon-beta (ifnb) reduced the replication of mers-cov in the tissue culture and showed the best efficacy with . - iu/ml ec s (chan et al., ; hart et al., ) . from the study on marmosets infected j o u r n a l p r e -p r o o f with mers-cov, chan et al., ( ) reported the clinical improvement with the use of lpv/rtv with ifnb. also, the clinical randomized control trials were started in saudi arabia in order to determine the efficacy of combinations of therapies, such as ifnb and lpv/rtv in improving the clinical outcomes against mers-cov (arabi et al., ) . notably, china has launched another controlled trial to test the effectiveness of ifnα- b and lpv/rtv in hospitalized sars-cov- patients (chictr ). in addition, the therapeutic and prophylactic activities of lpv/rtv-ifnb and remdesivir were compared in a transgenic mouse model infected with mers-cov (sheahan et al., ) . it was found that remdesivir reduced lung viral titers, virus replication, and improved the pulmonary function. at the same time, prophylactic combined therapeutics, such as lpv/rtv-ifnb, only resulted in a moderate decline in viral titers and did not produce any effects on other parameters of disease. the combined therapy boosted pulmonary function but did not influence viral replication (sheahan et al., ) . thus, it was observed that remdesivir has shown to be effective for treating another antiviral prodrug, ribavirin is a guanosine analogue used to treat many infections caused by viruses, such as hepatitis c respiratory syncytial virus, and patients coinfected with hiv and hepatitis b. in , it was launched to the marketplace for treating the respiratory syncytial virus, especially in children. in most cases, it is used in combination with interferon (ifn). falzarano and co-authors found promising results when ribavirin was combined with ifnα- b against the rhesus macaque model infected with mers-cov (falzarano et al., ) , but there have been some contradictory data on the mers-cov infected patients treated with ifnα a or ifnβ and ribavirin (arabi et al., ) . besides, ribavirin also produces adverse side effect such as decreasing haemoglobin concentrations in patient with respiratory disorder that could not be appropriate for use against sar-cov- . in addition, various other antiviral therapeutic drugs/cocktails, including favipiravir, nitazoxanide, ganciclovir, acyclovir/penciclovir, and the latest fda-approved ivermectin are listed in table . in summary, the repurposing of preexisting antiviral therapeutic drugs is finite, so it is almost inevitable that both new and repurposed drugs will be required to treat covid- . therapeutic drug choices in response to covid- are urgently required and fortunately, some of the preexisting antiviral therapeutics are already progressing into human clinical trials. it is important for pwh to regularly attend their healthcare providers and adhere to therapy (national institute of allergy and infectious disease, ). the treatment of pwh can be affected due to stay at home orders enforced by countries worldwide. for instance, several medical practitioners canceled their appointments with hiv suspected people in the usa and switched to telehealth appointments to follow the who guidelines (ives, ) . however, telehealth programs are restricted in the wide range of resources that can be given to the customers (siwicki, ) , therefore, pwh may not be able to completely access the services needed for hiv therapy. in addition, many pwh patients may not be able to get access to telehealth services for many reasons, such as limited access to technology and lack of knowledge about telehealth that could hinder their therapy progression. pwh are more vulnerable group to contract opportunistic infections, such as tuberculosis, toxoplasmosis, pneumonia, etc. (department of veteran affairs, ), than those without immunocompromised systems. hiv-infected people who have suffered from any other disorders may undergo delayed treatment because of covid- . this can happen in already taxed healthcare system due to hospital overcrowding. moreover, pwh who seek out emergency treatment can face a high risk of experiencing covid- among other disorders in the healthcare systems (collins, ) . the occurrence of person-to-person transmission for covid- is increasing, which means that . they propositioned that persons infected with hiv need to be considered as a vulnerable group for covid- . nonetheless, no robust evidence of any interrelationship between these two viral infections is yet to be identified. despite the high number of patients infected with hiv as well as the remarkable number of patients with the novel covid- disease, which extends to cover a wide zone in asia and worldwide. there are very few reports on the co-infection of hiv and sars-cov. in addition, antiviral therapeutic drugs are broadly used for treating hiv-infected patients, and these drugs have potential to be used against sars-cov- (felber et al., ; savarino et al., ) . thus, the patients infected with hiv who are receiving anti-hiv therapeutic drugs might be at a lower risk for covid- compared to the general population. some antiviral therapeutic drugs, including chloroquine, remdesivir, ribavirin, lopinavir/ritonavir, and now fda-approved ivermectin and others (summarized table ) have the potential to treat patients infected with sars-cov- . at the same time, more clinical trials are required in order to obtain robust data. also, further studies are urgently required to explore the pathogenicity, mechanisms, and transmission of novel coronavirus. to obtain a better insight of the novel virus, countries should strive to provide more reliable and accurate data by transparency and sharing of data, and more research studies need to be conducted on the reported cases. therefore, countries should continue to work towards developing preventive measures to minimize both the transmission and number of infected patients. in addition to unravelling the uncertainty of the mechanisms of viral replication and host cell entry, which will provide the fundamental knowledge for future research into the development of targeted vaccines and antiviral therapeutic drugs. with continuing efforts to curb the widespread transmission of covid- globally, we hope that the novel coronavirus pandemic may alleviate after a few months. in summary, there is an urgent need to develop a new broad-spectrum antiviral therapeutic agent that will be effective to fight against not only the novel respiratory coronavirus, but also to prepare for a possibly similar virus outbreak in the future. for -if there is any message coming out from the latest outbreaks, it is almost certain that it will happen again.‖ mansab (harrison et al., ) j o u r n a l p r e -p r o o f journal pre-proof coronavirus susceptibility to the antiviral remdesivir (gs- ) is mediated by the viral polymerase and the proofreading exoribonuclease treatment of middle east respiratory syndrome with a combination of lopinavir/ritonavir and interferon-β b (miracle trial): statistical analysis plan for a recursive two-stage group sequential randomized controlled trial ribavirin and interferon therapy for critically ill patients with middle east respiratory syndrome: a multicenter observational study evidence for continuing cross-species transmission of sivsmm to humans: characterization of a new hiv- lineage in rural côte d'ivoire incubation period of novel coronavirus ( -ncov) infections among travellers from wuhan, china presumed asymptomatic carrier transmission of covid- ocular safety of hydroxychloroquine death of cd + t-cell lines caused by human immunodeficiency virus type does not depend on caspases or apoptosis aids at : media coverage of the hiv epidemic - . the nation broad spectrum antiviral remdesivir inhibits human endemic and zoonotic deltacoronaviruses with a highly divergent rna dependent rna polymerase human immunopathogenesis of severe acute respiratory syndrome (sars) discovery of a new human t-cell lymphotropic virus (htlv- ) in central africa the fda-approved drug ivermectin inhibits the replication of sars-cov- in vitro will there be a cure for ebola? epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan implications of the hiv- rev dimer structure at . Å resolution for multimeric binding to the rev response element reduction and functional exhaustion of t cells in patients with coronavirus disease (covid- ) cell death by pyroptosis drives cd t-cell depletion in hiv- infection treatment with interferon-α b and ribavirin improves outcome in mers-covinfected rhesus macaques making sense of multifunctional proteins: human immunodeficiency virus type accessory and regulatory proteins and connections to transcription the role of tat in the human immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation feedback regulation of human immunodeficiency virus type expression by the rev protein hiv replication is associated to inflammasomes activation, il- β, il- and caspase- expression in galt and peripheral blood targeting the nlrp inflammasome in severe covid- a tug-of-war between severe acute respiratory syndrome coronavirus and host antiviral defence: lessons from other pathogenic viruses innate sensing of hiv- assembly by tetherin induces nfκb-dependent proinflammatory responses breakthrough: chloroquine phosphate has shown apparent efficacy in treatment of covid- associated pneumonia in clinical studies structure-function relationships in hiv- nef first known person-to-person transmission of severe acute respiratory syndrome coronavirus (sars-cov- ) in the usa the first two cases of -ncov in italy: where they come from quinoline-based antimalarial drugs: a novel class of autophagy inhibitors illustrations of the hiv life cycle severe acute respiratory syndrome-related coronavirus-the species and its viruses, a statement of the coronavirus study group temporal proteomic analysis of hiv infection reveals remodelling of the host phosphoproteome by lentiviral vif variants the coronavirus nucleocapsid protein is adp-ribosylated aids as a zoonosis: scientific and public health implications kaposi's sarcoma and pneumocystis pneumonia among homosexual men--new york city and california interferon-β and mycophenolic acid are potent inhibitors of middle east respiratory syndrome coronavirus in cell-based assays coronavirus puts drug repurposing on the fast track origins of hiv and the evolution of resistance to aids setting the stage: host invasion by hiv strategies of development of antiviral agents directed against influenza virus replication clinical features of patients infected with novel coronavirus in wuhan an interferon-γ-related cytokine storm in sars patients telemedicine surges, fueled by coronavirus fears and shift in payment rules hiv- tat reprograms immature dendritic cells to express chemoattractants for activated t cells and macrophages the natural history of hiv- and hiv- infections in adults in africa: a literature review oseltamivir for influenza in adults and children: systematic review of clinical study reports and summary of regulatory comments complement receptor c ar inhibition reduces pyroptosis in hdpp -transgenic mice infected with mers-cov sars-cov- and hiv mother-to-child transmission of hiv infection and its associated factors in ethiopia: a systematic review and meta-analysis chimpanzee reservoirs of pandemic and nonpandemic hiv- cbfβ stabilizes hiv vif to counteract apobec at the expense of runx target gene expression virus taxonomy. ninth rep int comm taxon viruses pathogenesis of hiv infection: total cd + t-cell pool, immune activation, and inflammation the multifaceted role of cd + t cells in cd + t cell memory lentiviral vif degrades the apobec z /apobec h protein of its mammalian host and is capable of cross-species activity activation of the atr pathway by human immunodeficiency virus type vpr involves its direct binding to chromatin in vivo novel coronavirus disease (covid- ): the first two patients in the uk with person to person transmission early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia gs- ) protects african green monkeys from nipah virus challenge genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding outbreak of pneumonia of unknown etiology in wuhan china: the mystery and the miracle inflammasome control of viral infection are pangolins the intermediate host of the novel coronavirus (sars-cov- )? decreased t cell populations contribute to the increased severity of covid- expression and function of t cell homing molecules in hodgkin's lymphoma influenza treatment with oseltamivir outside of labeled recommendations coronavirus genomic rna packaging coronavirus disease self-care instructions for people not requiring hospitalization for coronavirus disease (covid- ) species-specific vulnerability of ranbp shaped the evolution of siv as it transmitted in african apes a randomized, controlled trial of ebola virus disease therapeutics pathogenesis of hiv infection impact of trim α in vivo seeing your healthcare provider. hiv gov a structural analysis of m protein in coronavirus assembly and morphology a synthetic serine protease inhibitor, nafamostat mesilate, is a drug potentially applicable to the treatment of ebola virus disease enfuvirtide: a review of its use in the management of hiv infection structure and immune recognition of trimeric pre-fusion hiv- env transmission routes of -ncov and controls in dental practice the latest evidence for possible hiv- curative strategies a cross-sectional study of the role of hiv/aids knowledge in risky sexual behaviors of adolescents in nigeria diagnosis of chloroquine cardiomyopathy by endomyocardial biopsy going global-travel and the novel coronavirus nitazoxanide: a first-in-class broad-spectrum antiviral agent nitazoxanide, a new drug candidate for the treatment of middle east respiratory syndrome coronavirus differential regulation of nf-κb-mediated proviral and antiviral host gene expression by primate lentiviral nef and vpu proteins effects of chloroquine on viral infections: an old drug against today's diseases coronavirus envelope protein: current knowledge the extended impact of human immunodeficiency virus/aids research biochemical mechanisms of hiv induced t cell apoptosis cold spring harb perspect med origins of hiv and the aids pandemic antiviral drugs against alphaherpesvirus broad-spectrum antiviral gs- inhibits both epidemic and zoonotic coronaviruses comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov a review of coronavirus disease- (covid- ) telemedicine during covid- : benefits, limitations, burdens, adaptation. health care news coronaviruses as dna wannabes: a new model for the regulation of rna virus replication fidelity synthesis and characterization of a native, oligomeric form of recombinant severe acute respiratory syndrome coronavirus spike glycoprotein the epidemic emergence of hiv: what novel enabling factors were involved? cytokine storm in covid- : the current evidence and treatment strategies mechanism of inhibition of ebola virus rna-dependent rna polymerase by remdesivir evolution, distribution, and diversity of immunodeficiency viruses. dyn immune act viral dis a novel coronavirus outbreak of global health concern monitored emergency use of unregistered and experimental interventions (meuri discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavi systematic comparison of two animal-to-human transmitted human coronaviruses: sars-cov- and sars-cov preferential apoptosis of hiv- -specific cd + t cells hiv- vpr induces interferon-stimulated genes in human monocyte-derived macrophages angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target reply to comments on'co-infection of sars-cov- and hiv in a patient in wuhan city a novel coronavirus from patients with pneumonia in china the coronavirus replicase human immunodeficiency virus type vpr induces dna replication stress in vitro and in vivo emerging genetic diversity among clinical isolates of sars-cov- : lessons for today the proximal origin of sars-cov- mapping the genomic landscape & diversity of covid- based on > clinical isolates of sars-cov- : likely origin & transmission dynamics of isolates sequenced in india the authors declare no conflicts of interest coronaviruses cnscentral nervous system sars-cov- mers-cov severe acute respiratory syndrome coronavirus middle east respiratory syndrome coronavirus ards acute respiratory distress syndrome who world health organization hivshuman immunodeficiency viruses siv immunodeficiency virus gp glycoprotein- ccr c-c chemokine receptor- cxcr c-c chemokine receptor- (ifi)- interferon-gamma induced protein- il- βinterleukin- β ace angiotensin-converting enzyme- fgf basic fibroblast growth factor gcsf granulocyte-colony stimulating factor ifnγ interferon-gamma ip ifn-γ-inducible protein- mcp monocyte chemoattractant protein mip amacrophage inflammatory proteins tnf-α tumor necrosis factor-alpha j o u r n a l p r e -p r o o f generally, '-utr-replicase-s-m-n-utr- ' (fung et al., ; replicase gene overlapped with orfs a and b that codes two huge polyproteins, pp a and pp ab that are processed autoproteolytically into non-structural proteins involved in genome transcription and replication. (sheikh et al., ; singh et al., ; chan et al., ; clavel et al., ; zahoor et al., ) morphogenesis assembly of virion takes place at the smooth intracellular membranes of the endoplasmic reticulum/early golgi compartments. (masters, ) j o u r n a l p r e -p r o o f (geyer et al., ; sauter et al., ) tat  an hiv transcription factor that supports the generation of full-length viral mrnas. induces transcription and secretion of chemokines which attracts t cells and monocytes increasing infectivity.  (feinberg et al., ; izmailova et al., ) rev  directs exports of viral messages that are translated into viral proteins, providing genomic rna for packaging and budding virion j o u r n a l p r e -p r o o f key: cord- - tfbhmzo authors: góes, luiz gustavo bentim; campos, angélica cristine de almeida; carvalho, cristiano de; ambar, guilherme; queiroz, luzia helena; cruz-neto, ariovaldo pereira; munir, muhammad; durigon, edison luiz title: genetic diversity of bats coronaviruses in the atlantic forest hotspot biome, brazil date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: tfbhmzo bats are notorious reservoirs of genetically-diverse and high-profile pathogens, and are playing crucial roles in the emergence and re-emergence of viruses, both in human and in animals. in this report, we identified and characterized previously unknown and diverse genetic clusters of bat coronaviruses in the atlantic forest biome, brazil. these results highlight the virus richness of bats and their possible roles in the public health. brazil harbours % of the world's bat diversity and carries distinct bats species (nogueira et al., ) . out of these, a total of species exist only in the atlantic forest biome (afb), which is the second largest rain forest of the south america, and is one of the unique regions with highest biodiversity in the world (paglia et al., ) . bats are historically unique and widespread mammals playing essential roles in the emergence and re-emergence of viruses of both veterinary and public health importance. although viruses of diverse genetic backgrounds can co-exist asymptomatically in bats, majority of these viruses are single stranded rna viruses (calisher et al., ) . coronaviruses (covs) are enveloped, positive-sense, single-stranded rna viruses in the family coronaviridae, and are usually associated with respiratory, enteric, hepatic and neurological pathologies of varying severity (woo et al., ) . coronaviruses are classified into four genera; alphacoronavirus (α-cov) and betacoronavirus (β-cov) have been exclusively identified in mammals, whereas gammacoronavirus and deltacoronavirus are mainly detected in avian species (woo et al., ; ictv, ) . based on the genetic relatedness, the β-covs can further be subdivided into four clades: a to d (drexler et al., ) . all covs that can potentially infect human were originated from animal reservoirs, and four of such covs are believe to be transmitted to human through bats including α-covs ( e and nl ) and highly pathogenic β-cov (severe acute respiratory syndrome and middle east respiratory syndrome) (bolles et al., ; chan et al., ; corman et al., ; huynh et al., ) . recently, a number of novel bats covs have been identified, primarily from african, asian and european bats (calisher et al., ; chu et al., ; drexler et al., ) , as well as from south american countries including costa rica, panama, ecuador, mexico and brazil (corman et al., ; goes et al., ) . collectively, these studies indicate the co-existence of bats and viruses at the interface of viral evolution and bats ecology. a limited number of studies have been conducted in brazil to map the nature and breath of bats in harbouring viral populations. in previous studies, a total of five distinct cov lineages have been detected in just % of local bats ( species), and most of these are belonging to α-cov (brandao et al., ; corman et al., ; goes et al., ) . these attributes, and the existence of a large number of human beings ( million) in the atlantic forest biome, brazil, clearly highlight the potential of bats in not only carrying zoonotic viruses but also in possible transmission of viruses to human beings. to ascertain the diversity of bats covs circulating in the brazilian bats, a total of intestine tissues from bat species were collected from to encompassing bats with distinct diet habits (table , fig. a , b). bats from urban (n = ) and rural (n = ) areas were received from municipalities of the northwestern state of são paulo by the rabies laboratoryof universidade estadual paulista (unesp), araçatuba-sp. additionally, and samples were collected from iguaçu national park, and from two distinct disturbed landscape sites, respectively, and provided by our collaborators of the zoology department at unesp, rio claro-sp. all activities were infection, genetics and evolution ( ) infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d authorized and approved by the ethic committee of institute of biomedical research from university of são paulo ( - - / ) and bats species were identified based on the morphological characteristics including head-boby, forearm size, and dental arch as described by vizotto and tadei ( ) ; gregorin and taddei ( ) and miranda et al. ( ) . nucleic acids extracted from mg of intestine tissue of individual bat using nuclisens® easymag® were subjected to first strand cdna synthesis with random hexamers and rt-pcr high-capacity cdna archive kit (applied biosystems). the cdna preparations were screened by pancoronavirus nested pcr assay targeting the rna-dependent rna polymerase (rdrp) gene (chu et al., ) . positive samples were sequenced and final sequences were submitted to the genbank (kt -kt ) (supplementary table) . a dataset, consisting of sequence generated in this study and all publically available bats covs, was phylogenetically analysed by neighbour-joining method in mega software using kimura's two-parameter correction and , bootstrap values (tamura et al., ) . coronaviruses were detected in bat intestines samples from eight bat species with distinct diet habit, demonstrating a marked potential of covs distribution among bat species in afb that harbours % of world's bat diversity. all covs-positive bat species were geographical distributed in the neotropics and in anthropogenic area apparently affected by fragmented forests, and bats-abundance urbanized areas. the detection of covs varied by bat species; undisturbed forest remnant ( %), fragmented forests ( . %) and urban areas ( . %) ( table ) . phylodynamics analyses indicated the circulation of α-cov and β-cov in afb bats (fig. c) . this is of primary interest as majority of bat covs surveillance studies conducted in the new world bats detected only α-covs (corman et al., ; drexler et al., ; osborne et al., ) . a higher resolution analysis indicated a distinct distribution and diversity of α-cov and β-cov lineages (fig. d) . α-cov sequences obtained from bats of same genus presented high nucleotide sequence similarity (e.g. artibeus, glossophaga, carollia, molossus, myotis and sturnira) (fig. d and supplementary table), even with sequences detected in other studies from bats of geographically distant regions. this relation can be exemplified by the high similarity of the cov rna partial sequences ( , %) detected in carollia perspicillata species from fenix-pr, brazil (kt ), and fyzabad, trinidad and tobago (eu ) (carrington et al., ) , located at a distance of at least km. similar results were previously reported for a variety of bat covs and are taken as evidence of co-evolution of cov genotypes and specific host genera (drexler et al., ; corman et al., ; anthony et al., ) . moreover, two previously uncharacterized α-cov from myotis nigricans and myotis riparius and one β-cov cluster were identified in studied bats ( fig. c and d) . cov sequences from myotis bats genera presented high nt sequence similarity ( . %) and were most closely related to bat covs (kc ) collected from brazilian molossus molossus and tadarida brasiliensis. a distinct cluster of α-covs were also detected in sturnira bat that grouped with α-covs lineage , a group with an evolutionary history of recombination and cross-species transmission between domestic and livestock animals, such as feline, canine and swine α-covs (lorusso et al., ) . this clade presents nt similarity between . % and . % with transmissible gastroenteritis virus (dq ) and feline coronavirus (af ) depending on the sturnira lineage (supplementary table) . this branching pattern may possibly explain the common-ancestral origin of α-covs lineage species from bats and other animal species. however, extensive evolutionary studies on complete genome sequences of these isolates are required to provide information on the virus origin and divergence. out of two β-covs, the virus from eumops glaucinus clustered within mers-cov containing lineage c. the second β-cov detected in artibeus lituratus shown highest similarity ( . %) with cov reported from costa rica (corman et al., ) and lower similarity ( . %) with mers-cov (supplementary table) . these sequences showed sequence similarities with β-cov detected in pipistrellus bats that poses high homology with human mers-cov (jx ). notably to observe that the eumops bat positive for cov was found on an urban area and was predated by a domesticated cat. it was not possible to rule out any event of virus transmission to human, however, this highlight the potential of domesticated animals in virus transmission and disease dynamics at the virus-animalhuman interface. this is of special importance due to the established role of animals in the transmission of viruses to human being (johnson et al., ) . taken together, these results represent the first detection of lineage c β-cov in south american bats. despite of close relationship between lineage c β-covs in asia, africa and europe, cumulative data indicate that this lineage showed a more diversified host family distribution in americas in mormoopidae, phyllostomidae and molossidae bat species (anthony et al., ; corman et al., ; goes et al., ) . we present a great diversity of cov genotypes and clusters in brazilian bats, highlighting a biogeographic distribution of bats covs in the region. it is indispensable in future to investigate the evolutionary events in genetically diverse bats covs using complete genome sequences, and their possible transmission potentials to human being. although it is not possible to calculate the risk of "spill over" events of brazilian bats covs to humans, our results reinforce the need for expanded and continuing surveillance of covs in bat fauna, including those in the afb regions of brazil. supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . coronaviruses in bats from mexico sars-cov and emergent coronaviruses: viral determinants of interspecies transmission a coronavirus detected in the vampire bat desmodus rotundus bats: important reservoir hosts of emerging viruses detection and phylogenetic analysis of group coronaviruses in south american bats middle east respiratory syndrome coronavirus: another zoonotic betacoronavirus causing sars-like disease coronaviruses in bent-winged bats (miniopterus spp avian coronavirus in wild aquatic birds ecology, evolution and classification of bat coronaviruses in the aftermath of sars novel bat coronaviruses chave artificial para a identificação de molossídeos brasileiros (mammalia, chiroptera) evidence supporting a zoonotic origin of human coronavirus strain nl virus taxonomy spillover and pandemic properties of zoonotic viruses with high host plasticity chave ilustrada para determinação dos morcegos da região sul do brasil checklist of brazilian bats alphacoronaviruses in new world bats: prevalence, persistence, phylogeny, and potential for interaction with humans lista anotada dos mamíferos do brasil/annotated checklist of brazilian mammals evolution. an eocene big bang for bats mega : molecular evolutionary genetics analysis version . chave para determinação de quirópteros brasileiros discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus a: brazil's map and sites for bats capture/collection. bats were collected from sites (a-q) in the atlantic forest biome from two adjacent paraná and são paulo states, brazil. b: based on morphological characteristics, bats were classified into different families (positive families were bold and underlined) this study was supported by fapesp (são paulo research foundation) process number / - and british council grant number . cruz-neto a.p was supported by fapesp grant process number / - . we thank luiz aurélio de campos crispin and mariana cristine pereira de souza for their contributions to this study. key: cord- - jmgcq authors: liu, yong-sheng; zhou, jian-hua; chen, hao-tai; ma, li-na; pejsak, zygmunt; ding, yao-zhong; zhang, jie title: the characteristics of the synonymous codon usage in enterovirus virus and the effects of host on the virus in codon usage pattern date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: jmgcq to give a new perspective on the evolutionary characteristics shaping the genetic diversity of enterovirus (ev ) and the effects of natural selection from its host on the codon usage pattern of the virus, the relative synonymous codon usage (rscu) values, codon usage bias (cub) values, effective number of codons (encs) values and nucleotide contents were calculated to implement a comparative analysis to evaluate the dynamics of the virus evolution. the characteristics of the synonymous codon usage patterns and nucleotide contents of ev and the comparison between enc values for the whole coding sequence of ev and that of coding sequences for viral proteins of ev all indicate that the interaction between mutation pressure from virus and natural selection from host exists in the processes of evolution of ev . the synonymous codon usage pattern of ev is a mixture of coincidence and antagonism to that of host cell. in addition, the genetic diversity of ev strains and the preferential selection of some synonymous codons in ev strains based on the different epidemic areas were observed, suggesting that geographic and social factors may play roles in influencing the evolution of this virus. hand-foot-and-mouth disease (hfmd) is a general illness in children which usually is caused by some human enteroviruses (cherry, ) . there were many reports which indicated some pandemics of hfmd were associated with ev infection in the asia-pacific area (abubakar et al., ; chumakov et al., ; fujimoto et al., ; ho et al., ; lin et al., ; liu et al., ; zheng et al., ) . ev belongs to members of the enterovirus genus of the picornaviridae family and is a positivestrand rna virus with a genome size of about bp. the two non-translated regions ( -ntr and -ntr) flank the single open reading frame (orf) of ev virus genome. the coding sequence encodes one polyprotein that is cleaved by viral proteases to generate proteins, namely vp , vp , vp , vp , a, b, c, a, b, c and d. the structural proteins vp - are exposed on ev surface. the vp gene contained the major antigenic sites and genetic diversity associated with serotypes (oberste et al., a,b) . non-structural proteins were involved in polyprotein processing, rna replication and the shut-down of host cell protein synthesis. in addition, recombinations are well known to result in a genetic diversity and evolution of enteroviruses (chan and abubakar, ; chen et al., ; yoke-fun and abubakar, ) . due to various genetic diversities of ev , the effect of the vaccine is limited to prevent children from ev . this situation has made researchers aware of the importance of analysis of ev genetic diversity (bible et al., ; cardosa et al., ; herrero et al., ; huang et al., ; lewis-rogers et al., ; mcminn, ; sanders et al., ) . it is noticed that nucleotide composition comprising of ev coding sequence with various genetic diversities is selective rather than random, because the natural selection from host is responsible to select various strains shaped by mutation. in previous reports, translation selection and compositional constraints under the mutational pressure are thought to be the major factors accounting for codon usage variation among genomes in microorganisms (gu et al., ; karlin and mrá zek, ; lesnik et al., ; liu et al., ; zhou et al., zhou et al., , zhou et al., , . in some rna viruses, compared with natural selection, mutation pressure plays a more important role in synonymous codon usage pattern (jenkins and holmes, ; levin and whittome, ) . although it is known that compositional constraints and translation selection are the more generally accepted mechanisms accounting for codon usage bias (coleman to give a new perspective on the evolutionary characteristics shaping the genetic diversity of enterovirus (ev ) and the effects of natural selection from its host on the codon usage pattern of the virus, the relative synonymous codon usage (rscu) values, codon usage bias (cub) values, effective number of codons (encs) values and nucleotide contents were calculated to implement a comparative analysis to evaluate the dynamics of the virus evolution. the characteristics of the synonymous codon usage patterns and nucleotide contents of ev and the comparison between enc values for the whole coding sequence of ev and that of coding sequences for viral proteins of ev all indicate that the interaction between mutation pressure from virus and natural selection from host exists in the processes of evolution of ev . the synonymous codon usage pattern of ev is a mixture of coincidence and antagonism to that of host cell. in addition, the genetic diversity of ev strains and the preferential selection of some synonymous codons in ev strains based on the different epidemic areas were observed, suggesting that geographic and social factors may play roles in influencing the evolution of this virus. ß elsevier b.v. all rights reserved. karlin et al., ; zhi et al., ; zhou et al., ) , other selection forces have also been proposed such as fine-tuning translation kinetics selection as well as escape of cellular antiviral responses (aragones et al., (aragones et al., , karlin et al., ; sugiyama et al., ) . thus, the codon usage pattern may be important to disclose the molecular mechanism and evolutionary process of ev avoiding host cell response. to our knowledge, it is the first study that the synonymous codon usage pattern and evolutional dynamics of ev were systemically analyzed and the relationship between codon usage pattern of ev and that of its host was also analyzed. the complete rna sequences of ev were downloaded from the national center for biotechnology information (ncbi) (http://www.ncbi.nlm.nih.gov/genbank/) and detailed information about the viruses were listed in table s . each general nucleotide composition (u%, a%, c% and g%) and each nucleotide composition in the third site of codon (u %, a %, c % and g %) in ev coding sequence were calculated by biosoftware dnastar . for windows. to investigate the characteristics of synonymous codon usage without the confounding influence of amino acid composition among different sequences, the relative synonymous codon usage (rscu) values among different codons in the ev orf was calculated according to the published equation (sharp et al., ). the 'effective number of codons' (encs), the useful estimator of absolute codon usage bias, was a measure quantifying the codon usage bias of the whole coding sequence of ev . the enc value ranges from (when only one synonymous codon is chosen by the corresponding amino acid) to (when all synonymous codons are used equally) (wright, ) . in this study, this measure was used to evaluate the degree of codon usage bias of coding sequences for proteins of ev and to calculate the degree of the codon bias for the whole coding sequence of ev and other picornaviruses. additionally, there is a simple method which is supposed that statistically equal and random usage of all available synonymous codons was the ''neutral point'' (rscu = . ) for the development of group-specific codon usage . this method was introduced in the study to investigate the discrepancy of the synonymous codon usage pattern among of ev strains based on the different isolated areas. principal component analysis (pca), which was a commonly used multivariate statistical method (jolliffe, ; mardia et al., ) , was carried out to analyze the major trend in codon usage pattern among different strains of ev . pca involves a mathematical procedure that transforms some correlated variable (rscu values) into a smaller number of uncorrelated variables called principal components. each strain was represented as a dimensional vector, and each dimension corresponded to the rscu value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding the codon of aug, ugg and three stop codons. in addition, pca was also performed for analyzing the discrepancy between codon usage pattern of ev and that of host cell. the relationship between each general nucleotide composition (u%, a%, c% and g%) and each nucleotide composition in the third site of codon (u %, a %, c % and g %) in ev coding sequence and the relationship between u %, a %, c %, g % and the codon usage pattern of ev were evaluated by the pearson's rank. all statistical processes were carried out by statistical software spss . for windows. the a% and u% were higher than c% and g%, but a % and u % were lower than c % and g % in ev (table s ). the overall nucleotide composition never affects the nucleotide contents in the third site of codon in ev coding sequence, suggesting that composition constraints may be one of the factors in affecting the codon usage pattern of ev . the optimal codons of ala, arg, asp, cys, glu, gly, ile, phe, pro, ser were a-ended or u-ended, while those of asn, glu, his, leu, lys, thr, tyr, val were c-ended or g-ended (table ) . ev does not depends on all optimal codons with either a/u-end or c/g-ended like influenza a virus subtype h n , sever acute respiratory syndrome coronavirus or foot-and-mouth disease virus (fmdv) (gu et al., ; zhao et al., ; zhou et al., ) , but shapes the optimal codons with any types of nucleotide-ended. it is noted that although asn, leu, tyr, glu, lys and thr possessed optimal codon with c-or g-ended, they also contained some favored codon with u-or a-ended. similarly, asp, phe and cys also had favored codons with c-or g-ended (table ). these amino acids which choose optimal codons with any nucleotide-ended are affected under both mutation pressure by itself and natural selection from host, since natural selection from host ultimately allows those strains with good-fitness to possess a special codon usage patterns. the pca detected the first principal component (f ) which can account for . % of the total synonymous codon usage variation, and the second principal component (f ) for . % of the total variation. it appeared to be a little complex with some overlapping plots representing different epidemic areas (fig. s ). the plots for strains isolated from china-mainland, compared with that of strains isolated from other areas, could aggregate highly, while the plots for strains isolated from malaysia and china-taiwan scattered largely, the plots for strains from singapore, usa, japan and switzerland did not indicate the genetic diversity obviously, due to the limited samples. for strains circulating in china-mainland, social factors (public health, interpersonal communication, etc.) may play a role in influencing genetic diversity of these strains. however, for strains in china-taiwan and malaysia, geographic factors likely influence genetic diversity of those strains except for social factors. the nucleotide contents of the whole coding sequence of ev were analyzed. in table , the significant positive correlations between a% and a %, u% and u %, c% and c % and significant negative correlations among most of heterogeneous nucleotide contents indicated that composition constraints play a role in codon usage pattern of ev ; however, significant positive correlation between g% and a %, c% and g % and no correlation between g% and g % might suggest that natural selection from host plays a role in codon usage pattern of ev as well. in addition, there were significant correlations between each nucleotide content in the third site of codon and codon usage indices (f and f ) ( table ) . although the positive and negative correlations existed between c % and f , and between c % and f , respectively, the positive correlation play an important role in affecting the codon usage pattern due to f being the first principal component. the strong discrepancy of the synonymous codon usage in strains based on the different isolated areas was observed. in details, in strains from china-mainland, cgc for arg, gac for asp, cuu for leu, uuc for phe were chosen by ev strains preferentially, while cgg for arg, gac for asp, uuu for phe were chosen poorly; in singapore, auu for ile, cug for leu, uca for ser were chosen preferentially, while uug for phe and ucg for ser were chosen poorly; in usa, aga for arg was used preferentially, while ggu for gly was used poorly; in japan, gcu for ala, aau for asn, caa for gln, gaa for glu were chosen preferentially, while gcg for ala and cag for gln were poorly used; in switzerland, agg for arg, ugc for cys, cuc for leu, uug for leu, agu for ser were chosen preferentially; while cga for arg, ugu for cys, cug for leu, cuu for leu, agc for ser were poorly used (fig. s ) . these results may suggest that with the development of evolution of ev strains, the discrepancy of some synonymous codon usage probably is formed in different epidemic regions. in order to analyze whether the evolution of cub was controlled by mutation effect or natural selection from host, the cub values had been calculated based on data listed in table . the transition from maximum-negative to maximum-positive values was smooth and there was no obvious or unambiguous border between the so-called dominant and prohibited codons (fig. ) , namely, all synonymous codons were used. this result implied that the interaction between mutation pressure from ev and natural selection from host exists in the evolution of ev . by comparing between the patterns of synonymous codon usage of human cell and that of ev virus, we found that the pattern of synonymous codon usage of ev strains is partially antagonistic to that of human cells. in detail, optimal codons of nine amino acids in ev , including ala, asp, cys, gln, gly, leu, phe, pro, ser, are the disfavored codons of the corresponding amino acids in its host. among these non-coincidence patterns of synonymous codon usage of amino acids, the synonymous codon usage of asp, cys, gln, gly, phe has evolved to be complementary to that of host cells (table ). in addition, the optimal and rare synonymous codon usage patterns of arg, asn, glu, his, lys, thr, table summary of correlation analysis between the a%, u%, c%, g% and a %, u %, c %, g % in the whole coding sequences of ev strains a . a % u % c % g % ( c + g )% a% r = . ** r = À . ** r = À . ** r = À . ** r = À . ns u% r = À . ** r = . ** r = À . ** r = À . ** r = À . ** c% r = À . ** r = À . ns r = . ** r = . ** r = . ** g% r = . ** r = À . ** r = À . ns r = . ns r = . ** (c + g)% r = À . ns r = À . ** r = . ** r = . ** r = . ** a r value in this table is calculated in each correlation analysis. ns means non-significant (p > . ). ** means p < . . summary of correlation analysis between the first two axes in principle and nucleotide contents in ev . base compositions f f a % r = . ** r = . ** u % r = À . ** r = À . ** c % r = . * r = À . ** g % r = . * r = À . ** (c + g )% r = . ** r = . * * means . < p < . . ** means p < . . tyr and val of ev virus were in agreement with those of human cells (table ) . additionally, pca was performed to examine the whole coding sequence of ev in this study. the method detected one major trend in the first axis (f ) which can account for . % of the total synonymous codon usage variation, and another major trend in the second axis (f ) for . % of the total variation. the plots for codon usage pattern of human are far from the plots for that of ev (fig. s ) . the enc values were calculated for fmdv, cardiovirus, hapatitis a virus (hav), poliovirus (pv) and compared with that of ev (table s ). among these virus examined, the enc value for ev is highest, suggesting that ev has a most weak codon usage bias. in addition, we set up a plot which showed the relationship between gc % and enc values of all viral proteins (excluding the very small b protein) of ev virus, and found that the plots of coding sequences for vp , vp , vp , a, c, a, c and d aggregated around the expected curve, but the plots of coding sequences for vp and b scattered highly under the expected curve (fig. s a- c) . it may be explained that the codon usage bias of vp and b genes is influenced by their small size. in addition, there is no obvious geographic factor in influencing codon usage bias of the coding sequences of ev , implying that the natural selection from the geographic factor does not affect the codon usage patterns of specific coding sequences of ev , but shape the pattern of the whole sequence of this virus. furthermore, we found that some specific non-optimal codons are preferentially chosen in some coding sequences of ev . in details, three non-optimal codons (uua, cua and guu) are chosen in the vp gene, uug in the vp gene, cuu in the vp and a genes, uuu, cuu and guu in the c gene. it is also found that all coding sequences of ev contain some preferential codons. in details, gug, ccc, aca, gcc and gac are preferentially chosen in the vp gene, gug, cca and agg in the vp gene, cug, uca, acc and aga in the vp gene, uca in the vp gene, cuc, cca and aga in the a gene, ccu, aga and agg in the b gene, gug, ucu, cca, aca and aga in the c gene, cca, acu, agu, agc, aga and agg in the a gene, auu, ccu, aca, gca, agu and agg in the c gene, gug and aga in the d gene. taken together, there is no obvious relationship between the distribution of non-optimal codons and the deviation of enc value from the theoretical value. the pattern of codon usage is a genetic characteristic of various organisms. previous reports have been focused on viruses in picornavirdae family, such as fmdv, hav, poliovirus (aragones et al., (aragones et al., , coleman et al., ; zhong et al., ) . because a%, u%, g % and c % play roles in the formation of the different optimal codons with any nucleotide-ended, the codon usage pattern of ev is likely influenced by composition constraints. the codon usage pattern of pv is mostly coincident with that of its host, while the codon usage pattern of hav is antagonistic to that of its host (mueller et al., ; sá nchez et al., ) . the codon usage pattern of ev is a mixture of the two types of codon usage. the coincident portion of codon usage pattern of ev enable the corresponding amino acids to be translated efficiently, the other antagonistic portion of codon usage pattern of ev may enable viral proteins to be folded properly, although the translation efficiency of the corresponding amino acids decreased. in epstein-barr virus latent genes deoptimize codon usage in order to evade competition for host protein translation (karlin et al., ) and attenuation of pv activity was performed by rare codon pairs inducing poor translation for sequences of viral proteins (coleman et al., ) . these results suggest that disfavored codons coding for amino acids may not be deleterious factor for viruses to adapt to host cells. for codon usage patterns of the coding sequences of ev , the vp , a, b, c and d genes possess only some preferential codons and none of non-optimal codons is preferentially used, implying that translation of the whole coding sequence of ev is possibly regulated under the translation selection. furthermore, the alternative translation is the possibility of fine-tuning the kinetics of protein translation by a combination of rare and optimal codons (aragones et al., ; komar, ). for codon usage patterns of vp , vp , vp , a and c genes of ev , these genes possess combination of some non-optimal codons and optimal ones which are preferentially used, implying that translation of the coding sequences of ev is possibly regulated under fine-tuning translation kinetics selection. the sequences ntr and vp are often used to analyze the genetic diversity of ev (hagiwara et al., ; hsu et al., ; li et al., ) . by analyzing the codon usage pattern of the whole coding sequence of strains from different areas, genetic diversity resulting from geographic and social factors is likely observed. the genetic diversity of the most strains from china-mainland could indicate that a relatively independent area with geographic, public health and personal communication that enables the genetic diversity of ev to be sustained with little outside influence. compared with the genetic diversity of strains from china-mainland, that of strains from china-taiwan and malaysia also indicated that social factors play an important role in shaping the codon usage patterns of these strains. based on the genetic diversities of china-taiwan and malaysia, social factors may play important roles in shaping codon usage patterns of ev from the two areas. these genetic diversities of ev strains from different areas give a sign that geographic and social factors should be noticed at genetic diversity of virus from different areas. the enc values calculated for some picornaviruses indicate that a significantly lower bias of codon usage exists in ev than in the other viruses. as for rna viruses, previous study reported that the major factor in shaping codon usage patterns appears to be mutation pressure rather than natural selection zhao et al., ; zhong et al., ; zhou et al., zhou et al., , . however, the genetic characteristics of ev suggest the interaction between mutation pressure and natural selection, although enc values for the whole coding sequence of ev fig. . distribution of the cub of a codon for each amino acid. cub was taken from table and sorted in ascending order. suggest mutation pressure is a factor in influencing codon usage pattern. furthermore, in fig. s a- c , the relationship between enc data for ev proteins and cg % indicated that natural selection probably play roles in genetic diversity of ev strains except for mutation pressure in order to adapt to host. a general mutational pressure, which affects the whole genome would certainly account for the majority of the codon usage among some rna viruses (jenkins and holmes, ) . the genetic diversity and codon usage patterns results we proposed here are useful to understand the processes influencing the evolution of ev , especially the roles played by natural selection from host and mutation pressure from virus. additionally, such information might be helpful to understand the roles of geographic and social factors in influencing genetic diversity of ev . identification of enterovirus isolates from an outbreak of hand, foot, and mouth disease (hfmd) with fatal cases of encephalomyelitis in malaysia hepatitis a virus mutant spectra under the selective pressure of monoclonal antibodies: codon usage constrains limit capsid variability fine-tuning translation kinetics selection as the driving force of codon usage bias in the hepatitis a virus capsid molecular epidemiology of human enterovirus in the united kingdom from to molecular epidemiology of human enterovirus strains and recent outbreaks in the asia-pacific region: comparative analysis of the vp and vp genes human enterovirus in hand, foot and mouth disease patients analysis of recombination and natural selection in human enterovirus enteroviruses: polioviruses (poliomylitis), coxsackieviruses, echoviruses and enteroviruses enterovirus isolated from cases of epidemic poliomyelitis-like disease in bulgaria virus attenuation by genome-scale changes in codon pair bias outbreak of central nervous system disease associated with hand, foot, and mouth disease in japan during the summer of : detection and molecular epidemiology of enterovirus analysis of synonymous codon usage in sars coronavirus and other viruses in the nidovirales genetic and phenotypic characteristics of enterovirus isolates from patients with encephalitis and with hand, foot and mouth disease molecular epidemiology of enterovirus in peninsular malaysia an epidemic of enterovirus infection in taiwan genetic diversity of epidemic enterovirus strains recovered from clinical and environmental samples in taiwan appearance of intratypic recombination of enterovirus in taiwan from the extent of codon usage bias in human rna virus and its evolutionary origin principal component analysis, nded what drives codon choices in human genes? constrasts in codon usage of latent versus productive genes of epstein-barr virus: data and hypotheses why is cpg suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? a pause for thought along the co-translational folding pathway ribosome traffic in e. coli and regulation of gene expression codon usage in nucleopolyhedroviruses phylogenetic relationships and molecular adaptation dynamics of human rhinoviruses genetic characteristics of human enterovirus and coxsackievirus a circulating from to in shenzhen, people's republic of china genetic characteristics of human enterovirus and cosackievirus a circulating from to in shenzhen, people's republic of china an outbreak of enterovirus infection in taiwan, : epidemiologic and clinical manifestations analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus multivariate analysis an overview of the evolution of enterovirus and its clinical and public health significance reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity molecular evolution of the human enterovirus: correlation of serotype with vp sequence and application to picornavirus classification typing of human enteroviruses by partial sequencing of vp codon usage and replicative strategies of hepatits a virus genome variability and capsid structural constraints of hepatitis a virus molecular epidemiology of enterovirus over two decades in an australian urban community codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes cpg rna: identification of novel single-stranded rna that stimulates human cd + cd c+ monocytes the 'effective number of codons' used in a gene phylogenetic evidence for inter-typic recombination in the emergence of human enterovirus subgenotypes analysis of synonymous codon usage in human bocavirus isolates enterovirus isolated from china is serologically similar to the prototype ev brcr strain but differs in the -noncoding region codon optimization of human parvovirus b capsid genes greatly increases their expression in nonpermissive cells mutation pressure shapes codon usage in the gc-rich genome of foot-and-mouth disease virus papillomavirus capsid protein expression level depends on the match between codon usage and trna availability analysis of synonymous codon usage in foot-and-mouth disease analysis of synonymous codon usage in h n virus and other influenza a viruses synonymous codon usage in environmental chlamydia uwe reflects an evolution divergence from pathogenic chlamydiae this work was supported in parts by grants from national science & technology key project ( zx - b) and international science & technology cooperation program of china (no. dfa ) and science and technology key project of gansu province (no. nkda ). this study was also supported by national natural science foundation of china (no. and no. ). supplementary data associated with this article can be found, in the online version, at doi: . /j.meegid. . . . key: cord- - abp oom authors: lan, yu-ching; liu, hsin-fu; shih, yi-ping; yang, jyh-yuan; chen, hour-young; chen, yi-ming arthur title: phylogenetic analysis and sequence comparisons of structural and non-structural sars coronavirus proteins in taiwan date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: abp oom taiwan experienced a large number of severe acute respiratory syndrome (sars) viral infections between march and july ; by september of that year, sars cases were confirmed by rt-pcr or serological tests. in order to better understand evolutionary relationships among sars coronaviruses (scovs) from different international regions, we performed phylogenetic comparisons of full-length genomic and protein sequences from human scovs (including from taiwan) and two civet scovs. all the taiwanese sars-cov strains which associated with nosocomial infection formed a monophyletic clade within the late phase of the sars epidemic. this taiwanese clade could be further divided into two epidemic waves. taiwan scovs in the first wave clustered with three isolates from the amoy gardens housing complex in hong kong indicating their possible origin. of the human scovs, one isolate from guangdong province, china, exhibited an extra -nucleotide fragment between orf and orf —similar to the civet scov genome. nucleotide and protein sequence comparisons suggested that all scovs of late epidemic came from human-to-human transmission, while certain scovs of early epidemic might have originated in animals. on august , the world health organization (who, ) reported that the sars pandemic infection had spread to more than countries, affecting people and killing . later that year a novel coronavirus (sars-cov) was isolated from sars patients ksiazek et al., ; peiris et al., b; poutanen et al., ) ; an animal inoculation experiment identified a causal relationship between sars and sars-cov infection . zhong et al. ( ) identified the geographic origin of the epidemic as guangdong province, china, and the originating month as november . the first sars case in taiwan was diagnosed on march . its history was traced to a trip by the index case to guangdong in mid-february, when the sars epidemic in that province reached its peak (cdc, ; roc cdc, ) . the index case transmitted the virus to his wife and son; the first sars coronavirus in taiwan-scov tw -was isolated from the son (hsueh et al., ) . on march a male resident of the amoy gardens housing complex in hong kong (hereafter referred to as mr. x) flew to taiwan. on march he took a train from taipei to taichung city to visit his younger brother. that night he experienced a high fever; most likely he also read a local news report of a major sars outbreak in amoy gardens that same day (peiris et al., a) . he returned to hong kong on march. after he was admitted to a hospital, mr. www.elsevier.com/locate/meegid infection, genetics and evolution ( ) - x made a phone call to warn his younger brother, but it was too late. the younger brother who developed symptoms on march became the first sars-related fatality (tc ) in taiwan. a second index case (scov-twc) was isolated from this patient (the younger brother) by the roc cdc . the third index case was ms. a, a female adult who traveled on the same taiwanese train as mr. x on march. two days later she visited a hospital in taipei complaining of general fatigue. in addition to the local hospital, she visited two private clinics before being referred to taipei municipal hoping hospital on april. she spent less than h in that hospital's emergency room, but she probably transmitted the virus to two patients, an assistant nurse who escorted her to the x-ray room, and a laundry worker who handled her isolation gown. these individuals transmitted the sars virus to other medical personnel and patients, resulting in the entire hospital being shut down for more than months starting on april. according to the roc cdc, the hoping hospital nosocomial infection resulted in probable and suspected sars cases (wu et al., ) . even though the taiwanese government imposed a quarantine on april on all air travelers arriving from china, hong kong, singapore, macau, or toronto, the virus still spread to different parts of the main island of taiwan and the adjacent penghu islands. by september, sars cases in taiwan had been confirmed by rt-pcr or serological tests (who, ) . the size of the scov genome is approximately . kb (marra et al., ; rota et al., ) . the portion of the genome ( kb, about two-thirds) contains the code for the replicase gene, including two large open reading frames (orfs), referred to as orfs a and b. the other onethird of the genome contains orfs for four structural proteins (spike [s], envelope [e], membrane [m], and nucleocapsid [n] ) and nine putative non-structural proteins (orfs , , , , , , , and ) . recently, guan et al. ( ) isolated scov-like viruses from himalayan palm civets and raccoon dogs in southern china. according to a comparative analysis of human and animal scov genomes, the three animal scovs (sz , sz and sz ) all retain a -nucleotide sequence inserted between orfs and . for this study, we used phylogenetic analysis to investigate relationship among taiwanese sars-covs and between those scovs from other countries. one specific goal was to determine whether the sars-cov isolate from mr. x's younger brother (twc) clustered with the isolate from ms. a (twc ), and whether either one of those isolates clustered with isolates from other amoy gardens residents (chim et al., ) . we also compared the amino acid sequences of the s, e, m, and n structural proteins and three of the nine putative non-structural proteins (orfs , , and ) for sars-covs including taiwanese strains. twelve taiwanese scov strains were included in this study: tw (hsueh et al., ) , twc, twc , twc , twh, tc , tc , tc , twj, twk, tws and twy. tw was isolated from a patient whose father spent time in guangdong province in mid-february . twc was isolated from taiwan's first sars-related fatality. twc and twc were isolated from taipei municipal hoping hospital patients, and twc was from ms. a, the third index case. an additional full-length genomic sequences from human scov strains were selected from the genbank: nine from beijing (bj , bj , bj , bj , pumc , pumc , pumc , sino - and sino - ), six from hong kong (cuhk-w , cuhk-ag , cuhk-ag , cuhk-ag , cuhk-su and hku- ), five from singapore (sin , sin , sin , sin and sin ), two from guangzhou (gd and gz ), two from frankfurt (frankfurt and fra), two from milan (as and hsr ), two from guangdong province (zmy and gd ), and one each from wuhan (whu), zhejiang province (zj ), moscow (sod), toronto (tor ), and hanoi (urbani). a blast search was performed to locate sars cov sequences in the genbank database. a total of full-length nucleotide sequences from sars cov isolates (including two civet isolates) were aligned and edited using the bioedit program (hall, ) . phylogenetic analyses were conducted with the phylip . b (felsenstein, ) and mega programs (kumar et al., ) using the neighbor-joining (nj) and fitch and wagner parsimony (pars) methods. evolutionary distances were estimated with the kimura twoparameter model (kimura, ) . nj and pars tree robustness were statistically evaluated by bootstrap analysis ( samples). scov nucleotide sequence variation was analyzed with the simplot program (johns hopkins university, baltimore, md). the scovs used for this task were the urbani, cuhk-w , tor , hku- , bj , bj , bj , bj , gd , tw , twc, sin , sin , sin , sin , sin , hsr , cuhk-su , frankfurt , and gz . two civet scovs (sz and sz ) were used as references for comparison. sequence variation distance plots were generated with bp windows, bp steps, and a jukes-cantor correction. nucleotide sequences for the four structural genes, orf , and orf were edited and translated into amino acid sequences using the bioedit program prior to alignment for comparisons. the accession numbers for the scovs used in this study are urbani: to better understand evolutionary relationships between scovs isolated in taiwan and those isolated in other parts of the world, we constructed phylogenetic trees with two different methods using full-length genomic sequences from human ( taiwanese) and two civet scovs. tree topologies were consistent for the nj (fig. a) and pars (fig. b) methods. two human scov epidemics were identified. the late epidemic scovs formed a well-supported monophyletic clade with bootstrap values of and for the nj and pars trees, respectively. the early epidemic sequences did not cluster into a monophyletic clade, even though they did clearly differed from those of late epidemic. fig. . human and civet scov phylogenetic trees, produced with the neighbor-joining (nj) method using full-length ( . kb) sequences. branch bootstrap values from reps: (a) using the sz civet scov as a root; (b) a tree produced using the parsimony (pars) method. all early epidemic scovs had chinese origins: beijing (bj , bj , bj and bj ), guangzhou (gd and gz ), and hong kong (cuhk-w ). all the taiwanese scovs sequences which associated with nosocomial infection clustered into a monophyletic clade (bootstrap values and for nj and pars trees, respectively) within the late epidemic and could be further classified into two epidemic waves. second wave was a monophyletic clade supported by bootstrap values of and for nj and pars tree, respectively, while first wave was not a fully resolved cluster. twc (from mr. x's younger brother) did not cluster with three isolates from amoy gardens (cuhk-ag , cuhk-ag , cuhk-ag ), but did cluster with an isolate (whu) from wuhan, china (bootstrap value for nj tree) (fig. a) . pairwise comparison methods were used to analyze nucleotide sequence variation within the full-length genomes of human scovs ( from early epidemic and from late epidemic) (fig. ) . two civet scov sequences (sz and sz ) were used as references for comparison. our results revealed that the highest variation rate was in the one-third of the viral genome, especially the nucleotide sequences near the junction between the replicase b and spike genes; orf also had a relatively high sequence variation rate. amino acid sequences for the s, m, e, and n structural proteins of human scovs were compared with those of the sz- civet scov (fig. ) . the s protein was divided into s and s domains according to the molecular model proposed by spiga et al. ( ) . the s domain (n-terminal - amino acid residues, responsible for receptorbinding) had ( . %) amino acid differences; the s domain ( - amino acid residues) had ( . %)-a total of ( . %) differences in the s proteins of scovs. the s genes of whu and zmy contained several nucleotide insertions that interrupted the open reading frames. the amino acid distances of s proteins were . % ( . / ) for early epidemic scovs and . % ( . / ) for late epidemic scovs in comparison with civet scovs. intra-group sequence variation for early epidemic was . % (n = ) and for late epidemic . % (n = ) ( table ). the numbers of amino acid differences were for the e protein ( . %), for m ( . %), for n ( . %), and for orf ( . %) (fig. ) . amino acid distances among the human scovs were . % ( . / ) for the e protein, . % ( . / ) for m, . % ( . / ) for n, and . % ( . / ) for orf (table and fig. ) . among the human scovs that we analyzed, an isolate (gd ) from guangdong province china, contained an extra -nucleotide fragment. both whu and twc had dinucleotide deletions at the th and st nucleotides of orf , resulting in a frame shift and premature stop of the putative protein (fig. ). in addition, we observed a nucleotide deletion at the nd nucleotide of orf in sin ; this also resulted in a frame shift and premature translation stop. both the nj and pars trees separated the human scovs into two epidemics, even though early epidemic scovs failed to cluster into a well-supported monophyletic clade ( fig. a and b) . the early epidemic sequences were more closely related than the late epidemic sequences to civet scovs; all seven early epidemic scovs were from either fig. . plot analyses were used to compare diversity distributions among genes from human scovs. the average genetic distance from the reference genome of civet scovs of human scovs are plotted over the entire genome of scov. genomic sequences from the sz and sz civet scovs were used as references. the x-axis is the nucleotide location of the sars-cov genome. the y-axis is the rate of nucleotide differences between human scovs and civet scovs. sequence variation distance plots were generated with a bp window, bp steps by simplot program. guangdong province or beijing. among all the analyzed human scovs, gd was the only one having an extra nucleotide fragment which was also found in the civet scovs . furthermore, the average intragroup amino acid distance for the s gene in early epidemic was higher than for late epidemic (table ) . we also identified a signature amino acid sequence pattern (amino acid residues and ; fig. ) shared by early epidemic isolates and civet scovs. these evidences suggested that late epidemic scovs were transmitted from human-tohuman, while certain early epidemic scovs (e.g., gd ) might have been transmitted from animals to humans before spreading among various human populations. among the taiwanese scovs, our phylogenetic analysis does not support the hypothesis of an epidemiological link between the first and third index cases (mr. x and ms. a). according to our nj tree, twc (a scov isolate from mr. x's younger brother) clustered with the whu scov from wuhan, china (bootstrap = ), while twc- (ms. a's isolate) clustered with cuhk-ag and cuhk-ag , both of which originated in hong kong's amoy gardens housing complex. a sequence analysis demonstrated that twc and whu had di-nucleotide deletions in orf , resulting in a shift in the open reading frame (fig. ) . therefore, even though mr. x and ms. a took the same train from taipei to taichung, the evidence indicates that mr. x was not the source of ms. a's infection; that source has yet to be identified. as shown in the diversity plot, the s gene and orf at the junction between the replicase b and s genes had a higher number of sequence variations compared to other genomic regions (fig. ) . this influenced our decision to perform additional sequence comparisons of the s, e, m and n structural genes and orfs and . the s proteins of coronaviruses have been described as large, type i membrane glycoproteins that are responsible for both the binding of receptors to host cells and membrane fusion xiao et al., ) . the type i glycoproteins of coronaviruses, whose trimers resemble typical viral spikes, is transformed into virions through noncovalent interactions with m proteins. coronavirus s proteins contain two domains (or two subunits, depending on whether or not s is cleaved) (spiga et al., ) . the s domain contains virus-neutralizing epitopes and the receptor-binding domain (leparc-goffart et al., ; sanchez et al., ) . xiao et al. ( ) recently localized the scov receptor-binding domain (rbd) to amino acid residues - of the s protein. as shown in fig. , we observed seven amino acid differences in the rbd of the s protein, including amino acid residues , , , , , and . if we assume that the rbd is (a) conserved among different scovs, including civet scovs (bonavia et al., ) , and (b) more than - amino acids in length (lasky et al., ) , then it is possible that the rbd can be mapped onto amino acid residues - . identification of a receptor-binding domain of the spike glycoprotein of human coronavirus hcov- e memoir of severe acute respiratory syndrome control in taiwan severe acute respiratory syndrome analysis of the whole-length sequences of ten strains of sars coronavirus in taiwan and its epidemiological implications amino acid comparisons of s proteins from human and civet scovs. the urbani scov was used as a reference. a period (.) indicates concurrence with the top reference sequence (urbani) in the alignment genomic characterisation of the severe acute respiratory syndrome coronavirus of amoy gardens outbreak in hong kong identification of a novel coronavirus in patients with severe acute respiratory syndrome phylip -phylogeny inference package, version . aetiology: koch's postulates fulfilled for sars virus isolation and characterization of viruses related to the sars coronavirus from animals in southern china bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt microbiologic characteristics, serologic responses, and clinical manifestations in severe acute respiratory syndrome a simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences a novel coronavirus associated with severe acute respiratory syndrome mega : molecular evolutionary genetics analysis software delineation of region of the human immunodeficiency virus type gp glycoprotein critical for interaction with the cd receptor targeted recombination within the spike gene of murine coronavirus mouse hepatitis virus-a : q is a determinant of hepatotropism angiotensin-converting enzyme is a functional receptor for the sars coronavirus clinical progression and viral load in a community outbreak of coronavirusassociated sars pneumonia: a prospective study coronavirus as a possible cause of severe acute respiratory syndrome identification of severe acute respiratory syndrome in canada targeted recombination demonstrates that the spike gene of transmissible gastroenteritis coronavirus is a determinant of its enteric tropism and virulence molecular modelling of s and s subunits of sars coronavirus spike glycoprotein summary table of sars cases by country epidemiological investigation of the sars outbreak in the taipei municipal hoping hospital. memoir of severe acute respiratory syndrome control in taiwan the sars-covs glycoprotein: expression and functional characterization epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china we thank mr. jon lindemann for editing our manuscript. this work was supported in part by a grant from the national sars research program of the republic of china national science council (grant no. svir ). key: cord- - mg ke authors: gao, zhiru; xu, yinghui; guo, ye; xu, dongsheng; zhang, li; wang, xu; sun, chao; qiu, shi; ma, kewei title: a systematic review of re-detectable positive virus nucleic acid among covid- patients in recovery phase date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: mg ke a large number of coronavirus disease (covid- ) patients have been cured and discharged due to timely and effective treatments. while some discharged patients have been found re-positive nucleic acid again in the recovery phase. until now, there is still a great challenge to its infectivity and the specific potential mechanism which need further discussion. however, more intensive attention should be paid to the prognosis of the recovered patients. in this review, we mainly focus on the characteristics, potential reasons, infectivity, and outcomes of re-detectable positive patients, thereby providing some novel insights into the cognition of covid- . remarkable progress, enabling a great number of patients to be cured and discharged. the criteria for discharge in china are listed as follows: ) temperature returned to normal for longer than consecutive days; ) respiratory symptoms resolved significantly; ) improvement of acute exudative lesions of chest computed tomography (ct); ) two consecutive respiratory specimens tested negative for reverse transcriptase-polymerase chain reaction (rt-pcr) tests (sampling interval of at least h) [http://www.nhc.gov.cn/yzygj/s p/ / c a dfe cef dc f eb .s html]. a recent study reported that four medical workers aged - years who had re-detectable positive (rp) for sars-cov- within - days after being cured and discharged, indicating that some of the recovered patients may still be virus carriers, which caused widespread concern (lan et al., ). however, there is currently insufficient knowledge about the characteristics of rp patients. in the manuscript, we reviewed characteristics, potential reasons, infectivity, treatment, and outcome of rp patients in order to explain this phenomenon. according to several reports, some patients were found to be re-positive rt-pcr results of virus nucleic acid after - days of medicine discharge to re-positive rt-pcr results . the finding indicated that the rp patients accounted for . % ( / ) of discharged patients during the same follow-up period. they were characterized as young (mostly under years old), asymptomatic or minor clinical symptoms, and no disease progression after re-admission (an jh, et al., ) . a follow-up case of discharged covid- be required for them (fu et al., ) . many studies have shown that rt-pcr results of most rp patients, which may not be considered as simple viral relapse or secondary infection (xiao et al., ) . the underlying mechanism of rp patients remains elusive, the specific reasons need to be further explored. some experts speculated that the potential reasons might be related to some factors such as virology, detection of specimens or patients' condition. for virology of sars-cov- , it may be related to the biological characteristics of the virus. viral residue, intermittent viral release, and periodic changes of virus replication are generally considered as the main factors (an jh, et al., ; . a pathological examination of a patient who reached the discharge standard but died of sudden cardiac arrest found that sars-cov- virus still remained in the lung cells and caused lung pathological changes. although the results of the three nucleic acid tests were negative for the patient, there were viral residue in the lungs, so even if the patient was discharged, we supposed that virus would transfer positive again after a period of time (yao et al., ) . in addition, it may be linked to the diversity of sars-cov- genomic and the characteristics of repeated mutations (van dorp et al., ) . in other words, we lack a comprehensive understanding of sars-cov- , which may be continuously or repeatedly positive during the course of the disease (chen et al., ). for detection of specimens, it may be related to the collection methods, processing procedures, and detection methods (chen et al., ). differences of sample types, improper nucleic acid extraction, insufficient viral level or inappropriate sample pretreatments will lead to false-negative detection results by pcr method in a certain rate (pan et al., ; xie et al., ; zou et al., ) . this may cause covid- patients whose virus has not been completely clearance to reach the current discharge criteria. then after discharge, the virus will continue to replicate in a lower level, making this part of patients re-detected positive again once viral loads rise to the detection level. meanwhile, the virus mainly concentrates in the lower respiratory j o u r n a l p r e -p r o o f journal pre-proof tract and the lung, so false-negative tests may appear when collecting throat swabs (zhou et al., ) . in addition, initial studies reported that the sars-cov- rna could be detected in the feces of . % recovered patients ( / ), even in those with negative throat swabs (ling et al., ) . and later studies revealed that the viral rna can persist in fecal samples for nearly weeks after the patients' respiratory specimens detected negative (wu et al., b) . other studies have also demonstrated the importance of rectal swab-testing, which should be taken into consideration (wolfel et al., ; xu et al., ) . due to the possible presence of sars-cov- in the digestive tract, the current methods of discharge criteria for oral/nasopharyngeal swab virus detection are not accurate (liu et al., a) . therefore, using more sensitive detection method and collecting different samples to test will be a more effective way to overcome false-negative detection . for patients' condition, it may be related to the underlying diseases, degree of infection, and treatment methods, among which hypertension and diabetes are the most common underlying diseases (liu et al., b) . once infected with sars-cov- , the underlying diseases will be more difficult to control, leading to more complications and dysfunction of more organs and immune system (hussain et al., ). ultimately, the hospital stay will be prolonged, and patients are more likely to relapse or infection after discharge due to their lower immune function. also, some studies have indicated that the use of antiviral drugs may affect the host's cellular immunity. although virus can be cleared by antiviral drugs in the initial phase, patients' immune function decreased. once antiviral therapy discontinued, virus will tend to be activated due to lack of normal cellular immunity, which may be regarded as one of the reasons for recurrence of sars-cov- , but it still needs more evidence to verify that (balachandar et al., ; wu et al., ) . theoretically, the infectivity of patients is determined by the existence of the virus in different body fluids, secretions, and excreta (ling et al., ) . and the viral infectivity j o u r n a l p r e -p r o o f journal pre-proof mainly depends on its reproduction state (wölfel et al., ) . in a study from south korean, no active virus was discovered in samples from rp patients (kang, ) . this means that re-positive virus nucleic acid does not indicate infectivity. this also can explain that although sars-cov- rna can be detected in rp patients, no cases of infection have been reported so far. for example, all close contacts of rp patients were tested negative for nucleic acid and showed no suspicious clinical symptoms an {jh, et al., ) . another case report showed that there was no significant change in chest ct of rp patients and no family members were infected, which suggested that rp patients have no or lower infectivity (lan et al., ) . however, the infectivity of rp patients is still needed to be verified by more studies and more cases. furthermore, it is important and necessary to continue epidemiological follow-up on rp patients in order to monitor their health status and explain their infectivity. according to recent researches, rp patients usually completed negative-conversion again - weeks later, and they could heal themselves without any antibiotics or antiviral drugs, which might be related to the body's recovery immunity (an jh, et al., ) . in other words, even if sometimes the virus nucleic acid tested by rt-pcr is positive in the recovery phase of covid- , it will not cause a more serious condition, and antiviral therapy may not be required in most patients. rt-pcr results will turn negative again within a few days as immunity function recovered . for these cases, observational therapy can be used instead of antiviral drug therapy for asymptomatic rp patients. a recent study showed that the recovered patients acquired relatively stable and sustained immunity after being infected with sars-cov- . all the subjects produced cd + t cell responses to the spike protein on the surface of the sars-cov- , which also provided theoretical support for the vaccine under development (grifoni et al., ) . the detectable and sustained high levels of igm indicate that the acute phase of sars-cov- infection, but igg suggests that the body has enough immune protection against the sars-cov- and igg can persist a very long time (xiao et al., j o u r n a l p r e -p r o o f journal pre-proof ). it has been reported that the results of igm were negative but igg were positive when three patients were discharged from the hospital. and these results were still the same when they were re-admitted to the hospital as virus nucleic acid transferred positive again (fu et al., ) . therefore, detection of virus nucleic acid combined with antibody is useful for determining disease status, treatment and outcome. it has been nearly half a year since the covid- epidemic spread around the world. although a lot of patients from different countries have gradually recovered, it is very important to follow up with the patients who recovered from the infection. there are still some unknowns in the face of recovered patients. in this situation, it is necessary to understand the characteristics of rp patients and determine if they are potential threats to the public. moreover, rp patients should be paid more attention and it is important for early control of this epidemic in the whole world. analysis of factors associated with disease outcomes in hospitalized patients with novel coronavirus disease potential false-negative nucleic acid testing results for severe acute respiratory syndrome coronavirus from thermal inactivation of samples with low viral loads emergence of genomic diversity and recurrent mutations in sars-cov- . infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases virological assessment of hospitalized patients with covid- discontinuation of antiviral drugs may be the reason for recovered covid- patients testing positive again prolonged presence of sars-cov- viral rna in faecal samples profile of specific antibodies to sars-cov- : virology chest ct for typical -ncov pneumonia: relationship to negative rt-pcr testing characteristics of pediatric sars-cov- infection and potential evidence for persistent fecal viral shedding pathological evidence for residual sars-cov- in pulmonary tissues of a ready-for-discharge patient key: cord- - moccf c authors: hashemi, seyed ahmad; khoshi, amirhosein; ghasemzadeh-moghaddam, hamed; ghafouri, majid; taghavi, mohammadreza; namdar-ahmadabad, hasan; azimian, amir title: development of a pcr-rflp method for detection of d g mutation in sars-cov- date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: moccf c in late , an outbreak of respiratory disease named covid- started in the world. to date, thousands of cases of infection are reported worldwide. most researchers focused on epidemiology and clinical features of covid- , and a small part of studies was performed to evaluate the genetic characteristics of this virus. regarding the high price and low availability of sequencing techniques in developing countries, here we describe a rapid and inexpensive method for the detection of d g mutation in sars-cov- . using bioinformatics databases and software, we designed the pcr-rflp method for d g mutation detection. we evaluated sars-cov- positive samples isolated in six months in northeastern iran. our results showed that the prevalent type is s-d in our isolates, and a small number of isolated belongs to the s-g type. of samples, ( . %) samples have belonged to type s-d, and ( %) samples typed s-g. the first s-g type was detected on june . we have little information about the prevalence of d g mutation, and it seems that the reason is the lack of cheap and fast methods. we hope that this method will provide more information on the prevalence and epidemiology of d g mutations worldwide. the novel coronaviridae member named sars-cov- resulted in covid- disease. from to date, this disease has spread in most parts of the world and had become a significant challenge for the world health organization. many studies are running on clinical outcomes, epidemiology, and co-infections of this virus with other microorganisms. some researchers evaluated and compared the whole genome sequence of sars-cov- isolated in various parts of the world and identified some mutations. regarding the proteins encoded by mutant genes, they assumed that the mutations could affect the infectivity of this virus. the high-frequency mutations of the sars-cov- genome were seen in nsp , rna polymerase, helicase, membrane glycoprotein, rna primase, nucleocapsid phosphoprotein, and spike protein genes (yin, ) . one of the most critical mutations is d g in the spike protein gene. this mutation leads to a change of aspartate to glycine. studies showed that s-g mutants are more infective than s-d strains due to the high transmission efficacy (hu et al., ) . the s protein of coronaviruses is the main factor of host and tissue tropism and also is a significant target of viral entry inhibitors, neutralizing antibodies, and vaccines (du et al., ; hoffmann et al., ) . the s protein is cleaved to s and s subunits by host proteases. s acts for receptor binding and s for membrane fusion and entrance to host cells. multiple proteases include transmembrane serine protease , cathepsin b/l, and furin, are critical for s protein cleavage and cell entrance (hu et al., ) . recently researchers found a new serin protease, called elastase- , cleavage site in s-g mutants (bhattacharyya et al., ) . it leads to an increase in enzymatic cleavage efficiency and enhances infectivity (hu et al., ) . from the beginning of the covid- epidemic, many articles published on the evaluation of d g mutation in sars-cov- (bhattacharyya et al., ; biswas and majumder, ; gong et al., ; hu et al., ; isabel et al., ; korber et al., a; korber et al., b; maitra et al., ; yin, ) . almost in all works, researchers worked on previously published sars-cov- genome sequences in data banks such as ncbi and gisaid database, or their sequences. although wholegenome sequencing is a sensitive and precise method, but is expensive and puts researchers in a constraint on the number of samples. given that d g is the most important among all detected mutations, here we evaluated and optimized a fast and inexpensive method for detection of this mutation in sars-cov- in clinical samples. in the first step, we used the sequence of s protein of sars-cov- , published in gene bank with accession number mt . , for appropriate restriction endonuclease selection and primer design. at position in the s-d type, the primary nucleotide is t that encode aspartate at position of the amino acid chain. if the t to g mutation occurred at this position, then aspartate is replaced by glycine at position of the amino acid chain, called s-g type. at the next step, we evaluated the restriction endonucleases whose cleavage region covered the t- at the s protein gene. using the gene runner software, hpai, target sequence gttaac, found as a suitable enzyme [ fig. ]. then we designed a primer pair that their product includes the cut site position. the forward sequence was -aatctat-caggccggtagcac - , and the reverse was -caccaatgggtatgt-cacact − . the pcr product size was base pairs. if the t nucleotide is in position , enzymatic digestion produces two pieces of bp and bp and, if the nucleotide g is in this position, digestion had no effect on pcr product, and the one bp piece can be seen after agarose gel electrophoresis [ fig. ]. afterward, we selected a positive sample per day from patients admitted to the intensive care unit at intervals between march and august , a total of samples. the primary screening of positive samples was done using lightmix modular sars-cov- probe and primers (tib molbiol, berlin, germany) and addbio one-step rt master mix (add bio inc, daejeon, republic of korea) and also novel coronavirus ( -ncov) nucleic acid diagnostic kit (sansure, china). at the next step, we performed pcr on sars-cov- positive samples using the addscript rt-pcr kit (addbio, korea) as a manufacture recommendation [tables , ]. after pcr amplification, we performed rflp based on manufacturing protocol. in μl reactions; μl buffer ×, . μl enzyme solution, . μl pcr product, and μl dw were mixed and incubated at • c for min. finally, electrophoresis performed using μl of digestion products on . % agarose gel. we sent s-d and s-g related pcr product for sequencing and confirmation of our results. after the alignment of sequences of bp pcr product, we saw the t in s-d isolates and g in s-g isolates [ fig. ]. we selected positive samples from icu admitted patients in six months. of samples, ( . %) samples have belonged to type s-d, and ( %) samples typed s-g. it should be noted that the ( . %) samples had mixed bands related to both s-d and s-g types. we repeated the test on these samples and got the same results. the first s-g type was detected on june . after that, the number of s-g strains increasingly raised to date [ fig. ]. within six months, sars-cov- spread rapidly around the world. many efforts are made to develop vaccines or monoclonal antibodies against this virus. the viral spike protein is one of the best target molecules for this purpose. this protein is usually stable; nonetheless, some researchers found mutations in this protein (walls et al., ) . the most crucial mutation is a missense mutation in amino acid . this mutation converts aspartate to glycine, which is more easily breaks by proteinases such as elastase. the d g mutation was first identified in germany (phan, ) . after that, becerra-flores et al. concluded that the s-g strains are the more pathogenic form of the virus and lead to a higher fatality rate in patients (becerra-flores and cardozo, ). (bhattacharyya et al., ) . to date, reports around the world indicate an increase in the prevalence of d g mutation. in previous reports, researchers evaluated the whole genome sequences published in databases such as gisaid, and in some papers, researchers worked on their sequences gong et al., ; korber et al., b; maitra et al., ; yin, ) . the wholegenome sequencing is an expensive method and cannot be done for a large number of samples. regarding this and also the importance of d g in comparison to other mutations, here we evaluated the rapid and inexpensive pcr-rflp method for the detection of this mutation. this method is cheap and rapid compared to sequencing methods, and it can do in many molecular laboratories with common facilities around the world. we evaluated the samples of icu admitted patients, in six months in north khorasan, iran. the results of this method were the same in different samples. most of our strains ( . %) have belonged to the s-d type, and a little part ( %) has belonged to the s-g type. interestingly . % of samples had a mix of both types. on the contrary, bhattacharyya et al. reported that the dominant type of sars-cov- in europe and china is the s-g (bhattacharyya et al., ) , while in our samples, the current type was s-d. gong et al. reported the d g mutation in taiwanese patients and patients who had a history of travel to europe, turkey, and iran (gong et al., ) . this study is vital for us because of the lack of data about d g mutation in iran. they reported sars-cov- clade s-g strains in patients returned from iran. all of the evaluated samples in their report have been isolated in the first three months of , while our first d g mutant isolated in june. it should be noted that we evaluated the samples in north-eastern iran and not all parts of the country. in another study, eden et al. reported the viral genome sequences to include d g mutation in the travelers who returned from iran (eden et al., ) . generally, there is little information about the prevalence of mutations in iran and worldwide, and it seems that the reason is the lack of cheap and fast methods. we hope that this method will provide more information on the prevalence and epidemiology of d g mutations worldwide. our results showed that the designed method had consistent and reproducible results consistent with the protein s gene sequence of the studied strains. using this method, we also found that the g mutant is increasingly raised during the time. we need to test a higher number of positive samples to evaluate the prevalence of g mutation in this region and also its relationship with the transmission rate and severity of the disease. seyed ahmad hashemi; found acquisition and supervised data collection. amirhosein khoshi; collected laboratory data, carried out the rflp test. hamed ghasemzadeh-moghaddam; carried out rt-pcr tests. majid ghafouri and mohammadreza taghavi; supervised data collection and reviewed the manuscript. hasan namdar-ahmadabad: carried out rt-pcr tests. amir azimian; conceptualized and designed the study, carried out the rt-pcr and rflp tests, drafted the initial and final revision of the manuscript. all authors approved the final revised manuscript as submitted and agree to be accountable for all aspects of the work. the authors have no conflicts of intrests. sars-cov- viral spike g mutation exhibits a higher case fatality rate global spread of sars-cov- subtype with spike protein mutation d g is shaped by human genomic variations that regulate expression of tmprss and mx genes analysis of rna sequences of sars-cov- collected from countries reveals selective sweep of one virus type the spike protein of sars-cov-a target for vaccine and therapeutic development an emergent clade of sars-cov- linked to returned travellers from iran sars-cov- genomic surveillance in taiwan revealed novel orf -deletion mutant and clade possibly associated with infections in middle east sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor the d g mutation of sars-cov- spike protein enhances viral infectivity evolutionary and structural analyses of sars-cov- d g spike protein mutation now documented worldwide spike mutation pipeline reveals the emergence of a more transmissible form of sars-cov- tracking changes in sars-cov- spike: evidence that d g increases infectivity of the covid- virus mutations in sars-cov- viral rna identified in eastern india: possible implications for the ongoing outbreak in india and impact on viral structure and host susceptibility genetic diversity and evolution of sars-cov- structure, function, and antigenicity of the sars-cov- spike glycoprotein genotyping coronavirus sars-cov- : methods and implications this work is sppurted by north khorasan university of medical sciences. grant number is . fig. . alignment of pcr product sequences. in the s-d strain detected with our method, nucleotide t were seen in the position and nucleotide g were seen at this position. s. a. hashemi et al. infection, genetics and evolution ( ) key: cord- -wgeuh i authors: tian, lin; shen, xuejuan; murphy, robert w.; shen, yongyi title: the adaptation of codon usage of +ssrna viruses to their hosts date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: wgeuh i viruses depend on their host's cellular structure to survive. most of them do not have trnas, their translation relies on hosts' trna pools. over the course of evolution, viruses needed to optimally exploit cellular processes of their host. thus, codon usage of a virus should coevolve with its host to efficiently and rapidly replicate. some viruses can invade a broad spectrum of hosts (bstvs), while others can invade a narrow spectrum only (nstvs). consequently, we test the hypothesis that similarity of codon usage preference and the degree of matching between bstvs and their hosts will be lower than that of nstvs, which only need to coevolve with few hosts. we compare the patterns of codon usage in virus genomes to test this hypothesis. our results show that nstvs have a higher degree of matching to their hosts' trna pools than bstvs. further, analysis of the effective number of codons (enc) infers that codon usage bias of nstvs is relatively stronger than that of bstvs. thus, codon usage of nstvs tends to better match their host than that of bstvs. this supports the hypothesis that viruses adapt to the expression system of their host(s). viruses are pure parasites. they depend on their hosts' cellular structure and metabolism to replicate and assemble, i.e., survive. most of their genomes do not encode trnas, thus their translation of viral proteins relies on the hosts' trna pools (kumar et al. ) . a successful infection requires that viruses possess the ability to enter the host cell, and efficiently produce new viruses. the degenerate genetic code unequally uses synonymous codons, which code for the same amino acid (cristina et al. ; kanaya et al. ; shackelton et al. ; tsai et al. ). the redundancy of the genetic code provides the opportunity to shape the efficiency and accuracy of protein production, while maintaining the same amino acid sequence (chaney and clark ; plotkin and kudla ; stoletzki and eyre-walker ) . considering that the translation of viral proteins relies on the host's pool of trnas, codon usage of a virus must coevolve with its host to efficiently use host resources. it is expected that higher similarity of codon usage pattern will better facilitate their replication. the extent of codon usage among viruses and their hosts has been suggested to affect viral survival, fitness, and evasion from host's immune system (burns et al. ; costafreda et al. ; mueller et al. ) . because the virus relies on the host's cellular machinery for its replication, codon usage bias was suggested to play a role in the adaptation of a virus to its host. codon usage bias is common in viruses (butt et al. ; castells et al. ; cristina et al. ; he et al. ; li et al. ; moratorio et al. ; singh et al. ; su et al. ; xu et al. ; zang et al. ; zhao et al. ) . efficient replication seemingly requires that a virus and host have similar codon usage patterns to share a trna pool. co-evolution between a certain rna virus and its susceptible hosts at codon usages have been observed in many viruses (franzo et al. ; rahman et al. ; simón et al. ) . some viruses have a broad ranges of hosts (bstvs), such as arbovirus. these can infect mammals, birds, and insects. other viruses have narrow host ranges (nstvs), which can infect a limited number of hosts only. because bstvs must fit to multiple hosts and their diverse trna pools and their codon usage has a relationship with their host, a tradeoff exists regarding the extent of codon usage. bstvs must fit to their diverse hosts and, thus, the extent of matching for codon usage would be lower than that of nstvs, which must fit few, similar hosts. to test this hypothesis, we analyze viruses from genera of positivesense single-stranded rna (+ssrna) viruses (e.g., flavivirus, alphavirus, coronavirus, torovirus, arterivirus, rubivirus, pestivirus, hepacivirus, alphamesonivirus). viruses that can infect vertebrates and invertebrates, such as most of the flavivirus and alphavirus, were classified as bstvs, while other viruses that can infect either vertebrate or invertebrate were classified as nstvs. the complete genome sequences of virus strains from the genera of +ssrna viruses were obtained from the genbank database (http://www.ncbi.nlm.nih.gov). the information of host range was determined from ncbi (https://www.ncbi.nlm.nih.gov/taxonomy/) and the ninth report of the international committee on taxonomy of viruses (viralzone database: http://viralzone.expasy.org/). accession numbers and other detailed information of these viruses, such as strain names, isolate hosts and host ranges were also retrieved (supplementary table ) . estimates of codon usage, the relative synonymous codon usage (rscu) (sharp and li ) , and the effective number of codons (enc) (wright ), were calculated using codonw (available at http:// sourceforge.net/projects/codonw). in this study, aedes, culex and ixodes represented the main arthropod hosts, and gallus, homo and mus the three major groups of vertebrates. coding sequences of the hosts were obtained from the ensembl database (available at: http://www. ensembl.org) (yates et al. ). copy number of trnas for transmission vectors and hosts were obtained from gtrnadb (http://gtrnadb. ucsc.edu/). optimizing codon usage of viruses according to that of highly expressed host genes has been proved to increase the production of viral proteins (chithambaram et al. ; ngumbela et al. ) or transgenic genes (koresawa et al. ) . the degree of similarity for overall codon usage between viruses and their hosts' trna pools was estimated with a parameter based on optimized codon usage and the extent of matching between viral orf's codon usage bias and their hosts' trna pools. orfs were optimized on the basis of trna copy number characteristics of their hosts' expression system. online optimization software (http://genomes.urv.es/optimizer/) (puigbo et al. ) was utilized. the matching degree (md) was calculated as follows: where m was defined as the number of the different bases before and after optimized sequence, and n was the total length of open reading frame. this value could have ranged from zero to . to better quantify the effect of the overall codon usage of the host on the formation of the overall codon usage of the virus, the similarity index d(a,b) reported by the previous study was introduced into our work (zhou et al. ) . the d(a,b) represented the potential effect of the overall codon usage of the host on that of virus. this value potentially ranged from zero to . . . . the matching degree of +ssrna viruses to their hosts' trna pools md values were calculated for viral orfs. unfortunately, some viruses lacked data of trna copy numbers, and coding sequence of their hosts. therefore, only viruses (+ssrna) were used in this analysis. the strains of arboviruses that belonged to bstvs were optimized according to their hosts' trna pool expression systems (host: arthropods, mammals, gallus gallus). md values (mean ± sd) were . ± . , . ± . , and . ± . in the three hosts, respectively (supplementary table ). the md values of nstvs strains had a mean of . ± . (supplementary table ). wilcoxon & mann-whitney u test obtained statistically significant higher md values in nstvs than bstvs (z = − . , p < . ; z = − . , p < . ; z = − . , p < . , respectively. fig. a) . among the different genera of arbovirus (fig. b) , the extent of matching for flavivirus to their hosts' trna pools was higher than that of alphavirus. in addition, among the genera of nstvs, the md values of togaviridae (rubivirus) were the highest, followed by similarity index values d(a,b) were obtained for each strain in relation to its host(s) (supplementary tables and ). as shown in fig. a , the indices of the fourth group (nstvs vs hosts) were higher than those of the groups , , and (bstvs vs arthropods, bstvs vs mammals, bstvs vs gallus gallus). to quantify the degree of similarity of the overall codon usage pattern between different virus genera and their hosts, the similarity index d(a,b) was calculated to all strains (fig. b) (fig. ) . the degeneracy of the genetic code implies that several triplets can code for the same amino acid. the use of synonymous codons in gene coding regions dos not occur randomly, and codon usage bias is very common in viruses (butt et al. ; cristina et al. ; he et al. ; moratorio et al. ; singh et al. ) . codon usage is among the determinant factors that influence gene expression levels (chaney and clark ; zhou et al. ) . because viruses do not have trnas, and rely on host cell machinery for replication, co-evolution between a certain rna virus and its susceptible hosts at codon usages have been observed (franzo et al. ; rahman et al. ; simón et al. ). however, ambiguity remains in the co-evolution patterns of different viruses. some viruses infect a broad range of species (bstvs), whereas others infect only a single host (nstvs). viruses have very diverse hosts, and different hosts have very diverse trna pools. the md and d (a,b) values of nstvs are significantly higher than those of the bstvs vs. anopheles gambiae, homo sapiens, gallus gallus, and macaca mulatta ( figs. and ) . thus, and as our hypothesis predicts, nstvs appear to be more precisely adapted to their hosts' codon usage pattern and trna pools than bstvs. each nstv can infect one host only. therefore, these viruses need to fit only one host's trna pool. they appear to have evolved more consistent codon usage patterns with their hosts' expression systems. in contrast, bstvs can infect and replicate in mammals, birds, and insects. thus, adaptation in bstvs may involve a tradeoff between precise and functional matching to fit the diverse trna pools of multiple hosts. this may explain the relatively lower matching of bstvs to their hosts. enterovirus, hepacivirus and arterivirus are nstvs and their d(a,b) values are relatively low (fig. b ). this indicates a relatively low extent of similarity in overall codon usage between these viruses and their host. these viruses may not need to replicate rapidly. other factors, such as mutational pressure, may also play a role in determining codon usage bias (gu et al. ; rahman et al. ; wang et al. ; wong et al. ) . to quantify the extent of variation in codon usage, the enc values were calculated (wright ). most viruses have enc values > , which represents weak codon bias. this may be beneficial for efficient replication of viruses in host cells with potentially distinct codon preferences. the codon preference of nstvs is relatively stronger than for bstvs (fig. ) . this is could be due to weak codon preference being advantageous in the adaptation of bstvs to multiple host expression systems. the ability to enter the host-cell and efficiently replicate itself is essential for viral infection. viruses have coevolved many pathways to transcribe their own genetic material in their hosts (harwig et al. ) . codon usage in bstvs may involve a tradeoff between precise and functional matching to fit the diverse trna pools of multiple hosts. as expected, our analysis show that generally nstvs are more adapted to their hosts' codon usage pattern and trna pools than bstvs. this may help the virus to use the host transcript machinery more efficiently and, therefore, replicate faster. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . (a) group is the similarity degree between bstvs and arthropods. group is the similarity degree between bstvs and mammals. group is the similarity degree between bstvs and gallus gallus. group is the similarity degree between nstvs and a particular host. (b) the similarity degree of the overall codon usage bias between virus genera (+ssrna) and the hosts. ys: designed the study; lt and xs: analyzed the data; lt, rwm and ys: drafted the manuscript; all authors read and approved the final manuscript. modulation of poliovirus replicative fitness in hela cells by deoptimization of synonymous codon usage in the capsid region genome-wide analysis of codon usage and influencing factors in chikungunya viruses genome-wide analysis of codon usage bias in bovine coronavirus roles for synonymous codon usage in protein biogenesis the effect of mutation and selection on codon adaptation in escherichia coli bacteriophage hepatitis a virus adaptation to cellular shutoff is driven by dynamic adjustments of codon usage and results in the selection of populations with altered capsids a detailed comparative analysis of codon usage bias in zika virus canine parvovirus type (cpv- ) and feline panleukopenia virus (fpv) codon bias analysis reveals a progressive adaptation to the new niche after the host jump analysis of synonymous codon usage in sars coronavirus and other viruses in the nidovirales the battle of rna synthesis: virus versus host codon usage bias in the n gene of rabies virus codon usage and trna genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with cg-dinucleotide usage as assessed by multivariate analysis synthesis of a new cre recombinase gene based on optimal codon usage for mammalian systems revelation of influencing factors in overall codon usage bias of equine influenza viruses evolutionary and genetic analysis of the vp gene of canine parvovirus a detailed comparative analysis on the overall codon usage patterns in west nile virus reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity quantitative effect of suboptimal codon usage on translational efficiency of mrna encoding hiv- gag in intact t cells synonymous but not the same: the causes and consequences of codon bias optimizer: a web server for optimizing the codon usage of dna sequences analysis of codon usage bias of crimean-congo hemorrhagic fever virus and its adaptation to hosts evolutionary basis of codon usage and nucleotide composition bias in vertebrate dna viruses codon usage in regulatory genes in escherichia coli does not reflect selection for 'rare' codons host influence in the genomic composition of flaviviruses: a multivariate approach characterization of codon usage pattern and influencing factors in japanese encephalitis virus synonymous codon usage in escherichia coli: selection for translational accuracy synonymous codon usage analysis of hand, foot and mouth disease viruses: a comparative study on coxsackievirus a , a , a , and enterovirus from analysis of codon usage bias and base compositional constraints in iridovirus genomes analysis of codon usage in newcastle disease virus codon usage bias and the evolution of influenza a viruses. codon usage biases of influenza virus the 'effective number of codons' used in a gene comparative characterization analysis of synonymous codon usage bias in classical swine fever virus analysis of the codon usage of the orf gene of feline calicivirus analysis of codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (npv) and its relation to evolution the distribution of synonymous codon choice in the translation initiation region of dengue virus codon usage is an important determinant of gene expression levels largely through its effects on transcription this work was supported by guangdong natural science funds for distinguished young scholar ( a ). key: cord- -qjktnnn authors: wille, michelle; wensman, jonas johansson; larsson, simon; van damme, renaud; theelke, anna-karin; hayer, juliette; malmberg, maja title: evolutionary genetics of canine respiratory coronavirus and recent introduction into swedish dogs date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: qjktnnn canine respiratory coronavirus (crcov) has been identified as a causative agent of canine infectious respiratory disease, an upper respiratory infection affecting dogs. the epidemiology is currently opaque, with an unclear understanding of global prevalence, pathology, and genetic characteristics. in this study, swedish privately-owned dogs with characteristic signs of canine infectious respiratory disease (n = ) were screened for crcov and positive samples ( . %, . – . % [ % confidence interval (ci)]) were further sequenced. sequenced swedish crcov isolates were highly similar despite being isolated from dogs living in geographically distant locations and sampled across years ( – ). this is due to a single introduction into swedish dogs in approximately , as inferred by time structured phylogeny. unlike other crcovs, there was no evidence of recombination in swedish crcov isolates, further supporting a single introduction. finally, there were low levels of polymorphisms, in the spike genes. overall, we demonstrate that there is little diversity of crcov which is endemic in swedish dogs. canine infectious respiratory disease (cird) complex, colloquially referred to as kennel cough, is a contagious disease in dogs, particularly prolific in rehoming centers and kennels. dogs suffer from a dry hacking cough, which is usually cleared in - weeks, however, severe bronchopneumonia can develop (appel, ) . cird is a multifactorial disease with identified disease agents including canine parainfluenza virus (cpiv) (appel and percy, ) , canine adenovirus type (cav- ) (ditchfield et al., ) , canine pneumovirus (mitchell et al., ) , and the bacteria bordetella bronchiseptica (bemis, ) and mycobacterium cynos (mitchell et al., ) . more recently, canine respiratory coronavirus (crcov) was also identified as a causative agent of cird. this virus was first identified in the uk in a rehoming center with a high incidence of cird (erles et al., ) , however, a retrospective study has suggested that it may have circulated as early as in canada (ellis et al., ) . additional surveys have identified antibodies in dogs in the uk, ireland, italy, japan, and the us, with antibody prevalence as high as . % in the state of kentucky, usa (an et al., b; decaro et al., ; erles and brownlie, ; knesl et al., ; priestnall et al., priestnall et al., , schulz et al., ) . a small number of isolates have also been sequenced and characterized (an et al., a; erles et al., erles et al., , jeoung et al., ; yachi and mochizuki, infections in a number of avian and mammalian species. in humans, coronavirus infections range from the common cold (e.g. hcov-oc ) to more severe zoonotic diseases such as severe acute respiratory syndrome (sars) (peiris et al., ; van der hoek et al., ) and middle east respiratory syndrome (mers) (cunha and opal, ; de groot et al., ) . coronaviruses have large zoonotic potential, and the ability to cross species boundaries lies in not only mutation, but also the propensity for these viruses to recombine, with important breakpoints identified around the spike (s) gene in both birds and mammals (vijaykrishna et al., ; woo et al., ) . extensive homologous and heterologous recombination events have been documented in both human and animal group coronaviruses leading to the generation of various genotypes and strains (woo et al., ) . crcov is a group coronavirus, or betacoronavirus, which are comprised of mammalian coronaviruses. the most closely related species to crcov is bovine coronavirus (bcov), with many closely related species or strains such as sambar deer coronavirus, waterbuck coronavirus and human enteric coronavirus (hec ), illustrating the proliferation of these bcovlike viruses. in this study, we screened nasopharyngeal swabs from privatelyowned dogs in sweden with and without cird for crcov. we identified the first crcov positive dogs from sweden, and we used these to assess the evolutionary genetics of crcov, both locally and globally. specifically, we wanted to determine genetic variation and quasispecies of crcov in swedish dogs to clarify diversity within sweden and the relationship of swedish viruses to other crcov isolates in europe and elsewhere. furthermore, we aimed to understand the introduction of these viruses into sweden by using time-structured phylogeny and recombination analysis. finally, given the large number of crcov isolates sequenced in this study, we were able to better elucidate the emergence of crcov through analysis of recombination dynamics of crcov and bcov. this research was reviewed, approved and conducted in accordance with the regulations provided by the swedish board of agriculture and approved by the swedish animal research ethics board (uppsala djurförsöksetiska nämnd, reference numbers c / and c / ). between april and december , privately owned dogs with characteristic upper respiratory signs of cird (dry cough) for up to days were enrolled in a study investigating the cause of cird in sweden. maximum two dogs per households were sampled, and sampled dogs were not treated by antibiotics at the sampling. samples were taken from seven veterinary clinics across sweden, with dogs residing in swedish counties (fig. ) . nasopharyngeal swabs (e-swabs with amies medium and regular nylon flocked applicator, copan italia, brescia, italy) were collected and stored at − °c within - h of collection. a total of dogs with cird were swabbed. as controls, we also swabbed healthy dogs that had not suffered from respiratory signs for the last six months. we used generalized linear models (glm, family = binomial) to evaluate the effect of age, breed, location, and sex on crcov prevalence. explanatory variables were entered in isolation or in combination, where the resulting model improvements were χ tested for significance. statistics were done in r . . (r development core team, ) integrated into r studio . . . viral rna was extracted from nasopharyngeal swabs with the magnatrix + extraction robot (magnetic biosolutions, sweden) and vet viral na kit (nordiag asa, oslo, norway), as described in (jinnerot et al., ) . cdna was subsequently synthesized using su-perscript™ iii reverse transcriptase (invitrogen, life technologies, carlsbad, ca, usa), and the second strand was synthesized using klenow fragment™ ′-> ′ exo-(new england biolabs, m s, ipswich, ma, usa). four approaches were utilized to sequence viruses. first, using traditional pcr and sanger sequencing of the pcr products, and second, using illumina miseq to sequence pcr products of partial s gene. third, a viral metagenomics approach was done on two samples (crcov and crcov ) with the aim to get close to complete genomes. lastly, a probe-based capture method was used on one sample (crcov ) ( table a ). for the first approach, pcr reactions targeting the s, membrane (m) and hemagglutinin-esterase (he) genes were carried out using previously published primers (an et al., a; erles et al., ) (table a ) using kapa g robust hotstart readymix pcr kit (kapa biosystems, roche sequencing, pleasanton, ca, usa). thermocycling conditions were °c for min and then cycles of °c for s, a reaction specific annealing temperature for s, a reaction specific elongation time at °c, and finally °c for min (table a ). the pcr products were run on a . % agarose gel stained with gelred, visualized under uv transillumination (geldoc, bio-rad laboratories, inc., richmond, ca, usa), purified using genejet gel extraction kit (life technologies, carlsbad, ca, usa) and sequenced at macrogen europe (amsterdam, nl). a bp fragment was amplified using longamp taq dna polymeranse ( . u), dntp mix ( . mm), × longamp taq reaction buffer (new england biolabs, m s, ipswich, ma, usa), . μm of respective primer sp f and sp r (table a ) , and μl of template. thermocycling conditions were °c for s and then cycles of °c for s, °c for s, elongation time at °c for s, and finally °c for min. the pcr products were purified using the genjet pcr purification kit (thermo fisher scientific, waltham, ma, usa). sequencing libraries were made using nextera xt library preparation kit (illumina, san diego, ca, usa), normalization and pooling of nm sequencing libraries was done based on concentration measurements from agilent high sensitivity dna kit ( bioanalyzer, agilent technologies, palo alto, ca, usa). the pool of sequencing libraries was denaturated with naoh and further diluted with hybridization buffer to a final concentration of pm and spiked with % phix for diversity. paired-end sequencing was performed with miseq reagent kit v cycles on the miseq instrument (illumina, san diego, ca, usa) at the national veterinary institute, uppsala, sweden. for the viral metagenomics approach, μl of the swab media were freeze thawed twice, centrifuged at g at °c for min, the supernatant transferred to a filtrate . μm (millipore, burlington, ma, usa), and centrifuged for min at g. the filtrated sample was treated with u turbodnase in turbodnase buffer (invitrogen, thermo fisher scientific, waltham, ma, usa) and dnasei ( μl) (invitrogen, life technologies, carlsbad, ca, usa) for min at °c, followed by rnase cocktail™ enzyme mix (invitrogen, thermo fisher scientific, waltham, ma, usa) treatment for min at room temperature. thereafter, rna was extracted using trizol, chloroform and rneasy kit (qiagen, hilden, germany). the kit oviation® rna-seq system v (nugen, redwood city, ca, usa) was used to amplify the rna in accordance with the provided protocol, and the genjet pcr purification kit (thermo fisher scientific, waltham, ma, usa) was used for purification. sequencing libraries were constructed at the national sequencing infrastructure in uppsala, sweden, using the ab library builder system (ion xpress™ plus and ion plus library preparation for the ab library builder™ system protocol, thermo fisher scientific, waltham, ma, usa) and size selected on the blue pippintm (sage science, beverly, ma, usa). library size and concentration were assessed by a bioanalyzer high sensitivity chip (agilent technologies, santa clara, ca, usa) and by the fragment analyzer system (advanced analytical; agilent technologies, santa clara, ca, usa). template preparation was performed on the ion chef™ system using the ion & ion kit-chef (thermo fisher scientific, waltham, ma, usa). samples were sequenced on chips using the ion s ™ xl system (thermo fisher, waltham, ma, usa). probes were designed for feline coronavirus kb -nc_ . , canine respiratory coronavirus kb -jx . , bovine coronavirus kb -u . , kobuvirus kb -km . , porcine rubulavirus kb -nc_ . , african swine fever virus kb -benin / am . , canine parainfluenza virus kb -kc . , by agilent technologies sureselect dna advanced design wizard. this resulted in a . kbp capture based on probes. one sample that was positive for crcov with a cq-value of . as determined by qpcr was selected (crcov ). rna extracted from nasopharyngeal swabs as described above was used. in total μl of rna was converted into cdna using superscript™ iii reverse transcriptase (invitrogen, life technologies, carlsbad, ca, usa). thereafter, treated with rnase h for min at °c, and made double-stranded using, klenow fragment™ ′-> ′ exo-(thermo fisher scientific, waltham, ma, usa). kapa hyperplus library prep kit (kapa biosystems, roche sequencing, pleasanton, ca, usa) was used in combination with an unofficial protocol for using agilent's sureselectxt target enrichment system for illumina paired-end multiplexed sequencing libraries version b (june ), ng dna samples. a × bead clean-up was used on μl of double-stranded cdna prior to fragmentation. the samples were fragmented at °c for min. at the adapter ligation step the sureselect adapter oligo mix was used and the sample was incubated at °c for min. the pre-capture libraries were vacuum centrifuged to a final concentration of ng/μl. hybridization was done according to the sureselect protocol and the samples were incubated at °c for h in a proflex pcr machine (applied biosystems, foster city, ca, usa). thereafter, dynabeads myone streptavidin t (thermo fisher scientific, waltham, ma, usa) were used to capture the sureselectxt enriched libraries. an on-bead pcr with kapa hifi hotstart amplified the libraries. for this, pcr cycles were used. after a final clean up with agencourt ampure beads (beckman coulter indianapolis, in, usa), the libraries were quality assured using bioanalyzer hs dna assay (agilent technologies, santa clara, ca, usa). prior to sequencing a nm pool was made, denaturated with naoh and further diluted with hybridization buffer to a final concentration of pm and spiked with . % phix for diversity. pairedend, cycles sequencing was performed with miseq reagent kit v cycles on the miseq instrument (illumina, san diego, ca, usa) at the national veterinary institute in uppsala, sweden. reads were cleaned, forward and reverse reads aligned and a consensus per sample was made using dnastar (madison, wi, usa). the sequenced reads were demultiplexed using bcl fastq (https:// github.com/brwnj/bcl fastq.) and the adapters were removed using fastp version . . (chen et al., ) (https://github.com/ opengene/fastp). the reads were then assembled using megahit version . . (li et al., ) , with default settings. for each sample, the only contig produced corresponding to the size of the amplicon was extracted (longest contig). the reads were trimmed for quality using fastp with the a phred score threshold of and a minimum length of bp, and aligned to the longest contig obtained by megahit using bwa-mem version . . (li and durbin, ) with default parameters. the result of the alignment was converted and sorted in a bam file using samtools version . . (li et al., ) . from these alignments, single nucleotide variants were then analyzed using both shorah version . (zagordi et al., ) and quasirecomb version . (topfer et al., ) . the reads were assembled using megahit version . . (li et al., ) , with default settings. a taxonomic classification of the contigs using diamond version . . (buchfink et al., ) against the nonredundant protein database from ncbi (nr, release february ) was then performed. the produced output files (daa format) were uploaded into megan (version . . ) (huson et al., ) and all the contigs classified as betacoronavirus were extracted and inspected. the longest contigs were selected for further structural and functional annotation. reads were assembled using spades (v . . ) (bankevich et al., ) but as the coverage was too high to get a correct assembly, the dataset was randomly reduced down to , reads. those reads were then assembled using spades. the obtained contigs were then taxonomically assigned using diamond. the outputs from diamond were visualized in megan and contigs classified as betacoronavirus were retrieved. the crcov , crcov and crcov genomes were annotated using an annotation pipeline for prokaryotic and viral genomes, prokka (seemann, ) . we had previously extracted all the protein sequences belonging to the coronaviridae family from uniprotkb (release april ). this dataset was provided to prokka for annotating the newly assembled coronavirus genomes. all generated sequences from this study have been deposited to the european nucleotide archive at ebi, under the bioproject prjeb . high throughput sequencing (hts) reads have been deposited in the ebi short read archive (accession numbers: spike amplicons datasets: err -err , err -err , err -err . crcov probe-based capture dataset: err , metagenomics datasets: crcov : err and crcov : err ). final full and partial annotated genomes generated from hts have been deposited in european nucleotide archive under the accession numbers: erz (crcov ), erz (crcov ), erz (crcov ). full length genes generated through sanger sequencing have been deposited in european nucleotide archive, accession numbers erz to erz . resulting sequences were aligned using the mafft algorithm (katoh et al., ) within geneious r (biomatters, new zealand). maximum likelihood phylogenetic trees were constructed for each gene (orf ab, hemagglutinin-esterase he, s, and matrix m) using phyml . (guindon et al., ) implementing the best substitution model for each gene. for the full length s gene, we utilized beast . (drummond and rambaut, ) to better infer the evolutionary relationship within crcovs. shortly, we used maximum likelihood trees constructed in mega to test for clock-like behavior in each data set by performing linear regressions of root-to-tip distances across years of sampling in tempest (rambaut et al., ) . using beast, time-stamped data were analyzed using both the uncorrelated lognormal relaxed and strict molecular clock, the srd codon position model -a hky substitution model and a different rate of nucleotide substitution for the + codon position and the rd codon position (bahl et al., ; shapiro et al., ) . we implemented the bayesian skyline coalescent tree prior. three independent analyses of million generations were performed and convergence assessed using tracer . . independent runs were combined in logcombiner v . following a burnin of %. maximum credibility clade trees were generated using treeannotator v . . the maximum credibility clade trees were visualized in figtree v . . . to assess recombination, a concatenated sequence was generated including the viruses from which there were sequences from the partial non-structural protein a (ns), and full length he, s, envelope (e), and m, resulting in bp for analysis and the genes were placed in genomic order. to assess evolutionary patterns of all segments a splitstree network was constructed with crcov as well as bcov outgroups using splitstree (huson and bryant, ) . splitstree builds a network which takes recombination into account. to better understand the recombination process, the concatenated alignment was used in rpd to estimate break points (martin et al., ) . specifically, we used the algorithms rdp (martin and rybicki, ) and bootscan (martin et al., ) to detect the recombination window, and used additional algorithms within rpd to cross reference support for the detected window. to understand genetic variation within each sample, we utilized sequences of the partial spike gene generated using illumina miseq of nine samples. the amplicon sequenced ranged from position to the ′ end ( ) of the spike gene, so variants were only studied in this ′ region. the genetic variant population was estimated using a combination of both shorah version . (zagordi et al., ) and qua-sirecomb version . (topfer et al., ) with default parameters. we used a combination of both to ensure the accuracy of the snps found. a total of dogs were sampled as part of this study, comprising dogs with signs of disease and healthy dogs. thirteen samples collected from dogs with signs of disease were positive for crcov ( . %, . - . % [ % confidence interval (ci)]), and were collected from dogs living in the counties of stockholm ( / ), skåne ( / ), västmanland ( / ), västernorrland ( / ) and dalarna ( / ) (fig. ) . none of the healthy dogs tested positive for crcov. overall prevalence of crcov did not vary significantly by dog clinic (x = . , df = , p = . ) or county from which dogs originated (x = . , df = , p = . ). there was further no statistical difference across dog breed (x = . , df = , p = . ), sex (x = . , df = , p = . ), or age category (x = . , df = , p = . ) (fig. a ) . the largest number of positive samples were detected in winter months (n = ) compared to the autumn (n = ) and spring (n = ); no positive samples were detected in the summer (table ) . despite this detection difference, prevalence of crcov was not significantly different across season (x = . , df = , p = . ) (fig. a ) . using probe-based capture one complete genome was generated (crcov ). the complete genome of crcov was , bp and following annotation coding regions were predicted: orf a, orf ab, non-structural protein a, he, s, . kda non-structural protein, e, m and nucleoprotein (np) (fig. a ) . this full genome now brings the number of complete crcov genomes to [crcov-k (an et al., a) and crcov-bj (lu et al., ) ]. viral metagenomics resulted in partial genomes of crcov and crcov covering bp and , bp, respectively. following annotation of the crcov genome, coding regions were predicted: a partial gene coding for the he, and all full genes coding for: s, . kda non-structural protein, e, m and the np. annotation of the crcov genome predicted a partial orf ab gene and all the remaining genes located between orf ab and the ′ end of the genome. of the remaining viruses, - genes were sequenced through a combination of sanger sequencing and illumina high-throughput sequencing (table a ) . assessment of the s, e, m, ns and he genes generated using sanger and high-throughput sequencing (table a ) demonstrated high genetic similarity of viruses detected in swedish dogs, despite sampling from numerous clinics across sweden and an over a period of years. indeed, all samples for all genes were within % identity (fig. a ) . phylogenetic reconstruction of the larger genes (s, he and m) indicates that swedish viruses are most closely related to a crcov isolated in italy in (eu ), and in some trees crcov isolated in the uk in (dq ) (fig. a ) . given discordance between trees, we constructed a consensus tree from a concatenated alignment using splitstree, suggesting that all swedish viruses are likely the result of a single introduction and despite recombination being a common feature in crcov's, are most similar to the two aforementioned viruses (figs. , a ) . time-structured phylogenetic analysis of the s gene suggests that the most recent common ancestor of all swedish crcovs was introduced to sweden at the beginning of [ . with a confidence interval ranging from . to . ], with subsequence proliferation within sweden (fig. ) . as suggested with previous analysis, the viruses isolated in sweden are most closely related to an isolate from italy, but not necessarily the uk, and that crcovs found in europe may have emerged and been introduced from asia. however, due to the sparseness of sequences, we have not undertaken phylogeographic analysis, and these observations should be interpreted with caution. largely, crcov and bcov form independent lineages in phylogenetic analysis, and large number of swedish sequences in each tree help to better define the crcov clade. this lineage separation is most evident in the s tree (fig. a c) , where all crcovs form a clade that is different from bcov. however, crcov does not always fall into a distinctive clade; some genes like the he and m of some isolates are more closely related to bcov sequences than crcov. this was not the case with the viruses from sweden, which are very similar to each other, and more distantly related to bcov than other crcov sequences (fig. s a, fig. ) . when assessing pairwise identity, it is clear that the percentage identity of all global crcovs only (fig. a b) and a combination of all crcov and bcov sequences (fig. a c) for the partial ns, he and the complete m gene is rather similar. this is in contrast to the s gene, where percentage identity of all crcovs ranges from to % whereas when including bcov outgroups the range is from . to %. this is because for many genes, viruses from korea (k , k , k ) and a recent sequence from china which were mined from genbank are more similar to bcov than to crcovs detected in europe (fig. a) . recombination detection algorithms detect that there is indeed a recombination event in these previously described korean and chinese viruses, where the ns, he, e and m sequence is more similar to bcov than to other crcov sequences (region probability with mc correction: . ) ( table a ). the breakpoint, represented by the end of the he gene and start of the s gene, has a high probability ( . ); however, the certainty of the breakpoint at the end of the s gene is undetermined due to a number of potential breakpoints (fig. a ) , which is revealed in the bootscan analysis. this is because all crcovs, but especially the asian crcovs, have higher signal for bcov at the end of the s gene (fig. b) . whether this is due to recombination within the s gene, or due to evolutionary processes of conservation in this section of the s gene, is unclear given some similarity between swedish crcov and bcov in this region as well. regardless, these recombinant isolates have been maintained in asia, demonstrated by the isolation of a virus in china in sharing the same mosaic crcov-bcov pattern as viruses isolated in korea in . overall, viruses sequenced in sweden as part of this study have a different recombination profile as compared to previously described viruses from china and korea. to better understand the variation within each crcov virus detected in sweden, partial spike amplicon high-throughput sequencing of all crcov isolates except crcov , and were analyzed for variations with different methods: shorah and quasirecomb. overall, shorah identified more single nucleotide variants (snvs) compared to quasirecomb (fig. ) , however approximately half of the snvs predicted by shorah were also predicted by quasirecomb (except for the insertions/deletions that are only predicted by shorah). furthermore, regardless of method, most of the snvs observed were at a frequency lower than . . there was variation in snv's across samples. there were no variants identified in crcov with either method. in contrast, crcov contained different snv's. crcov and crcov had marked different pattern depending on method used, whereby snv's were identified with shorah but not with quasirecomb due to differences in detection of indels. there are no positions of snvs that are common to all samples. only one snv was common to crcov and crcov (position ). this variation analysis suggests that the studied region of the s gene is not variable among the crcov isolates collected from swedish dogs. in this study, we aimed to reveal the epidemiology and evolutionary genetics of crcov, one of the causative agents of canine infectious respiratory disease (cird), in sweden. cird is a major cause of morbidity in dogs and an important animal welfare condition. this disease is multi-factorial, and a recent european survey reported that crcov, canine pneumovirus and the bacteria mycoplasma cynos, besides the previously established main causes (mainly cpiv and b. bronchiseptica), all played a role in the disease (mitchell et al., ) . to date, the highest prevalence of crcov remains in rehoming centers (erles and brownlie, ; erles et al., ; mitchell et al., ) , or other instances where dogs may be cohoused, such as in racing greyhounds (sowman et al., ) . epidemiology is unclear in pet dogs, such as those sampled in our study, but we predict that occasions where many dogs meet (e.g. dog day-care centers, dog shows and other activities), m. wille, et al. infection, genetics and evolution ( ) are likely contributing to spread, similarly to other cird-causing pathogens. overall, we found that % of pet dogs with signs of cird in sweden, ranging from mild (cough only) to severe (affected general condition and deep coughing) disease signs, were positive for crcov, demonstrating the role that this virus plays in the complex aetiology of cird. importantly, we found crcov in dogs of all breeds, ages, and sexes demonstrating that this virus affects all dogs, equally. crcov was isolated between december and march; no viruses were detected in the summer months. this lack of significance of season was likely due to few tests being undertaken in the summer, resulting in a large uncertainty. the evolutionary genetics of coronaviruses is complex, largely driven by high rates of substitution, and more importantly recombination. this has allowed for interspecies variability, interspecies host jumps and novel coronaviruses to emerge under the "right" conditions (decaro et al., ; pyrc et al., ; ren et al., ; su et al., ; woo et al., ; zhang et al., ) . lineage a of betacoronaviruses is comprised of a number of closely related viruses including . splits tree of concatenated ns, he, s, e and m genes. each band of parallel edges indicates a split, which is a collection of sequence differences that separates one group of sequences from another. non-tree like evolution is expected to be from recombination. the distance between taxa represents the sum of weights of all splits that separate taxa. m. wille, et al. infection, genetics and evolution ( ) crcov, bcov, bcov-like, and hcov-oc . a study by kin et al., demonstrated that bcov and hcov-oc have > % global nucleotide identity and they reported in an earlier study that hcov-oc may be the result of a zoonotic transmission between bovines and humans (kin et al., ; kin et al., ) . furthermore, there is a . % global nucleotide similarity between crcov and bcov (erles et al., ) , but more importantly, cross-reactivity between bcov antigen with crcov antibodies (erles et al., ) , and infectivity of puppies with bcov (kaneshima et al., ; priestnall et al., ) . finally, bcov and crcov (and hcov-oc ) use sialic acids or heparan sulphate to attach to the cell surface, and use the same molecule as entry receptors (hla-i, szczepanski et al., ) . overall, this provides a lot of evidence for a very putative host switch event. our splitstree and recombination analyses of global viruses support the results in lu et al., which demonstrate high levels of recombination between bcov and previously described crcov strains, including those from uk, italy, korea and china (lu et al., ) . the viruses sequenced in this study, at a more local level, add an interesting piece to the puzzle. viruses from sweden were most closely related to a sequenced strain from italy in , and therefore the most parsimonious conclusion is that these viruses were introduced to sweden from elsewhere in europe. despite being detected in a large number of dog species, large geographic area and across years, there was very little genetic variation in crcov from swedenvery little genetic drift, few snvs and no evidence of recombination, suggesting a single, recent introduction, and subsequent spread. potentially the most interesting finding, is that the crcov circulating in sweden is the most genetically distant virus to bcov as compared to crcov previously described (particularly those from asia), with the majority of genes clustering into a "crcov clade" rather than bcov, and most genes being < % similar to bcov. whether this suggests better adaptation to dogs with time, or whether it represents geographical differences, is unclear, and will be revealed with further sequencing of these viruses, globally. recent descriptions of novel aetiological causes of cird, such as crcov, urge for further studies on epidemiology and pathogen genetics to facilitate future vaccine development and diagnostics. to date, commercial vaccines are only available for the well-established cird pathogens cpiv, cav- and b. bronchiseptica. despite frequent vaccinations in dogs, cird remains a burden for dogs and dog-owners, and as such, it is imperative to better understand the epidemiology and evolution to prevent virus introductions and spread, and to provide informed advice on quarantine decisions related to dog housing facilities, ranging from dog day-care centers to show dogs and racing kennels. fig. . recombination between crcov and bcov. (a) percentage identity between crcov and bcov strain kakegawa. white indicates no sequence from the gene was available, and grey/blue shading corresponds to the percentage identity to bcov. asterisk indicates only a partial gene sequence was available. (b) bootscan analysis of concatenated sequence illustrating a recombination where crcov from korea and china are more similar to bcov in the ns, he and m genes. genes are identified above the plot, and the statistically supported recombination window is further illustrated above the plot. lines plotted are percentage bootstrap support include only potentially recombined viruses, in addition to one of the swedish crcov viruses for reference (black). statistical support for recombination window is presented in table a and chi-squared breakpoint in fig. a . (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) the authors declare the following financial interests/personal relationships which may be considered as potential competing interests: the study was in part financed by msd animal health. this sponsor did not have any influence on the design of the study or the decision to publish the results. genetic analysis of canine group coronavirus in korean dogs a serological survey of canine respiratory coronavirus and canine influenza virus in korean dogs canine infectious tracheobronchitis short review: kennel cough sv- -like parainfluenza virus in dogs influenza a virus migration and persistence in north american wild birds spades: a new genome assembly algorithm and its applications to single-cell sequencing bordetella and mycoplasma respiratory infections in dogs and cats fast and sensitive protein alignment using diamond fastp: an ultra-fast all-in-one fastq preprocessor middle east respiratory syndrome (mers): a new zoonotic viral pneumonia middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group serological and molecular evidence that canine respiratory coronavirus is circulating in italy recombinant canine coronaviruses related to transmissible gastroenteritis virus of swine are circulating in dogs association of canine adenovirus (toronto a / ) with an outbreak of laryngotracheitis ("kennel cough"): a preliminary report beast: bayesian evolutionary analysis by sampling trees detection of coronavirus in cases of tracheobronchitis in dogs: a retrospective study from to investigation into the causes of canine infectious respiratory disease: antibody responses to canine respiratory coronavirus and canine herpesvirus in two kennelled dog populations detection of a group coronavirus in dogs with canine infectious respiratory disease isolation and sequence analysis of canine respiratory coronavirus new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . application of phylogenetic networks in evolutionary studies megan analysis of metagenomic data m gene analysis of canine coronavirus strains detected in korea development of a taqman realtime pcr assay for detection of bordetella bronchiseptica the infectivity and pathogenicity of a group bovine coronavirus in pups multiple alignment of dna sequences with mafft genomic analysis of human coronaviruses oc (hcov-oc s) circulating in france from to reveals a high intra-specific diversity with new recombinant genotypes comparative molecular epidemiology of two closely related coronaviruses, bovine coronavirus (bcov) and human coronavirus oc (hcov-oc ), reveals a different evolutionary pattern the seroprevalence of canine respiratory coronavirus and canine influenza virus in dogs in new zealand fast and accurate short read alignment with burrows-wheeler transform genome project data processing, s, . the sequence alignment/ map format and samtools megahit v . : a fast and scalable metagenome assembler driven by advanced methodologies and community practices discovery of a novel canine respiratory coronavirus support genetic recombination among betacoronavirus rdp: detection of recombination amongst aligned sequences a modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints rdp : a flexible and fast computer program for analyzing recombination detection of canine pneumovirus in dogs with canine infectious respiratory disease european surveillance of emerging pathogens associated with canine infectious respiratory disease coronavirus as a possible cause of severe acute respiratory syndrome serological prevalence of canine respiratory coronavirus serological prevalence of canine respiratory coronavirus in southern italy and epidemiological relationship with canine enteric coronavirus mosaic structure of human coronavirus nl , one thousand years of evolution r: a language and environment for statistical computing. r foundation for statistical computing exploring the temporal structure of heterochronous sequences using tempest (formerly path-o-gen) genetic drift of human coronavirus oc spike gene during adaptive evolution detection of respiratory viruses and bordetella bronchiseptica in dogs with acute respiratory tract infections prokka: rapid prokaryotic genome annotation choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences a survey of canine respiratory pathogens in new zealand dogs epidemiology, genetic recombination, and pathogenesis of coronaviruses canine respiratory coronavirus, bovine coronavirus, and human coronavirus oc : receptors and attachment factors probabilistic inference of viral quasispecies subject to recombination identification of a new human coronavirus evolutionary insights into the ecology of coronaviruses coronavirus diversity, phylogeny and interspecies jumping survey of dogs for group canine coronavirus infection shorah: estimating the genetic diversity of a mixed sample from next-generation sequencing data genotype shift in human coronavirus oc and emergence of a novel genotype by natural recombination we acknowledge the effort of all veterinarians involved in collecting samples for this study, as well as the animal owners and dogs enrolling.the authors would like to acknowledge support of the national genomics infrastructure (ngi)/uppsala genome center and uppmax for providing assistance in massive parallel sequencing and computational infrastructure. work performed at ngi/uppsala genome center has been funded by rfi/vr and science for life laboratory, sweden. we would also like to acknowledge slu global bioinformatics centre for providing resources for all computing.the authors would like to acknowledge the support of johanna hasmats at agilent technologies with the probe-based-capture method development and karin ullman at the national veterinary institute for support with the miseq sequencing. this study was funded by agria animal insurances and the swedish kennel club research foundation, jan skogborgs foundation, sveland foundation for animal health and welfare, msd animal health (grant number n - ) and the swedish research council for environment, agricultural sciences and spatial planning, formas (grant number - - ). supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . key: cord- -bedfwdyt authors: afelt, aneta; lacroix, audrey; zawadzka-pawlewska, urszula; pokojski, wojciech; buchy, philippe; frutos, roger title: distribution of bat-borne viruses and environment patterns date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: bedfwdyt environmental modifications are leading to biodiversity changes, loss and habitat disturbance. this in turn increases contacts between wildlife and hence the risk of transmission and emergence of zoonotic diseases. we analyzed the environment and land use using remote spatial data around the sampling locations of bats positive for coronavirus ( sites) and astrovirus ( sites) collected in sites. a clear association between viruses and hosts was observed. viruses associated to synanthropic bat genera, such as myotis or scotophilus were associated to highly transformed habitats with human presence while viruses associated to fruit bat genera were correlated with natural environments with dense forest, grassland areas and regions of high elevation. in particular, group c betacoronavirus were associated with mosaic habitats found in anthropized environments. south-east asia (sea) is considered a hotspot for emerging infectious diseases (jones et al., ) . the region is undergoing major demographic and economic development with major impacts on environment and biodiversity (sodhi et al., ) . outbreaks of nipah virus infections and severe acute respiratory syndrome (sars) pandemic in sea both originated from bats (chua et al., ; wang et al., ; field, ) . sea is hosting a high bat diversity and is habitat for % of the known global bat fauna (kingston, ) . in sea, bats are often hunted for food (mildenstein et al., ) . in cambodia, bat guano is collected on guano farms to be used as a plant fertilizer (chhay, ) . logging and conversion of forests into agricultural lands, monoculture plantations and urban areas have impacted the land cover configuration (flint, ; sodhi et al., ; defries et al., ) . the greater mekong subregion has lost % of its forest since (costenbader et al., ) . sea is predicted to lose % of its original forest and % of its mammal species by (sodhi et al., ) . the deforestation rates in sea are the highest in any tropical regions which could bring bats even closer to humans (achard et al., ; sodhi et al., ; stibig et al., ) . bats were discovered as a reservoir for the progenitor virus of the sars-coronavirus (sars-cov) species, responsible for the sars pandemic between and (wang et al., ) . coronaviruses belonging to alpha-cov and beta-cov, including mers-like viruses have been recently detected in cambodia and lao bat populations (lacroix et al., a) . several other viral families have also been detected, including astroviruses, flaviviruses, lyssaviruses, bunyaviruses or henipaviruses (salaün et al., ; olson et al., ; osborne et al., ; reynes et al., reynes et al., , lacroix et al., b) . other works have shed light on how anthropogenic and environmental changes may impact the dynamic of virus transmission and public health (calisher et al., ; han et al., ) . bats are usually sensitive to environmental changes which can modify the dynamic of populations with a consequent impact on the risk of virus transmission. in order to investigate if bat viruses can be found in any kind of environment or are associated with specific landscapes, we conducted a multivariate analysis of isolated bat viruses, bats and environments previously described (lacroix et al., a (lacroix et al., , b . the study was carried out in cambodia and lao pdr which experience a rainy season from may to october, and a dry season from november to april. the thickly forested lao pdr landscape consists of rugged mountains (up to m) surrounded by plains and plateaus. the mekong river represents the "lifeline" of the country, and is the major economic corridor (sisouphanthong and taillard, ) . in cambodia, the landscape is characterized by a low-lying central plain including the tonle sap lake and the upper areas of the mekong river delta dominated by irrigated cultures. these areas are flooded during the rainy season (wildlife, ) . they are surrounded by uplands and low mountains, and thinly forested and transitional plains at around m above sea level (wildlife, ) . cambodia and lao pdr experienced an important forest loss and fragmentation: % and % of their forest cover disappeared between and , respectively (wwf, . in , the forest covered . % and . % of cambodia and lao pdr respectively (fao and world bank, ; world development indicators, ) . bats were collected from sites in cambodia and lao pdr between and as previously described (lacroix et al., a (lacroix et al., , b (supplementary table ; fig. ). sampling was conducted in two phases (lacroix et al., a (lacroix et al., , b . phase was performed in by the institut pasteur in cambodia (ipc). bats were captured using harp traps, stored in mist net bags and humanely euthanized under supervision of the national veterinary research institute in full compliance with local ethical and legal guidelines at the sampling sites in the cambodian provinces of ratanakiri, stung treng and preah vihear. phase of sampling was performed by the wildlife conservation society (wcs), as part of usaid predict project, from to . samples were taken at the human-bat interface and thus mostly from bats dedicated to human consumption, in markets, restaurants or directly from hunters. fresh guano samples were also taken from guano farms in cambodia. geographical coordinates of each collection site were recorded using a global positioning system (gps). viral rna was extracted from rectal swabs, oral swabs and lung samples as previously described (lacroix et al., a (lacroix et al., , b ( table ) . the detection was based on broadly reactive reverse transcription-polymerase chain reaction (rt-pcr) assays, following previously described procedures (chu et al., ; watanabe et al., ) . land cover data were obtained from globeland (glc ) service operated by the national geomatics center of china (ngcc, ) . initial data were produced in and updated in . images were from landsat . land cover map showed the main classes of land use as a synthesis of various material types, i.e. natural attributes and features on earth used for globeland (glc ) classification from × m multispectral satellite images. schemes were merged into six land use categories: crop land, forest, grassland, wetland, water bodies and settlement areas. elevation map was obtained from shuttle radar topography mission (srtm) of the consortium for spatial information (cgiar-csi). most bats were obtained from markets or directly from hunters and have been hunted in surrounding areas (lacroix et al., a (lacroix et al., , b . it was thus important to estimate the area of origin of these hunted bats. according to road connectivity and cost effectiveness, the distance between hunting and selling points was assumed to be less than km. a -km radius was in agreement with the ecology and potential flying distances of the collected bats (flemming and eby, ; kunz and fenton, ; iucn, ) and should then encompass the original environment of the hunted bats. in this study, this km ( km radius) buffer zone was designated as "area of interest (aoi)". aois are shown in fig. . data were extracted using the patch analyst extension in geographical information systems (gis) software arcmap . . . forest cover was examined more precisely in each aoi: ) as a simple proportion of forest to each land cover type, ) using the shannon diversity index (sdi), ) using the fg fragmentation index defined as the ratio of the surface of the forest to the edge length of the forest (fortin et al., ) and ) using the ed edge density index which is the density of forest borders divided by the forest surface. administrative data were obtained from the gadm database of global administrative areas (version . , november ). main road networks in cambodia and lao pdr were obtained from the digital chart of the world (dcw) available on the diva-gis online resource (www. diva-gis.org) and google earth. the network density in each aoi was assessed by measuring the total road length and the number of intersections. connectivity index shows the density of roads connections for each aoi as the total length of the main roads (l) and the number of intersections (i). connectivity = log (l * l). a hierarchical analysis, also known as cluster analysis, based on environmental parameters in each aoi was performed in order to describe the different type of habitats using the software r version . . (www.r-project.org). the principle of hierarchical analysis is to build a binary tree of data that successively merges similar groups of points. from a statistical point of view each group is described by the index of similarity. parameters used for hierarchical analysis were: land cover [% forest cover, % settlement cover, % grassland, % surface water cover, % cropland cover and % wetland cover]; mean elevation; and connectivity. similarity is described by the distance on the y axis, in this work a euclidean distance. as a first step, data are grouped into most similar pairs (expressed by a correlation index). as a second step, pairs are grouped into larger groups of similarity. in this work, euclidean distances between and were yielded independent clusters and each cluster comprised from to statistically similar aois. principal component analyses (pca) were performed using statistica v. . pca analysis shows statistical, multidimension links between several variables and is used to emphasize variations. pca requires statistically independent factors. the information is shown as a set of new orthogonal variables, the principal components (p). the pattern of similarity of the observed variables is displayed as points on a surface. in this case we used pca to assess the correspondence of spatial environment and anthropogenic factors with respect to (a) astrovirus and coronavirus found on bats, and (b) to spatial environment and anthropogenic factors. as a first step all factors corresponding to a. afelt et al. infection, genetics and evolution ( ) - spatial analysis and considered in each aoi were assessed for correlation using a correlation matrix. out of factors considered, were strongly correlated and dependent, i.e. forest, ed, sdi (r = . - . ). out of these three factors, sdi (shannon index) provided a more complete information integrated both forest cover and density. sdi was therefore retained for pca analysis while forest and ed were excluded from this analysis. pca analysis was therefore performed with environment-linked factors: sdi, fragmentation, connectivity, settlements, cropland, wetland, grassland, water, and elevation. pca analysis was preceded by data normalization. finally, three sets of supplementary variables were investigated, i.e. (i) bats sampled in each aoi, (ii) coronavirus isolated from sampled bats, and (iii) astrovirus isolated from sampled bats. a total of bats were sampled, in cambodia and in lao pdr at the interface between humans and bats (table , figs. , ). for animals collected in the environment and in guano farms, the exact environmental description of the sampling location was available. for samples collected in restaurants and meat shops or directly from hunters, the exact original location of animals remained unknown but they were hunted for market trade near the selling place, i.e., mostly from areas at the border of deep forests, mixed agricultural zones with sparse forests or protected forest areas, close to water surfaces or in limestone karsts areas with mountain forests (fig. ) bat astv ( ) rhin, rs αcov- , βcov-d ( ) tph, hip, rhin, rs bat astv, cluster astv ( ) - a. afelt et al. infection, genetics and evolution ( ) - b). most investigated areas, i.e. % ( sites) were located in lowland areas m including location under m, in cambodia. aoi were located between m and m, mostly in north cambodia and lao pdr (mostly rivers valleys) whereas aoi were located above m in northern lao pdr. aoi were described by nine statistically independent parameters: i.e. water cover, settlements cover, cropland cover, wetland cover, grassland cover, elevation, forest shannon index, forest fragmentation index and connectivity index ( table ). hierarchical analysis based on these parameters resulted into clusters of landscape parametrization (fig. ) . the main parameters for cluster characterization were: grass cover, cropland, shannon index, fragmentation index and connectivity index. cluster a included aois displaying medium to high elevation, forest percentage of ca. - %, low fragmentation, low water and wetland percentage a. afelt et al. infection, genetics and evolution ( ) - and high connectivity index ( . %- . %). only one aoi ( ) was negative for viruses. aoi , , , and were positive for coronavirus while only two sites ( and ) were positive for astrovirus. cluster b comprised aois ( , , ), including four sites positive for at least one virus. in these areas, forest cover ( - %) and shannon index were higher than for cluster a and the human settlements cover was small or not detected in a × m resolution, indicating that villages were small and scattered. both clusters a and b were located in lao pdr. euclidean distance indicated that clusters a and b segregated independently from the other clusters (fig. ) . these remaining clusters were characterized by low cropland cover, low human settlement cover and high shannon index. clusters c, d, e and f corresponded to aoi located in cambodia. the cluster pairs c & d and e & f were the most similar. cluster c included aois with only two ( , ) where only coronavirus were detected while no astrovirus was present. the landscape corresponded to to % of forest cover, . to % of grass cover and a similar elevation ( m above sea level). for both aoi and , the connectivity index was high whereas the human settlement cover was less than . % indicating that villages were small and scattered. cluster d comprised aois ( - ) displaying cultivated areas with high cropland cover and one of the highest settlement cover ( . % for aoi ). three aois ( , , ) were positive for astrovirus, including aoi which displayed a high wetland cover and one of the highest cropland cover. cluster e covered aois ( - ) dominated by forest ( - %) with the lowest connectivity index, one of the lowest forest fragmentation index and limited cultivated and grassland areas. the shannon index was also among the lowest. cluster e represented a wilderness parts of cambodia. cluster e also displayed the highest virus richness, i.e. coronavirus sub-clusters and astrovirus sub-clusters. the last cluster, f, comprised aois ( - ) out of which were positive for viruses ( , , , , , ) . all displayed mostly cultivated land with patches of forest, and the highest connectivity index and wetland cover, corresponding to highly transformed environments. the human settlement cover was rather low ( . - . %) indicating that villages were concentrated in clusters. cluster f also displayed the highest rate of coronavirus detection ( , including αcov_ in aoi ) and astrovirus ( bat_astvs, located only in aois and ). distances indicated that clusters e and f were the most divergent (fig. ) . another conclusion from this analysis is the higher bat biodiversity observed in cambodia associated to more anthropized environment than in lao pdr where dense forest was predominant (fig. ) . pca were performed with the nine selected active independent astroviruses from the clusters , , , , , , including only bat astvs (lacroix et al., b) supplementary variables are shown in italic. a. afelt et al. infection, genetics and evolution ( ) a. afelt et al. infection, genetics and evolution ( ) - variables (i.e. water cover, settlements cover, cropland cover, wetland cover, grassland cover, elevation, forest shannon index (sdi), forest fragmentation index and connectivity index) for coronavirus subclusters (fig. a ) and for astrovirus subclusters (fig. b) . owing to the fact that only aois out harbored bats positive for both viruses, the pca relating to coronaviruses and astroviruses differed slightly. although they were established with the same parameters, the positive sites being different, the values associated with these parameters were also slightly different. with respect to the coronavirus-positive bats (fig. a) , the first axis of the pca (pc ) accounted for . % of the total variability while the second axis (pc ) accounted for %, so a total variability of . %. the contributions of the environmental variables to the construction of this axis were the highest for elevation, sdi and grassland which were correlated on the negative side and connectivity, forest fragmentation, cropland cover, water cover, wetland cover and settlement cover which were distributed on the positive side (fig. a) . the principal dimension of variability of the pca opposed anthropogenic transformed lands on the positive side to more natural environments (i.e. forest, higher elevation areas and grassland areas, with low fragmentation) on the negative side (fig. a) . the second component of the pca (pc ) accounted for % of the total variability. fragmentation index, settlement cover and connectivity index were close to the axis and did not participate to the topology of the pca. grassland cover, water cover bodies, wetland cover, elevation and cropland cover participated the most to topology on the negative side. sdi was located on the positive side with a more limited weight (fig. a) . when considering the pca representing the astrovirus-positive aois, contributions were slightly different (fig. b) . the first two axes contributed for a total of . % with axis representing . % of the dispersion and axis representing . %. the distribution with respect to axis was not affected and remained the same as for coronavirus-positive aois. the axis displayed however a slightly different topology (fig. b) . sdi moved to the negative side but very close to axis and therefore did not bear any weight in the analysis. cropland cover remained in the negative side but closer to axis . settlements cover, connectivity index and fragmentation index moved to the positive side to be moderately distant from the axis and gaining thus in representativeness (fig. b) . the betacoronaviruses βcov_d and βcov_d were strongly associated with natural habitats and in particular with grassland and high elevation (fig. a) . the betacoronavirus clusters βcov_d and, to a lower extent, βcov_d and βcov_c, seemed to be more correlated with the forest density gradient. βcov-d was the most influenced by higher forest density (fig. a) . the alphacoronavirus αcov_ and αcov_ were strongly associated to anthropogenic environments with wetlands, whereas αcov_ was in contrary slightly more associated with more natural habitats (fig. b) . astroviruses also displayed a clear differential distribution associated with environmental parameters. the un-g_astv cluster was strongly associated to natural environments, in particular grassland and higher elevation, whereas the mur_astv cluster was strongly associated to an anthropogenic habitat (fig. b) . the ba-t_astv cluster was clearly associated to open areas close to water and wetlands (fig. b) . bat genera displayed a strong correspondence with specific environmental parameters (fig. ) . the two main axes of the pca represented a total of . % of inertia, with % and . % for axis and axis , respectively. the genus myotis was strongly correlated with anthropogenic environments, water areas and wetlands. the distribution of scotophilus was similarly influenced by forest fragmentation, water areas and anthropogenic environments. conversely, the genera hipposideros, eonycteris, ia and rhinolophus were correlated with more natural habitats comprising dense forests and grassland area at higher elevation. this correlation was stronger for the former two. the genus rousettus was close to the center, indicating the distribution of this genus was not strongly influenced by the parameters. megaderma and taphozous were strongly opposed to all environmental parameters analyzed indicating that their distribution was influenced by a distinct parameter not considered in this work. this work demonstrates both a clear distribution of bats and batborne viruses depending on environmental factors. for example, the lesser asian bat (scotophilus) shows correlation with the anthropogenic, fragmented environments, close to water areas. the mouse-eared bat (myotis) also tends to be collected in anthropogenic areas around lakes or biggest regional river corridors regularly flooded. these observations are in accordance with their biology and ecology: myotis and scotophilus can adapt to many habitats but preferentially settle in anthropogenic habitats, in old buildings or crevasses. they rely on water bodies, which attract prey for these insectivorous bats (bates et al., b; rosell-ambal et al., ) . scotophilus bats are also reared in farms for their guano (chhay, ) . the presence of hipposideros, macroglossus and rhinolophus, in environments with forest areas mixed with other categories of land cover (mosaic landscape) is also in accordance with their ecology (bates et al., a; hutson et al., ; walston et al., ) . as a consequence, the two coronavirus sub-clusters αcov_ and αcov_ , detected specifically in scotophilus and myotis respectively (lacroix et al., a) displayed the same trend, i.e. a high correlation with anthropogenic environment. fruit and forest-dependent bat genera testing positive for viruses (rousettus, eonycteris) were correlated with an environment mostly consisting of densely forested areas at a higher elevation. the betacoronavirus clusters βcov_d and βcov_d which were found mostly in bats from the genera rousettus and eonycteris also correlated with the same type of environment. finding coronavirus sub-clusters specific to a given bat genus in similar environment types indicates that the distribution of viruses is likely to follow the distribution of their hosts. the highest diversity of bats observed in anthropized environments emphasizes furthermore the risk associated with environmental changes. deforestation and anthropization instead of leading to the elimination of bats generate conversely yield to a higher diversity. this might be explained by the complexity of a. afelt et al. infection, genetics and evolution ( ) - the anthropized environments which offer opportunities to different groups of ubiquity bat species whereas natural environments might be more selective and suited for species with more strict ecological requirements. this higher biodiversity occurring by definition at the human interface, the risk of virus transmission is therefore increasing as well. a particular attention has to be paid to the sub-cluster αcov_ and βcov_c. these sub-clusters are composed by viruses which are highly pathogenic for pigs and humans respectively. the sub-cluster αcov_ comprises the coronaviruses responsible for the porcine epidemic diarrhea, which threatens pig farming activities around the world (song et al., ; lacroix et al., a) . βcov_c comprises the zoonotic strain of coronavirus responsible for the middle east respiratory syndrome (mers) (omrani et al., ) . the sub-cluster αcov_ and lineage βcov_c are linked to anthropogenic habitats in accordance with the presence of their hosts (myotis and pipistrellus, respectively) in the same environment rosell-ambal et al., ) . the correspondence with host habitat is even more evident with astroviruses. ungulate astrovirus found in ungulates and bats are associated with grasslands at higher elevation while murine astroviruses found in rodents and bats are associated with human settlements, indicating that the host bats are present in these same habitats. bat astroviruses, found only in bats, are associated with water surface and irrigated cultures. this could be explained by the fact that irrigated cultures provide an important insect biomass and food resources for insectivorous bat populations which are harboring mainly arboviruses (kunz and fenton, ; lacroix et al., b) . fragmentation of the habitat has been reported as a factor having a major importance in the spatial distribution of species. the distribution pattern of bat population has been found to also impact the viral richness of the host (henle et al., ; meyer et al., ; gay et al., ) . the fragmentation of bat populations was linked to a decrease in viral species richness (gay et al., ) . however, this work indicates that if fragmentation is an important parameter, it is not sufficient by itself to explain the distribution and must be considered jointly with the shannon diversity index (sdi index) to yield a picture accurate enough. in this work, combining these parameters allowed us to clearly associate certain clusters of viruses to given environments and this is to our knowledge, the first time it is achieved in sea. this correlation is a direct consequence of the association between bats hosting these viruses and specific environments. this work should thus be accompanied by the modeling of the modification of landscapes (jung and threlfall, ) over the next decades in order to assess and predict which part of the current biodiversity may increase or be at risk. from a methodological standpoint, large sampling was not possible because of the status of protected species attached to the reservoir (iucn, ). on another hand, avoiding the sampling of protected reservoirs for the sake of conservation and thus avoiding early detection and risk assessment is putting human populations at risk and contrary to public health policy (morse, ; lipkin and anthony, ) . in this work we considered both aspects at once. the limited sample size and limited quality of the bat material are a direct consequence of their status of protected species which was respected and enforced. indeed, the bat sample collection was opportunistic, and mostly based on animals hunted by local populations (lacroix et al., a (lacroix et al., , b . one of the consequences was the difficulty to reach the level of species identification but this could be easily achieved in the future by implementing procedures integrating molecular typing of bats (clare et al., ; korstian et al., ) . however, if the species level could not be reached with certainty in a significant part of the samples, except for ia io, pipistrellus coromandra and megaerops niphanae, the ecology of the species present in south east asia was similar within a given genus. the iucn red list of threatened species provides the basis for a comparative analysis (iucn, ). scotophilus species found in south east asia are reported to roost in crevices, cracks in walls, or roof of old buildings as well as leaves and crowns of palms, hollows of trees and among leaves of banana. this is in conformity with the analysis of the distribution of scotophilus reported in this work as being influenced by forest fragmentation, water areas and anthropogenic environments. the genus myotis was found in this work to strongly correlate with anthropogenic environments, water areas and wetlands. the ecology of myotis species in south east asia is matching this description (iucn, ). m. annectans was found on a river valley, m. horsfieldii and m. ater are found near to water source and streams in lowland forest as well as disturbed forest and agricultural areas. m. rosseti is found in disturbed areas while m. siligorensis has been collected in lowland second growth forests over streams. m. pilosus is a fish-eating bat and therefore strictly dependent on water. we reported in this work that the genera hipposideros, eonycteris, ia and rhinolophus as correlating with more natural habitats comprising dense forests and grassland area at higher elevation. hipposideros scutinares is known only from caves in limestone areas while h. cineraceus is roosting in hollows of trees in forests in south east asia. h. galeritus is found in lowland forests as well as rubber plantations in southeast asia (iucn, ) . h. pomona ecology is not well known but in southeast asia it roosts essentially in caves. similarly not much is known about h. rotalis which is considered to live in dry forests (iucn, ) . eonycteris spelaea is found in caves in forested areas. however, it is a forest nectar eating bat which has adapted to agricultural and orchard crops (iucn, ) . with respect to rhinolophus, similar features are observed. r. coelophyllus is found in forest, r. lepidus is associated with intact lowland tropical moist forest. r. malayanus is found in caves in secondary forest and degraded habitat. r. paradoxolophus was found in dry pine forest, while r. affinis is living in primary and secondary forest (iucn, ) . the two rousettus species described in the iucn database as present in lao pdr and cambodia, rousettus amplexicaudatus and rousettus leschenaultia, share the same ecology and roost in caves, old and ruined buildings and disused tunnels in forest, agricultural areas, disturbed habitats and at the forest edge. this matches the position described for rousettus in this work. finally, with respect to taphozous theobaldi and taphozous melanopogon, they both share the same forest habitat with roosting in caves or abandoned buildings and mines (iucn, ) . this short review of the ecology of the bats genera described in this work indicates that the species of a given genus share the same ecological traits and therefore that the lack of identification at the species is not impairing the conclusions on correlation between batborne viruses and environmental patterns. another potentially biasing aspect is that the estimated distance to the actual origin of the sampled bats was based on an assumption. this could have been underestimated or overestimated for some collection sites. each bat genus is likely to present specificities in terms of migrations, flying distance and how they explore their environment (flights during days or nights, occurrences of flights for food search, etc.) (smith et al., ) . one must also take into account if species associated to similar environment shelter together. this will influence the presence and prevalence of bat-borne viruses which is not only a consequence of environmental change but also of ethology and ecology of bats. this can thus influence the occurrence of contacts with humans and should be further investigated in order to consider the specificity of each bat genus. however, this problem has been overcome with the definition of aoi considering both economic and ecological aspects. another key methodological aspect is the resolution and quality of spatial data. this resolution must be adapted to the observed differing density and a × m resolution seems to be a good compromise between quality, definition, data access and calculation burden. therefore, despite this limitation, the statistical approach implemented in this work allowed us to obtain the information needed on association with virus and environment. this can help designing prospective risk scenarios based on the expected evolution of landscape. this brings us to the last aspect which is the possibility to model and develop scenarios of risk management. the prediction of landscape modification, climate change and anthropogenic behaviors (urbanization, roads, development of croplands, animal husbandry, etc.) can be achieved by economic planning and using gis analysis. the association between environmental patterns and viral sub-clusters described in this work will therefore permit to develop a typology of situations at risk depending on (i) the virus involved, (ii) on the evolution of risk of contact, and (iii) the risk of emergence in the different situations which could arise. preventive and protective actions could therefore be proposed to reduce this risk and hinder the evolution and development of hazardous contexts. this work is a preliminary step and deeper analyses and modeling must be further investigated. nevertheless it provides opportunities for developing prospective approaches capable of managing protection of bats and wildlife while protecting human populations from potentially devastating emerging bat-borne viral diseases. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . determination of deforestation rates of the world's humid tropical forests hipposideros armiger scotophilus kuhlii. iucn red list threat. species bats: important reservoir hosts of emerging viruses cambodian bats: a review of farming practices and economic value of lesser asiatic yellow house bat scotophilus kuhlii (leach, ), in kandal and takeo provinces, cambodia. camb novel astroviruses in insectivorous bats nipah virus: a recently emergent deadly paramyxovirus neotropical bats: estimating species diversity with dna barcodes drivers affecting forest change in the greater mekong subregion (gms): an overview. fao, usaid and leaf deforestation driven by urban population growth and agricultural trade in the twenty-first century bats and emerging zoonoses: henipaviruses and sars ecology of bat migration changes in land use in south and southeast asia from to : a data base prepared as part of a coordinated research program on carbon fluxes in the tropics forest area (% of land area species' geographic ranges and distributional limits: pattern analysis and statistical issues parasite and viral species richness of southeast asian bats: fragmentation of area distribution matters bats as reservoirs of severe emerging infectious diseases predictors of species sensitivity to fragmentation macroglossus sobrinus. iucn red list threat. species iucn red list of threatened species. version . < www.iucnredlist global trends in emerging infectious diseases bats in the anthropocene: conservation of bats in a changing world research priorities for bat conservation in southeast asia: a consensus approach using dna barcoding to improve bat carcass identification at wind farms in the united states bat ecology genetic diversity of coronaviruses in bats in lao pdr and cambodia diversity of bat astroviruses in lao pdr and cambodia virus hunting ecological correlates of vulnerability to fragmentation in neotropical bats bats in the anthropocene: conservation of bats in a changing world public health surveillance and infectious disease detection meter global land cover dataset antibodies to nipah-like virus in bats middle east respiratory syndrome coronavirus (mers-cov): animal to human interaction isolation of kaeng khoi virus from dead chaerephon plicata bats in cambodia serologic evidence of lyssavirus infection in bats nipah virus in lyle's flying foxes a new virus, phnom-penh bat virus, isolated in cambodia from a short-nosed fruit bat, cynopterus brachyotis angulatus atlas of cambodia. maps on socio-economic development and environment. save cambodia's wildlife atlas of laos: the spatial structures of economic and social development of the lao people's democratic republic landscape size affects the relative importance of habitat amount, habitat fragmentation, and matrix quality on forest birds southeast asian biodiversity: an impending disaster porcine epidemic diarrhea: a review of current epidemiology and available vaccines forest cover change in southeast asia-the regional pattern rhinolophus affinis. red list threat. species review of bats and sars world development indicators: poverty & equity ecosystems in the greater mekong: past trends, current status, possible futures. world wide fund for nature a. lacroix was supported by the usaid emerging pandemic threats predict project (cooperative agreement number ghn-a-oo- - - ). we are very grateful to philippe dussart, institut pasteur in cambodia, for his support. philippe buchy is currently an employee of gsk vaccines. key: cord- - eatplc authors: wang, yongjin; shi, huiling; rigolet, pascal; wu, nannan; zhu, lichen; xi, xu-guang; vabret, astrid; wang, xiaoming; wang, tianhou title: nsp proteins of group i and sars coronaviruses share structural and functional similarities date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: eatplc the nsp protein of the highly pathogenic sars coronavirus suppresses host protein synthesis, including genes involved in the innate immune system. a bioinformatic analysis revealed that the nsp proteins of group i and sars coronaviruses have similar structures. nsp proteins of group i coronaviruses interacted with host ribosomal s subunit and did not inhibit irf- activation. however, synthesis of host immune and non-immune proteins was inhibited by nsp proteins at both transcriptional and translational levels, similar to sars coronavirus nsp . these results indicate that different coronaviruses might employ the same nsp mechanism to antagonize host innate immunity and cell proliferation. however, nsp may not be the key determinant of viral pathogenicity, or the factor used by the sars coronavirus to evade host innate immunity. viral infections are sensed by the host innate immune system through toll-like receptors and retinoic acid-inducible gene-i-like receptors, which trigger innate immune signal transduction, leading to production of type i interferons (ifns), and hundreds of proinflammatory cytokines that suppress viral spread (kawai and akira, ) . in return, viruses have evolved at least three mechanisms to evade or antagonize host innate immunity. one is blocking ifn induction. host innate immunity is initiated by recognition of cellular rna sensors and viral rna, which in turn interacts with the adaptor ips- to activate the ifn transcription factor irf- and nf-kb through kinases tbk- /ikki and ikka/b (kawai and akira, ) . many viruses encode proteins to block the ifn induction pathway. examples include the influenza a virus ns protein (lu et al., ; garcia-sastre, ) , the reovirus sigma protein (jacobs and langland, ) , the ebola virus vp protein (cardenas et al., ) , the poxvirus e l protein (xiang et al., ) , the herpes simplex virus us protein (poppers et al., ) and the murine cytomegalovirus m and m proteins (valchanova et al., ) . the second mechanism is interfering with ifn-activated signaling, mainly through interacting with the janus kinases jak- , tyk- , stat- and stat- to block the jak-stat signaling pathway. examples include the ebola virus vp protein (reid et al., ) , the paramyxovirus c and v proteins (didcock et al., ; gotoh et al., ) , and the rabies virus p protein (brzozka et al., ) . the third method is inhibiting the specific antiviral proteins that mediate the antiviral state. viral dsrna-binding proteins are mostly studied for their capability of preventing activation of pkr or the - oas/rnasel system, as demonstrated by the reovirus sigma (imani and jacobs, ) , the herpesvirus us protein (poppers et al., ) , the poxvirus e l (langland and jacobs, ) , the cytomegaloviruses dsrnabinding proteins (hakki and geballe, ) , and the influenza virus ns protein (min and krug, ) . coronaviruses are important human and animal pathogens that are divided into three groups based on serological criteria. the group ii coronaviruses severe acute respiratory syndrome coronavirus (sars-cov) and mouse hepatitis coronavirus (mhv) encode a number of proteins that antagonize host innate immunity. the orf b and orf proteins inhibit both ifn synthesis and signaling (kopecky-bromberg et al., ) . the nucleocapsid protein inhibits nfkb promoter and ifn synthesis (kopecky-bromberg et al., ; ye et al., ) . the papain-like protease interacts with irf- and inhibits its phosphorylation and nuclear translocation (devaraj et al., ) . the a protein causes endoplasmic reticulum stress, and antagonizes ifn responses and innate immunity (minakshi et al., ) . finally, the m protein associates with rig-i, tbk , ikkepsilon, and traf to inhibit the activation of irf- /irf- the nsp protein of the highly pathogenic sars coronavirus suppresses host protein synthesis, including genes involved in the innate immune system. a bioinformatic analysis revealed that the nsp proteins of group i and sars coronaviruses have similar structures. nsp proteins of group i coronaviruses interacted with host ribosomal s subunit and did not inhibit irf- activation. however, synthesis of host immune and non-immune proteins was inhibited by nsp proteins at both transcriptional and translational levels, similar to sars coronavirus nsp . these results indicate that different coronaviruses might employ the same nsp mechanism to antagonize host innate immunity and cell proliferation. however, nsp may not be the key determinant of viral pathogenicity, or the factor used by the sars coronavirus to evade host innate immunity. ß elsevier b.v. all rights reserved. transcription factors (siu et al., ) . nsp is another intensively studied viral protein. the nsp proteins of sars-cov and mhv promote host mrna degradation and suppress host gene expression (kamitani et al., ; zust et al., ; narayanan et al., ) . nuclear magnetic resonance (nmr) analysis revealed that sars-cov nsp has a novel b-barrel structure mixed with ahelixes (almeida et al., ) . more recently, sars-cov nsp was found to block host translational machinery function by binding to the ribosome small subunit (kamitani et al., ) . these studies suggest that coronavirus nsp is a major virulence and pathogenicity factor (kamitani et al., ; wathelet et al., ; zust et al., ) . little is known about the biological function of group i coronavirus nsp proteins. in this study, we conducted a comparative study of nsp proteins from groups i and ii coronaviruses. we compared models for nsp proteins of hcov- e and hcov-nl with models of sars-cov. we analyzed the interaction between nsp proteins and cellular proteins by coimmunoprecipitation (co-ip). finally, we systematically investigated the mechanism by which group i coronavirus nsp proteins suppress host protein synthesis, including in the innate immune system. the cell line was maintained at c in a % co incubator in dulbecco's modified eagle medium (dmem; gibco) supplemented with % fetal bovine serum (excell bio, new zealand). newcastle disease virus (ndv) was harvested from chicken embryos. plasmids pcdna-sarsnsp , pcdna- ensp and pcdna-nl nsp were constructed by reverse-transcription pcr amplification of nsp genes from sars-cov strain tor , hcov- e strain atcc vr- and hcov-nl strain amsterdam i genomic rna and inserted into pcdna . vector (invitrogen, carlsbad, ca) under the control of the human cytomegalovirus (cmv) promoter. an ha tag was added at the c-terminus for all constructs. plasmids pgl . , pgl . and pgl . , which contain renilla luciferase reporter genes driven by the sv , herpes simplex virus thymidine kinase (hsv-tk) and cmv promoters, respectively, were from promega (madison, wi). plasmids pgl -hifnb-p and pgl -hisg -p were constructed by pcr amplification of the human ifn-b gene promotor region (À to + ) or the human interferon stimulated gene (isg ) promotor region (À to + ) from genomic dna, and cloned into pgl . (promega). structures for the nsp proteins of hcov- e and hcov-nl were computed with modeller software (marti-renom et al., ) using the solution structure of nsp - from the sars-cov (pdb code hsx) as a template structure (almeida et al., ) . sequence alignment of the nsp proteins from hcov- e, hcov-nl and sars-cov was performed with the program clustalw and refined manually. cells in six-well plate were transfected with ug of nsp expressing plasmids or pcdna . control plasmid (with ha tag) using lipofectamin (invitrogen). cells were harvested h after transfection and washed three times with phosphatebuffered saline (pbs). half the cells were treated with native lysis buffer (promega) for min on ice, and centrifuged, and the supernatant was used for immunoprecipitation using anti-ha antibody-coupled agarose beads (pierce) according to the manufacturer's instructions. eluted proteins were separated by % sds-page followed by immunoblotting analysis using anti-s polyclonal antibodies (genscript). proteins from the other aliquot of cells were separated by % sds-page followed by immunoblotting using anti-ha or anti-b-actin antibodies (abcam, hongkong). transfection of cells was as above for h, followed by ndv infection for h. ifn-b levels in cell supernatants were measured using a human ifn-b enzyme-linked immunosorbent (elisa) kit according to the manufacturer's instructions (pbl interferonsource, nj). cells were washed three times with pbs. proteins were separated by % sds-page gel and immunoblotted with anti-irf- or anti-phospho-irf- (ser ) antibodies (cell signaling, beverly, ma). transfection of cells was as above for h, followed by ndv infection for h. alternatively, cells were cotransfected with nsp -expressing plasmids or pcdna . control plasmid together with pgl -hifnb-p or pgl . for h. cell rna was extracted using a cell total rna isolation kit (axygen), followed by dnase i (fermentas) treatment for min at c to remove genomic dna. rna was reverse transcribed using a primescript first-strand cdna synthesis kit (takara, dalian, china), and sybr green-based quantitative real-time pcr was performed in a cfx machine (bio-rad) using a perfect real-time pcr kit (takara). primers were listed in table lipofectamine (invitrogen) was used to transfect cells with plasmids pgl -hifnb-p or pgl -hisg -p, and stable cell lines were selected with . mg/ml g (invitrogen) and subcloned. the stable cell lines were transfected with nsp expressing plasmids for h and then challenged with ndv for h or were co-transfected with nsp -expressing plasmids and poly (i:c) (sigma) for h. cells in microplates were disrupted with ml of passive lysis buffer (promega) and firefly luciferase activity measured using a dual-luciferase reporter assay system (promega) and a synergy hybrid multi-mode microplate reader (biotek, winooski). alternatively, normal cells were cotransfected with nsp -expressing plasmids with pgl . , pgl . or pgl . for h, and renilla luciferase activity was measured. data were analyzed with the paired student's t-test assuming that the values followed a gaussian distribution. a p-value of < . was considered significant. the three-dimensional structure of group i coronavirus nsp has not been previously solved. to evaluate the general fold of these proteins, we built models for nsp proteins of hcov- e and hcov-nl based on the solution structure of nsp - from the sars-cov (almeida et al., ) . sequence alignment showed only % identity and % similarity between the hcov- e and sars-cov nsp proteins. similarly, the nsp proteins of hcov-nl and sars-cov share only % identity and % similarity (fig. a) . nevertheless, a careful inspection of several nsp regions revealed conserved hydrophobic clusters counterbalancing the regions of low identity and low similarity. thus, % of the hydrophobic residues of hcov- e nsp , and % of the hydrophobic residues of hcov-nl nsp are conserved, or replaced by other hydrophobic residues, in the sars-cov nsp (fig. a) . these conserved or similar hydrophobic residues are not randomly positioned. most are found within, or very close to the regular secondary structure elements of nsp - of sars-cov (fig. a) . moreover, the hydrophobic residues crucial for the original b-barrel of sars-cov nsp - (almeida et al., ) are conserved or replaced by other hydrophobic residues in the hcov- e and hcov-nl nsp proteins ( fig. a and c) . the hydrophobic cluster analysis identified stronger signatures of the nsp fold, centered on the secondary structure elements. modeling of the hcov- e and hcov-nl nsp proteins led to three-dimensional structures that conserved the regular secondary structure elements of the unique b-barrel of sars-cov nsp - , while the loops linking the secondary structures were less similar (fig. b) . intriguingly, hcov- e nsp contains five cysteine residues, four of which are potentially exposed to solvent. taking into account the distances separating the cysteines, formation of disulfide bridges is unlikely. taken together, these findings reinforce the hypothesis that hcov- e and hcov-nl nsp proteins fold in a manner similar to sars-cov nsp - , and suggest that these proteins have similar structural and functional relationships. sars-cov nsp suppresses host cell protein synthesis by binding to the host cell ribosomal s subunit (kamitani et al., ). based on the bioinformatics results, we examined whether nsp proteins of group i coronaviruses also interacted with the ribosomal proteins. ha-tagged nsp proteins of hcov- e, hcov-nl and sars-cov were expressed in cells, and immunoprecipitated with anti-ha antibody. immunoblotting with anti-s antibody detected a band of $ kda from all three nsp proteinexpressing cells, which corresponds to the predicted size for the ribosomal protein s . the nsp proteins were detected using anti-ha antibody. the s and nsp proteins were not detected in mocktransfected cells. as controls, b-actin was detected in all cell lysates ( fig. a) . we next examined whether the activation of the ifn transcriptional factor irf- was inhibited by nsp proteins. innate immune signal transduction was stimulated by ndv infection in cells transfected with plasmids-expressing nsp from hcov- e, hcov-nl or sars-cov, or with a control plasmid. immunoblotting with anti-phospho-irf- (ser ) antibody showed a consistent, homogenous band for the phosphorylated irf- protein in cells expressing the three nsp proteins, and in mock-transfected cells, but not in negative control cells that were not stimulated with ndv. moreover, no obvious inhibition effect was observed in nsp -transfected cells compared to the mock-transfected cells. expression of irf- and b-actin proteins was consistently detected in all cell lysates (fig. b ). [ ( f i g . _ ) t d $ f i g ] we further investigated whether group i coronavirus nsp proteins inhibit ifn-b and luciferase transcription. nsp -plasmids were transfected into cells, and ifn-b or luciferase transcription was stimulated by ndv or ifn-b promoter-or cmv promoterdriven plasmids, followed by quantification of ifn-b mrna by real-time pcr. as shown in fig. , the mrna levels for cell ifn-b, ifn-b promoter-driven firefly luciferase or cmv promoter-driven renilla luciferase were inhibited by . - dct values by nsp proteins from hcov- e, hcov-nl and sars-cov. suppression of host protein synthesis by sars-cov nsp protein, including in the innate immune system, has been seen in several studies (kamitani et al., ; zust et al., ; narayanan et al., ) . since the group i coronavirus nsp proteins also interact with the cellular translational machinery, we examined the influence of hcov- e and hcov-nl nsp proteins on host immune and non-immune protein synthesis. luciferase reporter assays showed that synthesis of the innate immune promoter ifn-band isg -driven genes was suppressed by - -folds in hcov- e and hcov-nl nsp -expressing cells (fig. a) . synthesis of non-immune promoter-driven genes, including for sv , hsv-tk and cmv promoters, was inhibited to a similar extent by the two group i coronavirus nsp proteins (fig. b) . in contrast, sars-cov nsp suppressed promoter activity by only - -folds by all assays (fig. ) . titration of the released ifnb proteins in cell supernatant by elisa revealed similar results. the ifn-b levels in coronavirus nsp -expressing cells were - -folds lower than in control cells. (fig. c) . studies on nsp proteins of group ii coronavirus sars-cov and mhv revealed a novel mechanism for interaction between host and viral proteins. group i coronaviruses also encode nsp proteins of only about amino acids, with relatively high conservation within the group, but with a high degree of polymorphism with sars-cov nsp , which has amino acids. however, the nmr analysis revealed that the segment from residue to of sars-cov nsp determined its core structure (almeida et al., ) , and this corresponded to the overall length of the group i coronavirus nsp proteins. therefore, we built models for the nsp proteins of hcov- e and hcov-nl , two low-pathogenic group i human coronaviruses, for comparison with nsp from the highly pathogenic sars-cov. careful inspection of the sequence alignment showed that most of the structurally important hydrophobic residues are conserved or similar for nsp proteins from hcov- e, hcov-nl and sars-cov. computational modeling demonstrated overall similarity in the three-dimensional nsp structures from the two group i coronavirus and sars-cov. bioinformatics suggest that group i coronavirus and sars-cov nsp proteins might have similar biological functions. recent studies revealed that sars-cov nsp suppresses host protein synthesis by interacting with the ribosomal s subunit (kamitani et al., ) . in this study, we demonstrated that the group i coronavirus nsp proteins also bound to the ribosomal s subunit, as shown by co-ip. further analysis showed that activation of irf- was not affected, but synthesis of host immune and nonimmune proteins was potently suppressed by group i coronavirus nsp proteins. it seemed that synthesis of these proteins was more strongly inhibited at translational level than transcriptional level. these results indicate that group i coronaviruses have evolved a mechanism strikingly similar to sars-cov for antagonizing host cell proliferation and innate immunity using nsp . our results are in general consistent to that of kamitani et al. ( ) , who firstly reported that sars-cov nsp inhibited ifn-b mrna transcription and ifn-b promoter-driven luciferase protein synthesis. however, in our study, the inhibition of ifn-b mrna transcription and luciferase protein synthesis was less strong than that reported by kamitani et al. ( ) , most probably due to different expression levels of nsp . in their study, the sars-cov nsp was driven by a chicken b-actin/rabbit b-globin hybrid promoter (ag promoter) in pcaggs vector, which is known for its robust expression in eukaryote cells. in this study, nsp was driven by a cmv promoter, which was probably less efficient than the hybrid ag promoter. most human coronaviruses are low-pathogenic viruses that often cause mild lower respiratory tract infections like common cold (thiel and weber, ) . however, sars-cov infection appears highly pathogenic and causes severe pneumonia and acute respiratory distress syndrome, characterized by the presence of diffuse alveolar damage (weiss and navas-martin, ) . surprisingly, few viral particles are isolated from lung tissues of sars-cov infected patients, but levels of inflammatory cytokines and chemokines are greatly elevated in the lung. a hyperinflammatory response is presumed to be the key determinant of the high pathogenicity of sars-cov, rather than rapid viral spread (de lang et al., ) . the specific mechanism for sars-cov pathogenicity is not known, but a number of viral coding proteins may be involved (weiss and navas-martin, ) . the nsp protein of sars-cov is defined as a major pathogenicity factor (kamitani et al., ; wathelet et al., ) . however, our results showed that the nsp proteins from low-pathogenic coronaviruses suppressed host protein synthesis more strongly than nsp from sars-cov, indicating that the nsp protein is actually a virulence factor for facilitating viral spread, but is not a major determinant in coronavirus pathogenicity. although an engineered sars-cov with a mutated nsp was greatly attenuated in an animal model (wathelet et al., ) , caution should be taken in developing this mutant virus as human vaccine because of the pathogenicity factors still present in the virion. sars-cov and mhv fail to induce or induced only weak and delayed innate immune responses (spiegel et al., ; spiegel and weber, ; roth-cross et al., ; zhou and perlman, ) . the underlying mechanism for this phenomenom is not fully understood, but nsp is proposed to be a major factor in the ability of the viruses to antagonize host innate immunity (kamitani et al., ; wathelet et al., ; zust et al., ; narayanan et al., ) . however, surprisingly, activation of host innate immunity and induction of type i ifn were observed for the group i coronavirus tgev (la bonnardiere and laude, ; charley and laude, ; charley and lavenant, ) . we also found that group i coronaviruses hcov- e steadily activated ifn transcription factors, and infected cells produced robust ifn-b (unpublished data) . this study clearly showed that the hcov- e and hcov-nl nsp proteins also potently suppressed host innate immune protein synthesis like nsp from sars-cov, implying that group ii coronaviruses do not employ nsp to evade host innate immunity. the mechanism underlying the different innate immune responses for the two groups of coronaviruses remains to be further investigated. novel beta-barrel fold in the nuclear magnetic resonance structure of the replicase nonstructural protein from the severe acute respiratory syndrome coronavirus inhibition of interferon signaling by rabies virus phosphoprotein p: activation-dependent binding of stat and stat ebola virus vp protein binds double-stranded rna and inhibits alpha/beta interferon production induced by rig-i signaling induction of alpha interferon by transmissible gastroenteritis coronavirus: role of transmembrane glycoprotein e characterization of blood mononuclear cells producing ifna following induction by coronavirus-infected cells (porcine transmissible gastroenteritis virus) unraveling the complexities of the interferon response during sars-cov infection regulation of irf- dependent innate immunity by the papain-like protease domain of the sars coronavirus the v protein of simian virus inhibits interferon signalling by targeting stat for proteasome-mediated degradation inhibition of interferon-mediated antiviral responses by influenza a viruses and other negative-strand rna viruses the stat activation process is a crucial target of sendai virus c protein for the blockade of alpha interferon signaling double-stranded rna binding by human cytomegalovirus ptrs inhibitory activity for the interferon-induced protein kinase is associated with the reovirus serotype sigma protein reovirus sigma protein: dsrna binding and inhibition of rna-activated protein kinase a twopronged strategy to suppress host protein synthesis by sars coronavirus nsp protein severe acute respiratory syndrome coronavirus nsp protein suppresses host gene expression by promoting host mrna degradation toll-like receptor and rig-i-like receptor signaling sars coronavirus proteins orf b,orf , and nucleocapsid function as interferon antagonists high interferon titer in newborn pig intestine during experimentally induced viral enteritis inhibition of pkr by vaccinia virus: role of the nand c-terminal domains of e l binding of the influenza virus ns protein to double-stranded rna inhibits the activation of the protein kinase that phosphorylates the elf- translation initiation factor comparative protein structure modeling of genes and genomes the primary function of rna binding by the influenza a virus ns protein in infected cells: inhibiting the - oligo (a) synthetase/ rnase l pathway the sars coronavirus a protein causes endoplasmic reticulum stress and induces ligand-dependent downregulation of the type i interferon receptor severe acute respiratory syndrome coronavirus nsp suppresses host gene expression, including that of type i interferon, in infected cells inhibition of pkr activation by the proline-rich rna binding domain of the herpes simplex virus type us protein ebola virus vp binds karyopherin alpha and blocks stat nuclear accumulation inhibition of the alpha/beta interferon response by mouse hepatitis virus at multiple levels severe acute respiratory syndrome coronavirus m protein inhibits type i interferon production by impeding the formation of traf .tank.tbk /ikkepsilon complex inhibition of beta interferon induction by severe acute respiratory syndrome coronavirus suggests a two-step model for activation of interferon regulatory factor inhibition of cytokine gene expression and induction of chemokine genes in non-lymphatic cells infected with sars coronavirus interferon and cytokine responses to sars-coronavirus infection murine cytomegalovirus m and m are both required to block protein kinase r-mediated shutdown of protein synthesis severe acute respiratory syndrome coronavirus evades antiviral signaling: role of nsp and rational design of an attenuated strain coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus. microbiol blockade of interferon induction and action by the e l double-stranded rna binding proteins of vaccinia virus mouse hepatitis coronavirus a nucleocapsid protein is a type i interferon antagonist mouse hepatitis virus does not induce beta interferon synthesis and does not inhibit its induction by double-stranded rna coronavirus non-structural protein is a major pathogenicity factor: implications for the rational design of coronavirus vaccines key: cord- -t z hbox authors: ogawa, hirohito; koizumi, nobuo; ohnuma, aiko; mutemwa, alisheke; hang’ombe, bernard m.; mweene, aaron s.; takada, ayato; sugimoto, chihiro; suzuki, yasuhiko; kida, hiroshi; sawa, hirofumi title: molecular epidemiology of pathogenic leptospira spp. in the straw-colored fruit bat (eidolon helvum) migrating to zambia from the democratic republic of congo date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: t z hbox the role played by bats as a potential source of transmission of leptospira spp. to humans is poorly understood, despite various pathogenic leptospira spp. being identified in these mammals. here, we investigated the prevalence and diversity of pathogenic leptospira spp. that infect the straw-colored fruit bat (eidolon helvum). we captured this bat species, which is widely distributed in africa, in zambia during – . we detected the flagellin b gene (flab) from pathogenic leptospira spp. in kidney samples from of e. helvum ( . %) bats. phylogenetic analysis of flab fragments amplified from e. helvum samples and previously reported sequences, revealed that of the fragments grouped with leptospira borgpetersenii and leptospira kirschneri; however, the remaining flab fragments appeared not to be associated with any reported species. additionally, the s ribosomal rna gene (rrs) amplified from randomly chosen flab-positive samples was compared with previously reported sequences, including bat-derived leptospira spp. all rrs fragments clustered into a pathogenic group. eight fragments were located in unique branches, the other fragments were closely related to leptospira spp. detected in bats. these results show that rrs sequences in bats are genetically related to each other without regional variation, suggesting that leptospira are evolutionarily well-adapted to bats and have uniquely evolved in the bat population. our study indicates that pathogenic leptospira spp. in e. helvum in zambia have unique genotypes. leptospirosis is an important reemerging zoonotic disease caused by pathogenic spirochetes of the genus leptospira. the disease is found worldwide, especially in tropical regions. human leptospirosis presents with a variety of signs and symptoms, including general febrile disease an influenza-like illness, and results in liver or kidney failure. as a result, this disease is often confused with other diseases, such as dengue fever, hemorrhagic fever and malaria, all of which are common in tropical and subtropical regions of the world (world health organization, ) . environments (e.g., soil and water) (adler and de la peña moctezuma, ) . humans become infected mainly through leptospira-contaminated water or soil, or from contact with urine from animals infected with this bacterium (adler and de la peña moctezuma, ) . rodents are the most important reservoir of leptospira among a variety of wildlife reservoirs. over the past decade, there have been many reports of bats being an important reservoir and vector of emerging infectious diseases, such as ebola and marburg viral diseases, severe acute respiratory syndrome (known as sars), nipah and hendra viral infections, and rabies (calisher et al., ) . bats (order chiroptera) are the second largest order in mammals after rodents (order rodentia) and are geographically widespread. loss of habitat for bats, caused by recent anthropogenic activities, may increase contact between bats and humans, resulting in transmission of various pathogens from peridomestic bats to humans (de jong et al., ) . transmission of viral pathogens from bats to humans has been the main focus of studies in this area; however, there have not been many studies on pathogenic bacteria in bats (mühldorfer, ) . a variety of pathogenic leptospira spp. have been identified in bats worldwide (bessa et al., ; bunnell et al., ; cox et al., ; fennestad and borg-petersen, ; harkin et al., ; lagadec et al., ; matthias et al., ; tulsiani et al., ) ; however, little is known about the role of bats in the transmission of leptospirosis. in this study, we performed a molecular epidemiological investigation of leptospira spp. in straw-colored fruit bats (eidolon helvum) captured from to , which were migrating from the democratic republic of congo to zambia (richter and cumming, ) . a total of kidney samples were collected from captured e. helvum that were roosting in trees (muleya et al., ; ogawa et al., ) in kasanka national park in central province and in ndola in copperbelt province of zambia (table ). this research was performed under the research project ''molecular epidemiology of bacterial zoonoses in zambia'' approved by the zambia wildlife authority, in the republic of zambia. the kidney samples collected from e. helvum were placed directly in korthof or ellinghausen-mccullough-johnson-harris (emjh) media (world health organization, ) and homogenized for dna extraction and leptospira isolation by crushing with beads. dna was extracted from % (w/v) kidney homogenates using a dna isolation kit for mammalian blood (roche diagnostics, indianapolis, in, usa) according to the manufacturer's instructions with minor modifications. a nested pcr based on the flagellin b gene (flab) sequence was used to amplify the extracted dna samples (n = ) to detect the flab gene of pathogenic leptospira spp. (koizumi et al., ) . some of the flab-nested pcr-positive samples (n = ) were examined further. to identify leptospira species, we also performed a nested pcr based on the s ribosomal rna gene (rrs) and the preprotein translocase gene (secy) using the primer sets shown in supplementary tables and . the pcr products from the flab-nested pcr ( bp including the bp primer sequence), the rrs-nested pcr ( bp including the bp primer sequence) and the secy-nested pcr ( bp including the primer sequence) were purified and subjected to direct sequencing using a bigdye terminator v . cycle sequencing a kit (life technologies, waltham, ma, usa) according to the manufacturer's instructions, and a xl genetic analyzer (life technologies). the sequence data were aligned using the clustal w software, and a maximum-likelihood phylogenetic tree was generated with , bootstrap replications using mega . . software (tamura et al., ) . the ddbj accession numbers for the flab and rrs sequences from the uncultured leptospira spp. detected in e. helvum comprised lc to lc and lc to lc , respectively (supplementary table ). a bp fragment of the leptospira flab gene was detected in out of e. helvum kidney samples ( . %, table ). among the flab-nested pcr-positive samples, were used for direct sequencing and nine samples were not able to be sequenced because of insufficient dna. phylogenetic analysis (fig. ) revealed that the flab sequences fell into seven clusters (fc -fc ). six flab fragments (zfb - , zfb - , zfb - , zfb - , zfb - and zfb - ) in the fc cluster were related to the corresponding gene sequences, all of which were identical to leptospira borgpetersenii strains including jules, de , arborea, poi, and veldrat batavia . the six fragments shared sequence identities ranging from . % to . % with the l. borgpetersenii strains described above. the nucleotide identity of the flab fragment for zfb - in the fc cluster with the leptospira kirschneri strains moskva v and c was . % and . %, respectively. the nucleotide sequence identities of five flab fragments (zfb - , zfb - , zfb - , zfb - and zfb - ) in the fc cluster with moskva v and c l. kirschneri strains were . % and . %, respectively. the nucleotide sequence identities of the remaining flab fragments belonging to fc to fc with that of the closest species, l. borgpetersenii, were from . % to . %. in a previous report, l. borgpetersenii, l. kirschneri, and leptospira interrogans were isolated predominantly from rodents in africa (ahmed et al., ; nalam et al., ) . a novel species, leptospira mayottensis, for which strains were isolated in mayotte located in the comoros archipelago, was reported to be genetically similar to l. borgpetersenii (bourhy et al., (bourhy et al., , . this new species was included in phylogenetic analysis; however, the flab fragments belonging to the fc -fc clusters were also distantly related to l. mayottensis. the flab fragments belonging to the three clusters fc -fc appear to group with those of l. borgpetersenii and l. kirschneri; however, the remaining flab fragments belonging to the fc -fc clusters appear not to be associated with those of any reported species. accordingly, the leptospira flab sequence data from kidney samples of captured e. helvum bats indicates that leptospires from e. helvum in zambia have genotypes distinct from those previously reported. the results of phylogenetic analysis of the secy gene, which has been used as a valuable tool for discriminating between leptospira spp. (gravekamp et al., ; rahelinirina et al., ; victoria et al., ) , were in accordance with the flab-based phylogenetic tree ( supplementary fig. ) , also supporting the hypothesis that leptospires from e. helvum in zambia have unique genotypes. subsequently, we examined another gene from leptospira, rrs, which has been used before to identify leptospira spp. (matthias et al., ; postic et al., ) . fragments of the rrs gene (each bp) were amplified and sequenced from samples randomly selected from the seven clusters (fc -fc ) (fig. ) . phylogenetic analysis showed that all rrs fragments from e. helvum kidney samples grouped into a pathogenic group. zfb - , zfb - and zfb - , belonging to the fc cluster, were associated with the l. kirschneri strain, kambale (fj ), and uncultured leptospira sp. (jq ) from triaenops menamena bats captured in madagascar (lagadec et al., ) (fig. ) . zfb - (fc ) and zfb - (fc ) were closely related to l. borgpetersenii and uncultured leptospira sp. (ay ) from bats captured in peru, respectively (matthias et al., ) (fig. ) . the rrs fragments of zfb - fig. . maximum-likelihood phylogenetic tree based on the nucleotide sequences of leptospira spp. flab in e. helvum bats. the dendrogram was constructed with the jc model, and with , replications using mega . . software (tamura et al., ) . numbers at nodes indicate bootstrap supports > %. the sequences determined in this study are shown in bold. the samples colored red were also used in the phylogenetic analysis of rrs (fig. ) . genbank accession numbers are indicated in parentheses. scale bar indicates the number of nucleotide substitutions per site. (fc ), zfb - (fc ), zfb - (fc ) and zfb - (fc ), all of which were identical, as well as those of zfb - (fc ), zfb - (fc ), zfb - (fc ) and zfb - (fc ), were located on unique branches (fig. ) . the other rrs fragments were closely related to an uncultured leptospira sp. (jq ) from rousettus obliviosus bats captured in comoros; the latter sequence was closely related to l. borgpetersenii (lagadec et al., ) (fig. ) . non-significant coevolutionary congruence was reported between the rrs sequence from leptospira spp. and that of bats at the bat species level (lei and olival, ) . however, the rrs sequences from bats are genetically related to each other and show no regional variations in phylogenetic analysis of the rrs sequences from various kinds of hosts (fig. ) , suggesting that leptospira have evolved uniquely in this bat population. dietrich et al. reported that the host is an important factor in leptospira diversification (dietrich et al., ) , also supporting our findings. zambia is bordered by eight countries. epidemiological studies in these countries have been reported; however, almost all of these reports were serological surveys using l. interrogans as the antigen, and most data originated from tanzania and zimbabwe (de vries et al., ) . in zambia, data regarding circulating leptospira spp. are limited. serosurveys of leptospira spp. in rodents and leptospira weilii in pigs have been reported (de vries et al., ) . although e. helvum examined in this study were migrating to zambia from the democratic republic of congo (richter and cumming, ) , data in this country are also lacking and there are no previous reports on l. borgpetersenii that may be related to the leptospira spp. detected in this study. e. helvum captured in kasanka national park were more frequently infected than those captured in nodla (x = . , df = , p < . ). the roosting environment and colony size may influence this difference. no significant difference in the prevalence of the leptospira flab gene was found between males and females. the phylogenetic analyses of flab and rrs infer that genes from potentially pathogenic leptospira spp. were present in the kidney samples of e. helvum in zambia. to the best of our knowledge, this is the first report of pcr detection of leptospira spp. in fruit bats from the african continent. in addition, the nested pcr-positive rate for leptospira ( . %) in e. helvum in zambia was relatively higher than that of previous reports (mühldorfer, ) . although isolation of leptospira directly from bat kidney samples using korthof and emjh media was not successful, the relatively high infection rate in the kidneys of e. helvum is likely to result in excretion of leptospira via the urine. contaminated urine has therefore been proposed as the potential transmission pathway of leptospira spp. from fruit bats to rodents (tulsiani et al., ) . it is suggested, therefore, that e. helvum might be a candidate natural reservoir for leptospira in zambia. continued surveillance in e. helvum, as well as in humans and rodents, is required to gain a better understanding of how leptospira is maintained in, and transmitted by, e. helvum bats in zambia. fig. . maximum-likelihood phylogenetic tree based on the nucleotide sequences of leptospira spp. rrs in e. helvum bats. the dendrogram was constructed with the general time reversible model with gamma distribution and invariable sites, and with replications using mega . . software (tamura et al., ) . numbers at nodes indicate bootstrap supports > %. the sequences determined in this study are shown in red. the sequences from bats are shown in bold. the fc clusters shown in fig. and genbank accession numbers are indicated in brackets and parentheses, respectively. scale bar indicates the number of nucleotide substitutions per site. leptospira and leptospirosis multilocus sequence typing method for identification and genotypic classification of pathogenic leptospira species the contribution of bats to leptospirosis transmission in são paulo city human leptospira isolates circulating in mayotte (indian ocean) have unique serological and molecular features leptospira mayottensis sp. nov., a pathogenic species of the genus leptospira isolated from humans detection of pathogenic leptospira spp. infections among mammals captured in the peruvian amazon basin region bats: important reservoir hosts of emerging viruses flying foxes as carriers of pathogenic leptospira species investigating the role of bats in emerging zoonoses: balancing ecology, conservation and public health interests leptospirosis in sub-saharan africa: a systematic review diversification of an emerging pathogen in a biodiversity hotspot: leptospira in endemic small mammals of madagascar leptospirosis in danish wild mammals detection of seven species of pathogenic leptospires by pcr using two sets of primers use of pcr to identify leptospira in kidneys of big brown bats (eptesicus fuscus) in kansas and nebraska, usa investigation of reservoir animals of leptospira in the northern part of miyazaki prefecture pathogenic leptospira spp. in bats, madagascar and union of the comoros contrasting patterns in mammal-bacteria coevolution: bartonella and leptospira in bats and rodents diversity of bat-associated leptospira in the peruvian amazon inferred by bayesian phylogenetic analysis of s ribosomal dna sequences bats and bacterial pathogens: a review molecular epidemiology of paramyxoviruses in frugivorous eidolon helvum bats in zambia genetic affinities within a large global collection of pathogenic leptospira: implications for strain identification and molecular epidemiology seroepidemiological prevalence of multiple species of filoviruses in fruit bats (eidolon helvum) migrating in africa interest of partial s rdna gene sequences to resolve heterogeneities between leptospira collections: application to l. meyeri first isolation and direct evidence for the existence of large smallmammal reservoirs of leptospira sp first application of satellite telemetry to track african straw-coloured fruit bat migration mega : molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods the role of fruit bats in the transmission of pathogenic leptospires in australia conservation of the s -spc-alpha locus within otherwise highly plastic genomes provides phylogenetic insight into the genus leptospira human leptospirosis: guidelines for diagnosis, surveillance and control. world health organization we thank ms. yuka thomas, drs. emiko nakagawa, akihiro ishii, reiko yoshida, yasuko orba, ichiro nakamura, kimihito ito, and many phd students and postdoctoral fellows at the research center for zoonosis control, hokkaido university for technical assistance. we also thank drs. katendi chabgula, edgar simulundu and musso munyeme of the university of zambia, dr. frank willems of the kasanka trust, the ministry agriculture and livestock, the zambia wildlife authority, and mr. bwalya chisha. this work was supported by the japan initiative for global research network on infectious diseases (j-grid) and the science and technology research partnership for sustainable development (satreps) by the japan science and technology agency (jst) and japan international cooperation agency (jica). supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/ . /j.meegid. . . . key: cord- - pflfznh authors: zang, minghui; he, wanting; du, fanshu; wu, gongjian; wu, bohao; zhou, zhenlei title: analysis of the codon usage of the orf gene of feline calicivirus date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: pflfznh feline calicivirus (fcv) is a highly prevalent pathogen of the domestic cat that causes acute infections of the oral and upper respiratory tract. the e region of the orf protein is responsible for the induction of virus-neutralizing antibodies, thus it is important to understand the codon usage of this gene. here, analysed coding sequences of orf and show that it undergoes a low codon usage bias. in addition, although mutational bias is one of the factors shaping the codon usage bias of this gene, natural selection plays a more significant role. our results reveal part of the mechanisms driving fcv evolution, which will lay foundation for the further research of fcv. feline calicivirus (fcv) is a highly prevalent pathogen of the domestic cat, with widespread distribution across the world (radford et al., ) . it has been suggested that almost all of the members of felidae, such as cats, tigers and cheetahs are susceptible to the virus. fcv has also been isolated from the faeces of dogs (gabriel et al., ) . fcv strains have been shown to exhibit interspecific circulation among different animal species. fcv belongs to the genus vesivirus of the caliciviridae. it has a small, non-enveloped, positive-sense, single-stranded rna genome of approximately nucleotides of which the ′ end is linked covalently to the vpg protein and the ′ end is linked to poly(a). the genome contains three open reading frames (orfs) referred to as orf , orf and orf (prikhodko et al., ) . orf encodes the capsid precursor protein, which is processed by the viral protease to release a small amino acid protein called the leader of the capsid (lc) and the mature capsid protein (vp ). comparative analysis of orf sequences has been used to elucidate phylogenetic relationships among different fcv isolates (prikhodko et al., ) . codons that encode the same amino acid are referred to as synonymous codons. although synonymous codons encode for the same amino acid, their corresponding trnas may differ in relative abundance in the cell as well as the ribosome recognition speed, thus affecting the codon usage. the usage of synonymous codons is a non-random process with some codons being used more often than others (marín et al., ) . this phenomenon which called 'codon usage bias', can be found in numerous species such as prokaryotes, eukaryotes and viruses (liu et al., ) . codon usage is influenced by two major factors, natural selection and mutation bias (gu et al., ) . the codon usage between the virus and the host will affect the overall survival of the virus, the ability to evade the host immune system and evolution (moratorio et al., ) . thus, understanding the codon usage of viruses can provide information about viral evolution and expand our understanding of the regulation of viral gene expression based on codon adoption. this can aid rational vaccine design to achieve efficient viral protein expression to induce long-lasting immunity. because different fcv strains cause disease with a wide range of clinical signs, it is important to characterize the genetic variation, evolution and the codon usage pattern of fcv to understand how these viral strains cause disease. the aim of this study was to describe the genetic features of the orf gene of fcv. to this end, we analysed in detail the genetic evolution, the codon usage pattern and the evolutionary characterization of the codon usage pattern of fcv. . . sequence data coding sequences (cds) of orf of fcv strains were included in this study, which were retrieved from the national center for biotechnology (ncbi) genbank database (https://www.ncbi.nlm.nih.gov/ nucleotide/). the details of the sequences analysed including accession number, time of collection and geographical distribution are shown in table s . infection, genetics and evolution ( ) infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d the frequency of each nucleotide (a%, u%, g% and c%) was calculated using bioedit. the nucleotide composition of the third synonymous codon position of each codon (a s, t s, g s, c s) was calculated using the codon w package. the g + c at the first (gc s), second (gc s) and third codon positions (gc s) were calculated using the codonw program. additionally, the g + c at the first and the second positions (gc s) were calculated with the same program. the rscu value of each codon (except for met, trp and termination codons and excluding the influence of amino acid composition and sequence length) was calculated to directly reflect the usage characteristics as first proposed in by sharp et al. (sharp and li, ) . the rscu value of a codon is the ratio of its observed frequency to its expected frequency assuming that all codons for a particular amino acid are used evenly (peden, ) and it was calculated using the following equation: it is essential to note that g ij represents the observed number of the i th codon for the j th amino acid which has ni kinds of synonymous codons (nasrullah et al., ) . normally, it is considered that a high rscu value reflects a strong codon usage bias. codon usages with rscu values of b . , . , n . stand for negative codon usage bias, no bias and positive codon usage bias respectively (chen et al., a) . the enc is considered the magnitude of the codon usage bias of a single gene (wright, ) . the enc value is not influenced by the amino acid or the gene length (morla et al., ) . the enc value was calculated using the formula given below: where the s value stands for the gc s composition of each codon (chen et al., a) . the enc value ranges from (only one of the possible synonymous codons is used for the corresponding amino acid) to (all possible synonymous codons are used equally for the corresponding amino acid (wright, ) . in contrast to the rscu value, the smaller the enc value, the greater the extent of codon usage bias. a enc value equal or b is considered to be a sign of strong codon bias (comeron and aguadé, ). enc plots were drawn to determining factors (especially mutation pressure) that influence the codon usage bias (wright, ) taking the gc s values in the x axes and the enc values in the y axes. under the null model, if codon usage is only constrained by g + c, the predicted enc values would sit on or around the standard curve (jiang et al., ) . otherwise, if predicted enc values sit far lower than the standard curve, other factors such as natural selection play a major role in shaping codon usage bias. pca, a widely used multivariate statistical approach to determine the major trends in codon usage variation of genes was performed using graphpad prism . (gupta and ghosh, ) . the rscu value of each gene, excluding met, trp and termination codons, is explained by a -dimensional space vector and transformed into a smaller number of unrelated factors (lu et al., ) . to unravel if natural selection shapes codon usage bias, the gravy and aroma score were determined. both indices were obtained from codonw, which reveals the frequencies of hydrophobic and aromatic amino acids respectively (kyte and doolittle, ) . a higher gravy or aroma value suggests a more hydrophobic or aromatic amino acid product. neutrality analysis was used to determine the role of mutational bias and natural pressure shaping codon usage bias. in neutrality plots, gc s are drawn against gc s each dot representing an independent fcv strain. it is essential to demonstrate that the slope of the line near to zero is an indication of only natural selection constrains the codon usage bias, while the near to one represents complete neutrality (sueoka ). correlation analysis was performed using graphpad prism . . the nucleotide compositions of coding sequences of fcv orf were calculated. the mean values of a%, c%, g% and u% were . %, . %, . % and . %, with standard deviations (sd) of . , . , . and . respectively. this indicates that u and a were more abundant than c and g, while u was the most preferred nucleotide. the codon compositions at the third position (a , u , g , c , gc ) revealed that the mean u % ( . %) was the highest among the four nucleotides, which is consist with the nucleotide content of fcv orf gene (table s ). the gc values raged from . % to . % (mean . %.), indicating that a/u terminated codons are preferred over g/c terminated codons. the rscu values of all codons were calculated and are displayed in table . among the most frequently employed synonymous codons, optional codons ended with u (gcu for ala, ugu for cys, gau for asp, uuu for phe, ggu for gly, auu for ile, cuu for leu, ccu for pro, ucu for ser, acu for thr, guu for val, uau for tyr), preferred codons terminated with a (gaa for glu, aaa for lys, caa for gln), codons ended with c (cac for his, aac for asn) and one codon terminated with g (agg for arg). it is interesting to note that codons ending in u were the most frequently used. this is in accordance with the fact that u was the most abundantly used nucleotide, demonstrating that codon usage is influenced by compositional constraints. the values of the enc analysis ranged from . to . (average ± sd of . ± . ) indicating fluctuation among the fcv strains. the high enc values (enc n ) indicate a low codon usage bias. enc-plots were drawn with enc values plotted against gc s values according to the geographical distribution of the strains used in this study (fig. ) . all the strains, represented in different colours for each continent, located below the theoretical curve (fig. a) . additionally, strains isolated from the same continents did not cluster together. in particular, all strains collected from countries were distributed widely (fig. b) . this indicates that mutational pressure combined with other factors contributes to codon usage bias of the fcv orf gene (sueoka ). in addition, there was a correlation between the nucleotide composition (a%, u%, g%, c%, gc%) and the codon contents (a s, t s, c s, g s, gc s) (p b . ), except between the relationship of a s and t. furthermore, there was a significant correlation between the enc values and the nucleotide compositions (p b . ), which indicates that mutational bias influences the synonymous codon usage pattern of the orf gene of fcv. pca analysis, a multivariable method, was employed to unravel the variation of the synonymous codon usage (singh et al. ) . we found that the first four principal axes accounted for . % of the total variation with the first, second, third and fourth principal axis accounting for . %, . %, . % and . % respectively (fig. ) . this suggests that the first and second axis contributed to the variation of rscu of synonymous codons. pca analysis was performed based on the continent and country of isolation (fig. ) . based on the distribution of different strains on the first two axes, we found that the distribution of asian strains, especially strains collected from china, was more widespread than the distribution of strains isolated from oceania, europe and north america. moreover, most of the strains isolated from north america located near the origin, indicating that mutational pressure contributed to codon usage of the fcv orf gene. it is normally considered that natural selection contributes to some extent to codon usage bias, therefore we evaluated the correlation between the gravy and aroma values and the codon contents (a s, g s, c s, u s and gc s) ( table ) . we found a correlation between the aroma values and u s, g s and gc s (p b . ), confirming that natural selection influences the codon usage bias of the fcv orf gene. . . natural selection plays a more important role than mutation pressure in shaping the codon usage of fcv we found that both mutation pressure and natural selection contribute to the codon usage bias of the orf gene of fcv. thus, to understand which one plays a more important role, the gc s values (the mean value of gc s and gc s) were plotted against the gc s values (fig. ) . we found a correlation between gc s and gc s (p b . ) with a correlation coefficient of . , indicating that relative neutrality was % or, conversely, natural selection was %. thus, natural selection plays a major role in shaping the codon usage bias of orf gene of fcv compared to mutational pressure. rna zoonotic viruses; such as influenza viruses and coronaviruses which have highly susceptible to recombination and cross species transmission (su et al., (su et al., , . fcv is a rna virus and as such it has experienced a high evolution rate since its emergence. previous studies on fcv have mostly focused on infectivity (pesavento et al. ) and prevalence (knowles et al. ) . however, there are no studies on codon usage bias of fcv. orf is one of three orf of fcv genome and encodes is the major capsid protein vp . therefore, the codon usage of orf of fcv was first studied. previous studies on codon usage bias of other rna viruses showed high codon usage bias. for example, analysis of the g gene of rabies virus (rabv) showed enc values ranging from . % to . % (zhao et al. ); foot and mouth disease virus (fmdv) displayed enc values of . % (zhou et al. ) ; porcine epidemic diarrhea virus (pedv) of . % (chen et al. b) ; and severe acute respiratory syndrome (sars) of . % (zhao et al. ). however, the mean enc value of orf of fcv reported here was . % (sd ± . ), thus in comparison with the above viruses, the degree of codon usage bias of fcv is lower. codon usage bias is mainly influenced by natural selection (romero et al. ) and mutation pressure (jenkins et al. ). here, we used enc-plots (fig. ) and pca (fig. ) analysis according to the geographical distribution to investigate the major factors shaping the codon usage bias of the fcv orf gene . we found only one strain isolated from usa, one from japan and one from australia to sit near the standard values, suggesting that mutation pressure contributed to the codon usage bias of these three strains. pca analysis revealed that mutation pressure is the dominant force shaping the codon usage of sequences isolated from north america (fig. ) . furthermore, analysis of the relationships between nucleotide composition and the codon contents at the third base positions suggested that mutation pressure is one of the factors in shaping the codon usage of fcv orf . correlation analysis of gravy, aroma values and codon content (a s, g s, c s, u s and gc s) showed that there is a correlation between aroma and u s, g s and gc s, confirming that natural selection contributes to the codon usage of the orf gene of fcv. since both mutation pressure and natural selection are important in driving the codon usage of orf , we performed neutrality analysis to understand which of the two forces has a bigger impact. we found that natural selection is the major force driving the codon usage of orf . this is the first study analysing the codon usage of the fcv orf gene and describing the forces that drive fcv evolution. in the future, more epidemiological surveys and sequence analysis are required to examine the factors that resulted in fcv evolution. characterization of the porcine epidemic diarrhea virus codon usage bias characterization of the porcine epidemic diarrhea virus codon usage bias an evaluation of measures of synonymous codon usage bias isolation of a calicivirus antigenically related to feline caliciviruses from feces of a dog with diarrhea analysis of synonymous codon usage in sars coronavirus and other viruses in the nidovirales gene expressivity is the main factor in dictating the codon usage variation among the genes in pseudomonas aeruginosa evolution of base composition and codon usage bias in the genus flavivirus analysis of synonymous codon usage in aeropyrum pernix k and other crenarchaeota microorganisms prevalence of feline calicivirus, feline leukaemia virus and antibodies to fiv in cats with chronic stomatitis a simple method for displaying the hydropathic character of a protein the characteristics of the synonymous codon usage in enterovirus virus and the effects of host on the virus in codon usage pattern genetic analysis of the pb -f gene of equine influenza virus variation in g + c-content and codon choice: differences among synonymous codon groups in vertebrate genes a detailed comparative analysis on the overall codon usage patterns in west nile virus synonymous codon usage pattern in glycoprotein gene of rabies virus genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on marburg virus evolution analysis of codon usage pathologic, immunohistochemical, and electron microscopic findings in naturally occurring virulent systemic feline calicivirus infection in cats genetic characterization of feline calicivirus strains associated with varying disease manifestations during an outbreak season in missouri feline calicivirus the influence of translational selection on codon usage in fishes from the family cyprinidae an evolutionary perspective on synonymous codon usage in unicellular organisms characterization of codon usage pattern and influencing factors in japanese encephalitis virus directional mutation pressure and neutral molecular evolution epidemiology, evolution, and recent outbreaks of avian influenza virus in china epidemiology, genetic recombination, and pathogenesis of coronaviruses the 'effective number of codons' used in a gene analysis of synonymous codon usage in human bocavirus isolates analysis of codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (npv) and its relation to evolution the analysis of codon bias of foot-and-mouth disease virus and the adaptation of this virus to the hosts this paper was supported in part by the priority academic program development of jiangsu higher education institutions. supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . key: cord- - ho av authors: abolnik, celia title: genomic and single nucleotide polymorphism analysis of infectious bronchitis coronavirus date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: ho av infectious bronchitis virus (ibv) is a gammacoronavirus that causes a highly contagious respiratory disease in chickens. a qx-like strain was analysed by high-throughput illumina sequencing and genetic variation across the entire viral genome was explored at the sub-consensus level by single nucleotide polymorphism (snp) analysis. thirteen open reading frames (orfs) in the order ′-utr- a- ab-s- a- b-e-m- b- c- a- b-n- b- ′utr were predicted. the relative frequencies of missense: silent snps were calculated to obtain a comparative measure of variability in specific genes. the most variable orfs in descending order were e, b, ′utr, n, a, s, ab, m, c, a, b. the e and b protein products play key roles in coronavirus virulence, and rna folding demonstrated that the mutations in the ′utr did not alter the predicted secondary structure. the frequency of snps in the spike (s) protein orf of . % was below the genomic average of . %. only three snps were identified in the s subunit, none of which were located in hypervariable region (hvr) or hvr . the s subunit was considerably more variable containing % of the polymorphisms detected across the entire s protein. the s subunit also contained a previously unreported multi-a insertion site and a stretch of four consecutive mutated amino acids, which mapped to the stalk region of the spike protein. template-based protein structure modelling produced the first theoretical model of the ibv spike monomer. given the lack of diversity observed at the sub-consensus level, the tenet that the hvrs in the s subunit are very tolerant of amino acid changes produced by genetic drift is questioned. coronaviruses (family coronaviridae, order nidovirales) are enveloped, single-stranded rna viruses with large genome sizes of $ - kb. the family is split into four genera: alpha-, beta, gamma and deltacoronaviruses, each containing pathogens of veterinary or human importance. a current evolutionary model postulates that bats are the ancestral source of alpha-and betacoronaviruses and birds the source of gamma-and deltacoronaviruses (woo et al., ) . the alphacoronaviruses infect swine, cats, dogs and humans. betacoronaviruses infect diverse mammalian species including bats, humans, rodents and ungulates. the sars coronavirus (sars-cov), which verged on a pandemic in with cases in humans and deaths is a betacoronavirus. another member of this genus, the recently-discovered middle east respiratory syndrome (mers) coronavirus (mers-cov) has claimed human lives from cases since april , and dromedary camels are the suspected reservoir (briese et al., ) . genus gammacoronavirinae includes strains infecting birds and whales (woo et al., ; mcbride et al., ; borucki et al., ) and deltacoronaviruses have been described in birds, swine and cats (woo et al., ) . the diversity of hosts and genomic features amongst covs have been attributed to their unique mechanism of viral recombination, a high frequency of recombination, and an inherently high mutation rate (lai and cavanagh, ) . infectious bronchitis virus (ibv) is a gammacoronavirus which causes a highly contagious respiratory disease of economic importance in chickens (cook et al., ) . ibv primarily replicates in the respiratory tract but also, depending on the strain, in epithelial cells of the gut, kidney and oviduct. clinical signs of respiratory distress, interstitial nephritis and reduced egg production are common, and the disease has a global distribution (cavanagh, ; cook et al., ) . the ibv genome encodes at least ten open reading frames (orfs) organised as follows: utr- a- ab-s- a- b-e-m- a- b-n- a- utr. six mrnas (mrna - ) are associated with production of progeny virus. four structural proteins including the spike glycoprotein (s), small membrane protein (e), membrane glycoprotein (m), and nucleocapsid protein (n) are encoded by mrnas , , and , respectively (casais et al., ; hodgson et al., ) . messenger rna (mrna) consists of orf a and orf b, encoding two large polyproteins via a ribosomal frameshift mechanism (inglis et al., ) . during or after synthesis, these polyproteins are cleaved into non-structural proteins (nsp - ) which are associated with rna replication and transcription. the s glycoprotein is post-translationally cleaved at a protease cleavage recognition motif into the amino-terminal s subunit ( kda) and the carboxyl-terminal s subunit ( kda) by the host serine protease furin (de haan et al., ) . the multimeric s glycoprotein extends from the viral membrane, and the globular s subunit is anchored to the viral membrane by the s subunit via non-covalent bonds. proteins a and b, and a and b are encoded by mrna and mrna , respectively and are not essential to viral replication (casais et al., ; hodgson et al., ) . a confounding feature of ibv infection is the lack of correlation between antibodies and protection, and discrepancies between in vitro strain differentiation by virus neutralization (vn) tests and in vivo cross-protection results. taken with the ability for high viral shedding in the presence of high titres of circulating antibodies, the involvement of other immune mechanisms are evident, and the roles of cell-mediated immunity and interferon have been experimentally demonstrated (timms et al., ; collisson et al., ; pei et al., ; cook et al., ) . dozens of ibv serotypes that are poorly cross-protective have been discovered and studied by vn tests and molecular characterisation of the s protein gene. most of these serotypes differ from each other by - % at amino acid level in s , but may differ by up to %. s contains the epitopes involved in the induction of neutralizing, serotype-specific and hemagglutinaton inhibiting antibodies (cavanagh, ; darbyshire et al., ; farsang et al., ; ignjatovic and mcwaters, ; meulemans et al., ; gelb et al., ) . most of the strain differences in s occur in three hypervariable regions (hvrs) located between the amino acid residues - (hvr ), - (hvr ) and - (hvr ) (moore et al., ; wang and huang, ) . monoclonal antibody analysis mapped the locations of many of the amino acids involved in the formation of vn epitopes to within the first and third quarters of the linear s polypeptide (de wit, ; kant et al., ; koch et al., ) , which is where closely-related stains (> % amino acid identity) also differ (bijlenga et al., ; farsang et al., ) . cavanagh ( ) proposed that these parts of the s subunit are very tolerant of amino acid changes, conferring a selective advantage. recently, the receptor-binding domain of the ibv m strain was mapped to residues - of the n terminus of s , which overlaps with hvr (promkuntod et al., ) . the s subunit, which drives virus-cell fusion, is more conserved between serotypes than s , varying by only - % at the amino acid level (bosch et al., ; cavanagh, ) . although it was initially thought that s played little or no role in the induction of a host immune response, it has since been shown that an immunodominant region located in the n-terminal half of the s subunit can induce neutralizing, but not serotype-specific, antibodies demonstrated by the ability of this subunit to confer broad protection against challenge with an unrelated serotype (kusters et al., ; toro et al., ) . ibvs are continuously evolving as a result of (a) frequent point mutations and (b) genomic recombination events (cavanagh et al., ; kottier et al., ; jackwood et al., ; zhao et al., ; kuo et al., ; liu et al., ) . multiple studies on ibv diversity have focused on inter-serotypic and inter-strain variation, and a few have focused on sub-populations within the s subunit in vaccine strains (gallardo et al., ; ndegwa et al., ) . the present study aimed to explore genetic variation across the entire viral genome at the sub-consensus level. it was anticipated, based on the published literature, that certain regions, and the s subunit hvrs in particular, would display significant sub-genomic variation. this study focused on a qx-like strain, a serotype currently causing significant poultry health problems across europe, asia, south america and south africa. . . origin and isolation of qx-like strain ck/za/ / twenty-eight-day old chickens in a commercial broiler operation presented with acute lethargy, reduced feed consumption and mortality. tracheitis and swollen kidneys were noted on post mortem, as well as a secondary escherichia coli infection. the worst affected houses had mortality rates of . %, . % and . %. ibv was isolated in specific pathogen free (spf) embryonated chicken eggs (ece) as described in knoetze et al. ( ) . after an initial two passages in ece, the virus was passaged twice further at the university of pretoria. rna was extracted from allantoic fluid using trizol Ò reagent (ambion, life technologies, carlsbad, usa) according to the manufacturer's protocol. the genome was transcribed to cdna and amplified using a transplex Ò whole transcriptome amplification kit (sigma-aldrich, steinheim, germany). illumina miseq sequencing on the cdna library was performed at the arc-biotechnology platform, onderstepoort, pretoria. illumina results were analysed using the clc genomics workbench v . . . paired-end reads were trimmed and a preliminary de novo assembly was performed. the larger segments were analysed by blast to identify the closest genomic reference strain (ita/ / , caz ) . this strain was retrieved and used as a scaffold for assembly-to-reference, generating a consensus sequence for / . trimmed paired-end reads were also mapped against other ibv serotype genomes, subsequently confirming that strain / was a pure culture of a qx-like ibv. the genome was deposited in genbank under the accession number kp . rna folding was predicted using the clc genomics workbench v . . . genetic recombination in the consensus sequence was evaluated using the recombination detection program rdp v . . coding sequence and orf prediction was carried out in vigor (wang et al., ) . trimmed paired-end reads were re-mapped against the / consensus sequence for snp detection. a snp detection table generated in the clc genomics workbench was manually edited to eliminate all snps with a frequency of < %. this conservative cutoff was selected to eliminate any nonspecific pcr errors introduced during preparation of the transcriptome library or deep sequencing, and excluded most of the point insertions producing gaps and frameshift mutations across the genome. nucleotide substitutions in coding regions were manually inspected for changes to the consensus amino acid (table , supplementary data). motifs were predicted using the elm eukaryotic linear motif resource for functional sites in proteins (dinkel et al., ) . protein structures for s and s were predicted in raptorx, a structure prediction server that predicts three dimensional ( d) structures for protein sequences without close homologs in the protein data bank (pdb) (kallberg et al., ) . s and s d structures were annotated and superposed in ccp mg v . . using the secondary structure (ssm) superposition method. this method superimposes pairs of structures by: ( ) finding the secondary structure elements (sses) and representing them as one simple vector spanning the length of the sse; ( ) finding equivalent sses in the two structures using graph-theory matching by geometric criteria of distances and angles between the vectors; ( ) superimposing vectors representing equivalent sses; ( ) finding the most likely equivalent residues in the superposed sses; ( ) superimposing ca atoms of equivalent residues; and ( ) iterating the last two steps. the genome sequence of qx-like strain ck/za/ / was assembled from , ibv-specific paired-end reads of bp each. the genome was , nt in length with the utr incomplete by $ nt. thirteen orfs were predicted by vigor in the order fig. ). this genome organisation including b, c and b was similar to that of turkey coronavirus (tcov; cao et al., ) , and the orfs b, c and b were also predicted in australian ibv strains (hewson et al., ) . when the sequences for a qx-like sequence (jq ) and arkdpi (eu ) were analysed using vigor, a similar genome arrangement was detected. mass (ay ) did not however contain the predicted b, c and b orfs (data not shown). orf b was amino acids (aa) in length and no smart domains were predicted, whereas orf c was aa in length and a low complexity region was identified. the b orf encoded a aa protein with a signal peptide predicted from residues to and two transmembrane domains from residues to and to . no recombination was detected across the genome of qx-like strain ck/za/ / . a gap was present between nucleotides and (table , supplementary data) ($aa in the a orf). although the gap was present in the majority ( . %) of reads, the sequence for strain / deposited in genbank contains the minority adenine residue because the gap introduced a frame shift, splitting orf a into two. it may be a legitimate mutation, but until further transcriptional analyses are conducted, the orf a gene has been reported intact here. two hundred and eight snps across the ibv qx-like genome were evaluated at the selected cut-off value. in table the consensus reference is juxtaposed with the allele variations, the relative frequencies of these point mutations, the actual number of counts and coverage at that position, the corresponding orf or region and the mutational effect. coverage ranged from -fold (position , ) up to fold (position , ). the relative frequencies of missense: silent snps in relation to orf length were calculated in order to obtain a comparative measure of variability in specific genes (table ) . results for the structural genes and polymerase are illustrated in fig. , and the results for the non-structural protein orfs and non-coding regions, which were much shorter in length, are presented in fig. . overall the most variable orfs in terms of total snps, in descending order, were: e, b, utr, n, a, s, ab, m, c, a, b (no snps were detected at the % cut-off in the a and utr regions). the most variable, as assessed by snps leading to missense mutations, in descending order, were: b, e, utr, a, n, m, a, ab, s, b, a/ c/ b. these mutations presumably did not affect the tertiary protein structure and might be advantageous to the virus. the orfs under the strongest positive selection pressure as indicated by the proportion of synonymous mutations, were, in descending order, c, ab, n, s, e, a/ b/m/ b/ a/ b. the e protein orf had significantly more missense mutations on average, at a frequency of . % of the orf, which is more than threefold higher than the average value ( . ) for the a, ab, s, m and n genes. the e protein gene was the most variable at the sub-consensus level, with missense mutations and only one silent mutation across its bp orf. despite its small size, the cov e protein drastically influences the replication of covs and their pathogenicity. in the sars-cov, it was experimentally demonstrated that the e protein is not essential for genome replication or subgenomic mrna synthesis, but it does affect morphogenesis, budding, assembly, intracellular trafficking and virulence. in fact, in sars-cov the e protein is the main antagonist associated with induction of inflammation in the lung, which causes the acute respiratory distress syndrome from which the virus derives its name (dediego et al., ) . no studies have been published for the ibv e protein, but the high variability demonstrated here suggests that it may be an important virulence factor in poultry, and that a higher mutation rate possibly provides an evolutionary advantage in overcoming host cellular immune responses. although the n protein gene contained one of the highest overall frequencies of snps ( . %), the n gene is evidently under greater selective pressure, since . % of these mutations ( . % as a total of the gene) were silent. the coronavirus n protein is multifunctional, playing vital roles in viral assembly and formation of the complete virion and is required for optimal viral replication. additionally, the cov n protein is implicated in cell cycle regulation and host translational shutoff, displays chaperone activity, activates host signal transduction and aids viral pathogenesis through the antagonism of interferon induction (reviewed by mcbride et al., ) . given its fundamental roles in rna binding, formation of the ribonucleoprotein complex and in the virion, it is not surprising that this structural protein is the most conserved, as evidenced by its gene having the highest ratio of silent mutations of all the genes analysed. the importance of maintaining the fig. . genome organisation of qx-like ibv strain / . sequence integrity in the n protein in ibv was demonstrated by kuo et al. ( ) , who reported that two residues within the nterminal domain of a taiwanese ibv strain were positively selected, and that mutation of either of these significantly reduced the affinity of the n protein for the viral transcriptional regulatory sequence. the glycosylated amino terminus of the m protein lies on the outside of the virion and m spans the membrane structure three times (collisson et al., ) . all four snps in the m gene resulted in missense mutations, two of which were located in the predicted transmembrane region. the m protein plays an important role in cov virion formation. ibv m protein co-expressed with s assembled into virus-like particles confirming its major role in virion formation, but cov m proteins also interact with other proteins and perform other roles in the infected cell. for example, m together with the accessory proteins a, b and were all found to prevent the synthesis of ifn-b through the inhibition of interferon promotor activation and irf- function, thus influencing disease outcome (yang et al., ) . coronavirus accessory proteins are generally dispensable for virus replication, but they play vital roles in virulence and pathogenesis by affecting host innate immune responses, encoding pro-or anti-apoptotic activities, or by effecting other signalling pathways that influence disease outcomes (susan & julian, ) . ibv was demonstrated to induce a considerable activation of the type i ifn response, but it was delayed with respect to the peak of viral replication and accumulation of viral dsrna (kint et al., ) . ibv accessory proteins a and b play a role in the modulation of this delayed ifn response, by regulating interferon production at both the transcriptional and translational levels. interestingly, ibv proteins a and b seem to have opposing effects on ifn production in infected cells: a seems to promote ifn production, and b is involved in limiting ifn production, antagonising each other to tightly regulate ifn production (kint et al., ) . field isolates lacking a and b displayed reduced virulence in vitro and in vivo (mardani et al., ) . orf a in strain / lacked snps, but orf b in had the highest frequency of snps relative to its size (n = ; . %). orf b is present in many international ibv strains (hewson et al., ; bentley et al., ) but is rarely mentioned in the literature since a canonical transcription regulatory sequence (trs-b) could not be identified upstream of the encoding rna. however, bentley et al. ( ) demonstrated that ibv was capable of producing subgenomic mrnas from noncanonical trs-bs via a template-switching mechanism with trs-l, the conserved trs in the leader sequence in the utr, which may expand the gammacoronavirus repertoire of proteins. they specifically demonstrated the transcription of the b orf by this mechanism. although no studies have been performed determining orf b s functional role in the pathogenesis of ibv, the homolog in mers-cov is a potent interferon antagonist (yang et al., ) . a single snp causing a missense mutation was present in . % of the sub-consensus population of the b orf in this study. the single mutation in orf c was silent, and the predicted protein contained a low complexity region. low complexity regions are regions of protein sequences with biased amino acid composition, and may be involved in flexible binding associated with specific functions (coletta et al., ) . orf b, a aa protein with a signal peptide and two transmembrane domains, was identified in the genome of strain / , and orf b was also reported in tcov and australian ibv strains (cao et al., ; hewson et al., ) . the homolog in sars-cov is aa in length and was identified as an endoplasmic reticulum/golgi membrane-localised protein that induces apoptosis. apoptosis may play an important role in promoting cov dissemination in vivo, minimising inflammation and aiding evasion of the host's defence mechanisms (ye et al., ) . protein from sars-cov accelerated the replication of murine cov, increasing the virulence of the original attenuated virus (tangudu et al., ) . presumably, this accessory protein plays a similar role in ibv pathogenesis, although this remains to be determined experimentally. the utrs of cov genomes contain conserved cis-acting sequence and structural elements that play essential roles in rna synthesis, gene expression and virion assembly, and each sub-genomic rna contains a leader segment that is identical to this utr region of the genome (goebel et al., ; sola et al., ) . no snps were detected in the utr in the sub-consensus sequences of strain / , which is consistent with the vital regulatory role that this region plays. conversely, the partial utr sequence of strain / was highly variable. the un-sequenced nucleotides from the end of the genome were extrapolated from the most similar genomic sequence, that strain ita/ / , and the secondary rna structure of the utr for / was predictively folded (fig. ) . the snps were then systematically substituted into the consensus sequence and rna folding repeated. delta g values for the predicted rna secondary structures in fig. (a) -(h) varied from À . kcal/mol to À . kcal/mol. apart from the c to g mutation (fig. (c) ), effects on rna secondary structure were minor and the structures in fig. (b) and (d)-(h) were similar. to assess the effect of combining mutations, an rna containing t(u), t(u), g and c was folded, and this resulted in a similar stem-loop structure to those in fig. (a), (b) and (d)-(h) (data not shown). apart from the mutation c to g, the snps had little effect on the secondary rna structure in the utr. twenty-three snps were identified in the bp spike protein orf; of these resulted in missense mutations at the amino acid level, and were silent mutations. the frequency of total snps in the s protein orf was below average, at . %, compared to the genome average of . %. it was anticipated that the majority of mutations in the s orf would be in the s gene, particularly in the hvrs, but, surprisingly, this was not the case. only three of these snps (two missense and one silent) were found in the s gene, and all three were located in the cooh-terminal half of the s protein (fig. ) . only one mutation, a missense mutation, . predicted structure of the spike protein monomer of qx-like ibv strain / . missense mutations in s (blue) and s (yellow) are indicated as coloured side chains. mapped to hvr . no snps were detected in hvr or hvr . the s subunit was considerably more variable, containing % of the polymorphisms detected across the entire s protein. two other notable features of s were detected: the first was a multi-a insertion site located between nucleotides , and , in the genome. the polymorphism involved the insertion of either one or two adenine nucleotides, possibly via a mechanism of polymerase stuttering. the second region of interest was located in close proximity, just downstream of the multi-a insertion site: a stretch of three consecutive mutated amino acids, namely c ? w, g ? d, s ? c followed by silent mutation g (fig. ) . template-based protein structure modelling was used to predict the secondary structure of the ibv spike monomer, based on the available crystal structure for the mers-cov s and s subunits (fig. ). s and s were modelled separately in raptor x and then superposed. the ibv s structure was arranged as two beta barrels and s formed packed a-helices. the s protein was not complete and the transmembrane domain was not represented since there were no sufficiently similar structures on which to model this region, but this is the first model of the spike protein monomer for ibv. hvr and the putative receptor binding domain maps to the apical beta barrel (fig. (a) ) and hvr is located on the flat plane on the base of the apical beta barrel and the peptide connecting it to the basal beta barrel (fig. (b) ). hvr maps to a region in the basal beta barrel of s that was predicted to contact or interact with s ( fig. (c) ). the locations of the missense mutations detected by snp analysis in the s and s subunits are indicated in fig. . many of these snps mapped to codons encoding amino acids on the surface of the predicted structure, but two regions were notable. firstly, the highly variable region in s spanning amino acids - was exposed on the s stalk, although folding of the remainder of the cooh domain may have influenced this conformation. secondly, ile was exposed on a projection at the top of the monomer. this residue precedes the second furin cleavage site in the s subunit with the sequence pisssgr/s . the cleavage of the s /s furin motif ( rrrr/s in strain / ) was found to be non-essential for attachment of ibv to the cell. rather, it promotes infectivity within the cell. in studies with the beaudette ibv strain, the second furin cleavage site in the s subunit was required for furin-dependent entry and syncytium formation, and the current hypothesis is that interplay between the s and s subunits determines virus attachment to specific receptors, determining tissue tropism of the virus (promkuntod et al., ) . the exact biological roles of these areas in s that are prone to mutation remain to be experimentally determined. archaeological remains of domestic chickens in northeast china and the indus valley date back $ years (west & zhou, ) . . the predicted locations of hvr ( a), hvr ( b) and hvr ( c) of qx-like ibv strain / , indicated as coloured side-chains on the s subunit. the cov group has been estimated to have arisen around bc, and the gammacoronaviruses diverged from the cov group around bc (woo et al., ) . covs have probably been co-evolving with their gallinaceous hosts for several thousand years. indeed, cook and co-authors ( ) state that ''ibv is found everywhere that commercial chickens are kept''. the implication is that although ibv was only discovered some years ago, the variety of serotypes we now observe are the results of hundreds if not thousands of years of genetic drift and recombination, accelerated by modern poultry farming practises where chickens are kept in high densities, and inter-regional trade in poultry and other avian species. studies on antigenic diversity of ibvs are heavily biased towards studies of the s gene, and the hvrs in particular (cavanagh, ; ducatez et al., ; kant et al., ; mork et al., ) . many of these studies cite frequent point mutations in the s gene, but this was not the finding of the present study. the discovery of a novel -to- exoribonuclease activity in cov nsp , which regulates replication fidelity and diversity in coronaviruses (denison et al., ) , lends weight to the theory that genetic drift is not primarily responsible for the degree of variation and serotypes we observe in poultry nowadays. instead, generation of variation by recombination is likely the main mechanism of serotypic diversity. the high frequency of rna recombination in coronaviruses is likely caused by their unique mechanism of rna synthesis, which involves discontinuous transcription and polymerase jumping (jeong et al., ) . sequencing of many field strains has provided convincing evidence that many, possibly all, ibv strains are recombinants between different field strains (cavanagh, ; kuo et al., ; liu et al., ; hewson et al., ) , driving ibv evolution at a population level. recombination of distinct ibv strains has been experimentally demonstrated in vitro, in ovo and in vivo (kottier et al., ; wang et al., ) . the s subunit hvr contains the ibv receptor-binding site. therefore despite the sequence variability in this region (which includes insertions and deletions), diverse strains must retain this critical biological function. all three hvrs may represent ancient artefacts of recombination, which have been perpetuated because they retain receptor-binding properties, with minimal permissive amino acid changes. this theory contrasts the tenet that the hvrs in the s subunit are very tolerant of amino acid changes produced by genetic drift, thereby conferring a selective advantage (cavanagh, ; de wit, ; kant et al., ; koch et al., ) . whereas s fulfils a primary role in receptor binding (promkuntod et al., ) , a broader role of s in antigenicity and attachment to receptors is emerging. chickens primed with a recombinant-expressed s subunit of a virulent arkdpi strain and boosted with a live mass-type vaccine were protected against challenge with live virulent arkdpi virus . although s subunits most likely do not contain an additional independent receptor-binding site, s in association with s forms part of a specific ectodomain which is critical to the binding of the virus to chicken tissues, which implies that both s and s contain determinants important to viral host range (promkuntod et al., ) . the results of the present study demonstrate that s is more predisposed to mutations than s , providing an adaptive advantage and at least one other study has reported higher variability in s compared to s (mo et al., ) . ibv has not been as extensively studied as other covs, and little progress has been made in effectively controlling or eradicating the disease in poultry. experimental and field studies provide substantial evidence that use of a homologous ibv vaccine is best, but sometimes, intriguingly, protection can be offered by an unrelated vaccine, or by the use of two heterologous vaccines (jones, ) . genotyping and phylogenetic analysis of ibv are typically focused on the s subunit sequence, and liu et al. ( ) caution against drawing conclusions based on a single gene sequence, particularly a partial gene sequence. the roles of the ibv e and accessory proteins and their roles in the pathogenesis of ibv have been completely overlooked, even when the roles of the homologs in other covs have been proven significant. accessory proteins of ibv and other covs may also offer a new generation of vaccine targets: the use of codon-deoptimization of non-structural virulence genes in influenza a virus and respiratory syncytial virus resulted in genetically stable viruses that retained immunogenicity but were attenuated (nogales et al., ; meng et al., ) . evidently virulence and immunogenicity in ibv is a multi-genic trait, and future studies must aim to pursue a better understanding and exploitation of the roles of various viral proteins in the host, if any advances are to be made in controlling the disease in poultry. identification of a noncanonically transcribed subgenomic mrna of infectious bronchitis virus and other gammacoronaviruses development and use of the h strain of avian infectious bronchitis virus from the netherlands as a vaccine: a review the role of viral population diversity in adaptation of bovine coronavirus to new host environments spike protein assembly into the coronavirion: exploring the limits of its sequence requirements middle east respiratory syndrome coronavirus quasispecies that include homologues of human isolates revealed through whole-genome analysis and virus cultured from dromedary camels in saudi arabia complete nucleotide sequence of polyprotein gene and genome organization of turkey coronavirus gene of the avian coronavirus infectious bronchitis virus is not essential for replication coronaviruses in poultry and other birds coronavirus avian infectious bronchitis virus infectious bronchitis virus: evidence for recombination within the massachusetts serotype lowcomplexity regions within protein sequences have position-dependent roles cytotoxic t lymphocytes are critical in the control of infectious bronchitis virus in poultry the long view: years of infectious bronchitis research taxonomic studies on strains of avian infectious bronchitis virus using neutralisation tests in tracheal organ cultures coronavirus virulence genes with main focus on sars-cov envelope gene characterization of a new genotype and serotype of infectious bronchitis virus in western africa cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell-cell but not virus-cell fusion coronaviruses: an rna proofreading machine regulates replication fidelity and diversity detection of infectious bronchitis virus the eukaryotic linear motif resource elm: years and counting molecular epizootiology of infectious bronchitis virus in sweden indicating the involvement of a vaccine strain effects of chicken anaemia virus and infectious bursal disease virus-induced immunodeficiency on infectious bronchitis virus replication and genotypic drift antigenic and s- genomic characterization of the delaware variant serotype of infectious bronchitis virus characterization of the rna components of a putative molecular switch in the untranslated region of the murine coronavirus genome infectious bronchitis viruses with naturally occurring genomic rearrangement and gene deletion neither the rna nor the proteins of open reading frames a and b of the coronavirus infectious bronchitis virus are essential for replication monoclonal antibodies to three structural proteins of avian infectious bronchitis virus characterization of epitopes and antigenic differentiation of australian strains a ribosomal frameshift signal in the polymerase-encoding region of the ibv genome data from years of molecular typing infectious bronchitis virus field isolates coronavirus transcription mediated by sequences flanking the transcription consensus sequence viral respiratory diseases (ilt, ampv infections, ib): are they ever under control? templatebased protein structure modeling using the raptorx web server location of antigenic sites defined by neutralizing monoclonal antibodies on the s avian infectious bronchitis virus glycopolypeptide activation of the chicken type i ifn response by infectious bronchitis coronavirus two genotypes of infectious bronchitis virus are responsible for serological variation in kwazulu-natal poultry flocks prior to antigenic domains on the peplomer protein of avian infectious bronchitis virus: correlation with biological functions first experimental evidence of recombination in infectious bronchitis virus evolution of infectious bronchitis virus in taiwan: positively selected sites in the nucleocapsid protein and their effects on rna-binding activity analysis of an immunodominant region of infectious bronchitis virus the molecular biology of coronaviruses assembly and immunogenicity of coronavirus-like particles carrying infectious bronchitis virus m and s proteins origin and characteristics of the recombinant novel avian infectious bronchitis coronavirus isolate ck/ch/ljl/ infectious bronchitis viruses with a novel genomic organization the coronavirus nucleocapsid is a multifunctional protein refining the balance of attenuation and immunogenicity of respiratory syncytial virus by targeted codon deoptimization of virulence genes epidemiology of infectious bronchitis virus in belgian broilers: a retrospective study complete genome sequences of two chinese virulent avian coronavirus infectious bronchitis virus variants identification of amino acids involved in a serotype and neutralization specific epitope within the s subunit of avian infectious bronchitis virus differences in the tissue tropism to chicken oviduct epithelial cells between avian coronavirus ibv strains qx and b are not related to the sialic acid binding properties of their spike proteins comparison of vaccine subpopulation selection, viral loads, vaccine virus persistence in trachea and cloaca, and mucosal antibody responses after vaccination with two different arkansas delmarva poultry industry-derived infectious bronchitis virus vaccines influenza a virus attenuation by codon deoptimization of the ns gene for vaccine development chicken interferon type i inhibits infectious bronchitis virus replication and associated respiratory illness contributions of the s spike ectodomain to attachment and host range of infectious bronchitis virus mapping of the receptor-binding domain and amino acids critical for attachment in the spike protein of avian coronavirus infectious bronchitis virus coronavirus pathogenesis rna-rna and rna-protein interactions in coronavirus replication and transcription severe acute respiratory syndrome coronavirus protein accelerates murine coronavirus infections cell mediated and humoral immune response in chickens infected with avian infectious bronchitis infectious bronchitis virus s expressed from recombinant virus confers broad protection against challenge the structural and accessory proteins m, orf a, orf b, and orf of middle east respiratory syndrome coronavirus (mers-cov) are potent interferon antagonists experimental confirmation of recombination upstream of the s hypervariable region of infectious bronchitis virus relationship between serotypes and genotypes based on the hypervariable region of the s gene of infectious bronchitis virus vigor, an annotation program for small viral genomes did chickens go north? new evidence for domestication discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus analysis of a qx-like avian infectious bronchitis virus genome identified recombination in the region containing the orf a, orf b, and nucleocapsid protein gene sequences adrian knoetze and rainbow veterinary laboratory are thanked for providing strain / for this study. funding was provided by the poultry section, department of production animal studies. supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/ . /j.meegid. . . . key: cord- - zbvgvk authors: kühnert, denise; wu, chieh-hsi; drummond, alexei j. title: phylogenetic and epidemic modeling of rapidly evolving infectious diseases date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: zbvgvk epidemic modeling of infectious diseases has a long history in both theoretical and empirical research. however the recent explosion of genetic data has revealed the rapid rate of evolution that many populations of infectious agents undergo and has underscored the need to consider both evolutionary and ecological processes on the same time scale. mathematical epidemiology has applied dynamical models to study infectious epidemics, but these models have tended not to exploit – or take into account – evolutionary changes and their effect on the ecological processes and population dynamics of the infectious agent. on the other hand, statistical phylogenetics has increasingly been applied to the study of infectious agents. this approach is based on phylogenetics, molecular clocks, genealogy-based population genetics and phylogeography. bayesian markov chain monte carlo and related computational tools have been the primary source of advances in these statistical phylogenetic approaches. recently the first tentative steps have been taken to reconcile these two theoretical approaches. we survey the bayesian phylogenetic approach to epidemic modeling of infection diseases and describe the contrasts it provides to mathematical epidemiology as well as emphasize the significance of the future unification of these two fields. molecular phylogenetics has had a profound impact on the study of infectious diseases, particularly rapidly evolving infectious - Ó elsevier b.v. doi: . /j.meegid. . . agents such as rna viruses. it has given insight into the origins, evolutionary history, transmission routes and source populations of epidemic outbreaks and seasonal diseases. one of the key observations about rapidly evolving viruses is that the evolutionary and ecological processes occur on the same time scale ). this is important for two reasons. first, it means that neutral genetic variation can track ecological processes and population dynamics, providing a record of past evolutionary events (e.g., genealogical relationships) and past ecological/population events (geographical spread and changes in population size and structure) that were not directly observed. second, the concomitance of evolutionary and ecological processes leads to their interaction that, when non-trivial, necessitates joint analysis. arguably the most studied infectious disease agent to date has been human immunodeficiency virus (hiv) and it has been the subject of thousands of phylogenetic studies. these have shed light on many aspects of hiv evolutionary biology, epidemiology, origins, phylogeography, transmission dynamics and drug resistance. in fact, the vast body of literature on hiv makes it clear that almost every aspect of the biology of a rapidly evolving pathogen can be better understood in the context of the evolution of the virus. whether it is retracing the zoonotic origins of the hiv pandemic or describing the interplay between the virus population and its host's immune system, a phylogenetic analysis frequently sheds light. although probabilistic modeling approaches to phylogenetics predate sanger sequencing (edwards and cavalli-sforza, ) , it was not until the last decade that probabilistic modeling became the dominant approach to phylogeny reconstruction. part of that dominance has been due to the rise of bayesian inference (huelsenbeck et al., ) , with its great flexibility in describing prior knowledge, its ability to be applied via the metropolis-hastings algorithm to complex highly parametric models, and the ease with which multiple sources of data can be integrated into a single analysis. the history of probabilistic models of molecular evolution and phylogenetics is a history of gradual refinement; a process of selection of those modeling variations that have the greatest utility in characterizing the ever-growing empirical data. the utility of a new model has been evaluated either by how well it fits the data (formal model comparison or goodness-of-fit tests) or by the new questions that it allows a researcher to ask of the data. in this review we will describe the modern phylogenetic approach to the field of infectious diseases, and particularly with reference to bayesian inference of the phylogenetic epidemiology of rapidly evolving viral pathogens such as hepatitis c virus (hcv), hiv and influenza a virus. the review is separated into two main sections. in section we discuss phylogenetic methods for reconstructing the history of infectious epidemics, including identification of origins, dating of common ancestors, relaxed phylogenetics and coalescent-based population dynamics. in section we review epidemiological models and finish by outlining progress in the development of phylodynamical models that marry statistical phylogenetics with dynamical modeling. the introduction of an efficient means of calculating the probability of a sequence alignment given a phylogenetic tree (known as the phylogenetic likelihood; felsenstein, ) heralded the beginning of practical phylogenetic tree reconstruction in a statistical framework. at around the same time the coalescent was introduced: a theory relating the shape of the genealogy of a random sample of individuals to the size of the population from which they came (kingman, ; see section . for details). both of these advances have been subsequently developed to the point that, together they enable the estimation of viral evolutionary histories and past population dynamics. bayesian inference brings together the likelihood, pr(djh) (the probability of the data given the model parameters) and the prior, p(h) (the probability of the model parameters prior to seeing the data), so that the posterior probability of the model parameters (h) given the data is: in a standard phylogenetic setting, the probabilistic model parameters include the phylogenetic tree, coalescent times and substitution parameters, and a prior probability distribution over these parameters must be specified. by using kingman's coalescent as a prior density on trees, bayesian inference can be used to simultaneously estimate the phylogeny of the viral sequences and the demographic history of the virus population (drummond et al., (drummond et al., , , see box ). extension of phylogenetic inference methods to accommodate time-stamped sequence data (rambaut, ; drummond et al., ) and relaxation of the assumption of a strict molecular clock (thorne et al., ; kishino et al., ; sanderson, ; drummond et al., ; rannala and yang, ) provided sophisticated methods for ancestral divergence time estimation. for virus species that occupy more than one host species (e.g influenza a), models that aim to detect cross-species transmission may provide clues to the origin of a virus strain in a host population (reis et al., ). when a new epidemic emerges, one of the first goals is to trace it back to its genetic and geographic origin. the reconstruction of phylogenetic trees to infer the evolutionary relationships has been a key tool to uncover the origin of regional epidemics such as those resulting from hiv (gao et al., ; santiago et al., ) , hcv markov et al., ) and sars coronavirus (sars-cov) . some studies have also attempted to use phylogenetic trees to draw conclusions about transmission history and geographic spread of viral epidemics (motomura et al., ; santiago et al., ; gilbert et al., ) . however, great care should be taken when coming to conclusions about aspects of the epidemic process that are not explicitly modeled in the reconstruction of the phylogenetic tree and even if they are, the user needs to consider the appropriateness of the underlying model assumptions. one common and straightforward method used to identify the origin of an epidemic involves determining the non-epidemic genotype or lineage most closely related to the epidemic, i.e., the molecular sequences clustered most closely with the epidemic strain on a phylogenetic tree. while the method is intuitive, its success heavily depends on the collected data. the closest simian immunodeficiency virus (siv) relative of hiv- is sivcpz (gao et al., ; santiago et al., ) , which is harbored in chimpanzee sub-species pan troglodytes troglodytes and p.t. schweinfurthii in the form of the respective sub-species specific siv lineages sivcpzptt and sivcpzpts. although sivcpz became the prime candidate for the zoonotic source of hiv- as soon as it was identified, alternative sources could not be ruled out due to the paucity of identified chimpanzee infections (vanden haesevelde et al., ) . the source of hiv- was confirmed much later after the collection of sivcpz from fecal samples of wild p. t. troglodytes apes in the cameroon forest . hiv- groups m and n are much more closely related to sequences from the fecal samples than previously identified sivcpz strains. this finding uncovered the distinct origins of hiv- group m (pandemic) and group n (non-pandemic) traced to chimpanzee communities of southeastern and central cameroon respectively. the precise geographic identification of these wildlife chimpanzee reservoirs of hiv- by phylogenetic techniques provided the crucial evidence that sivcpz gave rise to the hiv/aids pandemic. conversely, if strains sufficiently closely related to the epidemic strain cannot be identified then phylogenetic trees are not able to easily provide answers about origins. for example, there has been much heated debate on the origin of the h n influenza a pandemic -whether its source was avian, non-human mammalian or even human. the uncertainty mainly stems from the absence of sequences from the immediate ancestral source population of the virus (gibbs and gibbs, ) . a similar, though less severe problem has been encountered with the search for the origin of hiv- o group. strains of hiv- o group have been revealed to be most closely related to sivgor found in western lowland gorillas (gorilla gorilla gorilla) takehisa et al., ). however, hiv-o sequences are moderately divergent from the known sivgor sequences and consequently, the route of transmission that has given rise to hiv- o group and sivgor is still indeterminate. the interspersion of an emergent viral strain with other strains in a phylogenetic tree is often interpreted as evidence supporting multiple independent viral introductions. for example, hiv lineages are paraphyletic with siv lineages creating several separate clusters of hiv suggesting multiple zoonotic viral transmissions into the human population (santiago et al., ; keele et al., ) . while it is intuitive that separate clusters of the emergent virus suggest multiple introductions, it is not clear from the number of clusters alone how many independent events are responsible for the observed pattern. incomplete taxon sampling will lead to undercounting. for example, there may exist an unsampled sequence that will split an emergent viral cluster, or an additional unsampled emergent cluster. both scenarios, if detected, would increase the lower bound of the inferred number of events. the number of events could also be incorrectly estimated due to phylogenetic estimation error. finally, in situations where the event is potentially reversible, such as with drug-resistance mutations, e.g., adamantane resistance in h n influenza virus (nelson et al., ) , it is quite possible that reversions are also present in the phylogenetic history, and these are not always detectable by a simple parsimony reconstruction, again leading to undercounting. for all these reasons, the applications of bayesian modeling of phylogeography and character evolution on phylogenies is crucial to quantitatively assess the uncertainty generated from these different sources of error (see section . ). in contrast to hiv- , it has been clearly established for almost two decades that the progenitor of hiv- is sivsm from sooty mangabey (cerocebus torquatus atys) (hirsch et al., ; gao et al., ) . it was suggested by (santiago et al., ) that the geographic origin of hiv- groups a and b are in the eastern sooty mangabey range according to the clear geographic clustering displayed in the phylogenetic tree and branching position of the hiv- strains. although this heuristic approach to locating phylogeographic origins is commonly used, it has several disadvantages aside from the sampling error mentioned earlier. first, it relies on strong geographic signals to produce an unambiguous geographic clustering pattern in the trees. second, the lack of a formal statistical framework results in an inability to quantify the associated uncertainty with the geographic estimates. a number of statistical phylogenetic methods aim to reconstruct the migration process by treating geographic locations as another state that evolves down the tree. the states are either discrete (lemey et al., b) , denoted by names of cities or provinces, or continuous represented by the latitude and longitude of the location (biek et al., (biek et al., , lemey et al., ) . even with comprehensive sampling, using a single phylogenetic tree is insufficient to reflect the complex genetic origin of virus species that undergo recombination or reassortment. reassortment arises when segments of the viral genome come from different viruses, while recombination also requires the genetic material from one source to (break and) join with that from another. these two processes enable the generation of novel combinations from two existing genotypes. moreover, these often large genetic changes may provide the potential for adaptation to a new host species (parrish et al., ) . reassortment has played an important role in the evolution of the influenza a virus (lindstrom et al., ; holmes et al., ; nelson et al., ) . evidence for recombination have also been found in dengue (holmes et al., ) , hiv, hcv and sars-cov . there are many phylogenetic methods that aim to detect recombination by identifying discordance in the topologies of different parts of the alignment (grassly and holmes, ; salminen et al., ; lole et al., ; smith, ; robertson et al., ; paraskevis et al., ) , which is a potential consequence of recombination. most of these methods use a sliding window approach to compute a summary statistic along the length of sequence. phylogenetic approaches are based on estimating either (i) bootstrap values or (ii) clade posterior probabilities for each window and a sudden change in bootstrap value, clade posterior probability or site percentage identity is an indication of the presence of a breakpoint around the region. other methods explicitly estimate the position of the breakpoint in an alignment, providing access to test the strength of support for recombination (holmes et al., ) . finally, some approaches portray the evolutionary history by networks to incorporate horizontal transfer (huson, ) or ancestral recombination graphs (bloomquist and suchard, ) . as a rule, rna viruses mutate rapidly, so that viruses isolated only a few months apart may exhibit measurable genetic differences (drummond et al., a and references therein) . indeed, the mutation rate of some rna viruses is so high that it can result in evolutionary changes within a host during the course of infection. this is particularly true of long term chronic infections caused by viruses such as hiv and hcv. it is therefore not appropriate to consider the analysis of sequences that have been sampled years apart as if they are contemporaneous. sequence data with this type of temporal structure are called heterochronous and from such data the substitution rate can be estimated and divergence times calibrated to a calendar scale. here, a tree with branch lengths in calendar units is termed a ''time tree''. fig. depicts an example of a serially sampled time tree of a rapidly evolving virus. to account for temporal structure in sequence data, the earliest methods estimated the time scale by estimating a gene tree with unconstrained branch lengths and then performing a linear regression of root-to-tip genetic distance against sampling times (see for review drummond et al., b) . this method was used to provide the first estimate of the time of the most recent common ancestor (t mrca ) of hiv- m group, placing it in the s (korber et al., ) . despite its simplicity, this method also accurately estimated the age of the oldest hiv sequence sampled in . a maximum likelihood based method (the single rate dated tips (srdt) model; rambaut, ) , estimates ancestral divergence times and overall substitution rate on a fixed tree, assuming a strict molecular clock. the srdt model was used to date the most recent common ancestor of hiv- subtype a in ± and that of subtype b in ± (lemey et al., ) . using the serial coalescent as a tree prior in bayesian coalescent methods (drummond et al., (drummond et al., , drummond and rambaut, ) allows the time scale to be simultaneously estimated with other phylogenetic and demographic parameters. recently, a relaxed clock bayesian coalescent analysis that included two historical viral samples from (zr ) and (drc ) (worobey et al., ) , pushed back the estimated t mrca of hiv- m group to hiv- m group to ( hiv- m group to - . besides estimating the time of an epidemic outbreak, it may also be important to know how long the ancestors of the epidemic strain had circulated in the source population prior to the epidemic. this can sometimes be indicated by the length of the branch ancestral to the epidemic clade. in the case of the swine-origin influenza a virus, the length of the branch leading to s-oiv strains is estimated to be - years depending on the viral segment analyzed, suggesting roughly a decade of unsampled diversity (smith et al., ) . to estimate the age of the common ancestor of sivsm strains, the t mrca of hiv- /sivsm has been dated, indicating that the common ancestry prior the zoonosis of hiv- group a and b spans only the last few centuries (wertheim and worobey, ). this does not necessarily indicate that sivsm first arose only centuries ago, just that the common ancestor of all current sivsm may be recent. however, even this conclusion has recently been questioned (worobey et al., ) as a result of independent calibration evidence that suggests the t mrca could in fact be greater than , years ago, leading to debate about the fidelity of the statistical substitution models commonly employed for divergence time dating when the true divergence times are very ancient compared to the sampling interval. as demonstrated by wertheim and pond, , substitution models that do not take into account the effects of selection can produce underestimated branch lengths leading to much younger age estimates in presence of purifying selection. this will be more problematic for data sets for which the total sampling interval is only a small fraction of the total age of the tree. while incorporating sampling dates provides additional information to phylogenetic inference, it also implies that the reliability of those dates has a heavy impact on the validity of the inference. the h n influenza virus that re-emerged in was found to have missed decades of evolution and was genetically remarkably similar to the h n virus (nakajima et al., ) . it is thus thought to be descended from a strain that was kept frozen in an unknown laboratory for perhaps decades before again becoming a ''wild'' strain again (zimmer and burke, ). if the missing evolution is not corrected for, analyses including the re-emergent strains produce biased date estimates and increased variances of the t mrca of the re-emergent lineages and across the phylogeny (wertheim, ) . in cases where the sampling dates of sequences are contentious or unknown, a method that can handle sequences with unknown dates is required. for example, the leaf-dating method estimates the unknown date or age of a sequence as a parameter, treating it the same way as the age of internal nodes (drummond et al., c; nicholls and gray, ; shapiro et al., ) . unrealistic sampling dates may also be the result of human error and are thus not recognized prior to an analysis. therefore, diagnostics for unrealistic dates are important to pick up errors in the recorded dates. one possible method is to plot the root-to-tip genetic distance against sampling year if the virus does not display significant departure from constant rate (wertheim, ) . another is to check calibrations by dropping each calibration point in turn and re-estimating the date to confirm that the estimated dates are consistent ryder and nicholls, ) . early methods that accommodated heterochronous data assumed a strict clock model. however, a comprehensive study of heterochronous rna viral sequences using the srdt model (rambaut, ) demonstrated that the majority of the rna viral species studied rejected the constant rate molecular clock hypothesis (jenkins et al., ) . the unrooted phylogeny is the other extreme of the scale of rate variability across branches of a phylogenetic tree. neither of them is a realistic representation of the underlying evolutionary process and the reality lies somewhere between the two. this has spawned the development of numerous methods that relax the molecular clock assumption and differ in their assumption of the pattern of rate variation across the branches. the local clock model approach assigns different rates to clades/ regions of the tree. however, without external information, it is difficult to know a priori what is the best partitioning of the tree into local clock models. bayesian model averaging overcomes the challenge of rate assignment by averaging over all possible local clock models , estimating the substitution rates, and the number and position of changes in substitution rate, simultaneously. another category of relaxed clock models is based on 'rate smoothing', including non-parametric rate smoothing (sanderson, ) , penalized likelihood (sanderson, ) and bayesian autocorrelated relaxed clock methods (thorne et al., ; kishino et al., ; aris-brosou and yang, ; rannala and yang, ) . these methods restrict the rates on parent and descendant branches to be similar by penalizing large departures from parent branch rates. hence, rate variation is expected to occur through small and frequent changes. different bayesian autocorrelated clock models differ in the distribution used to model a branch rate given its parent rate (thorne et al., ; kishino et al., ) . however, analysis of sequence data from influenza a and dengue- do not provide any evidence of autocorrelation of branch rates suggesting that autocorrelated models may not be appropriate when analyzing a genealogy of sequences from a single virus species. whereas lineage-effects may be expected to cause autocorrelation of rates (through incremental changes to life-history, metabolic rate et cetera), the gene-specific action of darwinian selection will also cause apparent rate variation among lineages, by producing a general over-dispersion of the molecular clock over the entire phylogeny (takahata, (takahata, , . this second source of rate variation among lineages may be better modeled by uncorrelated relaxed clock models , which make no assumption about the autocorrelation of rates between ancestral and descendent branches. published analyses have provided strong evidence supporting the uncorrelated relaxed clock model (e.g., salemi et al., ; worobey et al., ) over the strict clock model. as well as estimating the age of ancestral divergences, it is also of interest to estimate the time of cross-species transmission if the disease is zoonotic in origin. one method of identifying the time of the host-switch is by applying non-homogeneous substitution models. the motivation of non-homogeneous substitution models is to acknowledge possible differences in pattern of substitution in the virus within different host species, which violates the assumptions of homogeneity and stationarity underlying the standard substitution models. therefore it may be more appropriate to apply different substitution models to different parts of the tree (forsberg and christiansen, ) . non-homogeneous substitution models permit the equilibrium frequencies, and hence the model parameters, to change on a branch and all the descendant lineages from the point of change are assumed to have different equilibrium base frequencies to the lineages prior to that point. this technique has been used to suggest that the immediate ancestral population of influenza a virus resided in a mammalian host (reis et al., ). however, it does not indicate whether the most recent common ancestor of the swine influenza virus and the virus resided in humans or other mammals. interpretation of estimated divergence times can be difficult. there may be direct ancestors that are more ancient, but the lineages that would reveal them have not been sampled or did not survive to the present due to processes such as genetic drift. therefore, the estimated t mrca may not answer the question of interest. for epidemics that resulted from a zoonotic transmission, the host switch event is of paramount interest, but estimating the t mrca of the epidemic strain does not directly estimate the time of the transmission, and only serves as a lower bound. likewise, if there have been processes causing a loss of genetic diversity in the past or the sampling is not comprehensive, then the estimated t mrca could be substantially younger than the age of the viral lineage. an obvious example of the former occurs in seasonal influenza due to seasonal population fluctuations and also strong positive darwinian selection caused by immune surveillance (fitch et al., ; bush et al., ) , leading to rapid lineage turnover and a recent common ancestor of any single-season sample. similarly, the analysis by worobey et al. ( ) shows that the t mrca of hiv- group m seems to have been pushed back due to the inclusion of an additional pre-epidemic sample from which is highly divergent to the sequence (zr ). in general the inclusion of older samples can increase the estimated age of root by (i) revealing previously unsampled lineages that are outgroup to the t mrca estimated without them, or (ii) simply because more temporal sampling breaks up long internal branches as well as potentially revealing ancient evidence of variants that were assumed modern, resulting in a slower estimated rate and therefore older estimated root height. finally, it is likely that current techniques alone cannot always recover accurate divergence dates in the distant past, as illustrated by recent analyses suggesting a much deeper history of siv (worobey et al., ) than previously suggested (sharp et al., ; wertheim and worobey, ). fig. illustrates the problem with three estimated viral time-trees that have vastly different inferred ages of their most recent common ancestor. we would expect the greatest confidence in the inferred age of the human influenza a time-tree where the sample period is a large fraction of the total age of the time tree, and the least confidence in the inferred age of the hepatitis c time-tree in which the sampling period is a small fraction of the inferred age of greater than years. so, apart from better models of rate variation across lineages (see guindon et al., , for early steps in this direction), future research in divergence time dating will likely focus on models that more accurately account for purifying selection and its role in maintaining the structure and function of the encoded genes. the impact of darwinian selection is expressed both in distortions of the genealogy (o'fallon, ; o'fallon et al., ) and the substitution process (e.g., bloom et al., ; cartwright et al., ) from neutral expectations. consideration of the action of pervasive purifying selection is especially important in viral genomes prone to clonal interference and which are compact, information rich and subject to great levels of functional and structural constraint in their evolutionary trajectories, especially when considering long time periods. beyond that there is also a need for more statistically rigorous methods of incorporating diverse sources of calibration information, such as biogeography, archaeology and paleontological evidence. bayesian statistical frameworks are uniquely suited for this sort of integration of multiple sources of information. genealogy-based population genetics can be used to infer demographic parameters including population size, rate of growth or decline, and population structure. when the characteristic time scale of demographic fluctuations are comparable to the rate of accumulations of substitutions then past population dynamics are ''recorded'' in the substitution patterns of molecular sequences. coalescent theory can therefore be combined with temporal information in heterochronous sequences to uncover past epidemiological events and pinpoint them on a calendar time scale. kingman's coalescent (kingman, ) describes the relationship between the coalescent times in a sample genealogy and the population size assuming an idealized wright-fisher population (fisher, ; wright, ) . the original formulation was for a constant population, but the theory has since been generalized to any deterministically varying function of population size for which the integral r t t nðtÞ À dt can be computed (griffiths and tavaré, ) . parametric models with a pre-defined population function, such as exponential growth, expansion model and logistic growth models can easily be used in a coalescent framework (see fig. and box for details). for example a ''piecewise-logistic'' population model was employed in a bayesian coalescent framework to estimate the population history of hcv genotype a infections in egypt . this analysis demonstrated a rapid expansion of hcv in egypt between - , consistent with the hypothesis that public health campaigns to administer anti-schistosomiasis injections had caused the expansion of an hcv epidemic in egypt. the coalescent process is highly variable, so sampling multiple unlinked loci (felsenstein, ; heled and drummond, ) or increasing the temporal spread of sampling times (seo et al., ) can both be used to increase the statistical power of coalescentbased methods and improve the precision of estimates of both population size and substitution rate (seo et al., ) . however in many virus species, the entire genome acts as a single locus, or undergoes recombination only when the opportunity arises through superinfection. the lack of independent loci therefore places an upper limit on the precision of estimates of population history. in many situations the precise functional form of the population size history is unknown, and simple population growth functions may not adequately describe the population history of interest. non-parametric coalescent methods provide greater flexibility by estimating the population size as a function of time directly from the sequence data and can be used for data exploration to guide the choice of parametric population models for further analysis. these methods first cut the time tree into segments, then estimate the population size of each segment separately according to the coalescent intervals within it. the main differences among these methods are (i) how the population size function is segmented along the tree, (ii) the statistical estimation technique employed and (iii) in bayesian methods, the form of the prior density on the parameters governing the population size function. in the 'classic skyline plot' (pybus et al., ) each coalescent interval is treated as a separate segment, so a tree of n taxa has n À population size parameters. however, the true number of population size changes is likely to be substantially fewer, and the generalized skyline plot (strimmer and pybus, ) acknowledges this by grouping the intervals according to the small-sample akaike information criterion (aic c ) (burnham and anderson, ) . the epidemic history of hiv- was investigated using the generalized skyline plot (strimmer and pybus, ) , indicating the population size was relatively constant in the early history of hiv- subtype a in guinea-bissau, before expanding more recently (lemey et al., ) . using this information, the authors then employed a piecewise expansion growth model, to estimate the time of expansion to a range of - . while the generalized skyline plot is a good tool for data exploration, and to assist in model selection (e.g., pybus et al., ; lemey et al., ) , it infers demographic history based on a single input tree and therefore does not account for sampling error produced by phylogenetic reconstruction nor for the intrinsic stochasticity of the coalescent process. this shortcoming is overcome by implementing the skyline plot method in a bayesian statistical [ , ] and represents a significant fraction (% . t mrca ) of the overall tree height, but still small enough that the estimated root should be viewed with caution. (c) a phylogeny of human influenza a subtype h n : the sampling interval spans . years [ . , . ] and represents almost the full height of the tree (% . t mrca ), and all divergence times are likely to be quite accurately estimated, since interpolation between many known sample times is inherently less error prone than extrapolation to ancient divergence times. framework, which simultaneously infers the sample genealogy, the substitution parameters and the population size history. further extensions of the generalized skyline plot include modeling the population size by a piecewise-linear function instead of a piecewise-constant population, allowing continuous changes over time rather than sudden jumps. the bayesian skyline plot (drummond et al., ) has been used to suggest that the effective population size of hiv- group m may have grown at a relatively slower rate in the first half of the twentieth century, followed by much faster growth (worobey et al., ) . on a much shorter time scale, the bayesian skyline plot analysis of a dataset collected from a pair of hiv- donor and recipient was used to reveal a substantial loss of genetic diversity following virus transmission (edwards et al., ) . further analysis with a constant-logistic growth model estimated that more than % of the genetic diversity of hiv- present in the donor is lost during horizontal transmission. this has important implications as the process underlying the bottleneck determines the viral fitness in the recipient host. one disadvantage of the bayesian skyline plot is that the number of changes in the population size has to be specified by the user a priori and the appropriate number is seldom known. one solution is provided by methods that perform bayesian model averaging on the demographic model utilizing either reversible jump mcmc (opgen-rhein et al., ) or bayesian variable selection (heled and drummond, ) , and in which case the number of population size changes is a random variable estimated as part of the model. the methods for demographic inference discussed so far assume no subdivision within the population of interest. like changes in the size, population structure can also have an effect on the pattern of the coalescent interval sizes, and thus the reliability of results can be questioned when population structure exists ). in the next section we will discuss approaches to phylogeographic inference, including coalescent approaches to population structure. phylogeography is a field that studies the evolution and dispersal process that has given rise to the observed spatial distribution of population or taxa. phylogeographic methods can be divided into two approaches. the first performs post-tree-reconstruction analysis to answer phylogeographic questions, while the second jointly estimates the phylogeny and phylogeographic parameters of interest. when treating geographic location as discrete states, the former approach has been popular in the past couple of decades. it has the advantage of being less computationally intensive, but the outcome of the analysis depends on the input tree. due to its simplicity, the most popular method for inferring ancestral locations has been maximum parsimony (slatkin and maddison, ; swofford, ; maddison and maddison, ; wallace et al., ) , however this method does not allow for any probabilistic assessment of the uncertainty associated with the reconstruction of ancestral locations. a mugration model is a mutation model used to analyze a migration process. a recent study of influenza a h n virus introduced a fully probabilistic 'mugration' approach by modeling the process of geographic movement of viral lineages via a continuous time markov process where the state space consists of the locations from which the sequences have been sampled (lemey et al., b ). this fig. . the underlying wright-fisher population and serially-sampled genealogies from two populations. the first population has a constant population size over the history of the genealogy, while the second population has been exponentially growing. the coalescent likelihood calculates the probability of a genealogy given a particular background population history (e.g., constant or exponentially growing) and can therefore be employed to estimate the population history that best reflects the shape of the co-estimated phylogeny. facilitates the estimation of migration rates between pairs of locations. furthermore, the method estimates ancestral locations for internal nodes in the tree and employs bayesian variable selection (bvs) to infer the dominant migration routes and provide model averaging over uncertainty in the connectivity between different locations (or host populations). this method has helped with the investigation of the influenza a h n origin and the paths of its global spread, and also the reconstruction of the initial spread of the novel h n human influenza a pandemic (lemey et al., b) . however, a shared limitation of models for discrete location states is that ancestral locations are limited to sampled locations. as demonstrated by the analysis of the data set on rabies in dogs in west and central africa, absence of sequences sampled close to the root can hinder the accurate estimation of viral geographic origins (lemey et al., b) . phylogeographic estimation is therefore improved by increasing both the spatial density and the temporal depth of sampling. however, dense geographic sampling leads to large phylogenies and computationally intensive analyses. the structured coalescent (hudson, ) can also be employed to study phylogeography. the structured coalescent has also been extended to heterochronous data (ewing et al., ) , thus allowing the estimation of migration rates between demes in calendar units. the serial structured coalescent was first applied to an hiv dataset with two demes to study the dynamics of subpopulations within a patient (ewing et al., ) , but the same type of inference can be made at the level of the host population. further development of the model allowed for the number of demes to change over time (ewing and rodrigo, a) . migrate (beerli and felsenstein, ) also employs the structured coalescent to estimate subpopulation sizes and migration rates in both bayesian and maximum likelihood frameworks and has recently been used to investigate spatial characteristics of viral epidemics (bedford et al., ) . additionally, some studies have focused on the effect of ghost demes (beerli, ; ewing and rodrigo, b) , however no models explicitly incorporating population structure, heterochronous samples and nonparametric population size history are yet available. one ad hoc solution involves modeling the migration process along the tree in a way that is conditionally independent of the population sizes estimated by the skyline plot (lemey et al., a) . thus, given the tree, the migration process is considered independent of the coalescent prior. however this approach does not capture the interaction between migration and coalescence that is implicit in the structured coalescent, since coalescence rates should depend on the population size of the deme the lineages are in. as we will see in the following section, statistical phylogeography is one area where the unification of phylogenetic and mathematical epidemiological models looks very promising. in some cases it is more appropriate to model the spatial aspect of the samples as a continuous variable. the phylogeography of wildlife host populations have often been modeled in a spatial continuum by using diffusion models, since viral spread and host movement tend to be poorly modeled by a small number of discrete demes. one example is the expansion of geographic range in eastern united states of the raccoon-specific rabies virus (biek et al., ; lemey et al., ) . brownian diffusion, via the comparative method (felsenstein, ; harvey and pagel, ) , has also been utilized to model the phylogeography of feline immunodeficiency virus collected from the cougar (puma concolor) population around western montana. the resulting phylogeographic reconstruction was used as proxy for the host demographic history and population structure, due to the predominantly vertical transmission of the virus (biek et al., ) . however, one of the assumptions of brownian diffusion is rate homogeneity on all branches. this assumption can be relaxed by extending the concept of relaxed clock models to the diffusion process . simulations show that the relaxed diffusion model has better coverage and statistical efficiency over brownian diffusion when the underlying process of spatial movement resembles an over-dispersed random walk. like their mugration model counterparts, these models ignore the interaction of population density and geographic spread in shaping the sample genealogy. however there has been progress in the development of mathematical theory that extends the coalescent framework to a spatial continuum (barton et al., (barton et al., , a , although no methods have yet been developed providing inference under these models. box : the anatomy of a bayesian coalescent analysis using mcmc bayesian phylogenetic inference by markov chain monte carlo (mcmc) (yang and rannala, ; mau et al., ) involves the simulation of the joint posterior distribution of substitution model parameters (/) and the phylogenetic tree given the sequence data (d). by restricting the phylogenetic model to time-trees (see fig. ) and coupling the phylogenetic likelihood with a coalescent prior, the parameters (h) of the population history, n h (t), can also be estimated simultaneously by sampling from the posterior probability distribution (drummond et al., ) : the term pr(djg,/) is often referred to as the phylogenetic likelihood, and is the probability of the data given the time-tree g and substitution model parameters. it can be computed by the pruning algorithm (felsenstein, ) , which efficiently sums over all ancestral sequence states at the internal nodes of the tree. an extension of the likelihood accommodates heterogeneity across sites (yang, ) . if the time-tree g relates a heterochronous sample of sequences, then the substitution parameters / also includes the overall substitution rate l, and this can be estimated from the heterochronous data, so that the population history is estimated on a calendar scale. the normalizing constant pr(d) is also known as the partition function or marginal likelihood and its magnitude provides a measure of model support, although its estimation requires advanced mcmc techniques (e.g., thermodynamic integration or transdimensional mcmc). coalescent models come into play when determining the prior density for the time-tree topology and coalescent/ divergence times. the coalescent provides a probability distribution, f g (gjh), conditional on a deterministic model of population size history, n h (t). its parameters (h) can in turn be estimated as hyperparameters. given a time-tree g = {e g ,t} of n contemporaneous samples composed of an edge graph e g and coalescent times t = {t n = ,t nÀ , . . . ,t ,t } the coalescent density is: the prior distributions f h (h) and f u (/) are usually selected from standard univariate or multivariate distributions. in the previous section we have seen that phylogenetics can be used to infer the date of an outbreak, its source population and the viral transmission history, directly from time-stamped genomic data. whereas phylogenetic models mainly address questions about evolutionary history, dynamical models are often used to make predictions about the future. predictive models are important because they provide the possibility of anticipating certain aspects of the outcome of emerging epidemics and assessing the risk of pandemics, and the potential effects of planned intervention. phylogenetic inference is based on genetic data such as sampled dna sequences from infected hosts. current models using such data to infer information about the past often require simplifying assumptions about the population size e.g., to be constant or to be subject to pure exponential growth. epidemiologists, on the other hand, fit their models to prevalence or incidence data. standard epidemiological models are described by sets of ordinary differential equations tracking the (often non-linear) changes in numbers of susceptible and infected individuals. consequently, the simple prior assumptions for the population sizes (of infected individuals) used in phylogenetics appear inadequate from an ecological perspective. epidemiological models play a major role in deciding which measures of disease control are taken to avoid or stop viral outbreaks. the effects of isolation, vaccination and other measures are estimated through model simulations, serving as a basis for decisions on which public health policies to institute and actions to take. however, knowledge of the phylogenetic history of viral outbreaks can be vital in reconstructing transmission pathways which contributes to effective management and future prevention efforts (e.g., cottam et al., ) . the epidemiological and ecological processes determining the diversity of fast evolving rna viruses act on the same time scale as that on which mutations arise and are fixed in the population (holmes, ) . this implies that genetic sequence data can provide independent evidence on transmission histories. whereas epidemiological data typically provides information about who was infected and when, it generally does not provide positive evidence about transmission history. thus the combination of these sources of information should open the way to more detailed epidemiological inference, including bayesian estimation of contact networks and transmission histories (welch et al., ) . standard epidemiological models are based on flux between host compartments dividing the host population e.g., into susceptible (s), infected (i) and recovered or removed (r) individuals. standard models are termed si, sis and sir. the choice of model is based on the characteristics of the considered disease, the existence of a latent period, immunity after infection et cetera (see box ) (anderson and may, ; keeling and rohani, ) . restricting the focus to the time evolution of the number of individuals in each compartment, these models grasp the overall progress of an epidemic. certain disease characteristics require adaptations or extensions of standard models, for example, the inclusion of asymptomatic infections that account for a sampling bias towards symptomatic infections in case the virus of interest does not always cause noticeable symptoms (e.g., aguas et al., ) . an important threshold ratio is the basic reproduction ratio r , the expected number of secondary infections caused by one primary infection in a completely susceptible population (diekmann et al., ) . based on its value epidemiologists make predictions on the effect of the disease. in classical deterministic epidemiological models, if the basic reproduction ratio is larger than one, an epidemic is expected. box : compartmental models for infectious diseases (keeling and rohani, ) let s, e, i and r be the fractions of susceptible, exposed, infected and recovered/removed individuals in the host population. the left hand side of each equation block gives the model equations, the right hand side the (non-trivial) endemic equilibria, which are only obtainable for r > . the basic reproduction ratio r depends on the corresponding model. apart from the si model, the overall population is assumed to be constant, such that the sum of fractions for each model equals one. under the assumption of homogeneous mixing in the population the transmission term bs i can be derived, which determines the total rate of new infections. si model. fatal infections, eventually killing the infected, can be modeled with only two compartments: susceptible and infected. assume a fixed birth rate m and death rate l. the sir model. transmission of the disease to susceptibles leads to a period of illness until recovery, which in turn implies immunity. demography is described by the birth and death rate l and recovery is obtained at rate c; its reciprocal /c is the mean infectious period. here, r ¼ b lþc . the last equation is redundant since s + i + r = . instead, after infection the individuals go back to the susceptible stage. therefore, the disease can persist even without including newborns in the population. ignoring demography, the dynamics are characterized by coupled differential equations _ s ¼ ci À bsi and _ i = bs i À ci. since s = À i, they can be replaced by one equation. seir model. in order to account for a latent period with assumed average duration /r, the sir model can be extended by including exposed individuals composing a fraction e of the population. exposed individuals are infected, but not yet infectious. the differential equations for s (and r) are as in the sir model. dynamics in e and i are described as follows. further models are sirs, seis, msir, mseir, mseirs, etc., where m denotes passively immune infants, allowing for diseases where an individual can be born with a passive immunity from its mother. typically, epidemiologists fit a suitable set of deterministic differential equations to empirical data, often the number of infections or related hospitalizations in a population. consequently, the model can be used to estimate if an epidemic can be kept under control by measures such as (i) vaccination and (ii) antiviral prophylaxis for susceptible individuals, (iii) treatment of infected individuals or (iv) isolation of infected individuals from susceptible individuals. decisions on public health policies are often based on these estimates. the simplest epidemiological models assume homogeneous mixing within a population. in many cases this assumption is not valid. due to host contact dynamics viral infections spread easily within social units such as schools, cities and farms, less so among them. integration of population structure is therefore essential. however, even within subpopulations individual dynamics might differ stochastically (see fig. ). such randomness can be accounted for by considering stochastic models (see e.g., survey by britton, ) . before introducing stochastic compartmental models thoroughly, we illustrate them based on a stochastic sir model simulation. we simulate the spread of a virus strain in a population divided into n subpopulations which are connected by comparatively rare migration events. let l ¼ f ; . . . ; n À g denote the set of locations. a single infected individual initiates the epidemic in one of the n completely susceptible populations. after an exponentially distributed waiting time one of the following events happens: infection at mass action infection rate b. migration at migration rate m ik for ik l. birth of a susceptible individual at rate l. death of an individual at rate l. fig. shows a realization of the simulated dynamics for n = populations. the epidemic starts in population (blue) and many individuals get infected before the first individuals in population (yellow) and eventually population (red) get infected. let s k , i k and r k be the fractions of individuals in each subpopulation k l. the sum s k + i k + r k equals one for every k l. the deterministic analogue of our model can be described with the following differential equations: however it is important to realize that this set of differential equations cannot capture all of the behaviors of its stochastic counterpart. in fact, starting from a deterministic representation like this, there are multiple stochastic markov processes that exhibit the same deterministic limit, but can potentially have exponentially different behavior in their stochastic properties, such as the time to extinction (e.g., . formally, two distinct sources of variance can be considered in stochastic models of populations (engen et al., ) . the first is environmental stochasticity and is often modeled by admitting temporal variation in the parameters of the population model. the second is demographic stochasticity and describes the stochasticity of fluctuations in populations of finite size due to the inherent unpredictability of individual outcomes. to model demographic stochasticity (also known as internal stochasticity; chen and bokka, ) in the absence of environmental (external) stochasticity, the time-evolution of an epidemic can be represented by a jump process and its corresponding master equation (gardiner, ). the master equation describes the time evolution of the probability distribution over the discrete state space. for the closed sir model (kermack and mckendrick, ) the master equation for the numbers of individuals in each of the three compartments (n s , n i , n r ) is: _ p n s ;n i ;n r ðtÞ ¼ bðn s þ Þðn i À Þp n s þ ;n i À ;n r ðtÞ ð Þ þ cðn i þ Þp n s ;n i þ ;n r À ðtÞ À ðbn s n i þ cn i Þp n s ;n i ;n r ðtÞ a single realization of this epidemic jump process is described by a sequence of timed transition events (individual infection or recovery events). in the closed sir model, the waiting or sojourn time between a pair of sequential events is exponentially distributed (i.e., the transition process is memoryless), and thus the process is a continuous-time markov process. stochastic models of this form can also be viewed in terms of their reaction kinetics. for the closed stochastic sir model above the two 'reactions' are infection and recovery: indicating that a susceptible contacts an infectious individual and gets infected at reaction rate b whereas an infected recovers at reaction rate c. more precisely, the time (s) an individual spends in the susceptible and infected compartments are exponentially distributed with rates bi and c, respectively. it is the binary infection reaction that leads to the non-linear dynamics of the system. for stochastic models r > does not necessarily imply an outbreak of the disease. instead, a higher basic reproduction ratio suggests a higher probability of an outbreak, but the precise relationship depends on the specific model considered and the initial condition. algorithms have been developed that allow exact and approximate simulation of coupled reactions such as the closed sir (bartlett, ; gillespie, gillespie, , ). fig. shows simulated viral outbreaks under a stochastic sir and sis model with r % . in a population divided into three distinct subpopulations. note that there is no outbreak in ( ) although r > . deterministic epidemic models can be derived from the underlying jump process, and can represent useful macroscopic laws of motion in the appropriate limit. however such approaches are not adequate for modeling systems in which small numbers of individuals are frequently involved. for a similar reason, it is awkward to reconcile large-limit deterministic models with the small sample genealogies that are obtained with molecular phylogenetic approaches. therefore, stochastic continuous-time discrete-state formulations of epidemic models may be more suited to forming connections between the two disciplines. the forward simulations of a stochastic epidemic model introduced with fig. demonstrate the relationship between epidemic models and genealogies. knowing the exact parameters and resulting dynamics throughout the simulated outbreak, we can build a full transmission history for the outbreak (which is not unique given only the time evolution of the number of infected individuals, since at each event the infected individuals involved are chosen randomly). an infection event in the forward simulation corresponds to a bifurcation in the transmission tree. restricting the full tree to a ''sample genealogy'' that only includes the individuals that were infectious at a specific sampling time yields very different results for different times during the outbreak, which underlines the importance of sampling methods (see e.g., stack et al., ) . as we can see in the simulations, virus transmission often depends on spatial structure. the interaction among humans living in the same city, for example, differs from among-city interaction, which is important whenever viral transmission exceeds city borders. there are many other social and spatial units this concept applies to: households, schools, or on a larger scale, regions, countries and continents. in fact, most phylogenetic and epidemiological studies model the dynamics of spatially distributed systems, albeit many of them ignore spatial structure for the sake of simplicity. durrett and levin demonstrate that models ignoring spatial structure yield qualitatively different results than spatial models (durrett and levin, ) . phylodynamics is a term used to describe a synthetic approach to the study of rapidly evolving infectious agents that considers the action (and interaction) of both evolutionary and ecological processes. the term phylodynamics was introduced by grenfell et al. ( ) to describe the ''melding of immunodynamics, epidemiology, and evolutionary biology'' that is required to analyse the interacting evolutionary and ecological processes especially of rapidly evolving viruses for which both processes have the same time scale. two distinct pursuits have been labeled phylodynamics by recent studies. the first relies on the idea that ecological processes and population dynamics can effectively be tracked by neutral genetic variation, such that past ecological and population events are ''imprinted'' in genetic variation within populations and can be reconstructed along with the reconstruction of evolutionary history. the idea is sound for truly neutral variation, but the compact genomes of rapidly evolving viruses are not simple recording devices. instead they are packed with functional information and mutations play an active role in population and ecological processes through the action of darwinian selection. hence, the more challenging second phylodynamic pursuit is the analysis of the inevitable interaction of evolutionary and ecological processes that requires the joint analysis of both. we will call the former pursuit phylogenetic epidemiology, and reserve the term phylodynamics for approaches that aspire to model the interaction of ecological and evolutionary processes. the effect of novel mutations on population dynamics through their interaction with the immune system or anti-viral drugs are examples of phylodynamics in this stricter sense. the focus of many studies aspiring to combine population genetic and epidemiological approaches is the basic reproduction ratio r , estimates of which are used to develop containment strategies for emerging pandemics. such estimates can be obtained from phylogenetic analysis, e.g., through estimating population growth rates . another popular way to infer population dynamic information from genomic data is the application of parametric and non-parametric coalescent models (strimmer and pybus, ; drummond et al., ; minin et al., ) . phylogenetic methods can be used to estimate r , which can then be used to investigate transmission patterns and the number of generations of transmission. depending on the distribution of the generation time (i.e., the duration of infectiousness) the relationship between r and the growth rate r of the population can be used to compute the basic reproduction number (wallinga and lipsitch, ) . little is known about generation time distributions, the usual approach is to fit the epidemic models to the observed data. wallinga and lipsitch list the resulting equations for r for exponential, normal, or delta distributions of generation time. they show that without knowledge of the generation time distribution an upper bound for the reproductive number can still be estimated. others obtain r estimates based on coalescent theory, as for example (rodrigo et al., ) who estimated it in vivo for hiv- . in a recent study on the influenza a (h n ) outbreak in both epidemiological and bayesian coalescent approaches for the computation of r were applied (fraser et al., ) . whereas the epidemic approaches gave estimates of . - . for r , the bayesian coalescent approach yielded a posterior median of . . all estimates are larger than one, correctly indicating that the virus spreads successfully, rather than dying out. however, an agedependent heterogeneous epidemic model best fits the data and results in an estimate of r = . . structures determining host interaction are often modeled as contact networks (welch et al., ) . the transmission of foot and mouth disease virus is highly dependent on the interaction among farms and the detection of infected farms is essential. a plausible approach is to consider each farm as an individual in a contact network. through phylogenetic analysis of consensus sequences (one sequence for each farm) contacts between farms can be traced in order to find infected but non-detected farms such that contacts between farms can be traced in order to find infected but non-detected farms (cottam et al., ) . changes in effective population size estimated through phylogenetic analyses can indicate past changes in population size. therefore, many recent studies infer the demographic history of a virus using bayesian skyline plot models (drummond et al., ) . for example, (siebenga et al., ) are interested in the epidemic expansion of norovirus gii. which they investigate by reconstructing the changes in population structure using bayesian skyline plots. similarly, (hughes et al., ) explore the heterosexual hiv epidemic in the uk. analyses of the genomic and epidemiological dynamics of human influenza a virus explore the sink-source theory and investigate the spatial connections of a seasonal global epidemic (rambaut et al., ; lemey et al., b; bedford et al., ) . coalescent theory has also been adapted to fit an epidemic sir model to sequence data (volz et al., ). frost and volz ( ) provide an overview on how appropriate interpretation of coalescent rates differs among the different population dynamic approaches it is being used with. interpretation of the coalescent-based skyline plots must be made with caution. as opposed to generation times referring to durations of infection in epidemiological theory, for coalescent approaches being applied to infectious diseases the generation times usually describe times between transmission events. accordingly, although prevalence does affect phylogenetic reconstruction through sampling, the population dynamic patterns are mainly determined by incidence (frost and volz, ) . one early attempt to integrate dynamical and population genetic models used coupled differential equations and markov chain theory to model the within-host time evolution of viral genetic diversity under basic dynamic models of a persistent infection (kelly et al., ) . the main focus was the impact of the dynamical model on the variance in the number of replication cycles, as this is a key determinant of the rate of genetic divergence and thus potential for adaptation. interestingly, the model reveals that multiple cell type infections can decrease viral evolutionary rates and increase the likelihood of persistent infection. genetic diversity within hosts is closely related to between host dynamics: gordo and campos ( ) develop structured population genetic models, explicitly incorporating epidemiological parameters to analyze the relationship between genetic variability and epidemiological factors. a simple sis model is simulated based on two different models of host contact structure, the island model and a scale free contact network. for low clearance rates and low intrahost effective population size, levels of genetic variability turn out to be maximal when transmission levels are intermediate, independent of the host population structure. in a scale free contact network the population consists of many low-connectivity hosts and very few high-connectivity hosts, a common pattern for sexually transmitted diseases (e.g., lloyd and may, ; liljeros et al., ) . in this setting genetic variation appears to be lower in highly connected than in weakly connected hosts. with their study gordo and campos ( ) underline that an integration of population genetics and epidemiology can have important implications for public health policies. in a deterministic framework day and gandon ( ) model the interaction of evolutionary and ecological processes by coupling sis host dynamics with viral evolution. the interaction of evolution and ecology is incorporated through the fitness of each virus strain. for strain i they define a fitness r i = b i n s À l À v i À c, where b i is the strain-specific transmission rate per susceptible, v i is the strainspecific virulence (determining the increase in mortality rate due to infection), l is the baseline mortality rate and c is the recovery rate. the evolutionary dynamics of strain frequencies are tracked quantitatively and the evolutionary dynamics of strain frequencies are intimately linked with the overall infection dynamics of the host population via the strain-specific virulence and transmission rates. their analysis provides insight into the mechanistic laws of motion connecting genetic evolution with the evolution of virulence and transmission rates. an exceptional feature of influenza viruses is the limited genetic diversity which appears to contradict the viruses' high mutation rate. integrating single virus strain features and host immunity into a stochastic transmission model ferguson et al. ( ) search an explanation for this. although epidemiological factors play a role in limiting influenza diversity, strain-transcendent immunity must be relevant as well. through a phylodynamic analysis of interpandemic influenza in humans koelle et al. ( ) underline the importance of the viral structure for antigenicity and the immune recognition dynamics of influenza epitopes. they consider clusters that contain strains with similar conformations of ha epitopes such that there is high cross-immunity of strains within each cluster. a genotype-phenotype model that implements neutral networks (the clusters) is coupled with an epidemiological transmission model in which the number of susceptible, infected and recovered individuals in each cluster are modeled. model simulations result in time series of infected cases that agree with the typical annual outbreaks in temperate regions and empirical dominance of certain antigenic clusters. according to this model, years in which a formerly dominant cluster is replaced by a new one have the highest numbers of infections. in the following year there are particularly few infec-( ) ( ) ( ) fig. . simulated viral outbreak under stochastic sir ( - ) and sis ( ) model among three populations (denoted by blue, yellow and red curves). the initial condition is a single infected individual in the blue population. in ( ) the disease does not break out (numbers of susceptibles in dotted lines and infected in solid lines). (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) tions, presumably due to higher host immunity caused by the previous year's outbreak. thereafter follow ''average'' years until the next cluster-transition occurs, i.e., until another cluster becomes dominant again. another natural explanation of the contradiction between high mutation rates and constant genetic diversity is the fixation of many deleterious mutations that leads to the extinction of the respective strains. recent population genetic models account for population dynamics e.g., in order to enhance the understanding of allele fixation processes and the importance of demographic stochasticity (parsons and quince, ; champagnat and lambert, ; parsons et al., ) . structured models do not only allow for more realistic dynamics, they can also bridge the gap to phylogenetic/-geographic methods since most of them are sample-based, ideally, with each sample representing one infected individual. modeling coupled host-virus dynamics welch et al. ( ) embed an epidemic population model into a branching and coalescent structure, producing a scaled coalescent process that describes the inter-host dynamics given a virus sample genealogy. their simulations show that, for large sample sizes, the model provides accurate estimates of the contact rate and the selection parameter. overall, phylodynamic methods have been developed and proven useful for the analysis of various viruses. however, phylogenetic reconstruction is still quite restricted by coalescent assumptions. an alternative to the coalescent for cases in which sample sizes are big compared to the overall population is the birth-death with incomplete-sampling model (gernhard, ; stadler, ) , and this framework has recently been extended to include heterochronous data (stadler, ) , opening the way for an alternative approach to phylodynamic inference from timestamped virus data. bayesian phylogenetic inference has led to an explosion of analyses of rapidly evolving viruses in recent years. while this explosion has been fruitful in elucidating the manifold variation in origin, transmission routes and evolutionary rates underlying the present diversity of infection agents, there is a nascent field that promises to extend the conceptual reach of molecular sequence data, through a unification of phylogenetics and mathematical epidemiology. this new field of phylodynamics encompasses both inference of classical epidemiological parameters using phylogenetics as well as exciting new approaches that aim to investigate the consequences of the inevitable interaction between evolutionary (mutation, drift, darwinian selection) and ecological (population dynamics and ecological stochasticity) processes. the research being pursued has broader consequences for evolutionary biology and molecular ecology. this interaction of evolution and ecology will occur whenever a population contains genotypes with different intrinsic dynamical properties (e.g., virulence, transmission rates, recovery rates). whereas this condition is almost always met in real populations and frequently definitive in its role in shaping outcomes, the mathematical and theoretically analysis of darwinian selection within epidemiological models is the most challenging and least studied area within the emerging field of phylodynamics. it is thus ripe for future research. in the meantime, it is likely that phylodynamic research will rapidly develop new methods for statistical phylogeography and structured population dynamics. prospects for malaria eradication in sub-saharan africa infectious diseases of humans: dynamics and control effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan s ribosomal rna phylogeny measles periodicity and community size neutral evolution in spatially continuous populations a new model for evolution in a spatial continuum a new model for extinction and recolonization in two: dimensions quantifying phylogeography global migration dynamics underlie evolution and persistence of human influenza a (h n ) effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach a virus reveals population structure and recent demographic history of its carnivore host a highresolution genetic signature of demographic and spatial expansion in epizootic rabies virus thermodynamics of neutral protein evolution unifying vertical and nonvertical evolution: a stochastic arg-based framework stochastic epidemic models: a survey model selection and multimodel inference: a practical information-theoretic approach predicting the evolution of human influenza a history can matter: non-markovian behavior of ancestral lineages evolution of discrete populations and the canonical diffusion of adaptive dynamics stochastic modeling of nonlinear epidemiology integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus applying population-genetic models in theoretical evolutionary epidemiology on the definition and the computation of the basic reproduction ratio r in models for infectious-diseases in heterogeneous populations measurably evolving populations inference of viral evolutionary rates from molecular sequences bayesian random local clocks, or one rate to rule them all relaxed phylogenetics and dating with confidence estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data tools for constructing chronologies: crossing disciplinary boundaries beast: bayesian evolutionary analysis by sampling trees bayesian coalescent inference of past population dynamics from molecular sequences extinction times in autocatalytic systems the importance of being discrete (and spatial) a method for cluster analysis population genetic estimation of the loss of genetic diversity during horizontal transmission of hiv- demographic and environmental stochasticityconcepts and definitions using temporally spaced sequences to simultaneously estimate migration rates, mutation rate and population sizes in measurably evolving populations coalescent-based estimation of population parameters when the number of demes changes over time estimating population parameters using the structured serial coalescent with bayesian mcmc inference when some demes are hidden evolutionary trees from dna sequences: a maximum likelihood approach phylogenies and the comparative method accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci? ecological and immunological determinants of influenza evolution positive darwinian evolution in human influenza a viruses a codon-based model of host-specific selection in parasites, with an application to the influenza a virus who rapid pandemic assessment collaboration viral phylodynamics and the search for an 'effective number of infections origin of hiv- in the chimpanzee pan troglodytes troglodytes human infection by genetically diverse sivsm-related hiv- in west africa stochastic methods: a handbook for the natural and social sciences the conditioned reconstructed process molecular virology: was the pandemic caused by a bird flu the emergence of hiv/aids in the americas and beyond a general method for numerically simulating the stochastic time evolution of coupled chemical reactions approximate accelerated stochastic simulation of chemically reacting systems a likelihood method for the detection of selection and recombination using nucleotide sequences unifying the epidemiological and evolutionary dynamics of pathogens sampling theory for neutral alleles in a varying environment modeling the sitespecific variation of selection patterns along lineages the comparative method in evolutionary biology bayesian inference of population size history from multiple loci an african primate lentivirus (sivsmclosely) related to hiv- phylogenetic evidence for recombination in dengue virus the phylogeography of human viruses whole-genome analysis of human influenza a virus reveals multiple persistent lineages and reassortment among recent h n viruses gene genealogies and the coalescent process bayesian inference of phylogeny and its impact on evolutionary biology uk hiv drug resistance collaboration, rambaut, a.uk hiv drug resistance collaboration splitstree: analyzing and visualizing evolutionary data rates of molecular evolution in rna viruses: a quantitative phylogenetic analysis chimpanzee reservoirs of pandemic and nonpandemic hiv- modeling infectious diseases in humans and animals linking dynamical and population genetic models of persistent viral infection a contribution to the mathematical theory of infections the coalescent performance of a divergence time estimation method under a probabilistic model of rate evolution epochal evolution shapes the phylodynamics of interpandemic influenza a (h n ) in humans timing the ancestor of the hiv- pandemic strains the molecular population genetics of hiv- group o tracing the origin and history of the hiv- epidemic bayesian phylogeography finds its roots phylogeography takes a relaxed random walk in continuous space and time reconstructing the initial global spread of a human influenza pandemic: a bayesian spatial-temporal model for the global spread of h n pdm bats are natural reservoirs of sars-like coronaviruses animal origins of the severe acute respiratory syndrome coronavirus: insight from ace -s-protein interactions the web of human sexual contact genetic analysis of human h n and early h n influenza viruses, - : evidence for genetic divergence and multiple reassortment events how viruses spread among computers and people full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination sinauer associates phylogeography and molecular epidemiology of hepatitis c virus genotype in africa bayesian phylogenetic inference via markov chain monte carlo methods smooth skyride through a rough skyline: bayesian coalescent-based inference of population dynamics different subtype distributions in two cities in myanmar: evidence for independent clusters of hiv- transmission recent human influenza a (h n ) viruses are closely related genetically to strains isolated in the origin and global emergence of adamantane resistant a/h n influenza viruses multiple reassortment events in the evolutionary history of h n influenza a virus since dated ancestral trees from binary trait data and their application to the diversification of languages a method to correct for the effects of purifying selection on genealogical inference a continuous-state coalescent and the impact of weak selection on the structure of gene genealogies inference of demographic history from genealogical trees using reversible jump markov chain monte carlo slidingbayes: exploring recombination using a sliding window approach based on bayesian phylogenetic inference cross-species virus transmission and the emergence of new epidemic diseases fixation in haploid populations exhibiting density dependence i: the non-neutral case some consequences of demographic stochasticity in population genetics genetic history of hepatitis c virus in east asia the epidemic behavior of the hepatitis c virus the epidemiology and iatrogenic transmission of hepatitis c virus in egypt: a bayesian coalescent approach evolutionary analysis of the dynamics of viral infectious disease an integrated framework for the inference of viral population history from reconstructed genealogies estimating the rate of molecular evolution: incorporating noncontemporaneous sequences into maximum likelihood phylogenies the genomic and epidemiological dynamics of human influenza a virus inferring speciation times under an episodic molecular clock using non-homogeneous models of nucleotide substitution to identify host shift events: application to the origin of the 'spanish' influenza pandemic virus recombination in aids viruses coalescent estimates of hiv- generation time in vivo missing data in a stochastic dollo model for binary trait data, and its application to the dating of proto-indo-european highresolution molecular epidemiology and evolutionary history of hiv- subtypes in albania identification of breakpoints in intergenotypic recombinants of hiv type by bootscanning a nonparametric approach to estimating divergence times in the absence of rate constancy estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach simian immunodeficiency virus infection in free-ranging sooty mangabeys (cercocebus atys atys) from the tai forest a viral sampling design for testing the molecular clock and for estimating evolutionary rates and divergence times a bayesian phylogenetic method to estimate unknown sequence ages origins and evolution of aids viruses: estimating the time-scale phylodynamic reconstruction reveals norovirus gii. epidemic expansions and their molecular determinants a cladistic measure of gene flow inferred from the phylogenies of alleles origins and evolutionary genomics of the swine-origin h n influenza a epidemic analyzing the mosaic structure of genes protocols for sampling viral sequences to study epidemic dynamics on incomplete sampling under birth-death models and connections to the sampling-based coalescent sampling-through-time in birth-death trees exploring the demographic history of dna sequences using the generalized skyline plot paup⁄: phylogenetic analysis using parsimony (⁄ and other methods). version on the overdispersed molecular clock statistical models of the overdispersed molecular clock origin and biology of simian immunodeficiency virus in wild-living western gorillas estimating the rate of evolution of the rate of molecular evolution human immunodeficiency viruses: siv infection in wild gorillas sequence analysis of a highly divergent hiv- -related lentivirus isolated from a wild captured chimpanzee phylodynamics of infectious disease epidemics a statistical phylogeography of influenza a h n how generation intervals shape the relationship between growth rates and reproductive numbers statistical inference to advance network models in epidemiology integrating genealogy and epidemiology: the ancestral infection and selection graph as a model for reconstructing host virus histories the re-emergence of h n influenza virus in : a cautionary tale for estimating divergence times using biologically unrealistic sampling dates purifying selection can obscure the ancient age of viral lineages dating the age of the siv lineages that gave rise to hiv- and hiv- direct evidence of extensive diversity of hiv- in kinshasa by island biogeography reveals the deep history of siv evolution in mendelian populations maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods bayesian phylogenetic inference using dna sequences: a markov chain monte carlo method historical perspective-emergence of influenza a (h n ) viruses key: cord- -hf fgtnp authors: vashi, yoya; jagrit, vipin; kumar, sachin title: understanding the b and t cell epitopes of spike protein of severe acute respiratory syndrome coronavirus- : a computational way to predict the immunogens date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: hf fgtnp the novel severe acute respiratory syndrome coronavirus- (sars-cov- ) outbreak has caused a large number of deaths, with thousands of confirmed cases worldwide. the present study followed computational approaches to identify b- and t-cell epitopes for the spike (s) glycoprotein of sars-cov- by its interactions with the human leukocyte antigen alleles. we identified peptide stretches on the sars-cov- s protein that are well conserved among the reported strains. the s protein structure further validated the presence of predicted peptides on the surface, of which are surface exposed and predicted to have reasonable epitope binding efficiency. the work could be useful for understanding the immunodominant regions in the surface protein of sars-cov- and could potentially help in designing some peptide-based diagnostics. also, identified t-cell epitopes might be considered for incorporation in vaccine designs. emerging severe acute respiratory syndrome coronavirus- (sars-cov- ) is a recent pandemic and has been declared as a public health emergency by the world health organization ((who, b) . the disease rapidly spread across the globe and caused havoc to humanity (wu and mcgoogan, ) . by the start of may, sars-cov- had spread to countries and infected over , , people (who, a) . the who is continuously monitoring and updating health-related plans to curtail the disease spread. the absence of a specific treatment and vaccine worsens the situation and threatens the world. the international committee on taxonomy of viruses (ictv), classified sars-cov- under the family coronaviridae of order nidovirales. the genomic sequence of sars-cov- isolated from the bronchoalveolar lavage fluid of a patient from wuhan, china showed a length of , nucleotides (genbank accession number nc_ . sars-cov- contains a positive-sense single-stranded rna with ˊ and ˊ untranslated region. the genome codes for orf a, orf b, spike (s), orf a, orf b, envelope (e), membrane (m), orf , orf a, orf b, orf , orf b, orf , nucleocapsid (n), and orf from ˊ to ˊ zhu et al., ) . the s glycoprotein forms a homotrimer and mediates viral entry into host cells. the s protein is a potential target for therapeutic and vaccine design against sars-cov- infection in humans (li, ; tortorici et al., ) . the s glycoprotein comprises two functional subunits: the s subunit is responsible for binding to the host cell receptor and the s subunit is responsible for fusion of the virus with the cell membrane. usually in covs, s is cleaved at the boundary between s and s subunits, which remain non-covalently bound in the prefusion conformation, to activate the protein for membrane fusion via extensive irreversible conformational changes (burkard et al., ; park et al., ; walls et al., ) . setting it apart from other sars-covs, it is found that the s glycoprotein of sars-cov- harbors a j o u r n a l p r e -p r o o f furin cleavage site at the boundary between the s /s subunits (walls et al., ) . by now, it is evident that sars-cov- s uses angiotensin-converting enzyme (ace ) receptormediated entry into cells. some studies suggest similar binding affinities to human ace with the s protein of sars-cov- and sars-cov (letko et al., ; walls et al., ) . however, some suggest that sars-cov- binds ace with higher affinity than sars-cov (tai et al., ; wang et al., ; wrapp et al., ) . as the situation worsens, there is a growing need for the development of suitable therapeutics, vaccines, and other diagnostics against sars-cov- for effective disease management strategies. vaccines and diagnostic assays based on peptides have become increasingly substantial and indispensable for their advantages over conventional methods (li et al., ; mohanraj et al., ) . the present study aimed to locate appropriate epitopes within a particular protein antigen that can elicit an immune response and could be selected for the synthesis of an immunogenic peptide. using a computational approach, the s glycoprotein of sars-cov- was explored to identify various immunodominant epitopes for the development of diagnostics and vaccines. besides, the results could also help us to understand the sars-cov- surface protein response towards t-and b-cells. the amino acid sequences (n= ) of s protein available at the time of study on targeted sars-cov- were downloaded from the national centre for biotechnological information (ncbi) database. to identify an immunodominant region, it is of extreme importance to select the conserved region within the s protein of sars-cov- . all the sequences were compared j o u r n a l p r e -p r o o f among themselves for variability using the protein variability server by the shannon method (garcia-boronat et al., ) . the average solvent accessibility (asa) profile was predicted for each sequence using the sable server (adamczak et al., ) . bepipred . linear epitope prediction module incorporated in immune epitope database (iedb) was used to predict potential epitopes within the s protein (haste andersen et al., ; larsen et al., ; ponomarenko and bourne, ; vita et al., ) . the fasta sequence of the targeted protein was used as an input for all the default parameters. we used two web-based tools for b-cell epitope prediction: the iedb and abcpred servers (saha and raghava, ) . s protein structure from the protein data bank (pdb, vsb) was analyzed for linear and discontinuous b-cell epitopes using the ellipro module on the iedb server with default settings (ponomarenko et al., ; wrapp et al., ) . also, the abcpred server was used to detect b-cell epitopes using the artificial neural network (ann) method. t-cell epitopes with a binding affinity towards mhc-i and mhc-ii alleles were selected to boost up both cytotoxic t-cell and helper t-cell mediated immune response. iedb server was used to predict the major histocompatibility complex (mhc)-i and mhc-ii binding epitopes for the targeted protein. the reference set of alleles was used for predicting the mhc-i and mhc-ii t-cell epitopes (karosiene et al., ; nielsen et al., ; nielsen et al., ; peters and sette, ; sturniolo et al., ) . in our study, we targeted the s glycoprotein of sars-cov- as it is present outside the virus and interacts with the host receptor. at the time of the study, there were j o u r n a l p r e -p r o o f sequences available for the targeted protein of sars-cov- . the s glycoprotein sequence is , amino acids long, except for that of the virus isolated from kerala (india), which is a , amino acid long s glycoprotein (genbank accession number mt ). our interest here was to determine conserved regions first and then determine surface-exposed regions, which are potential epitopes to generate an immune response. we found that sequences among all the s proteins in the analysis are least variable and highly conserved, as shown in asa value are more surface exposed compared to others. we identified a total of peptides of varying lengths, which were selected based on high asa values (table ). the potential epitope regions were predicted using the sequence of the s protein of sars-cov- that showed the least variability (genbank accession number nc_ ). the potential epitopes are represented by blue peaks, while green-colored slopes represent non-epitopic regions ( figure ). the existence of b-cell linear and discontinuous (conformational) epitopes within the identified segments could help us to identify the peptides, which can elicit an immune response (purcell et al., ) . we identified linear epitopes, predicted by ellipro (iedb), which contained regions from of our selected peptides (highlighted in red in table ). these identified b-cell linear epitopes were placed based on their positional value and scores. epitopes with high scores have more potential for antibody binding. five of our selected j o u r n a l p r e -p r o o f peptides (peptide numbers , , , , and in table ) were not considered as potential linear b-cell epitopes. some parts of our identified epitopes were in accordance with epitopes recognized in an earlier study (ahmed et al., ) , which further supports the credibility of our identified epitopes. using the same module, b-cell discontinuous epitopes were predicted, which gave epitope regions that contained regions from of our selected peptides (highlighted in red in table s ). six peptides (peptide numbers , , , , , and in table ) were not predicted as discontinuous b-cell epitopes. to further confirm, we used the abcpred server to detect b-cell epitopes, with a default threshold of . . it identified various epitopes with different lengths and scores. out of those, the regions that contained our selected peptides are highlighted in red in table . a high score represents good binding affinity with epitopes; most of our peptides scored more than . and were predicted as linear b-cell epitopes. we used the iedb server to determine the binding affinity for the human leucocyte antigen (hla). as recommended by the iedb server, reference hla allele sets were used for the prediction of mhc-i and mhc-ii t-cell epitopes, as they provide comprehensive coverage of the population. all the predictions were made using iedb recommended procedures. the list of binding affinities for mhc-i t-cell epitopes is given in table s , where low rank represents high binding affinity. similarly, the list of binding affinities for mhc-ii t-cell epitopes are given in table . regions from our selected peptides are highlighted in red. the epitopes with rank < % for very high binding affinity were selected. we also observed that some of the peptides we identified as potential b-cell epitopes were present as t-cell epitopes with good binding affinities. overall, it was found that the regions identified in table not only had good b-cell and t-cell affinities, but the majority of them had also overlapped with discontinuous epitopes (table s ). the peptide segments identified from the set of sequences of the j o u r n a l p r e -p r o o f sars-cov- s glycoprotein appear to hold reasonable potential to act as immunogens. peptide-based diagnostics and vaccines have previously been proposed against virus outbreaks (dey et al., ; ichihashi et al., ; navalkar et al., ; oany et al., ; zhao et al., ) . the availability of a d structure ( vsb) of the sars-cov- s glycoprotein provided an opportunity to inspect the predicted peptides. placement of the peptide segments identified by asa and conserved sequence analysis on the s glycoprotein showed that of the regions we identified lie on the surface (figure ) . in order to limit recognition and evade the immune response of the host, coronaviruses use conformational masking and glycan shielding xiong et al., ) . sars-cov- s trimer also exists in multiple distinct conformational states, which is necessary for receptor engagement, leading to the initiation of fusogenic conformational changes (walls et al., ) . the considerable number of peptides at the surface region of the s glycoprotein allows for the potential use of those peptide regions as immunogens. binding to the ace receptor is a critical initial step for the sars-cov- in entering target cells. recent studies have also pointed out the vital role of ace in mediating the entry of sars-cov- (hoffmann et al., ). receptor binding motif (rbm) is part of the receptor-binding domain (rbd) of sars-cov- , which contains most of the contacting residue for ace- binding (lan et al., ) . it was observed that some of our identified peptides from table (peptide no. - ) fall in the regions of rbd (amino acid no. - ) and rbm (amino acid no. - ), which makes them potential peptide regions to be used. the emergence of new viral diseases like sars-cov- represents a substantial global disease burden. over the past few months, there have been increased research efforts for the design and development of diagnostics and vaccines for sars-cov- . some related analyses have been reported in distinct, parallel studies (baruah and bose, ; bhattacharya et al., ; grifoni et al., ) . our study leverages the available resources and computational j o u r n a l p r e -p r o o f methods and adds to the ongoing research focused on the development of diagnostics and vaccines against sars-cov- . other than already existing ones, we have identified a further number of peptides, which adds to the library of peptides that are likely to be recognized by human immune responses. facilitated by high mutation rates, traditional vaccines based on antibody-mediated protection are often poor inducers of t-cell responses and can have limited success (rosendahl huber et al., ) . peptide-based sensitive and rapid diagnostic kits are considered a better alternative to the conventional serological tests, including whole antigenic protein (mohanraj et al., ) . in our study, we predicted both b-cell and t-cell epitopes for conferring immunity in different ways. we speculate that the identified epitopes with considerably good epitope binding efficiency have the potential to be an immunodominant peptide. the study could help us to use the predicted peptide as an immunogen for the development of diagnostics and vaccines against sars-cov- . in the present study, peptide segments were identified on s proteins for the development of diagnostics and vaccines against sars-cov- . the recent availability of d data on -cov s glycoprotein has helped the search. sars-cov- , being an rna virus, has a high mutation rate and undergoes active recombination (yi, ) . although the peptides identified are ideal candidates as immunogens for the development of peptide-based diagnostics and vaccines, more refinement and lab trials are essential steps that are yet to be undertaken for early development before the identified epitopes are rendered obsolete. numbers. high asa value means the solvent accessibility score is relatively higher for that region and it is more surface exposed with respect to its neighbours. netmhccons: a consensus method for the major histocompatibility complex class i predictions structure of the sars-cov- spike receptor-binding domain bound to the ace receptor functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses structure, function, and evolution of coronavirus spike proteins peptide vaccine: progress and challenges peptide based viral detection systems for effective diagnosis of common viral infections in india peptide based diagnostics: are random -se quence peptides more useful than tiling proteome sequences? prediction of mhc class ii binding affinity using smmalign, a novel stabilization matrix alignment method reliable prediction of t-cell epitopes using neural networks with novel sequence representations design of an epitope -based peptide vaccine agai nst spi ke protein of human coronavirus: an in silico approach proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method antibody-protein interactions: benchmark datasets and prediction tools evaluation more than one reason to rethink the use of peptide s in vaccine design t cell re sponse s to viral infections -opportunities for peptide vaccination prediction of continuous b-cell epitopes in an antigen using re curre nt neural network generation of tissue -specific and promiscuous hla ligand databases using dna microarrays and virtual hla class ii matrices characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine structural basis for human coronavirus attachment to sialic acid receptors the immune epitope database (iedb): update structure, functi on, and antigenicity of the sars-cov- spike glycoprotein. cell tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion unexpected receptor functional mimicry elucidates activation of coronavirus fusion structural and functional basis of sars-cov- entry by using human ace who director-general's opening remarks at the media briefing on covid- - cryo-em structure of the -ncov spike in the prefusion conformation a new coronavirus associated with human respiratory disease in china characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china glycan shield and fusion activation of a deltacoronavirus spike glycoprotein fine-tuned for enteric infections novel coronavirus is undergoing active recombination screening of specific diagnostic peptides of swine hepatitis e virus a novel coronavirus from patients with pneumonia in china figure . our selected peptides are highlighted on spike protein of sars-cov- protein structure downloaded from pdb (id: vsb). ltpgdsssgwtag kynengtitd qtsnfrvqptes vrqiapgqtgkiadynyklpdd nsnnldskvggnyn lkpferdisteiyqagstpcngveg qsygfqptngvgyq patvcgpkkstnl rdiadttdavrdpqt vitpgtntsnq hadqltptwrvystgsnvfqtrag ehvnnsye syqtqtnsprrarsvasqs gaensvaysnnsia tgiaveqdkntqe iyktppikdfgg ilpdpskpskrs nntvydplqpeldsfke dkyfknhtspdvdlgdisg kfdeddsepvlkg key: cord- - dz bu authors: zhai, bintao; niu, qingli; liu, zhijie; yang, jifei; pan, yuping; li, youquan; zhao, hongxi; luo, jianxun; yin, hong title: first detection and molecular identification of borrelia species in bactrian camel (camelus bactrianus) from northwest china date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: dz bu comprehensive epidemiological surveys for lyme disease have not been conducted for the bactrian camel in china. in this study, a total of blood specimens collected from bactrian camels from zhangye city in gansu province and yili and aksu in xinjiang province, china, were examined for the presence of borrelia spp. species-specificity nested pcr based on the s- s rrna, ospa, flab and s rrna genes revealed that the total positive rate of borrelia spp. was . % ( / , % ci = . – . ). these results were confirmed by sequence analysis of the positive pcr products or positive colonies. this is the first report of borrelia pathogens in camels in china. two borrelia species that cause lyme disease and one that causes relapsing fever were identified in the camel blood samples by sequencing. the findings of this study indicate that the bactrian camel may serve as a potential natural host of lyme disease and/or relapsing fever in china. borrelia species are distributed throughout the world and are maintained in nature within various arthropod vectors and mammalian, avian or reptilian hosts (brisson et al., ; vollmer et al., ) . in humans, borrelia spp. are the causative agents of a major disease: lyme borreliosis (lb) (mainly caused by b. garinii, b. afzelii, and b. burgdorferi sensu stricto). lb is also an important disease of domestic animals and wildlife worldwide. lb-group spirochetes, commonly known as b. burgdorferi s.l., cause one of the most significant natural zoonosis diseases that is carried and transmitted by ixodes spp. ticks (wodecka et al., ) . there are at least genospecies of b. burgdorferi s.l., which are classified based on their genetic differences. more than six of these genospecies have been reported in china (b. burgdorferi s.s., b. garinii, b. afzelii, b. valaisiana, b. sinica and b. yangtze) (yu et al., ) . b. garinii is the main genospecies and is distributed mainly in northern china, while b. burgdorferi s.s. is widely distributed in south china (chen et al., ; wan et al., ) . b. burgdorferi s.l. has been detected in more than mammalian species and seven genera of birds (li, ) . studies have shown that in addition to humans, at least six other taxa of mammals (sheep, cattle, horses, dogs, cats and mice) and two types of birds (seabirds and migratory birds) can be infected by borrelia spirochetes in china (keesing et al., ) . previous research used antibodies to detect the borrelia spp. antigen within camel sera in egypt and reported a positive rate of . % (helmy, ) . the genus camelus contains three species: camelus dromedaries (onehump dromedary), camelus bactrianus (two-hump bactrian camel), and camelus bactrianus ferus (two-hump bactrian camel). dromedaries are mainly found in the arabian peninsula, the middle east, and parts of africa, whereas bactrian camels are mainly located in central and northeast asia, northern china, and mongolia . camelus bactrianus ferus is a new species of camel, and it is mainly distributed in china and mongolia (guo, ) . currently, there are an estimated , camels, which are mainly distributed in xinjiang and the inner mongolian autonomous region in china (feng, potential spreaders of lyme disease in china. the exposure of camels to b. burgdorferi s.l. complex was investigated using pcr assay and sequencing. the sequences of the s- s rrna, ospa, flab and s rrna genes obtained from positive dna samples were analysed, and the ticks that potentially act as vectors were discussed. the bactrian camel is free-choice grazing animal that inhabits desert regions. a total of blood specimens from bactrian camels were collected during may and november in two lb-endemic localities at three sites in northwest china (gansu (zhangye) and xinjiang (ili and aksu)) ( fig. ) . each blood sample was collected from the jugular vein of the camel into a sterile tube containing an anticoagulant (ethylene diamine tetraacetic acid, edta). genomic dna was extracted from individual specimens using a commercial qiaamp dna blood kit (qiagen, maryland, usa) and a qiaamp dna mini kit (qiagen, hilden, germany) according to the manufacturer's instructions. the extracted genomic dna samples were then stored at − °c until use. a nested pcr for the detection of b. burgdorferi s.l. was carried out using four independent sets of species-specific primers to amplify the s- s rrna, ospa, flab and s rrna genes, as described in previous studies (zhang et al., ; postic et al., ; guy and stanek, ; wodecka, ; ni et al., ; zhai et al., ) . the primer sequences are shown in table . first-round pcr reactions were performed in a thermocycler (biorad, hercules, ca, usa) with a total volume of μl containing . μl of × pcr buffer (mg + plus), . μl of each dntp at . mm, . u of taq dna polymerase (takara, dalian, china), . μl of template dna, . μl of each primer ( pmol), and . μl of distilled water. the pcr conditions were as follows: min of denaturation at °c; cycles of °c for s, annealing for s (annealing temperatures of primers are listed in table ), and °c for - s (depending on amplicon size); with a final extension step at °c for min. nested pcr reactions included μl of the first-round pcr product as template for another cycles with the same parameters and annealing temperature profile as described above and in table . to avoid cross-contamination and sample carryover, pre-and post-pcr processing and pcr amplification was performed in separate rooms. b. garinii sz genomic dna (from lanzhou veterinary research institute) was used as a positive control, while distilled water was used as a negative control. pcr products were separated by . % agarose gel electrophoresis, and some positive amplicons from the second round of pcr amplification were directly used for sequencing (genewiz, inc. beijing china) or cloned into pgem-t easy vector (promega, madison, wi, usa). positive colonies were then selected for pcr amplification and sequencing. all the sequences obtained in this study were subjected to blast search via the ncbi website (https://blast.ncbi.nlm.nih.gov/blast.cgi) using the blastn program. multiple sequence alignment was executed in florence corpet (http://multalin.toulouse.inra.fr/multalin/). phylogenetic analysis was performed using mega . software (tamura et al., ) . phylogenetic trees of borrelia spp. strains were constructed using all the sequences generated in this study and related sequences previously deposited in genbank to show the relationships between different strains. a % confidence interval ( % ci) for the overall prevalence value of each borrelia spp. strain was calculated using ibm spss statistics version . . all sequences of borrelia spp. s rrna (including sequences fig. . phylogenetic tree of the s rrna gene sequences of borrelia species obtained in the present study and those deposited in genbank from different countries; accession numbers are shown after isolate names. the s rrna gene sequences obtained in this study are indicated by bold triangles. the tree was inferred using the neighbour-joining method of mega . ; bootstrap values are shown at each branch point. numbers above the branch reflect bootstrap support from replications. all sites of the alignment containing insertions-deletions or missing data were eliminated from the analysis (option "complete deletion"). from ili city and sequences from aksu city) were deposited in genbank under the following accession numbers: ky -ky (among them, aku = aku - , aku = aku - , aku = aku - , and aku = aku - ). the sequences of the borrelia spp. s- s rrna gene, including sequences from zhangye city and sequences from aksu city, were deposited in genbank under the following accession numbers: ku -ku . two sequences of the borrelia spp. flab gene from aksu city were deposited in genbank with the following accession numbers: ku -ku . two sequences of the borrelia spp. ospa gene from zhangye city were deposited in genbank with the following accession numbers: ku and ky . the blood samples collected from a total of bactrian camels in the field two chinese provinces were screened for the presence of borrelia spp. by nested pcrs based on four gene loci. the samples were amplified, and the pcr products had lengths of - bp for the four genes. the results of the nested pcr amplification for positive sample screenings are summarized in table . of these specimens, tested positive for s- s rrna, tested positive for ospa, tested positive for flab and tested positive for s rrna, with positive rates of . % ( % ci = . - . ), . % ( % ci = . - . ), . % ( % ci = . - . ) and . % ( % ci = . - . ), respectively. (the sample zy was detected from both s- s rrna and ospa, the samples aku - and aku - were detected from both s- s rrna and s rrna, and the samples aku - and aku - were detected fig. . phylogenetic tree of the s- s rrna gene sequences of borrelia species obtained in the present study and those deposited in genbank from different countries; accession numbers are shown after isolate names. the s- s rrna gene sequences obtained in this study are indicated by bold triangles. the tree was inferred using the neighbour-joining method of mega . ; bootstrap values are shown at each branch point. numbers above the branch reflect bootstrap support from replications. all sites of the alignment containing insertions-deletions or missing data were eliminated from the analysis (option "complete deletion"). from both flab and s rrna). at least one b. burgdorferi s.l. gene was detected in of the blood samples examined ( . %, % ci = . - . ). sequence analysis of the positive pcr products of the genes assayed in this study revealed that the sequences were most similar to those of b. garinii based on the s rrna, s- s rrna and flab gene sequences ( %, - %, and - % identity, respectively) (genbank accession numbers: cp , dq and cp ) and were also similar to b. burgdorferi ( %, %, and - % identity, respectively) (genbank accession numbers: af , kp and cp ). interestingly, a novel borrelia genospecies (genbank accession number: ky ) was identified from the ili region, which had a high identity with b. theileri kat ( . %, genbank accession number: kf ), which was detected from rhipicephalus geigyi, and with borrelia sp., with . % identity (genbank accession number: ab ), which was detected from haemaphysalis ticks collected from wild sika deer (cervus nippon yesoensis) from hokkaido, japan, based on s rrna. phylogenetic trees were constructed based on the identified borrelia spp. s rrna (n = ), s- s rrna (n = ), flab (n = ) and ospa (n = ) gene sequences by the neighbour-joining method using the software mega . ( fig. - ) . the phylogenetic tree based on s rrna sequences indicated that the s rrna gene sequences ( from aksu, from ili) from our study formed three distinct clades. the five aku (aksu) strains cluster within a sub-clade of one of three main clades, forming a sister clade with the b. garinii s rrna sequences from china. interestingly, the strains from the ili area belonged to two different clades, with two sequences of b. garinii and b. burgdorferi belonging to the ld borrelia spp. group and one sub-clade of b. theileri belonging to rf borrelia spp. group. in general, the results show the presence of high heterogeneity among the s rrna sequences of the different borrelia species strains (fig. ) . the s- s rrna sequences formed two distinct clades: strains ( from aksu and from zhangye) formed two distinct sub-clusters in the borrelia s- s rrna phylogenetic tree (fig. ) . the s- s rrna sequences from the aksu strains (aku - = aku of s rrna, aku - = aku of s rrna) and one zhangye borrelia spp. strain belong to one clade of the same branch, which are sister to b. garinii s rrna. one zhangye strain of the s- s rrna sequence was located within the same branch (fig. ) . a phylogenetic tree was constructed based on all the borrelia flab sequences deposited in genbank and two sequences (aku - = aku of s rrna and aku - = aku of s rrna) obtained in this study. the flab sequences from borrelia spp. formed two main clades. the fig. . phylogenetic tree of the flab gene sequences of borrelia species obtained in the present study and those deposited in genbank from different countries; accession numbers are shown after isolate names. the flab gene sequences obtained in this study are indicated by bold triangles. the tree was inferred using the neighbour-joining method of mega . ; bootstrap values are shown at each branch point. numbers above the branch reflect bootstrap support from replications. all sites of the alignment containing insertions-deletions or missing data were eliminated from the analysis (option "complete deletion"). sequences of the two aksu strains clustered together with the b. garinii flab sequences (fig. ) . two distinct clades were formed from the sequences of the borrelia spp. ospa phylogenetic tree. the sequences of two strains from zhangye clustered with the b. burgdorferi ospa sequences (fig. ) . in china, lyme disease is caused by various borrelia spirochetes. many of these agents are highly pathogenic to both humans and animals (liu et al., ) . previous studies reported the prevalence of borrelia in field-collected blood samples from cattle, sheep, dogs, rabbits and rats from different areas in china. these studies primarily used serological detection methods and showed that the distribution of borrelia varied considerably in the different areas (hou et al., ; wan et al., ) . the areas of zhangye city, gansu province, and of ili city and aksu city of xinjiang province all include desert regions and are located along the old silk road, halfway between eastern asia and europe, in areas where international livestock trade and travel were frequent (takada et al., ) . bactrian camels were important transportation for trade and travel in the desert within these regions. the bactrian camel can harbour and spread many zoonoses, such as middle east respiratory syndrome coronavirus (mers-cov), anaplasma, toxoplasma gondii, onchocerca, trypanosoma evansi, and parabronema skrjabini luo, ; wang et al., ; yang et al., ; yang et al., ) . ticks are one of the most significant vectors of borrelia burgdorferi s.l. and rf (relapsing fever). domestic animals, rodents, and many other wild animals host ticks, and animals bitten by infected ticks can acquire the pathogen and serve as natural reservoirs. the detection of b. burgdorferi s.l. using pcr is an alternative method that can be used to improve the control and prevention of lyme disease. according to the nested pcr results, field-collected blood samples assayed with primers targeting the s rrna, s- s rrna, flab, and ospa genes revealed ( . %), ( . %), ( . %) and ( . %) positive samples, respectively, from three regions of two provinces in china where these camels live. according to our knowledge, this is the first report of borrelia spp. infection in camels in china, indicating their reservoir role in the maintenance of this organism in the environment. the s rrna gene sequences of borrelia spp. detected from the aksu region had the highest infection rate ( . %), followed by the ili region ( . %). the genetic identity of b. burgdorferi fig. . phylogenetic tree of the ospa gene sequences of borrelia species obtained in the present study and those deposited in genbank from different countries; accession numbers are shown after isolate names. the ospa gene sequences obtained in this study are indicated by bold triangles. the tree was inferred using the neighbour-joining method of mega . ; bootstrap values are shown at each branch point. numbers above the branch reflect bootstrap support from replications. all sites of the alignment containing insertions-deletions or missing data were eliminated from the analysis (option "complete deletion"). spirochetes can be clarified by their differential reactivity with genospecies-specific pcr primers targeting the s- s rrna intergenic spacer amplicon gene. genetic heterogeneity should be further classified by analysing longer sequence data among b. burgdorferi strains that have been previously identified as the same genospecies of atypical strains of borrelia spirochetes (mathiesen jr et al., ; postic et al., ) . moreover, the s rrna gene was detected at a higher positive rate in blood examined for borrelia spirochetes than other genes in previous research (wodecka et al., ) . our study showed that the infection rate of these genes decreased in the order s rrna > s- s rrna > flab > ospa. two borrelia species, b. garinii and b. burgdorferi s.s., were identified, and b. garinii was found to be widely distributed in camels in china. in the present study, b. garinii and b. burgdorferi s.s. were identified in camels from aksu and ili in xinjiang province. sequence and phylogenetic analysis revealed that those isolates were closely related to the corresponding genotypes based on s rrna gene with high sequence similarities ( . %- %, genbank accession numbers: cp ; %, genbank accession numbers: ay ), although the bootstrap values of the phylogenetic tree were relatively low. this finding suggested the genetic diversity of b. garinii and b. burgdorferi s.s in different hosts and geographic locations. interestingly, the sequencing of cloned pcr products from the s rrna gene of borrelia spp. from the ili region showed the presence of a new borrelia species belonging to the relapsing fever group. the s rrna gene sequence of borrelia sp. obtained from camel has a . %, . % and . % similarity to the gene of b. theileri kat strain, borrelia sp. d and borrelia sp. _ _hjf, respectively (genbank accession numbers: kf , ab and ab ). b. theileri belongs to the rf borrelia spp. group and is the causative agent of bovine borreliosis. it was initially identified in cattle and subsequently in goats, sheep and deer from africa, south america, mexico and australia (l, ; mathiesen jr et al., ) . most of the rf borrelia spp. are transmitted by soft-bodied ticks, but b. theileri is found in hard-bodied ticks and is transmitted by rhipicephalus spp., including r. annulatus, r. decoloratus, r. microplus and r. evertsi (barbour et al., ; smith et al., ; trees, ) . this study provides the first report of b. theileri in camel blood samples in china. at present, b. burgdorferi has been isolated from nine ixodes ticks: i. acutitarsus, i. persulcatus, i. granulatus, h. longicornis, h. bispinosa, h. concinna, h. formosensis, boophilus microplus and d. silvarum (niu et al., ) . a previous study reported that the borrelia isolates were isolated from d. marginatus collected from camels in xinjiang, china, and these isolates were genetically identified as b. burgdorferi sensu stricto . the blood samples from bactrian camels in this study were donated by dr. li, who reported that there are ticks available to be collected from these bactrian camels that have been identified as belonging to h. asiaticum, h. dromedarii, r. sanguineus group, and d. niveus . thus, these tick species might act as potential vectors to carry and transfer borrelia spp. that cause camel borreliosis in china. further study is required to determine whether these ticks are competent vectors for borrelia spp. in conclusion, we successfully identified infection with borrelia spirochetes from camel blood samples from different geographic locations of gansu province and xinjiang province in china. b. garinii and b. burgdorferi s.s. were highly prevalent in the sampling areas of the two provinces surveyed. further studies concerning the prevalence of borrelia spp. groups for both lyme disease and relapsing fever should be performed to confirm the presence of these different borrelia species in camels within china. our findings suggest that borrelia infection in camels could potentially present a concern for public health. more detailed and widespread monitoring of tick populations and the screening for borrelia in a greater variety of hosts are warranted in future studies. horizontally acquired genes for purine salvage in borrelia spp. causing relapsing fever genetics of borrelia burgdorferi tick-borne pathogens and associated coinfections in ticks collected from domestic animals in central china. parasites vectors the prospects on camel milk industry development in china origin and evolution on camel detection of borrelia burgdorferi in patients with lyme disease by the polymerase chain reaction seasonal abundance of ornithodoros (o.) savignyi and prevalence of infection with borrelia spirochetes in egypt rats, the primary reservoir hosts of borrelia burgdorferi, in six representative provinces hosts as ecological traps for the vector of lyme disease sur la spirillose des bovidés research progress on animal of lyme disease anaplasma infection of bactrian camels (camelus bactrianus) and ticks in xinjiang studies on relation between lyme disease infection and human, livestock, rodents absence of middle east respiratory syndrome coronavirus in bactrian camels in the west inner mongolia autonomous region of china: surveillance study results from genetic heterogeneity of borrelia burgdorferi in the united states lyme borreliosis caused by diverse genospecies of borrelia burgdorferi sensu lato in northeastern china progress on lyme disease in china diversity of borrelia burgdorferi sensu lato evidenced by restriction fragment length polymorphism of rrf ( s)-rrl ( s) intergenic spacer amplicons expanded diversity among californian borrelia isolates and description of borrelia bissettii sp. nov. (formerly borrelia group dn ) borrelia theileri: isolation from ticks (boophilus microplus) and tick-borne transmission between splenectomized calves lyme disease borrelia spp. in ticks and rodents from northwestern china mega : molecular evolutionary genetics analysis version . the transmission of borrelia theileri by boophilus annulatus (say, ) host migration impacts on the phylogeography of lyme borreliosis spirochaete species in europe preliminary investigation on lyme disease in animals in provinces, cities and autonomous regions of china toxoplasma gondii infection in bactrian camel (camelus bactrianus) in china a broad-range survey of ticks from livestock in northern xinjiang: changes in tick distribution and the isolation of borrelia burgdorferi sensu stricto significance of red deer (cervus elaphus) in the ecology of borrelia burgdorferi sensu lato a comparative analysis of molecular markers for the detection and identification of borrelia spirochaetes in ixodes ricinus investigation of parabronema skrjabini disease of camels in inner mongolia region lamp for detection of trypanosoma evansi in the camels research progress on lyme disease identification and molecular survey of borrelia burgdorferi sensu lato in sika deer (cervus nippon isolation of borrelia burgdorferi in ixodes from four counties, in north xinjiang key: cord- - oqzsd authors: domanska-blicharz, katarzyna; lisowska, anna; sajewicz-krukowska, joanna title: molecular epidemiology of infectious bronchitis virus in poland from to date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: oqzsd the presence of infectious bronchitis virus (ibv) was identified for the first time in the poultry population in poland at the end of the s. from this time a few waves of epidemics caused by different ibv variants spread across the country. in order to gain more insight into the molecular epidemiology of ibv in poland, in the present study the s coding region of ibv isolates and nearly whole genome of strains collected over a period of years was characterized. phylogenetic analysis showed that these strains belonged to five recently established ibv lineages: gi- , gi- , gi- , gi- and gi- . additionally, two strains from and formed a separate branch of the phylogenetic tree categorized as unique early polish variants, and one strain was revealed to be the recombinant of these and gi- lineage viruses. irrespective of year of isolation and s -dependent genotype, the genome sequences of polish ibv strains showed the presence of six genes and orfs: ′utr- a- b-s- a- b-e-m- b- c- a- b-n- b- ′utr, however their individual genes and putative proteins had different lengths. the phylogenetic analyses performed on the genome of ten polish ibv strains revealed that they cluster into different groups. the polish gi- , gi- and gi- strains cluster with other similar viruses of these lineages, with the exception of the two strains from and which are different. it seems that in poland in the s and s ibv strains with a unique genome backbone circulated in the field, which were then replaced by other strains belonging to other ibv lineages with a genome backbone specific to these lineages. the recombination analysis showed that some polish strains resulted from a recombination event involving different ibv lineages, most frequently gi- and gi- . infectious bronchitis virus (ibv) is the etiological agent of a highly contagious disease of chickens known as infectious bronchitis, but the virus can replicate in epithelial cells of different organs, also affecting the urogenital or digestive tracts beside the respiratory tract (cavanagh, (cavanagh, , . together with genetically similar viruses isolated from other domesticated galliformes, ibv belongs to the igacovirus subgenus within the gammacoronavirus genus (nidovirales order, cornidovirinae suborder, coronaviridae family, orthocoronavirinae subfamily). the nonavian sw gammacoronavirus isolated from beluga whales was recently assigned to the separate cegacovirus subgenus (dong et al., ; king et al., ) . the virus genome is an approximately kb long single-stranded, positive-sense rna consisting of several open reading frames (orfs). two thirds of the genome in the ′ end are occupied by two overlapping orfs encoding viral rna-dependent rna polymerase. the a and b orfs encode non-structural polypeptides (nsp - ) which are associated with rna replication and transcription. in the ′ end are genes that among other products encode the four major structural proteins: spike (s), envelope (e), matrix (m), and nucleocapsid (n). the s glycoprotein is post-translationally cleaved into s and s subunits of about and amino acids during viral maturation. the s subunit anchors the spike into the virus membrane whereas s forms the extracellular part of the spike and plays a major role in tissue tropism and induction of protective immunity (cavanagh and gelb, ) . ibv undergoes many genetic changes generated both by recombinations and mutations such as substitutions, deletions and insertions, which could lead to the emergence of new variants. among factors that create favorable conditions for such events are characteristic features of coronaviruses in the genome structure (large singlestranded rna) and virus biology (minimal proofreading activity of viral polymerase) and modern poultry-rearing habits and immunological pressure caused by the worldwide use of vaccines (ovchinnikova et al., ; woo et al., ) . mutations within the s gene particularly result in new geno-or serotypes, and currently there are many such types around the world (de wit et al., ) . their number, diversity and naming and the plurality of methods used for their determination for years have caused much confusion. to avoid it, new classification rules based on the whole s gene phylogeny (about nt) and new nomenclature have been proposed. this system distinguished and named lineages, aggregating into genotypes (gi to gvi) (valastro et al., ) . however, in the last three years, two more lineages (gi- and ) and even one more genotype (gvii) have been described in china jiang et al., ; ma et al., ) . in poland, the first suspicion of ib was based on clinical observations, as respiratory symptoms incurable with antibiotics in some flocks and/or misshapen eggs from commercial flocks came to notice. laboratory confirmation of ibv infection was obtained at the end of the s. between and , sera from two hundred ten chicken flocks at the age of - months were examined in an agar gel precipitation test and only % of them were positive although % of flocks contained birds with positive serum (karczewski and cakala, ) . outbreaks of ib with respiratory signs and a drop in egg production and egg quality in non-vaccinated breeding and particularly laying chicken flocks were recorded in the mid- s (bugajak et al., ) . since the mid- s, outbreaks of ib-nephritis have been reported in broiler flocks (minta et al., ) . a multiplex-pcr testing strains isolated between and revealed that the most of them belonged to the b type (capua et al., ) . the emergence of qx ibv was detected in (domanska-blicharz et al., ; domanska-blicharz et al., ) . more recently, the next variant of ibv called var which had been circulating only in the middle-east region for the previous years was also detected in poland . in this study we attempted to molecularly characterize the field ibv strains detected in poland during the period between and . strain determination was accomplished by phylogenetic analysis of the full s coding region sequences against reference strains representing all genotypes and lineages recently described (valastro et al., ) . additionally, we also analyzed the complete genome sequences of ten polish ib viruses. recombination analysis was also performed using the obtained sequences of these strains. thirty four field ibv strains isolated between and in poland were included in the study. these strains originated from poultry experiencing clinical forms of the disease as respiratory or enteric symptoms, nephritis, or problems with egg production. epidemiological information of the studied isolates is summarised in table . the samples were named to fulfill the previously described criteria, but to make it easier to follow the results of the analysis, in subsequent parts of the text they were shortened to the individual symbol given in the laboratory and the year of identification (ducatez, ) . the earliest virus materials from the s were available in the form of a lyophylizate of allantoic fluids from commercial chicken eggs. after propagation in spf embryos, materials from the s were in the form of allantoic fluids stored deep frozen. field materials delivered to the department of poultry diseases for diagnostic purposes between and were isolated in specific pathogen-free (spf) chicken eggs as described previously (gelb and jackwood, ) . virus genome presence confirmation and genotype determination preceded the spf egg isolation. materials from to as referred to above were also refreshed using the virus isolation method on spf embryonating eggs. harvested allantoic fluids were processed using an rneasy mini kit (qiagen, hilden, germany) according to the manufacturer's recommended procedure for rna extraction, and isolated rna was stored at − °c until analysis. the rt-pcrs were conducted on the one-step model using the one step rt-pcr kit (qiagen, hilden, germany) according to the manufacturer's instructions. various combinations of primer pairs described recently as well as additional primers specifically constructed for some strains (appendix a supplementary material) were applied for amplification and sequencing of the whole s coding region (binns et al., ; boursnell et al., ; dolz et al., dolz et al., , lisowska et al., ; worthington et al., ) . the reactions were run according to the recommended protocol for the kit with different annealing temperatures depending on the melting temperature of the primer pair used. amplified pcr products were visualized by electrophoresis on a % agarose gel stained with ethidium bromide and then purified using a qiaquick gel extraction kit (qiagen, hilden, germany). typically, for the s coding region of polish ibv strains, - pcr products were sequenced in both directions using sanger sequencing technology by genomed (warsaw, poland). the complete genomes of these ibv strains were generated using illumina miseq technology (illumina, san diego, usa) in several laboratories. the five ibv strains / , / , / , g / and g / were processed in the department of microbiology of the swedish national veterinary institute (sva, uppsala, sweden), four subsequent virus strains / , / , / and g / were analyzed in the department of omics analysis in our institute, and one ibv strain g / was sequenced by genomed (warsaw, poland). analyses in these organizations were made according to the standard procedure. briefly, rna extracted directly from the allantoid fluid was retrotranscribed into dna using a superscript iv first-strand cdna synthesis kit (invitrogen, waltham, usa) and the second strand was synthesized with the addition of klenow polymerase (new england biolabs, ipswich, usa). a bp-long paired-end dna library was prepared using a nextera xt sample preparation kit (illumina, san diego, usa) and sequencing was performed using a miseq reagent kit v (illumina, san diego, usa). sequences of s coding region fragments obtained by sanger sequencing were trimmed based on quality and assembled into consensus sequences using geneious v . . (biomatters, auckland, new zealand). sequences of polish viruses were searched with blast (basic local alignment search tool) to find these ones with the highest similarity and include them in the phylogenetic analyses. then the full s sequences were aligned with sequences representing lineages in their ibv genotype groupings and unique variants as valastro et al. recommended (valastro et al., ) using clustal w. additionally, eight sequences representing the two newly identified gi- and gi- lineages and one gvii- genotype were included in the analysis. sequencing data from miseq technology obtained from the sva (uppsala, sweden) and genomed were processed with the clc genomics workbench (qiagen, hilden, germany). the reads obtained in the department of omics analysis of our institute were assembled into contigs with the spades assembler using the website at http://spades.bioinf. spbau.ru (bankevich et al., ) . for phylogeny of the complete genomes of the polish ibv strains, a preliminary analysis was carried out using all gammacoronaviruses available in the niaid virus pathogen database and analysis resource (vipr) through the website at http://www.viprbrc.org/ (pickett et al., ) . next, strains were selected for further analysis, taking into account their clustering in the vipr analysis. alignments of nucleotide sequences were performed using the multiple alignment using fast fourier transform (mafft) method in geneious software, v . . (biomatters, auckland, new zealand). the alignments were then exported to the mega program, v . . (tamura et al., ) . maximum likelihood (ml) phylogenetic analyses of the s coding region and of the complete genome were then conducted using the best-fitting nucleotide substitution models (the lowest bayesian information criterion (bic) scores in each analysis were for the general time reversible (gtr) model and a discrete gamma distribution (+g) with five rate categories, assuming that a certain fraction of sites are evolutionarily invariable (g + i)). bootstrap analyses of the resultant trees were performed using replicates. to detect any recombination events in the analyzed sequences, rdp software v . was used (martin et al., ) . the full s coding region sequences of ibv strains were screened to check if unusual clusters formed by polish ibv strains are viruses representing real new ibv lineage or recombinants. ten full genomes of polish ibv strains were also analyzed for recombination events using the complete genomes of representative viruses selected for analysis as described above. the rdp analysis was accomplished using different available methods with their default parameters, however recombination events were only considered proven if detected by at least seven programs (rdp, geneconv, bootscan, maxchi, chimaera, siscan and seq) and the p-value was calculated at below . × e− . full s sequences of the analyzed polish ibv isolates as well as complete genomes of ten of them were submitted to the genbank database and accession numbers were assigned as given in table . the nearly full genome sequences of ten ibv strains were obtained with the ′ and ′utr fragments incomplete. in all genomes the analysis predicted genes consisting of open reading frames (orfs) with a typical order for ibv of ′utr- a- b-s- a- b-e-m- b- c- a- b-n- b- ′utr, but with their individual genes and putative proteins having different lengths ( table ). the orfs/proteins with a constant conservative amount of nt and amino acids (aa) were a ( nt/ aa), b ( nt/ aa), and n ( nt/ aa). the next orfs/proteins of conservative length were b, c and a counting nt/ aa, nt/ aa and nt/ aa respectively in strains, whereas in two ibvs each of these structures was different (longer or shorter by one nt codon/aa). the difference in orf/protein length of accessory b protein was also slight as it fell within - nt/ - aa. the most diverse in terms of length was the orf coding e protein, ranging from to nt ( - aa) in polish gi- and gi- ibv strains to - nt ( - aa) in polish gi- ibvs. a similar relationship was also observed in the case of the orf encoding m protein, which was the shortest ( - nt/ - aa) in ibv strains of gi- and g- lineages and the longest ( nt/ aa) in ibvs of gi- lineage. the orf of the s protein was of varying lengths from nt/ aa to nt/ aa and did not show any dependence for length on identified ibv lineages. the molecular relatedness extents of the compete genome and the individual orfs between polish and selected ibv strains were - % (appendix b -b supplementary material). genotyping based on phylogenetic analysis of full s coding region sequences of polish ibv strains from the years - grouped (caption on next page) k. domanska-blicharz, et al. infection, genetics and evolution ( ) them into six groups: five distinct, previously known lineages and an additional new one (fig. ) . five isolates comprised of four early ones from the s and one identified in were assigned to the gi- lineage. one strain identified in affiliated to the gi- lineage. the group of gi- lineage contained eight ibv strains: two from the late s isolated between and , two identified between and , and four strains isolated after . the gi- lineage comprised ten strains detected between and and the group of gi- lineage held eight isolates detected between and . two isolates, / and / , were in the separate cluster designated early polish on the phylogenetic tree. sequence analysis revealed that five polish gi- strains shared nucleotide identities of . - % and formed two clades. in one clade were the dutch h , north american, south african, indian and four polish isolates but the strain / formed a distinct branch in this gi- subtree with low nucleotide identity of . - . % to the rest of this group. the analysis of the s coding region sequences of eight polish isolates of gi- lineage showed that four strains formed a common branch, sharing . - . % nt identity with the / strain from the united kingdom, whereas two earlier strains from and which were similar to each other with . % identity form a sister group with the more recent viruses and had . - . % nt similarity with the / strain. the earliest polish gi- strains from and were visibly different and had nt identity of . - . % to the rest of the g- ibv strains. one of them, strain / , occupied positions close to the israeli variant strain from with identity of . %, and its similarity to the moroccan gi- lineage prototype g strain from was . %. the similarity of the other, strain / , was . % and % to strains from israel and morocco, respectively. the gi- lineage contained only one polish strain, g / , which shared . % nt identity with the dutch d virus. eight polish strains were in the gi- lineage and the similarity of their s coding region sequences was between . and . %. their identity with the pathogenic israeli is/ / strain from showed as from . to . %. the subtree of gi- lineage contained polish qx strains in two branches, of which one contained nine strains with nt identity of . - . % to the european qx prototype dutch l- k/ ibv. one strain, g / , with similarity to the previous ones of . - . %, constituted the offshoot branch. the two polish strains / and / , isolated at an interval of years from each other, formed a separate branch in the phylogenetic tree and they shared . % nt identity. the phylogenetic analysis of the analyzed full ibv genomes showed that ten polish ibv strains grouped into four phylogenetic groups (fig. ) . two early strains, / and / , clustered together with massachusetts-like strains (mass , peafowl/gd/kq / and ses ab- ) showing the highest nt identity of . % with the sequence of the prototype gi- lineage beaudette strain. the two most recent strains, g / and g / , clustered with ibvs of gi- lineage. the sequence identities of g / and g / to the previously described polish gammacov/ck/poland/g / strain were . and . %, respectively. the three polish strains / , g / and g / were in the same cluster as other qx strains from europe and africa and were distantly related to chinese qx ibvs analyzed in this study (sdzb and p ). they had nucleotide similarity to each other of . - . % and were located in two subclades. a polish strain from clustered together with swe/ / , the first described full-genome ibv strain of qx type in europe, with similarity of . %. the other two polish qx strains were . % similar to each other and formed a common branch on the phylogenetic tree. three early polish strains / , / and / were in a separate branch on the phylogenetic tree and showed nucleotide similarity with each other in the range of . - . %. recombination analysis of all aligned full s sequences was performed to assess the existence of possible recombinants among the analyzed polish ibv strains, especially those with less obvious membership to the lineage, i.e. / , / and / . our analysis identified only one s coding region which resulted from recombination events and it belongs to the / strain; this event having taken place was supported by seven different methods with a very good global ka p-value of . e− . we confirmed this recombination breakpoint with phylogenetic trees. the region from to nt of the / ibv strain clustered together with / and / isolates, the viruses which formed the separate early polish cluster on the full s coding region phylogenetic tree (fig. a) . in turn, the region from to nt clustered with viruses belonging to gi- lineages (strains / , / , ibv india and h ) (fig. b) . the relevant s coding region fragments of the other ibv strains analyzed in this study grouped in the same way as they did in the phylogenetic analysis of the entire, intact s coding region. to check if any of the analyzed genomes of polish ibv strains result from recombination events, their sequences were thoroughly examined using the rdp program. our analysis revealed many such events. however, we selected five of them identified in six strains and they were supported with seven different methods (rdp, geneconv, bootscan, maxchi, chimaera, siscan and seq) and a very good global ka p-value below . x e− (table ) the retrospective phylogenetic analysis of ibv included field strains collected over a period of years, between and . we investigated the s coding region of ibv strains and the whole genome of ten strains. polish ibv strains showed different molecular features of the s coding region allowing their genotype or lineage to be determined, and their appearance in time reflects the history of ibv epidemics in europe (de wit et al., ) . the plot showing the timeline of various ibv lineages's detection and introduction of different vaccines to poultry population in poland is given in fig. . the first identified ibv isolates in europe belonged to the mass type. in the netherlands they were diagnosed in the middle of the s and one of them was even attenuated for vaccine development purposes (bijlenga et al., ) . the first ibv material available in our laboratory originates from and its s sequence displayed the features of gi- lineage, although no information was provided about the disease symptoms observed in the chicken flock where it was identified. later on, especially in the middle of the s, numerous cases of a drop in egg production were recorded. the problem was so serious that polish veterinary authorities decided to allow the first ib vaccine introduction, but only for immunization of commercial layer flocks. the health problems in layers were significantly mitigated, but at the end of the s respiratory fig. . phylogenetic tree of the s gene of reference and polish ibv strains (bold underlined letters). the tree was constructed using mega using the maximum likelihood method based on the gtr + g + i model and bootstrap replicates (bootstrap values shown on the tree). to make the tree clearer visually, branches with ibv lineages only distantly correlated with studied polish strains are collapsed. problems and mortality manifested in broiler chicken flocks, and so the vaccination of chickens of this production type was also started (minta et al., ) . we thoroughly examined four virus strains from that time, / / , / / , / / and / / . two of them, strains / / and / , have the s structure typical of the gi- lineage. similarly, their entire genomes revealed the highest identity to masslike strains such as h or beaudette. these two viruses came from broiler chickens on farms in the silesia region separated by only a few kilometers. the two other viruses, / and / , were identified in broilers delivered to the laboratory near the same period (in june ) but from farms about km from the previous ones. however, our investigation revealed a distinction between them. phylogenetic analysis of the full s coding region showed that strain / forms a separate branch on the tree designated as early polish and was classified as a unique variant of ibv within the gi genotype. deep analysis of the / strain strongly suggests that its s coding region was created as the result of a recombination event between the mass-like strains and the unique early polish variants circulating in the field at that time. it should be emphasized that the identified recombination breakpoint ( nt) was in the intermediate region between highly variable regions (hvrs) and and hvr previously described as the most frequent locations of variations between ib viruses, and moreover, it exactly matches the breakpoint ( and nt) of recombinants between viruses of gi- and gi- lineages (valastro et al., ) . the introduction of vaccines based on the mass-like strains for chicken immunization significantly reduced the economic losses caused by ib. this state of ib control lasted for about years until , when ib disease inducing kidney damage appeared, caused by b-like ibv strains. the first case of nephritis was in -week-old broilers in the south of poland. the birds showed signs of severe enteritis and the observed gross lesions were congested tracheas and lungs and swollen and pale kidneys with the presence of urine. in subsequent months, further broilers with nephritis were provided for diagnostic purposes and the diseased flocks from which they came were located in all regions of poland; most of them had not been vaccinated against ib, but some had been immunized with mass-like vaccine in the first days of life. the strains identified at that time, / and / , have an s sequence similar to ibv strains of gi- lineage. surprisingly, one of the first isolates known to cause nephritis, strain / together with strain / inflicting respiratory disorders, were located in the early polish branch of the phylogenetic tree with well supported uniformity to others (bootstrap value of ) (hillis and bull, ) . the next polish gi- strains identified between and revealed the highest nt similarity to the s sequence of the / strain contained in the most commonly used vaccine in poland at that time (adzhar et al., ) . the next epidemic wave of ib in poland was caused by qx strains. the first report of disease induced by this virus type was published in , but our studies showed the presence of the virus in poland in (domanska-blicharz et al., ) . it was from this year that qx strains were first identified in holland, germany, belgium and france, and next year they became dominant in some of these countries phylogenetic tree of the s gene fragment between potential recombination breakpoints and (a) and and (b) among ibvs included in the analysis. sequences of polish ibv strains are marked with bold underlined letters and recombinants with black dots. the tree was constructed using mega using the maximum likelihood method based on the gtr + g + i model and bootstrap replicates (bootstrap values shown on the tree). to make the tree clearer visually, branches with ibv lineages only distantly correlated with studied polish strains are collapsed. (worthington et al., ) . the analysis of the migration history of gi- strains suggests that most european ones came from a single introduction from china, which then spread in european countries, evolving in them separately, since they tend to cluster by country. however, the genetic variability of gi- ibvs sometimes identified within countries suggests subsequent introduction of the virus in epidemic waves (franzo et al., ) . the division of polish gi- strains into four clusters could reflect a separate introduction or epidemic wave of this virus variant into the country. the last large epidemic wave of ib was caused by gi- (var ) strains. the first strain of this lineage was identified in december and in the following months it was the most common virus type detected in field samples delivered to our laboratory for diagnostic purposes apart from strains of b . the s coding region of most polish gi- strains is in the same phylogenetic cluster, except for strain g / , which constitutes a separate one and could result from a separate virus introduction or from its intensive evolution. the single polish g / virus strain of gi- lineage analyzed in our study was identified in a -week-old broiler flock vaccinated with poulvac ib primer so it is highly probable that the identified strain originated from vaccine virus. although ball et al. (ball et al., ) showed that after vaccination of -day-old broilers with this vaccine only rna of the mass strain was detected in tissues and swabs and explained it through the higher replication potential of the mass virus. it cannot be ruled out that, as some other ibv strains are, the d virus is deposited in the body of chickens (possibly in the cecal tonsils) and after some time it is shed with cloaca (alexander and gough, ; naqi et al., ) . it should be noted that the ibv strains discussed here in detail are those that caused the greatest losses in polish poultry farming. during this period, strains of other genotypes and lineages also circulated in the field but their detection or type determination was not possible using available methods. the comprehensive studies of polish ibv isolates from to using serological and molecular tests conducted in cooperation with italian researchers showed that the b type was a major component of the ibv population in poland during this period, but serologically the presence of /i isolates was also identified and one isolate even showed no serological cross-reaction in an hi test nor amplification in rt-pcr (capua et al., ) . recently, the /i and q types were determined to affiliate to the gi- lineage, which has been present in europe (italy) since and persists until now (franzo et al., ) . in the period - numerous cases of d ibv (gii- lineage) were detected, however, in subsequent years ( ) ( ) , the number of d -positive samples dropped to . %, and currently we do not detect these viruses at all (domanska-blicharz et al., ; domanska-blicharz et al., ) . the complete genome sequences of all polish field ibv strains showed the presence of six genes and orfs in the order previously reported, irrespective of the year of first isolation (abolnik, ; gomaa et al., ; hewson et al., ) . most accessory proteins are conservative in their lengths, in contrast to the structural ones which differ by even as much as aa (m protein of gi- / and gi- ). an interesting observation is clustering of polish ibvs based on the complete genome sequences. the earliest strains / and / belonging to the gi- lineage cluster with other representatives of this lineage such as the beaudette and dutch h strains, which could indicate the common origin of mass-like viruses. the viruses from the next epidemic wave are / and / , and they were on the separate branch of the phylogenetic tree together with the strain / . the / virus was located on the s coding region phylogenetic tree with the / strain in the separate ibv branch of early polish ibv. on the other hand, the third virus of this separate group, / , found its place in the gi- lineage on the phylogenic tree based on the s coding region. thorough analysis using rdp software revealed that the s gene of this virus was acquired from ibv k. domanska-blicharz, et al. infection, genetics and evolution ( ) strains of gi- lineage during a recombination event. the results suggest that in the late s and s two ibv variants circulated in the polish poultry population: gi- and unique early polish ones. these viruses differ not only in the s coding region, which is the basis for the differentiation of lineages, but also in the remaining part of the genome. the viruses with such a genome backbone recombined with other viruses that were donors of the s coding region. grouping on a phylogenetic tree based on the complete genome of the other five polish strains was as expected. three g- strains, / , g / and g / , took positions among other qx-like viruses from europe (sweden and italy) and africa (south africa and sudan) (abolnik, ; abro et al., ; ducatez et al., ; naguib et al., ) . in turn, two strains of gi- lineage, the viruses g / and g / , were in the branch with the previously characterized polish g / and iranian is- strains of gi- lineage isolated in . it seems that in poland in the s and s ibv strains with a unique genome backbone circulated in the field, which were then replaced by strains belonging to other ibv lineages with a genome backbone specific to these lineages. in addition to the aforementioned recombination, five such events were also identified in polish ibv strains. three strains of the gi- lineage had orf a and orf b which revealed a high frequency of recombination events with / and sdzb -like strains (qx type strains from china from ). two strains of the gi- lineage exhibited recombination with the italy/ / -type ibv, and a similar recombination pattern was also previously indicated . in conclusion, phylogenetic analysis performed on the s coding region of polish ibv strains collected during a -year period showed that these strains belonged to five recently established ibv lineages: gi- , gi- , gi- , gi- and gi- . additionally, two strains formed a separate branch of the phylogenetic tree described as unique early polish variants and one strain revealed itself to be the recombinant of gi- lineage viruses and these unique early polish variants. the phylogenetic analyses performed on the complete genome of ten polish ibv strains showed that they cluster into different groups. polish gi- , gi- and gi- strains cluster with other similar viruses of these lineages, with the exception of the strains from to which are different. the recombination analysis showed that polish strains are a mosaic of different parental viruses most likely resulting from recombination events involving different ibv lineages, most frequently gi- and gi- . it should be also stressed that the major epidemics of ib in poland appeared every - years: gi- in , gi- in , gi- in and gi- in . these subsequent ibv lineages could have reached chickens in poland in various ways: carried by wild birds, or as a result of international trade, including uncontrolled movement of animals across borders (domanska-blicharz et al., ; hussein et al., ; kahya et al., ) . despite the apparent regularity in the appearance of subsequent ib epidemics, it is absolutely impossible to predict when the next one will appear. the most important impediment to prediction are the visible climate changes forcing changes in bird behavior, but another is the extraordinary intensification of the poultry industry in poland. taken as a whole, the molecular characteristics of polish ibvs presented here could help to understand the origin, spread and evolution of ib viruses in europe and the rest of the world. none. genomic and single nucleotide polymorphism analysis of infectious bronchitis coronavirus emergence of novel strains of avian infectious bronchitis virus in sweden molecular analysis of the /b serotype of infectious bronchitis virus in great britain a long-term study of the pathogenesis of infection of fowls with three strains of avian infectious bronchitis virus infectious bronchitis vaccine virus detection and part-s genetic variation following single or dual inoculation in broiler chicks spades: a new genome assembly algorithm and its applications to single-cell sequencing development and use of the h strain of avian infectious bronchitis virus from the netherlands as a vaccine: a review comparison of the spike precursor sequences of coronavirus ibv strains m and / with that of ibv beaudette completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus występowanie zakaźnego zapalenia oskrzeli u kur w polsce w latach - (the occurrence of infectious bronchitis virus between - in poland) co-circulation of four type of infectious bronchitis virus ( /b, /i, b and massachusetts) coronaviruses in poultry and other birds coronavirus avian infectious bronchitis virus infectious bronchitis identification and molecular characterization of a novel serotype infectious bronchitis virus (gi- ) in china antigenic and molecular characterization of isolates of the italy infectious bronchitis virus genotype molecular epidemiology and evolution of avian infectious bronchitis virus in spain over a fourteen-year period new variant of ibv in poland molecular studies on infectious bronchitis virus isolated in poland d -like genotype of infectious bronchitis virus responsible for a new epidemic in chickens in poland detection and molecular characterization of infectious bronchitis-like viruses in wild bird populations specific detection of gii- lineage of infectious bronchitis virus detection of a novel and highly divergent coronavirus from asian leopard cats and chinese ferret badgers in southern china recommendations for a standardized avian coronavirus (avcov) nomenclature: outcome from discussions within the framework of the european union cost action fa : "towards control of avian coronaviruses: strategies for vaccination, diagnosis and surveillance characterization of a new genotype and serotype of infectious bronchitis virus in western africa think globally, act locally: phylodynamic reconstruction of infectious bronchitis virus (ibv) qx genotype (gi- lineage) reveals different population dynamics and spreading patterns when evaluated on different epidemiological scales gi- lineage ( /i or q ), there and back again: the history of one of the major threats for poultry farming of our era a laboratory manual for the isolation and identification of avian pathogens complete genomic sequence of turkey coronavirus infectious bronchitis viruses with naturally occurring genomic rearrangement and gene deletion an empirical-test of bootstrapping as a method for assessing confidence in phylogenetic analysis sequence analysis of infectious bronchitis virus is/ like strain isolated from broiler chicken co-infected with newcastle disease virus in egipt during genome characterization, antigenicity and pathogenicity of a novel infectious bronchitis virus type isolated from south china presence of is/ / genotype-related infectious bronchitis virus in breeder and broiler flocks in turkey serological study of the infectious bronchitis virus occurrence in poland changes to taxonomy and the international code of virus classification and nomenclature ratified by the international committee on taxonomy of viruses first characterization of a middle-east gi- lineage (var -like) of infectious bronchitis virus in europe novel genotype of infectious bronchitis virus isolated in china detecting and analyzing genetic recombination using rdp enzzotic infectious bronchitis in broilers nephropathogenic form of infectious bronchitis in broiler chickens. proceedings of xi congress of ptnw full genome sequence analysis of a newly emerged qx-like infectious bronchitis virus from sudan reveals distinct spots of recombination establishment of persistent avian infectious bronchitis virus infection in antibody-free and antibody-positive chickens molecular characterization of infectious bronchitis virus isolates from russia and neighbouring countries: identification of intertypic recombination in the s gene vipr: an open bioinformatics database and analysis resource for virology research mega : molecular evolutionary genetics analysis version . s gene-based phylogeny of infectious bronchitis virus: an attempt to harmonize virus classification infectious bronchitis virus variants: a review of the history, current situation and control measures coronavirus diversity, phylogeny and interspecies jumping a reverse transcriptase-polymerase chain reaction survey of infectious bronchitis virus genotypes in western europe from the authors wish to acknowledge dr. siamak zohari and karin ullman (department of microbiology, national veterinary institute -sva, uppsala, sweden) and dr. ewelina iwan and arkadiusz bomba (department of omics analysis, national veterinary research institute, puławy, poland) for their support while conducting ngs. we also acknowledge justyna opolska for her help in molecular diagnostic tests.an ethical statement is not required as samples from animals were delivered to our laboratory by the owners or veterinarians for diagnostic purposes. chickens on the farms were under the supervision of appropriate persons, who took different samples as part of their routine work.this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . key: cord- - rdiosxd authors: cuevas, josé m.; combe, marine; torres-puente, manoli; garijo, raquel; guix, susana; buesa, javier; rodríguez-díaz, jesús; sanjuán, rafael title: human norovirus hyper-mutation revealed by ultra-deep sequencing date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: rdiosxd human noroviruses (novs) are a major cause of gastroenteritis worldwide. it is thought that, similar to other rna viruses, high mutation rates allow novs to evolve fast and to undergo rapid immune escape at the population level. however, the rate and spectrum of spontaneous mutations of human novs have not been quantified previously. here, we analyzed the intra-patient diversity of the nov capsid by carrying out rt-pcr and ultra-deep sequencing with , -fold coverage of stool samples from symptomatic patients. this revealed the presence of low-frequency sequences carrying large numbers of u-to-c or a-to-g base transitions, suggesting a role for hyper-mutation in nov diversity. to more directly test for hyper-mutation, we performed transfection assays in which the production of mutations was restricted to a single cell infection cycle. this confirmed the presence of sequences with multiple u-to-c/a-to-g transitions, and suggested that hyper-mutation contributed a large fraction of the total nov spontaneous mutation rate. the type of changes produced and their sequence context are compatible with adar-mediated editing of the viral rna. noroviruses (novs) are one of the most common causes of foodborne viral gastroenteritis, infecting over million people worldwide every year. symptoms typically last to h, but complications can occur in immunocompromised patients, resulting in an estimated , deaths per year mainly among elderly people and young children in developing countries (patel et al., ; robilotti et al., ) . novs are positive-stranded rna viruses belonging to the family caliciviridae and, similar to other rna viruses, they exhibit extremely high levels of genetic diversity (debbink et al., ) . novs have evolved into seven highly divergent genogroups (gi-gvii), which are in turn divided into genotypes. the prototypic norwalk virus belongs to genotype gi. , but gii. has become the most prevalent genotype in the last decades, being responsible for the majority of outbreaks (white, ) . the most variable nov genome regions are located in the surface-exposed p domain of the capsid (vp ) protein, which determines antibody escape (lindesmith et al., ; white, ) . differences in genetic diversity and evolution rates among nov genotypes have been attributed to multiple factors, including random genetic drift, receptor usage, the structural plasticity of the vp protein, and replication fidelity (bull and white, ; donaldson et al., ) . however, and despite its purported importance for evolution, immune escape, and the development of efficient control strategies, the rate of spontaneous mutation of human novs has not been experimentally determined. the genetic diversity of rna viruses is ultimately driven by their extremely high rates of spontaneous mutation, which are orders of magnitude higher than those of dna-based microorganisms and range from − to − per nucleotide per round of copying (lauring et al., ; sanjuán et al., ) . such high mutation rates are commonly attributed to the low replication fidelity of rna virus polymerases, since these lack ′ exonuclease activity in all viral families examined except coronaviruses (smith and denison, ; ulferts and ziebuhr, ) . however, editing of the viral genome by host-encoded proteins is another possible source of mutations. double-stranded rna-specific adenosine deaminases (adar) have been suggested to edit the genome of a variety of negative-stranded rna viruses, including measles virus (cattaneo et al., ) , human parainfluenza virus (murphy et al., ) , respiratory syncytial virus (martínez and melero, ) , lymphocytic choriomeningitis virus (zahn et al., ) , and rift valley fever virus (suspène et al., ) , and the apolipoprotein b mrna editing enzyme, catalytic polypeptide-like cytidine deaminase family (apobec ) is known to edit hiv- (desimmie et al., ; moris et al., ; santa-marta et al., ) , hepatitis b virus (suspène et al., ) , papillomaviruses , and herpesviruses (suspène et al. ) . in hiv- , % of spontaneous mutations in vivo are produced by apobec , whereas only % are attributable to the viral reverse transcriptase (cuevas et al., ) , but the relative contribution of the viral polymerase and host-mediated editing is unknown for most other viruses. here, we have analyzed the intra-patient genetic diversity of a region of the nov capsid vp by performing ultra-deep sequencing of stool samples obtained from infected patients. unexpectedly, we found a small number of hyper-mutated sequences carrying large numbers of a-to-g or u-to-c base transitions that were not attributable to sequencing errors. in this sense, since the per-base error rate of the employed technology is ca. / (jünemann et al., ) , it is extremely improbable to find multiple mutations in a single read. the natural genetic diversity of novs has been studied previously within individual patients (nilsson et al., ; obara et al., ; vega et al., ) , within defined outbreaks (dingle, ; holzknecht et al., ; sasaki et al., ) , or at larger geographic and temporal scales (bodhidatta et al., ; carlsson et al., ; cotten et al., ; kobayashi et al., ; vega et al., ) . however, this diversity depends on multiple factors other than spontaneous mutation rates, including natural selection, the number of replication rounds elapsed and random genetic drift, among others. to discard these confounders and focus on spontaneous mutations, we used a cell culture system in which human cells are transfected with an infectious cdna clone (asanaka et al., ; katayama et al., ) . since these cells do not support viral attachment and entry, this system restricts viral replication to a single infection cycle. whereas this is generally viewed as a limitation, single-cycle viral replication was convenient for the purpose of mutation rate estimation, because it allowed us to minimize the effects of selection and other evolutionary factors. this approach allowed us to observe hundreds of sequences carrying multiple u-to-c or a-to-g substitutions each, suggesting that a large fraction of all spontaneous mutations corresponds to hyper-mutation events. based on the sequence context of the observed changes, we propose that nov hypermutation might be driven by adar-mediated editing of the viral genomic rna of either polarity during replication. viral rna was extracted from % stool suspensions in pbs using the trizol ls reagent (invitrogen), eluted in diethyl pyrocarbonate-treated water containing rnasin (promega) and stored at − °c. rt was performed using superscript iii (invitrogen) and random hexamers for min at °c, min at °c and min at °c. pcr was done with phusion high-fidelity dna polymerase following manufacturer's recommendations and specific primers degenerated either for purines or pyrimidines at a final concentration of μm. for vp region , two pairs of primers with different degeneration were used: primers ′-aygaagayggcgycgagygacg- ′ (forward, nucleotides - in accession jx ) and ′-ggrrrrtttggtgggrctgctgc- ′ (reverse, nucleotides - in accession jx ) were designed to account for u-to-c mutations in plus-strand rna, and primers ′-rtgrrgrtggcgtcgrgtgrcg- ′ (forward) and ′-ggaa aayyyggygggacygcygc- ′ (reverse) were designed to account for a-to-g mutations in plus-strand rna. for region , the degenerate primer pairs were ′-caagayyccccayyccyyygg- ′ (forward, nucleotides - in accession jx ) and ′-ggrtgrcrccgrctggggtg- ′ (reverse, nucleotides - in accession jx ), and ′-crrgrttccccrttcctttgg- ′ (forward) and ′-ggaygacaccgacyggggyg- ′ (reverse), respectively. pcr conditions were °c s, cycles of °c s, °c s, °c min, and a final elongation step at °c min. a previously described norwalk virus infectious cdna clone (asanaka et al., ) was obtained after an mta with dr. m. k. estes (baylor college of medicine, houston), cloned in escherichia coli by the heat shock method, and purified by midiprep using the purelink hipure plasmid midiprep kit (invitrogen). human embryonic kidney cells hek were obtained from the american tissue culture collection (atcc crl- , ) and cultured in dmem f (dulbecco's modified eagle medium) supplemented with % fbs and antibiotics at °c under % co . norovirus was recovered from the cdna clone as described previously (asanaka et al., ) . briefly, approximately hek cells ( % confluence) were infected with a recombinant vaccinia virus expressing bacteriophage t rna polymerase at a multiplicity of infection of plaque-forming units per cell and, after h incubation, the inoculum was washed and cells were transfected with . μg of the infectious cdna clone using lipofectamine ltx reagent (invitrogen), following manufacturer's instructions. after h incubation at °c, vaccinia replication was inhibited with μg/ml arac (arabinofuranosyl cytidine) and cells were incubated for h. a plasmid containing a green-fluorescent-protein (gfp) was transfected under the same conditions as a control. after h incubation, rna was extracted from transfected cultures using trizol (invitrogen) followed by chloroform and isopropanol purification, and washed with % ethanol. in order to digest any remaining dna, samples were treated with u/μg rnase-free dnase i (thermo scientific) for min at °c. dnase i was heat-inactivated ( min at °c) and the rna was column-purified using nucleospin rna clean-up xs kit (macherey-nagel). purified rna was reversetranscribed using accuscript high fidelity reverse transcriptase (agilent technologies) and a sequence-specific primer. negativestrand rna was reverse-transcribed using the following primer ′-attactctctgtgcactgtctg- ′ (nucleotides - in accession nc_ ), whereas positive-strand rna was reverse-transcribed using primer ′-cagtgtagaagaggctgttgaa- ′ (nucleotides - in accession nc_ ). reverse transcription conditions used were °c for min, followed by °c at min. the vp gene was then pcr-amplified using phusion high fidelity dna polymerase (new england biolabs) and primers ′-gacgcyacaycaagcgygg- ′ (forward, nucleotides - in accession nc_ ) and ′-ctcrtgttrccrrcccrrcc- ′ (reverse, nucleotides - in accession nc_ ). the pcr conditions used were °c s, cycles at °c s, °c s, and °c min, and a final extension at °c min. controls were carried out in which the pcr was performed without rt step to ensure that no remaining dna from the infectious clone was amplified. to control for strand-specific amplification, transfection supernatants were cleaned by centrifugation at , × g, min, °c to separate free virions containing plus-strand genomes from cellular pellets, used for rna extraction, and the rt step was performed with the same primer used for amplification of minus strands. as expected, these controls did not yield any visible pcr product. to obtain a shorter product for illumina sequencing, a secondary pcr of the indicated size was done with the following cycling conditions: °c s, cycles of °c s, °c s, °c min, and a final extension at °c min. pcr products were sequenced in an illumina miseq machine using paired-end libraries. the quality of the run was first evaluated with fastqc software . . (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc). for clinical samples, a base calling pipeline was run to define a consensus reference sequence for each sample. to do this, illumina adapters and pcr primers were cut with cutadapt software (marcel, ) , fastq files were trimmed using prinseq-lite version . . (schmieder and edwards, ) , mapping was done using the mem algorithm from bwa . . (http://arxiv.org/abs/ . ), sam files were converted to bam format, sorted and indexed using samtools software package (li et al., ) , and sequence variants relative to a common reference were called with varscan . . (koboldt et al., ) using samtools mpileup data as input. for each sample, nucleotide changes detected at a frequency higher than . were used to construct the sample-specific reference sequence. for subsequent steps, paired-end illumina reads were merged using pandaseq (masella et al., ) and an initial trimming of these merged fastq files was performed with prinseq-lite version . . . trimmed fastq files were converted into fasta and standalone blast pair-wise alignments (camacho et al., ) were obtained to map reads and to obtain the number of mutations relative to the reference sequence of the sample. since the number of reads was variable, , reads were randomly chosen for each pcr. to obtain a refined set of mutated reads, a final quality filter was applied, such that only reads with average phred quality score higher than for the specific mutated positions were considered. this filter removed less than % of the original reads in all samples. specific shell, python, perl and r scripts were written for these analyses. pcr products were gel-purified and cloned using clonejet pcr cloning kit (thermo scientific) in e. coli by the heat shock method. transformant colonies were pcr-amplified using taq dna polymerase and clonejet-specific primers (forward ′-cgactcactatagggaga gcggc- ′; reverse ′-aagaacatcgattttccatggcag- ′) under the following conditions: °c min, cycles of °c s, °c s, °c min, and a final extension at °c min. colony pcr products were column-purified and sequenced by the sanger method. sequence chromatograms were analyzed using the staden software (http:// staden.sourceforge.net). we used stool samples from patients acutely infected with nov gii. to amplify by rt-pcr a -base region encompassing nucleotides to of the vp gene (reference sequence: genbank jx ; fig. a ). the rt-pcr was successful in / samples. of these, eight belonged to newborns or children under the age of three, two to adults, and for one sample there was no available age information. we performed paired-end illumina sequencing of these pcr products with , -fold coverage (i.e. , reads per patient). for three patients, approximately one in reads contained large numbers of u-to-c or a-to-g base transitions ( - reads with - such mutations out of approx. , total reads; table ). reads with less than five mutations of such type were not considered as hyper-mutants. although the error rate of illumina sequencing precludes analysis of low-frequency polymorphisms, it provides a powerful approach for detecting hyper-mutants. in the most mutated read, of the u residues were substituted for c, a pattern that cannot be explained by sequencing error. interestingly, we found both u-to-c and a-to-g hyper-mutations, but these did not occur in the same reads. to extend our analysis, we set out to amplify by rt-pcr another region encompassing nucleotides to of vp ( bases, although only the bases excluding primer regions were considered for subsequent analysis), which maps to the hypervariable domain p . in two out of the three samples showing hyper-mutation in the first region, we also found hyper-mutated reads in the second region, with a maximum of mutations in a single read. furthermore, one sample which failed to amplify for the first region also yielded hyper-mutated reads in the second region (table ) . two of the samples showing hyper-mutation belonged to newborns, whereas the other two belonged to adults, with no significant association between age and hyper-mutation at this low sample size (fisher's exact test, p = . ). viruses carrying large numbers of mutations should not be viable and, thus, their population frequency should be strongly reduced by the action of purifying selection. therefore, albeit very rare, hyper-mutants may reflect a relevant mutational process in nov. to minimize the effects of selection, we transfected a norwalk virus infectious cdna clone into hek cells expressing the t rna polymerase from a recombinant vaccinia virus (atcc vr- ). as previously described, this system supports nov transcription, replication and encapsidation (fig. b) , but does not allow released virions to initiate a second infection cycle because hek t cells are not a natural cell target for the virus (asanaka et al., ) . after h incubation, total rna was extracted from cells, residual dna was removed with dnase i, a specific primer annealing to the minus-strand of the vp capsid gene was used for reverse transcription, and high-fidelity pcr amplification of a region encompassing positions to of the vp gene ( bases, although only the bases excluding primer regions fig. . nov genetic map, regions sequenced, and setup of transfection assays. a. in the nov genetic map, the vp capsid gene is shown in red. molecular clones encompassing the entire vp gene were sequenced by the sanger method. illumina sequencing was used to analyze smaller regions mapping to the s domain of vp and the hyper-variable domain p (dark red bars). b. an infectious cdna clone was transfected in hek cells previously infected with a recombinant vaccinia virus expressing t rna polymerase, allowing for transcription of plus-strand nov genomic rna. a primer annealing to minus-strand copies was used for rt-pcr amplification and sequencing. colored circles represent mutations/variants. were considered for subsequent analysis) was carried out. for each of three independent transfection assays, we subjected the pcr products to paired-end illumina sequencing with the same coverage as above. comparison of illumina reads with the sequence of the infectious cdna clone (reference sequence: genbank nc_ ) revealed hundreds of sequences with multiple u-to-c substitutions. some examples of u-to-c hyper-mutants are shown in fig. . to objectively define hyper-mutated sequences, we analyzed the distribution of the number of u-to-c transitions among the , reads obtained for each replicate assay. the distribution clearly deviated from a poisson model of rare random events, showing an excess of sequences with high mutation counts (fig. a) . based on this, we defined hyper-mutated sequences as those carrying five or more mutations in the -base region studied. however, the data also showed that hyper-mutation was not an all-or-nothing process and that the number of mutations per read varied continuously. regarding u-to-c substitutions, we found hyper-mutated reads ( , and reads for assays , and , respectively), meaning that approximately one in every reads ( . %) contained u-to-c hyper-mutations (table ) . these carried , total u-to-c substitutions, the number of mutations per read varying from to out of the u residues contained in the -base fragment. since sequences were derived from minus-strand rna, u-to-c substitutions in the reference (plus-strand) genome sequence indicate that the negative-strand template rna contained ato-g substitutions. ultra-deep sequencing also revealed some a-to-g hyper-mutated sequences in two of the three assays (indicating u-to-c changes in the negative-strand template), but these were times less frequent (i.e. / = . ) than the former (table ) . a-to-g hyper-mutants may be a result of plus-strand carry-over amplification during rt-pcr or, alternatively, they may represent a different mutational process. analysis of the location of mutations revealed a widespread distribution along the -base vp region. although all of the u residues showed at least one u-to-c mutation at this high sequencing depth, mutation frequencies varied strongly across sites, the pattern of variation being highly reproducible between the three biological replicates (pairwise spearman p n . , p b − ; fig. b ). a major determinant of the frequency of u-to-c mutation was the identity of the ′ neighboring base. among the , u-to-c changes observed, the ′ neighbor was u in cases, a in cases, g in cases, and c in only cases. these counts clearly deviated from those of ′ neighbors of non-mutated bases (chi-square test: p b − ; fig. c ). after correcting for base composition, the ′ neighbor preferences for u-to-c hypermutation were a nu ng nc. interestingly, a-to-g hyper-mutated sequences showed a marked bias in the ′ neighboring base such that, among the total a-to-g mutations, the ′ neighbor was u in cases, a in cases, g in cases, and c in cases (p b − ; fig. c ). therefore, a-to-g hyper-mutation had ′ neighbor base preferences (unancn g) which are exactly the reverse complement of those for u-to-c hyper-mutation. this strongly suggests a common biochemical process underlying both u-to-c and a-to-g mutations, the type of change observed depending on whether hyper-mutation occurred in the minus or plus rna strand, respectively. based on the above data, the per-base probability of a u-to-c substitution due to hyper-mutation was ( . ± . ) × − , a value within the typical range of rna virus rates of spontaneous mutation (lauring table hyper-mutant (hm) sequences found in stool samples after rt-pcr of two vp regions. region a reads with an average base quality (sanger-scaled phred) score q n at mutated positions. b rt-pcr failed. fig. . distribution of u-to-c mutations along a vp region in sequences derived from transfected hek cells. the alignment on top shows two examples of highly mutated reads from each transfection assay. the heat map below indicates, for each nucleotide site, the total number of deep-sequencing reads carrying a u-to-c mutation (see color legend). sanjuán et al., ) . to ascertain the contribution of hypermutation events to the total nov mutation rate, we sought to estimate the total mutation rate from the above single-cycle transfection assays. since the illumina per-read accuracy is not high enough to reliably infer individual base substitutions at such low frequencies, we performed classical molecular cloning followed by sanger sequencing. using rna extracts from the above transfections, we amplified by high-fidelity rt-pcr the entire vp gene and obtained molecular clones. in total, we found base substitutions in , bases, giving a mutation rate estimate of . × − per nucleotide per cell infection (table ) , a value nearly identical to the hyper-mutation rate inferred by illumina sequencing. furthermore, of the mutations were u-to-c base transitions found in a single, hyper-mutated clone. removing this single clone, the estimated mutation rate was . × − , a value seven times lower than the estimated hyper-mutation rate. our results reveal that a large fraction of nov spontaneous mutations is constituted by u-to-c and a-to-g substitutions occurring as bouts of mutations in the same rna molecule. we argue that, depending on whether the hyper-mutation takes place in the minus or plus strand, u-to-c or a-to-g changes are observed, respectively, in the (plus strand) genomic rna. a likely mechanism underlying these ato-g mutations is adar, which edits adenosines to inosines that subsequently base-pair with cytosines (samuel, ; valente and nishikura, ) . a hallmark of adar and is that editing is more likely when the ′ neighbor of the editable base is a or u and, more precisely, the neighbor base preferences have been shown to be u n a n c ng (dawson et al., ; kuttan and bass, ; lehmann and bass, ; polson and bass, ) . our sequence analysis shows these same preferences, thus supporting the involvement of adar in nov hyper-mutation. previous work has shown or suggested adarmediated hyper-mutation in several viruses, but these were negativestrand viruses as opposed to novs (samuel, ) . hyper-mutation should be carried out by the interferon-inducible p isoform of adar , since this is the only adar form located in the cytoplasm (george et al., ) where novs replicate. adar uses doublestranded rna as a substrate and, therefore, the template rna has to adopt a nearly perfect stem-like secondary structure or be a doublestranded replicative intermediate. the secondary structure of the nov genomic rna has not been solved experimentally and, although in silico rna folding shows limited reliability for long molecules, stem-like structures are simple enough to be confidently predicted. however, the minimum free energy structure of the -base region encompassing vp nucleotides to predicted by the mfold algorithm (zuker, ) did not show a stem-like structure. this suggests that adar acts on nov double-stranded replicative intermediates. adar is ubiquitously expressed in human tissues (kim et al., ) and, although hek cells express relatively low adar levels, this activity was shown to be sufficient to edit % of hepatitis delta virus rna molecules (sato et al., ) . in b lymphocytes, which are candidate cell targets for novs in vivo (jones et al., ) , adar and adar are more strongly expressed and have been shown to edit thousands of adenosines in cellular mrna and long non-coding rna . a limitation of our study is that, whereas transfection assays were carried out using a cdna clone belonging to genogroup i, viruses isolated from stool samples were all from genogroup ii. however, the type of mutations produced and the neighbor base preferences were very similar in stool samples and in transfection assays. specifically, . % of the ′ neighbors of u-to-c mutations and . % of the ′ neighbors of a-to-g mutations were a or u in clinical samples. after correcting for base composition, the resulting ′ neighbor preferences for u-to-c mutations were a nu ng nc, whereas the ′ preferences for a-to-g mutations were unan gn c in clinical samples. the similarities between the results obtained in transfection assays and in vivo support a common underlying mechanism, despite the fact that different genogroups were used for these experiments. still, hyper-mutation was -fold more abundant in the transfection assays than in clinical samples. we attribute this difference to the fact that selection was mild or absent in transfection assays, whereas in stool samples (which should contain mainly free virions) we expect stronger selection against hyper-mutated genomes. alternatively, it is possible that adar activity was lower in the nov target cells in vivo than in hek cells. however, b cells show extensive adarmediated editing of cellular rnas . work with hiv- has shown that the observed levels of hyper-mutation vary depending on whether intra-cellular or virion-associated sequences are analyzed (russell et al., ) . apobec massively edits the retroviral cdna, leading to a rate of g-to-a mutation of approximately × − per base per cell, a value two orders of magnitude higher than hiv- reverse transcriptase errors (cuevas et al., ) . in contrast, the rate observed in plasma is times lower, consistent with the notion that the vast majority of apobec-edited hiv- genomes is unviable and rapidly removed by selection (ho et al., ) . in human hepatitis b virus, papilloma virus, herpes simplex virus and epstein-barr virus the strong deviation between observed and expected counts shows that sequence reads carrying multiple mutations were more frequent than expected from the poisson model. based on this, hyper-mutated reads were defined as those carrying five or more mutations. b. reproducibility of u-to-c mutation frequency in three transfection assays. in the graphs, each data point corresponds to an u-containing nucleotide site, and the number of times a u-to-c mutation was observed in deep-sequencing reads is plotted for each pair of transfection assays (also represented in the heat map of fig. ) . from left to right, spearman correlations were . , . , and . (p b − in all cases). c. neighbor base preferences for u-to-c and a-to-g hyper-mutation. the histograms show the frequency of u, a, g, and c among ´neighbors of u-to-c mutations (left), and the frequency of u, a, g, and c among ´neighbors of a-to-g mutations (right). the crossed lines indicate these same frequencies among non-mutated bases (null expectation). the error bars indicate the sem frequency from three transfection assays. apobec-edited genomes are usually found at low frequencies and their identification required a modified pcr protocol in which the lower melting temperature of a/t-rich molecules is exploited for selective amplification of hyper-mutants (suspène et al., ; suspène et al., ; vartanian et al., ) . a variant of this strategy has been devised for adar-edited sequences, which allowed detecting hyper-mutants in rift valley virus (suspène et al., ) . therefore, probably with the exception of hiv- , hyper-mutated sequences are generally rare. selective pcr amplification is valuable for detecting these sequences but does not allow an estimation of their population frequency. as a result, few studies have determined the abundance of viral hyper-mutants in an unbiased manner. ultra-deep sequencing provides a powerful tool for achieving this goal. previous work has suggested that the high genetic diversity of rna viruses originates mainly from the low replication fidelity of their polymerases. however, our in depth analysis of nov spontaneous mutations in clinical samples and laboratory populations supports the notion that host-driven hyper-mutation is a source of diversity comparable to or even greater than polymerase infidelity. hyper-mutation is not necessarily an all-or-nothing process and the number of nucleotide substitutions per sequence varied extensively, suggesting that hyper-mutation may significantly contribute to nov genetic diversity and evolution in nature. analysis of the types of mutations produced in longitudinal studies may help elucidate this contribution. a reads with an average base quality (sanger-scaled phred) score q n at mutated positions. b mutation rates were estimated by dividing, for a giving assay, the total number of mutations by the product of reads times the length of the pcr product ( bases). replication and packaging of norwalk virus rna in cultured mammalian cells molecular epidemiology and genotype distribution of noroviruses in children in thailand from to : a multi-site study mechanisms of gii. norovirus evolution blast+: architecture and applications quasispecies dynamics and molecular evolution of human norovirus capsid p region during chronic infection biased hypermutation and other genetic changes in defective measles viruses in human brain infections deep sequencing of norovirus genomes defines evolutionary patterns in an urban tropical setting extremely high mutation rate of hiv- in vivo structure and sequence determinants required for the rna editing of adar substrates norovirus immunity and the great escape multiple apobec restriction factors for hiv- and one vif to rule them all mutation in a lordsdale norovirus epidemic strain as a potential indicator of transmission routes viral shape-shifting: norovirus evasion of the human immune system adenosine deaminases acting on rna, rna editing, and interferon action replication-competent noninduced proviruses in the latent reservoir increase barrier to hiv- cure sequence analysis of the capsid gene during a genotype ii. dominated norovirus season in one university hospital: identification of possible transmission routes enteric bacteria promote human and mouse norovirus infection of b cells updating benchtop sequencing performance comparison plasmid-based human norovirus reverse genetics system produces reporter-tagged progeny virus containing infectious genomic rna molecular cloning of cdna for double-stranded rna adenosine deaminase, a candidate enzyme for nuclear rna editing molecular evolution of the capsid gene in norovirus genogroup i using varscan for germline variant calling and somatic mutation detection mechanistic insights into editing-site specificity of adars the role of mutational robustness in rna virus evolution double-stranded rna adenosine deaminases adar and adar have overlapping specificities the sequence alignment/map format and samtools mechanisms of gii. norovirus persistence in human populations cutadapt removes adapter sequences from high-throughput sequencing reads. a model for the generation of multiple a to g transitions in the human respiratory syncytial virus genome: predicted rna secondary structures as substrates for adenosine deaminases that act on rna pandaseq: paired-end assembler for illumina sequences aid and apobecs span the gap between innate and adaptive immunity numerous transitions in human parainfluenza virus rna recovered from persistently infected cells evolution of human calicivirus rna in vivo: accumulation of mutations in the protruding p domain of the capsid leads to structural changes and possibly a new phenotype single base substitutions in the capsid region of the norovirus genome during viral shedding in cases of infection in areas where norovirus infection is endemic systematic literature review of role of noroviruses in sporadic gastroenteritis preferential selection of adenosines for modification by double-stranded rna adenosine deaminase apobec g induces a hypermutation gradient: purifying selection at multiple steps during hiv- replication results in levels of g-to-a mutations that are high in dna, intermediate in cellular viral rna, and low in virion rna adenosine deaminases acting on rna (adars) are both antiviral and proviral viral mutation rates host factors and hiv- replication: clinical evidence and potential therapeutic approaches multiple viral infections and genomic divergence among noroviruses during an outbreak of acute gastroenteritis hepatitis delta virus minimal substrates competent for editing by adar and adar quality control and preprocessing of metagenomic datasets coronaviruses as dna wannabes: a new model for the regulation of rna virus replication fidelity extensive editing of both hepatitis b virus dna strands by apobec cytidine deaminases in vitro and in vivo inversing the natural hydrogen bonding rule to selectively amplify gc-rich adar-edited rnas genetic editing of herpes simplex virus and epstein-barr herpesvirus genomes by human apobec cytidine deaminases in culture and in vivo nidovirus ribonucleases: structures and functions in viral replication adar gene family and a-to-i rna editing: diverse roles in posttranscriptional gene regulation evidence for editing of human papillomavirus dna by apobec in benign and precancerous lesions rna populations in immunocompromised patients as reservoirs for novel norovirus variants adar regulates rna editing, transcript stability, and gene expression evolution of norovirus a-to-g hypermutation in the genome of lymphocytic choriomeningitis virus mfold web server for nucleic acid folding and hybridization prediction we thank members of the genomics facility of the university of valencia for assistance with illumina sequencing, and dr. silvia torres for laboratory assistance, and dr. mary estes for the norwalk virus infectious cdna clone. this work was supported by grants from the european research council (erc- -stg- -virmut) and the spanish ministerio de economía y competitividad (bfu - ) to r.s. illumina sequence alignments with hyper-mutated reads have been deposited in genbank under the following accessions: ku -ku (stool sample , region ), ku -ku (stool sample , region ), ku -ku (stool sample , region ), ku -ku (stool sample , region ), ku -ku (stool sample , region ), ku -ku (stool sample , region ), ku -ku (transfection ), ku -ku (transfection ), and ku -ku (transfection ). key: cord- -r hgfsxz authors: chakraborty, supriyo; barman, antara; deb, bornali title: japanese encephalitis virus: a multi-epitope loaded peptide vaccine formulation using reverse vaccinology approach date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: r hgfsxz japanese encephalitis (je) is a serious leading health complication emerging expansively that has severely affected the survival rate of human beings. this fatal disease is caused by je virus (jev). the current study was carried out for designing a multi-epitope loaded peptide vaccine to prevent jev. based on reverse vaccinology and in silico approaches, octapeptide b-cell and hexapeptide t-cell epitopes belonging to five proteins, viz. e, prm, ns , ns and ns of jev were determined. hydrophilicity, antigenicity, immunogenicity and aliphatic amino acids of the epitopes were estimated. further, the epitopes were analyzed for different physicochemical parameters, e.g. total net charges, amino acid composition and boman index. out of all the epitopes, a total of four t-cell epitopes namely kradss, krsrrs, skrsrr and kecpde and one b-cell epitope i.e. pkpcskgd were found to have potential for raising immunity in human against the pathogen. taking into account the outcome of this study, the pharmaceutical industries could initiate efforts to combine the identified epitopes together with adjuvant or carrier protein to develop a multi-epitope-loaded peptide vaccine against jev. the peptide vaccine, being cost effective, could be administered as a prophylactic measure and in jev infected individuals to combat the spread of this virus in human population. however, prior to administration into human beings, the vaccine must pass through several clinical trials. japanese encephalitis (je) is a major emerging, dreadful infection worldwide as this fatal disease has affected the lives of many individuals resulting into , deaths and , cases of infection per year (solomon, ; tsai, ) . the causal agent of the disease is je virus (jev). jev is transmitted by mosquitoes and is categorized into genus flavivirus and family flaviviridae (westaway et al., ) . jev severely affects the central nervous system of human and results into infectious disease. the transmission cycle of jev occurs between mosquitoes and birds or swine. however, the transmission of the virus to humans usually takes place through infected mosquitoes of the species, culex tritaeniorrhynchus (porterfield, ) . jev is found to prevail in many asian nations namely india, nepal, sri lanka, china, japan, korea, vietnam, thailand, myanmar, taiwan, siberia, cambodia, bhutan, bangladesh, malaysia and indonesia. the je epidemic has spread from eastern asia to southeast and southern asia (burke and leake, b; oya, ; vaughn and hoke jr, ; endy and nisalak, ; mackenzie et al., ) . apart from asia, je has affected many geographic regions of other continents as well, namely northern australia and western pacific (paul et al., ; hanna et al., ; hanna et al., ) . the outbreak of je was first observed in japan during s and the first isolate of jev was obtained by culturing the brain cells of an infected individual in (solomon et al., ) . although children are the primary targets of je infection, it also causes dreadful infection in adolescents and adults. in temperate regions of asia je outbreaks mainly occur in summer; while outbreaks in torrid zone and subtropics of asia prevail throughout the year and the occurrence of je infections rapidly increases during rainy days (burke and leake, b; halstead and jacobson, ; fischer et al., ) . symptoms of jev infection include fever, meningoencephalopmyelitis, aseptic meningitis, seizures or poliomyelitis-like paralysis (solomon et al., ; solomon and vaughn, ) . death occurs in about - % of je infected cases and about - % of surviving people usually encounter constant abnormalities associated with nervous system, e.g. mental disorientation, mental retardation and hemiparesis (solomon et al., ; fischer et al., ; ooi et al., ) . there are five different genotypes of jev (uchil and satchidanandam, ; solomon et al., ) and all the strains belong to only one serotype (tsarev et al., ; erra and kantele, ) . jev genome consists of a positive-sense single stranded rna molecule of size kb (westaway et al., ) . the viral rna synthesizes one polyprotein, which undergoes proteolytic cleavages within the jevinfected cells and produces proteins, viz. envelope (e), capsid (c), membrane (m or precursor membrane i.e. prm), ns , ns a, ns b, ns , ns a, ns b and ns proteins (ns-non-structural) (chambers et al., ; marin et al., ; rice, ; zanotto et al., ) . the envelope of jev, made up of glycoprotein, has a size of nm. this envelope encircles the nucleocapsid formed by the joining of capsid and rna. the e protein helps the virus adhere and penetrate into the host cell; in addition to helping at the time of membrane fusion (allison et al., ; kuhn et al., ) . the prm protein is secreted only during immature stage of the virion. at the later stage of viral infection, prm protein is broken down into m protein with the help of proteases. as a result, the virion develops into a mature virion. sometimes, prm protein fails to break down into m protein (bray and lai, ) . in jev-affected host cells, the virus produces ns protein externally and it acquires importance in virion maturation (fan and mason, ; rice, ) . the two other proteins namely ns and ns help jev undergo replication process (rice et al., ; bartholomeusz and wright, ) . various vaccines are formulated time to time for preventing the spread of je. the vaccines include inactivated cells acquired from mouse brain or vero cell culture, live-attenuated vaccines developed from jev or other viruses resulting into the formation of chimera (halstead and thomas, ; baig et al., ; hegde and gore, ) . all these vaccines have shown effective results in infected individuals. the who suggests to utilize these vaccines in those regions of the world where jev has caused acute infection (organization, ) . novel vaccines based on replicon (kofler et al., ) , subviral particles (konishi et al., ; konishi et al., ) , and jev-pulsed dendritic cells (li et al., ) have been under development and tested only in animal models. reverse vaccinology approach is based on the genomics of a pathogen and elucidates the antigenic determinants in the proteins of the pathogen (rinaudo et al., ; tantray et al., ) . it, in fact, follows an in silico approach to screen the entire genome of the infectious pathogens to determine the epitopes of proteins that could be used as potential molecules for peptide vaccine formulation and development. in comparison to the traditional technique of vaccine design that requires laborious as well as expensive wet lab experiments, the reverse vaccinology approach relies mostly on genomic information and involves much lesser cost and time. the technique of reverse vaccinology provides enormous advantage in designing peptide vaccines against highly infectious pathogens, including those which cannot be cultured in lab due to high risk imposed on researchers and health professionals. antigenic determinants or epitopes are the regions of the proteins that interact with the receptors on the t-cells or with antibodies produced by the b-cells and generate significant immune response in host. for this purpose, various bioinformatic tools have been developed to identify the potential epitopes for peptide vaccine formulation (weiss and littman, ) . peptide vaccines are not only safer, less expensive and easily manufacturable but also require less time for manufacture as compared to the traditional vaccines (von hoff, ; tang et al., ) . unlike traditional vaccines, peptide vaccines do not cause autoimmune disorders or skin irritations (skwarczynski and toth, ) . moreover, the newly discovered epitopes need to be well characterized for biological functions. development of immunity in a host against a pathogen or antigen stems from the presence of t-and b-lymphocytes. t-lymphocytes recognize the antigen only when it is displayed by a class of proteins, called major histocompatibility complex (mhc) molecules, as exterior part to the antigen presenting cells (apcs). the t-cell receptor binds with the antigen and causes its destruction (lafuente and reche, ). on the other hand, the destruction of antigen by b-cells takes place via the secretion of antibodies that interact with the antigen. b-cells are also capable of providing immunity during future exposure to the antigen as some b-cells differentiate into memory cells in the host. the current study was performed for predicting the potential t-cell and b-cell epitopes of five proteins in jev namely e, prm, ns , ns and ns of jev for designing a multi-epitope loaded peptide vaccine. this study allows the identification of unique peptides as vaccine candidates that might be much cheaper than the existing vaccine formulation against jev. our analysis suggested that e, prm and ns proteins possess capability for generating immunological response in the host body (monath et al., ) . another study reported that ns and ns proteins can induce immune reaction in the host body (turtle et al., ) . the predicted epitopes identified in five proteins selected in this study could be promising for formulating a peptide vaccine against jev and hence, could prevent the spread of jev in affected individuals. complete sequences of the proteins, viz. e, prm, ns , ns and ns of jev were collected from the protein database of national center for biotechnology information (ncbi) (http://www.ncbi.nlm.nih.gov). all the proteins were identified to possess their respective accession numbers: np_ . , np_ . , np_ . , np_ . and np_ . . using the algorithm based on wet lab experiments and established by hopp and woods, the linear hexamer t-cell epitopes in five selected proteins of jev were identified (hopp and woods, ) . further, the set of linear octamer b-cell epitopes in five selected jev proteins was determined by using the algorithm outlined by kolaskar and tongaonkar (kolaskar and tongaonkar, ). hydrophilicity score is a measure that describes the hydrophilicity of an epitope or a protein. it is the mean of the hydrophilicity values of all the amino acids contained in an epitope or a protein. it is useful in determining the structure of the protein (hopp and woods, ) . hydropathy index, a measure of hydrophobicity of a given epitope or a protein, was determined as the average of hydrophobicity values from all amino acids present in an epitope or a protein (kyte and doolittle, ) . estimation of hydropathy index values of all epitopes was achieved by applying the abdesigner algorithm (pisitkun et al., ) . beta-turn regions within a protein have significant role in generating immunogenicity. chou-fasman algorithm was used to determine the properties of secondary structure of peptide (chou and fasman, ) . immunogenicity score reports the immunogenic nature of a given epitope or protein. the immunogenicity scores of all the epitopes were estimated on the basis of hydropathy index and chou-fasman conformation parameters of each epitope. further, hydrophilicity, antigenicity and immunogenicity scores as well as aliphatic amino acids in each epitope were estimated using a computer program written in perl language by sc (corresponding author). the physico-chemical parameters of epitopes such as net charge and amino acid composition were determined in protparam tool of expasy (https://web.expasy.org/protparam/). further, the cleavage sites in epitopes were identified using the online tool peptide cutter (https:// web.expasy.org/peptide_cutter/) (shown in s and s ). boman index i.e. the protein binding potential of each epitope was determined by using the online tool apd i.e. (http://aps.unmc.edu/ap/prediction/ prediction_main.php). the epitopes that acquire boman index values higher than . kcal/mol are considered being capable of attaching to the mhc molecules or antibodies. with the help of bioinformatic tools, five hexapeptide t-cells epitopes and five octapeptide b-cell epitopes were identified in five proteins namely e, prm, ns , ns and ns of jev. in order to combat the spread of jev, hexapeptide t-cell epitopes would possess the ability to raise high immune response in the host (hopp and woods, ) . the epitope positions, hydrophilicity scores and total net charges of all t-cell epitopes were determined (table a) . out of five epitopes in e protein, the epitope ekrads (epitope position - ) was recorded with the greatest hydrophilicity score ( . ) but without any net charge. two epitopes namely krsrrs (epitope position - ) and skrsrr (epitope position - ) located within prm protein exhibited the greatest hydrophilicity value ( . ) with positive net charge (+ ). similarly, within ns protein, two epitopes with maximum hydrophilicity score of . were identified as eestde ( - ) and reestd ( - ) carrying net charges of − and − , respectively. the epitope rgeekk ( - ) of ns protein was estimated with the highest hydrophilicity value ( ) along with a positive net charge (+ ). in ns protein, the epitope deeren ( - ) was recorded with high hydrophilicity value ( . ) associated with a negative net charge (− ). high hydrophilicity scores of t-cell epitopes along with their positions usually represent the hydrophilic sections in proteins and these regions tend to remain exposed on the surface of the proteins. these regions can bind easily with the mhc molecules and the complexes so formed would be exhibited by the apcs on their surfaces. the t-cells can easily recognize these cells and destroy them. the epitope positions, hydrophilicity scores and total net charge of each octapeptide b-cell epitope were evaluated (table b) . in e protein, the epitope sgsdgpck ( - ) occupied maximal hydrophilicity value ( . ) but without any net charge. in the case of prm protein, two epitopes namely skrsrrsv ( - ) and krsrrsvs ( - ) were identified with the highest hydrophilicity score of . with net charge of + each. in ns protein, the epitope tkecpdeh ( - ) was recorded with the highest hydrophilicity value ( . ) and a negative net charge (− ). in ns protein, the epitope pkpcskgd ( - ) was found to possess high hydrophilicity value ( . ) with a positive net charge (+ ). in ns protein, the epitope pyvgkred ( - ) was found to possess the highest hydrophilicity score of . without any net charge. high hydrophilicity scores of b-cell epitopes imply that these epitopes lie in the hydrophilic regions of the respective proteins and can interact with the antibodies produced by b-cells. this interaction might lead to the generation of immunological response in host against jev. all t-cell epitopes obtained from five proteins were examined and evaluated for antigenicity (fig. a) . antigenicity values of the t-cell epitopes were found to range from . (deeren) to . (kecpde). likewise, total b-cell epitopes were also analyzed for antigenicity score (fig. b) and the antigenicity scores of b-cell epitopes varied from . (skrsrrsv, krsrrsvs, rctrtrhs and cgrggwsy) to . (pspkpcsk). the epitopes possessing high antigenicity values normally represent highly antigenic nature of the epitopes. notably, high antigenicity score of an epitope is treated as a salient feature for the formulation of a peptide vaccine. like antigenicity analysis, all the hexapeptide t-cell epitopes were evaluated for their immunogenicity scores (table a) . in e protein, the epitope nardrs was found to possess the highest immunogenicity score ( . ) whereas, in prm protein the epitope gndped recorded table a t-cell epitopes from five proteins of jev representing their hydrophilicity scores and total net charges (hopp & woods approach, maximum immunogenicity score ( . ). in ns protein the epitope khnrre was identified with the highest immunogenicity score ( . ) but in ns protein the epitope drqeep was found to possess the maximum immunogenicity score of . . in ns protein two epitopes, viz. deeren and rekrpr each possessed the highest immunogenicity score ( . ). aliphatic amino acid plays an important role in binding affinity. it is believed that the epitopes comprising of aliphatic amino acids viz. ala, gly, ile, lys, leu, met and val can interact easily with either of the two lymphocytes. in the present study, the t-cell epitopes were examined to calculate their aliphatic amino acid number (table a) . one epitope, gkrekk of ns protein consisted of aliphatic amino acids. four epitopes, namely grgdkq of e protein, rgeekk and geekkn of ns protein, and krekkp of ns protein had aliphatic amino acids each. six epitopes were identified to consist of aliphatic amino acids namely ekrads, nekrad and kradss of e protein; skgenr of prm protein; edcgkr of ns protein; and gperek of ns protein. ten epitopes, i.e. nardrs of e protein; krsrrs, skrsrr, dpedvd and gndped of prm protein; khnrre and kecpde of ns protein; gdrqee of ns protein; and rekrpr and rrarre of ns protein contained a single aliphatic amino acid. however, four epitopes were identified with no aliphatic amino acid and these were eestde and reestd of ns protein, drqeep of ns protein, and deeren of ns protein. in the case of all octapeptide b-cell epitopes, the immunogenicity score as well as the number of aliphatic amino acids contained in each b-cell epitope were determined (table b ). in e protein the epitope ysgsdgpc was identified to possess the highest immunogenicity score of . whereas, in prm protein the epitope ndpedvdc was recorded with maximal immunogenicity score ( . ). in ns protein the b-cell epitope tkecpdeh was estimated with maximum immunogenicity score of . . further, in ns protein the epitope ypkckngd recorded the highest immunogenicity score ( . ) whereas, in ns protein the epitope hkdpehpy recorded the maximum immunogenicity score of . . nine epitopes namely sgsdgpck and lsysgsdg of e protein; kpvgryrs of ns protein; ypkckngd, pkpcskgd and eypkckng of ns protein; gkenyvdy, pyvgkred and cgrggwsy of ns protein were identified to contain aliphatic amino acids. seven epitopes were found to possess aliphatic amino acids i.e. ysgsdgpc and elsysgsd of e protein, skrsrrsv and krsrrsvs of prm protein, drykylpe and kylpetpr of ns protein and pspkpcsk of ns protein. seven epitopes, i.e. eppfgdsy of e protein, ndpedvdc of prm protein, tkec-pdeh and ecpdehra of ns protein, dtpspkpc of ns protein, and hkdpehpy and fykpseps of ns protein contained a single aliphatic amino acid. moreover, two epitopes namely dcwcdnqe and rctrt-rhs of prm protein contained no aliphatic amino acid. it is believed that the epitopes that possess high immunogenicity score are highly immunogenic and the immunogenicity score of a peptide epitope is a determining factor in choosing epitopes for peptide vaccine formulation. boman index values of all hexapeptide t-cell epitopes present in five jev proteins were estimated (fig. a) . we used boman value . kcal/ mol as the minimum cut-off value. notably, all the t-cell epitopes in our study possessed boman value > . kcal/mol and were considered to have potential for binding successfully to mhc molecules. from these results we concluded that the complexes would be presented to tcells by apcs and would elicit immune response in the host. further, we noted that the boman index of t-cell epitopes ranged from . (kecpde) to . (rrarre). similarly, we estimated boman index of all the b-cell epitopes (fig. b) . eight b-cell epitopes, viz. sgsdgpck, ysgsdgpc, eppfgdsy and lsysgsdg of e protein; pspkpcsk and dtpspkpc of ns protein; and fykpseps and cgrggwsy of ns protein were found to have boman index < . kcal/mol which indicated they may not attach to immunoglobulins secreted by b-cells. the remaining epitopes possessed boman index > . kcal/mol and hence, these epitopes could bind to immunoglobulins effectively. the t-cell epitopes of jev proteins were analyzed for their amino acid arrangement (fig. a) . different amino acids were found in epitopes. interestingly, some epitopes of e, prm, ns and ns proteins comprised of amino acids like lys, asp, gly, ser, glu, arg constituting nearly . % of the total composition. on the other hand, some (kolaskar & tongaonkar approach, epitopes of prm, ns and ns proteins contained arg, asp, glu and lys to the extent of . %. only one epitope rrarre of ns protein was found to contain arg constituting . % of the total epitope composition. likewise, the b-cell epitopes of jev proteins were analyzed for amino acid composition (fig. b) . all the epitopes contained varying amino acid composition. some b-cell epitopes contained amino acids namely tyr, his, asp, ser, gly, lys, thr, cys, glu, pro and arg constituting nearly . % of the total composition. further, some other epitopes of e, prm, ns and ns proteins were found to possess ser, gly, arg, pro and asp constituting . % of the total epitope composition. further, the t-cell and b-cell epitopes against jev are amenable to cleavage by various enzymes and chemicals and the results of cleavage are presented in supplementary tables (s , s ). in this study, the potential epitopes for peptide vaccine formulation were identified in five proteins of jev namely e, prm, ns , ns and ns . all the epitopes were analyzed for various parameters using different bioinformatic tools. only a few epitopes were found to possess the properties essential for generating immune response in the host against jev and the selected epitopes might be used in formulating an effective peptide vaccine to combat the menace of jev. earlier several studies were carried out by other researchers for designing peptide vaccine based on genome derived or reverse vaccinology approach. in a study conducted by wei et al. ( ) , the epitopes existing in e protein of jev were identified that possessed the ability for developing peptide vaccine. two t-cell epitopes were determined positioning at amino acid sequences - and - , while six b-cell epitopes positioning at amino acid sequences - , - , - , - , - , and - were identified (wei et al., ) . chikungunya infection, a viral disease spread by aedes mosquitoes, is caused by chikungunya virus (chikv). kori et al. ( ) identified the epitopes from the proteins of three chikv strains having potential for peptide vaccine formulation. the t-cell epitopes identified by them included daekeaeeereaelt, aeeereael and kkkpgrrermcmkie whereas, the b-cell epitopes were found to be qvlkaknigl and sskydlecaq (kori et al., ) . zika virus (zikv) that causes deadly infectious disease results in malformed babies and the decline of survival rate in human. yadav et al. ( ) identified epitopes in the envelope glycoprotein sequence of zikv that could be employed for creating a peptide vaccine to prevent zikv. the t-cell epitope yrimlsvhg possessed the competence for developing immunity (yadav et al., ) . nipah virus (niv) causes respiratory infection and encephalitis in human and the transmission usually occurs through pigs or bats. in order to design peptide vaccines for treatment of niv diseases, two potential epitopes from niv proteins i.e. glycoprotein (g) and fusion (f) protein were detected with the help of immunoinformatics approach. it was reported that either the epitope ewisivpnfilvrnt from g-protein or the epitope gpkvslidtsstiti from f-protein could be used for vaccine development (sakib et al., ) . high-risk human papillomavirus (hrhpvs) is known to cause viral infections as well as cervical cancers in human. in a study, the probable epitopes were identified from e protein of hrhpvs and these epitopes were reported to possess competence in preparing successful peptide vaccine against hrhpvs. based on in silico approach, it was suggested table a immunogenicity (ig) and number of aliphatic amino acids in t-cell epitopes of jev (hopp & woods approach, immunogenicity (ig) and number of aliphatic amino acids in b-cell epitopes of jev (kolaskar & tongaonkar approach, that the t-cell epitopes fafrdlcivyr and rrevydfaf or their mixture could be considered for vaccine formulation and development. moreover, another potential t-cell epitope positioning at - amino acids was also identified as a potential peptide vaccine candidate (khan et al., ) . human coronavirus (hcov) has been reported to cause pneumonia as well as some diseases of the respiratory and the gastrointestinal tracts in human. for developing peptide vaccines against this virus, oany et al. ( ) identified the epitopes in spike proteins that was suggested for boosting immunological response within human body against hcov. from the study, it was evident that the b-cell epitope positioned at - amino acids and the t-cell epitope ksstgfvyf had the ability for designing a peptide vaccine (oany et al., ) . ebola virus disease (evd) is considered to be one of the deadly viral diseases affecting human beings. epitope-based peptide vaccine formulation against evd using reverse vaccinology approach was reported, wherein the potential t-cell epitopes viz., rrtrre, rrkrrd, ktgkkg and dedded and the potential b-cell epitopes viz., hlglddq, pdyddch, qpkcnpn, dqekkil, shyeppn, dyddchs, ptsppqd and eytypds were identified from the viral surface proteins (chakraborty, ) . the present study suggested that a multi-epitope-based peptide vaccine against jev could be developed by combining the promising bcell and t-cell epitopes found in e, prm, ns , ns and ns proteins. in order to design a peptide vaccine, the first criterion is to select the epitopes possessing high antigenicity scores. out of t-cell epitopes examined in five proteins of jev, eight epitopes were identified to possess high antigenicity scores. these epitopes were ekrads, krsrrs, skrsrr, dpedvd, edcgkr, kecpde and krekkp. the epitope ekrads possessed low immunogenicity score and so, it was excluded from further analysis. the remaining seven epitopes possessed high immunogenicity and hydrophilicity values in addition to the desired boman index value. on the basis of total net charges, three epitopes, i.e. dpedvd, edcgkr and kecpde were excluded from the study as these might not bind with the virus. we concluded that four t- among octapeptide b-cell epitopes, seven epitopes namely drykylpe, kpvgryrs, pkpcskgd, pspkpcsk, dtpspkpc, hkdpe-hpy and fykpseps were estimated to possess high antigenicity scores. only one epitope pkpcskgd was identified with excessive immunogenicity, hydrophilicity as well as boman index value. this b-cell epitope could be used in formulating the peptide vaccine. pharmaceutical companies could initiate efforts to synthesize a multi-epitope-loaded peptide vaccine by combining the promising four t-cell epitopes and one b-cell epitope in varying proportions with a strong adjuvant or carrier protein. the peptide vaccine could be administered in jev affected persons following proper clinical trials to ensure its safety and efficacy and might prevent the spread of jev by raising immunity in human. in view of the endless struggle between human and jev in evolutionary context, it is advisable to construct different sets of peptide vaccines comprising the epitopes in varied combinations. to build up a continuing resistance against jev through vaccination, the temporal and spatial deployment strategy (over time and space) of vaccine administration could be followed using different peptide vaccine sets in the countries frequently affected by jev. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . authors declare no conflict of interest in the manuscript. mutational evidence for an internal fusion peptide in flavivirus envelope protein e japanese encephalitis surveillance and immunization-asia and the western pacific amino acid composition of each hexapeptide t-cell epitope of jev. fig (b): amino acid composition of each octapeptide b-cell epitope of jev synthesis of dengue virus rna in vitro: initiation and the involvement of proteins ns and ns dengue virus premembrane and membrane proteins elicit a protective immune response japanese encephalitis ebola vaccine: multiple peptide-epitope loaded vaccine formulation from proteome using reverse vaccinology approach flavivirus genome organization, expression, and replication empirical predictions of protein conformation japanese encephalitis virus: ecology and epidemiology the vero cell-derived, inactivated, sa - - strain-based vaccine (ixiaro) for prevention of japanese encephalitis membrane association and secretion of the japanese encephalitis virus ns protein from cells expressing ns cdna japanese encephalitis prevention and control: advances, challenges, and new initiatives japanese encephalitis vaccines japanese encephalitis vaccines japanese encephalitis: new options for active immunization an outbreak of japanese encephalitis in the torres strait, australia, japanese encephalitis vaccines: immunogenicity, protective efficacy, effectiveness, and impact on the burden of disease prediction of protein antigenic determinants from amino acid sequences computational identification, characterization and validation of potential antigenic peptide vaccines from hrhpvs e proteins using immunoinformatics and computational systems biology approaches mimicking live flavivirus immunization with a noninfectious rna vaccine a semi-empirical method for prediction of antigenic determinants on protein antigens mice immunized with a subviral particle containing the japanese encephalitis virus prm/m and e proteins are protected from lethal jev infection generation and characterization of a mammalian cell line continuously expressing japanese encephalitis virus subviral particles in silico prediction of epitopes for chikungunya viral strains structure of dengue virus: implications for flavivirus organization, maturation and fusion a simple method for displaying the hydropathic character of a protein prediction of mhc-peptide binding: a systematic and comprehensive overview evaluation of murine bone marrow-derived dendritic cells loaded with inactivated virus as a vaccine against japanese encephalitis virus japanese encephalitis virus: the geographic distribution, incidence, and spread of a virus with a propensity to emerge in new areas phylogeny of tyu, sre, and cfa virus: different evolutionary rates in the genus flavivirus flaviviruses. in: fields virology. design of an epitope-based peptide vaccine against spike protein of human coronavirus: an in silico approach the epidemiology, clinical features, and long-term prognosis of japanese encephalitis in central sarawak, malaysia japanese encephalitis vaccines: who position paper-february japanese encephalitis vaccine outbreak of japanese encephalitis on the island of saipan nhlbi-abdesigner: an online tool for design of peptide-directed antibodies exotic viral infections nucleotide sequence of yellow fever virus: implications for flavivirus gene expression and evolution vaccinology in the genome era prediction of epitope-based peptides for the utility of vaccine development from fusion and glycoprotein of nipah virus using in silico approach peptide-based synthetic vaccines viral encephalitis in southeast asia pathogenesis and clinical features of japanese encephalitis and west nile virus infections poliomyelitis-like illness due to japanese encephalitis virus japanese encephalitis seizures and raised intracranial pressure in vietnamese patients with japanese encephalitis origin and evolution of japanese encephalitis virus in southeast asia the epitopes of foot and mouth disease a new vaccine candidate using reverse vaccinology for vibrio cholera in cholera disease factors in the changing epidemiology of japanese encephalitis and west nile fever. fac. emerg. arbovirus dis phylogenetic analysis suggests only one serotype of japanese encephalitis virus human t cell responses to japanese encephalitis virus in health and disease phylogenetic analysis of japanese encephalitis virus: envelope gene based analysis reveals a fifth genotype, geographic clustering, and multiple introductions of the virus into the indian subcontinent the epidemiology of japanese encephalitis: prospects for prevention design and evaluation of a multi-epitope peptide against japanese encephalitis virus infection in balb/c mice signal transduction by lymphocyte antigen receptors flaviviridae intervirology. computational modeling and analysis of prominent t-cell epitopes for assisting in designing vaccine of zika virus population dynamics of flaviviruses revealed by molecular phylogenies the authors are grateful to assam university, silchar, assam, india for providing necessary facilities to carry out this research work. further, the research work is dedicated to those great souls who died of jev due to lack of proper medication and affordable medical facility. key: cord- -qfim nu authors: oem, jae-ku; lee, soo-young; kim, young-sik; na, eun-jee; choi, kyoung-seong title: genetic characteristics and analysis of a novel rotavirus g p[ ] identified in diarrheic feces of korean rabbit date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: qfim nu group a rotaviruses (rvas) are important gastroenteric pathogens that infect humans and animals. this study aimed to analyze the complete genome sequence, i.e., genome segments of the lapine rotavirus (lrv) identified in the intestine of a dead rabbit in the republic of korea (rok) and to describe the genetic relationships between this lapine isolate [rva/rabbit-wt/kor/rab / /g p[ ] (rab )] and other lapine isolates/strains. rab possessed the following genotype constellation: g -p[ ]-i -r -c -m -a -n -t -e -h . the p[ ] genotype was found to originate from rabbits and was for the first time identified in the rok. phylogenetic analysis showed that rab possessed vp - and vp genes, which were closely related to those of the bat strain lzhp ; nsp - genes, which were closely related to those of the simian strain rrv; and vp , vp , and nsp genes, which were closely related to the genes obtained from other rabbits. interestingly, a close relationship between rab and simian rva strain rva/simian-tc/usa/rrv/ /g p[ ] for gene segments was observed. rrv is believed to be a reassortant between bovine-like rva strain and canine/feline rva strains. rab and canine/feline rvas shared the genes encoding vp , vp , vp , nsp , and nsp . additionally, the genome segments vp (i ), nsp (n ), and nsp (h ) of rab were closely related to those of bovine rvas. this is the first report describing the complete genome sequence of an lrv detected in the rok. these results indicate that rab could be a result of interspecies transmission, possibly through multiple reassortment events in the strains of various animal species and the subsequent transmission of the virus to a rabbit. additional studies are required to determine the evolutionary source and to identify possible reservoirs of rvas in nature. group a rotaviruses (rvas) are major pathogens associated with acute gastroenteritis in various host species, including birds and mammals, throughout the world (bresee et al., ) . rvas belong to the family reoviridae and have genome segments composed of double-stranded rna encoding six structural viral proteins (vp -vp , vp , and vp ) and five or six non-structural proteins (nsp -nsp ) (estes and cohen, ; pesavento et al., ) . the infectious rotavirus particle is composed of three concentric layers (the inner, middle, and outer layers). the outer layer is formed by the two capsid proteins vp and vp , which are most frequently used to classify rvas into g (for glycoprotein) and p (for protease-sensitive) genotypes, respectively (abe et al., ; matthijnssens et al., ) . this dual classification system has accelerated the comparison according to species-specific patterns among various animal species. to date, based on genetic characterization, g and p genotypes have been identified in humans and animals (li et al., ) . lapine rotavirus (lrv) strains have been isolated in canada, china, japan, italy, hungary, and the united states (banyai et al., ; bonica et al., ; ciarlet et al., ; guo et al., ; hoshino et al., ; martella et al., ) , and those that have been characterized belong to the vp g genotype. the g genotype has been described in various different host species, such as humans, rabbits, pigs, birds, bats, cats, dogs, monkeys, horses, mice, cows, and lambs (bonica et al., ) . the p [ ] and p [ ] genotypes of vp have mainly been reported in lrvs (banyai et al., ; martella et al., ; martella et al., ) . additionally, the p[ ]/[ ] genotype has been identified in porcine (collins et al., ; tonietti et al., ) . lrv is recognized as a potential source of human infection (bonica et al., ) ; however, to date, little is known about the molecular characteristics of rotavirus infection in rabbits in the republic of korea (rok (matthijnssens et al., ; parreno et al., ; schoondermark-van de ven et al., ) . the objective of this study was to analyze an lrv isolated from the intestine of a dead rabbit in in the rok by performing a complete genomic sequence analysis of the genome segments and to characterize the phylogenetic relationships between our isolate and other lapine isolates/strains. in , young rabbits died of acute enteritis in a flock of approximately rabbits. a domesticated rabbit that died suddenly was sent to the animal disease diagnostic division, animal and plant quarantine agency, rok, for post-mortem examination, where routine bacterial examination of the intestinal content was also performed. rabbit enteritis-related virological examination was performed, and rotavirus was detected using rt-pcr. the rabbit rotavirus rva/rabbitwt/kor/rab / /g p[ ] (rab ) was isolated from the intestinal content of the rabbit. viral rna was extracted from fecal suspensions using a rneasy mini kit (qiagen, hilden, germany) according to the manufacturer's instructions. the total rna was eluted in μl of rnase-free water and stored at − °c until use. rt-pcr was performed using × one-step rt-pcr smart mix (solgent, daejeon, korea). the primers used for amplifying vp -vp , vp , vp , and nsp -nsp were as described previously (de leener et al., ; matthijnssens et al., ; zeller et al., ) . briefly, rna was denatured at °c for min and quenched on ice. reverse transcription was performed at °c for min, followed by °c for min, and then cycles of amplification [at °c for s, °c for s (vp , vp , vp , and nsp -nsp ), °c for s (vp -vp and nsp ), °c for min (vp -vp and nsp ) and °c for min (vp , vp , and nsp -nsp )], and a final extension step at °c for min (vp -vp and nsp ) and °c for min (vp , vp , and nsp -nsp ). each pcr product was purified using the accupower pcr purification kit (bioneer, daejeon, korea). pcr amplicons were directly cloned into pgem®-t easy vector (promega, madison, wi, usa), which was then used for direct sequencing (macrogen inc., daejeon, korea). the obtained sequence data were analyzed using the basic local alignment search tool of the national center for biotechnology information database. homologous sequences were analyzed using the chromas software (version . , http://www.technelysium.com.au/ chromas.html) and aligned using clustalx (version . ). phylogenetic trees were constructed based on the genome segments of lrv using the maximum-likelihood method and a kimura -parameter (kimura, ) substitution model by employing mega software (kumar et al., ) . to construct each phylogenetic tree, additional sequences were obtained from genbank (http://www.ncbi.nlm.nih.gov). the complete nucleotide sequences of the genome segments of rab obtained in this study were assigned the accession numbers mk and mk -mk . the intestinal content of the rabbit was tested and found to be positive for rotavirus alone. no other enteric pathogens, such as e. coli and coronavirus, were detected. the nucleotide sequences of the genome segments of rab were completely identified, whereas the vp gene ( bp) of rab was partially sequenced. based on the nucleotide sequence identities, the vp , vp , vp , vp - , and nsp - genome segments of rab were observed to possess g -p[ ]-i -r -c -m -a -n -t -e -h , which is different from the constellation observed in previously characterized lrv strains (table ) . on comparison of the genotype of rab with that of the italian lapine strain - , chinese lapine strain n , and the dutch lapine strain k , differences were found in five (vp - , vp , nsp , and nsp ), five (vp , vp , nsp , nsp , and nsp ), and eight genes (vp - , vp , nsp , nsp , and nsp ), respectively (table ) . interestingly, rab shared eight identical genes with rch (vp - , vp - , nsp - , and nsp ) and rrv (vp - , vp - , and nsp - ). furthermore, rab shared seven genotypes with the canine strain cu- , canine-like human strain hcr a, bat strain lzhp isolated in china, and equine strain e (table ) . the sequences of genome segments of rab were analyzed and the genetic relationships between rab and other known lrvs as well as rvas were compared. the vp genome segment of rab was most closely related to the chinese bat strains, myas ( . %) and lzhp ( . %), and the argentine horse strain e ( . %) but not to the lrvs ( fig. and table ). the vp sequence was compared with the currently known representative rabbit strains. the vp gene of rab exhibited a maximum nucleotide similarity of . % with the lapine strain - from hungary followed by . %, . %, and . % similarity with the - , - , and - italian rabbit strains; phylogenetic analysis revealed that the vp gene from rab clustered in the p[ ] genotype with lrvs ( fig. and table ). the vp gene of rab clustered most closely with the lapine strain - ( . %) as well as with the human lapine-like strains b ( . %) and be ( . %), which were previously demonstrated to have a lapine origin (matthijnssens et al., ) ; this vp gene of either simian, bovine, or human origin belonged to the i genotype (fig. ) . the vp gene of rab was closely related to lzhp ( . %) belonging to the r genotype. the vp gene of rab belonged to the c genotype and included bat, simian, and equine rva strains. rab clustered most closely with the bat strain myas ( . %). the vp gene segment of rab was closely related to rrv ( . %, table ). the nsp , nsp , and nsp genes of rab were closely related to rrv ( . %, . %, and . %) and e ( . %, . %, and . %) and belonged to the a , t , and e genotypes, respectively. the nsp gene segment of rab clustered in the a genotype with rrv ( . %). the nsp and nsp genes of rab showed genetic relatedness to rrv ( . % and . %) and lzhp ( . % and . %), whereas the nsp and nsp genes of the lapine strain - and the human lapine-like strains b and be clustered in the t and e genotypes, respectively. the nsp and nsp genes of rab clustered in the n and h genotypes and were closely related to the strains - ( . % and . %), k (both genes . %), b ( . % and . %), and be ( . % and . %), respectively ( fig. and table ). although k originated in a rabbit, it was completely divergent from the n strain and shared three genotypes with rab and six with the - strain (fig. ) . to date, g and p genotypes of rotavirus have been identified (li et al., ) . among them, g has been detected in a broad spectrum of host species, including humans, indicating that g is more likely to undergo interspecies transmission than g , g and g . additionally, g is the only genotype identified in rabbits having vp specificity (banyai et al., ; guo et al., ; martella et al., ) . p [ ] for vp has been found in humans, goats, and rabbits (martella et al., ; matthijnssens et al., ; parreno et al., ) . previous studies have also identified p [ ] in the lapine-like human strains b and be and in the human bovine-like strain rch (bonica et al., ; donato et al., ) . consequently, p[ ] specificity in humans may be attributed to interspecies transmission. however, recently, p[ ] rotaviruses have rarely been reported in humans. moreover, p[ ] has mainly been found in rabbits in hungary and italy (banyai et al., ; martella et al., ) and also identified in porcine (collins et al., ; tonietti et al., ) . this seems to be related to virus evolution. our results revealed that the p[ ] genotype was detected in a rabbit for the first time in the rok, suggesting that p[ ] has greater species specificity than p [ ] . because little lrv sequencing data are available, further studies are necessary to investigate the presence of the p[ ] genotype in other animal species in the rok. rab shared three genes with two lrvs ( - and n ), namely vp (g genotype), vp (m genotype), and nsp (a genotype), whereas the remaining eight genes (vp , vp , vp , vp , nsp , nsp , nsp , and nsp ) were different from each other (table ) . considering the lapine origin, the reason underlying differences in genotypes remains unclear. it is therefore presumed that its ancestor/origin is different. interestingly, of the genotypes, rab shares eight identical genotypes with rrv and rch . however, rch showed ≤ % sequence identify for a majority of rab gene segments. rrv, which was isolated from a juvenile rhesus macaque with diarrhea, shared gene segments (vp , vp , nsp - ) and showed ≥ % sequence identity with rab . rrv was most closely related to rab with regard to the vp , nsp , and nsp genes ( fig. and table ). additionally, rrv shared eight genotypes with the canine strain cu- , feline strain cat , and canine-like human strain hcr a (mino et al., ; tsugawa and hoshino, ) and three genotypes with the bovine-like rva strain (vp , vp , and nsp ) (matthijnssens et al., ) (table ) . rrv may present strong evidence for reassortment among different animals rva strains via interspecies transmission. rrv appears to have undergone reassortment a long time ago. so far, the infection source and host origin of rab remains unknown. these results indicate that rab is most closely related to the canine/feline ancestor of rrv. our genomic analysis showed that rab was more related to other animal strains, such as simian, bat, and equine, than rabbit. this implies that rvas can easily cross between different host species and are able to spread successfully and cause diseases in a new host. several studies have reported that human and animal rvas originate from complex animal-human reassortment or interspecies transmission events, presumably because of the close proximity of humans to livestock and companion animals (banyai et al., a; banyai et al., b; ghosh et al., ; guo et al., ; matthijnssens et al., ; matthijnssens et al., ) . rab shared the following genes with a previous phylogenetic analysis revealed that rab showed a high sequence identity for vp ( . %), nsp ( . %), and nsp ( . %) genes with the human strain be , representing a rabbit to human interspecies transmission which can cause disease development in humans; rab was closely related to the bovine strain (bonica et al., ) . another lrv strain, k , has a genotype constellation identical to that of a bovine strain, which was isolated from a -month-old boy with gastroenteritis in slovenia (steyer et al., ) . interestingly, lapine and lapine-like strains have been derived from a population of previously characterized bovine strains and bovine-like ancestral strains, suggesting a past reassortment event between a bovine and lapine. rab was closely related to the bat strain lzhp with regard to vp ( . %), vp ( . %), vp ( . %), and nsp oem, et al. infection, genetics and evolution ( ) [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ( . %) genes. however, numerous genes of bat strains possessed canine/feline characteristics. bats may be important zoonotic reservoirs that readily transmit animal rvas, facilitating reassortment events. the present results indicate that rab possesses genes derived from various host species; however, it was impossible to accurately determine the host species that transmitted rab . therefore, rab identified in the rok may have originated from other animal strains, instead of rabbits, as a result of multiple reassortment events. in conclusion, the present findings suggest that rab is closely related to the simian strain rrv and not lrvs. g p[ ] identified in this study has newly emerged in rabbits in the rok and may have more species specificity. the results revealed that rab is an interspeciestransmitted virus, which may have developed owing to multiple reassortments among canine, feline, and bovine hosts and subsequent transmission to a rabbit. the results of this study reinforce the important role of animals in the ecology and evolution of rvas as well as highlight the potential of this virus to cross between species. additional studies should emphasize on the importance and the need for continued surveillance of rvas in animals. oem, et al. infection, genetics and evolution ( ) - rva/cat-tc/aus/cat / /g p rva/rabbit-tc/ita/ - / /g p[ ] rva/rabbit-tc/ita/ - / /g p[ ] rva/cat-tc/aus/cat / /g p[ ] rva/rabbit-tc/ita/ - / /g p[ ] rva/rabbit-tc/ita/ - / /g p[ ] rva/cat-tc/aus/cat / /g p[ ] rva/rabbit-tc/ita/ - / /g p[ ] rva/rabbit-tc/ - / /g p[ ] whole genome characterization of new bovine rotavirus g p[ ] and g p[ ] strains provides evidence for interspecies transmission identification of the novel lapine rotavirus genotype p [ ] from an outbreak of enteritis in a hungarian rabbitry genetic diversity and zoonotic potential of human rotavirus strains genetic heterogeneity in human g p[ ] rotavirus strains detected in hungary suggests independent zoonotic origin complete genome analysis of a rabbit rotavirus causing gastroenteritis in a human infant update on rotavirus vaccines comparative amino acid sequence analysis of the outer capsid protein vp from four lapine rotavirus strains reveals identity with genotype p[ ] human rotaviruses detection and characterisation of group a rotavirus in asymptomatic piglets in southern ireland human infection with a p[ ], g lapine rotavirus genetic characterization of a novel g p[ ] rotavirus strain causing gastroenteritis in year old australian child rotavirus gene structure and function full genomic analysis and possible origin of a porcine g rotavirus strain ru full genomic analysis of rabbit rotavirus g p[ ] strain n in china: identification of a novel vp genotype characterization of neutralization specificities of outer capsid spike protein vp of selected murine, lapine, and human rotavirus strains a simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences mega : molecular evolutionary genetics analysis version . for bigger datasets rva/rabbit-tc/ita/ - /g p[ ] identification of novel and diverse rotaviruses in rodents and insectivores, and evidence of cross-species transmission into humans molecular characterization of the vp , vp , vp , and nsp genes of lapine rotaviruses identified in italy: emergence of a novel vp genotype lapine rotaviruses of the genotype p[ ] are widespread in italian rabbitries full genomic analysis of human rotavirus strain b and lapine rotavirus strain / provides evidence for interspecies transmission are human p[ ] rotavirus strains the result of interspecies transmissions from sheep or other ungulates that belong to the mammalian order artiodactyla? simian rotaviruses possess divergent gene constellations that originated from interspecies transmission and reassortment equine g p[ ] rotavirus strain e related to simian rrv and feline/canine-like rotaviruses based on complete genome analyses molecular characterization of the first isolation of rotavirus in guanacos (lama guanicoe) rotavirus proteins: structure and assembly rabbit colony infected with a bovine-like g p [ ] rotavirus strain whole genome sequence analysis of bovine g p[ ] rotavirus strain found in a child with gastroenteritis phylogenetic analyses of the vp and vp genes of porcine group a rotaviruses in sao paulo state, brazil: first identification of g p[ ] in piglets whole genome sequence and phylogenetic analyses reveal human rotavirus g p[ ] strains ro and hcr a are examples of direct virion transmission of canine/feline rotaviruses to humans full genome characterization of a porcine-like human g p[ ] rotavirus strain isolated from an infant in belgium the authors declare no conflicts of interest. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . key: cord- -ju xqwa authors: xia, jing; he, xiao; yao, ke-chang; du, li-jing; liu, ping; yan, qi-gui; wen, yi-ping; cao, san-jie; han, xin-feng; huang, yong title: phylogenetic and antigenic analysis of avian infectious bronchitis virus in southwestern china, – date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: ju xqwa the aim of this study was to decipher the molecular epidemiological and antigenic characteristics of infectious bronchitis virus strains (ibvs) isolated in recent years in southwestern china. a total of field strains were isolated from diseased chickens between and . phylogenetic analysis based on s nucleotide sequences showed that of the isolates were clustered into four distinct genotypes: qx ( . %), tw ( . %, twi and twii), mass ( . %), and j ( . %). the qx genotype was still the prevalent genotype in southwestern china. recombination analysis of the s subunit gene showed that eight of the field strains were recombinant variants that originated from field strains and vaccine strains. a new potential recombination hotspot [atttt(t/a)] was identified, implying that recombination events may become more and more common. the antigenicity of ten ibvs, including seven field strains and commonly used vaccine strains, were assayed with a viral cross-neutralization assay in chicken embryonated kidney cells (cek). the results showed that the ten ibvs could be divided into four serotypes (massachusetts, b, sczy , and scyb). sczy and b were the predominant serotypes. six of the seven field isolates (all except for ck/ch/scyb/ ) cross-reacted well with anti-sera against other field strains. in conclusion, the genetic and antigenic features of ibvs from southwestern china in recent years have changed when compared to the previous reports. the results could provide a reference for vaccine development and the prevention of infectious bronchitis in southwestern china. infectious bronchitis (ib) is a highly contagious disease in chickens that causes significant economic losses to the worldwide poultry industry (colvero et al., ) . the etiologic agent of ib is the infectious bronchitis virus (ibv), a member of the coronaviridae family, in the subfamily coronaviridae and genus gamma-coronavirus. the ibv genome is . kb and encodes at least four structural proteins, including the spike glycoprotein (s), membrane protein (m), small membrane protein (e), and nucleocapsid protein (n) (ujike and taguchi, ) . the major immunogen of ibv is the s subunit protein, which contains epitopes that can induce the production of specific neutralizing antibodies and the hemagglutination inhibition antibody. ibvs from different serotypes usually exhibit poor cross-protection (li et al., ) . due to the incomplete proofreading mechanism of the rna polymerase and the gene recombination during genome replication, ibv genomes are constantly evolving, and new ibv variant strains are always arising (baker and lai, ; lai, ) . since the early s, ibv has been diagnosed in china by viral isolation. although the wide use of vaccine strains, such as h , m , / , / , and ma , has successfully prevented ib epidemics on most farms, immune failure is still reported frequently as the result of infections with strains that differ serologically from the vaccine strains. therefore, continuing analysis of the genetic evolution and antigenic relatedness among field isolates and vaccine strains may provide critical insight for vaccine strain selection and vaccine development. our previous study revealed that isolates obtained between and from the sichuan province belonged mainly to a group of qx-like strains ( % qx-type; % twi-type) (zou et al., ) . in a later report from other researchers, qx-type and twi-type ibvs accounted for % and %, respectively, in sichuan area during - . while in southern china, picture was quite different, ck/ch/lsc/ itype was the predominant genotype and no qx-type strains were isolated during (mo et al., ) . so the genetic character of ibvs from china varied according to time and regions. for the antigenic features of ibvs isolated in recent years from china, less attention was given. a report showed that ibvs from guangxi of china in - infection, genetics and evolution ( ) infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d could be divided into serotypes (i-vi), but most of the isolates ( / ) was qx-genotype, ibvs from the qx-genotype may belong to different serotypes, and the serotype of ibvs varied according to time and regions (qin et al., ) . another report showed the serotype of twi-type strains were different from mass-type in taiwan in (wang and huang, ) . as the genetic and antigenic character of ibvs varied according to time and regions, and there is no official report on that of the ibvs from southwestern china in recent years, the molecular and antigenic characteristics of ibvs from southwestern china were not clear. the aim of this study was to decipher the genetic and antigenic characteristics of ibv strains circulating in commercial flocks in southwestern china in recent years. specific pathogen-free (spf) chicken embryos were obtained from beijing merial vital laboratory animal technology co., ltd. (beijing, china). m and h strains were obtained from the china institute of veterinary drug control (beijing, china). the / vaccine was supplied by internet international b.v. (boxmeer, nl). throughout - , kidney, lung, and trachea samples were collected from broiler or layer chickens suspected of ib infection in southwestern china (table ). samples were homogenized in phosphatebuffered saline (pbs) containing μg/ml penicillin and μg/ml streptomycin in a ratio of : - . after filter sterilizing with a . μm filter membrane, . ml sample was inoculated into the allantoic cavity of -to -day-old spf embryos. the embryos were incubated at °c and examined twice daily for their viability. the allantoic fluids were harvested after h incubation, and three blind passages were conducted. the presence of ibv was verified by reverse transcription-polymerase chain reaction (rt-pcr) of the n gene (zou et al., ) . the existence of other five pathogens, h subtype avian influenza virus (h aiv), newcastle disease virus (ndv), marek's disease virus (mdv), bacteria and coccidiosis in those samples were verified by following the methods of other reports (abu-akkada and awad, ; chen et al., ; li et al., ; rui et al., ; tian et al., ) . total rna was extracted from ibv-infected allantoic fluid with rnaiso plus (takara biotechnology co., ltd., dalian, china) according to the manufacturer's instructions and dissolved in μl sterile diethylpyrocarbonate (depc)-treated water before being stored at − °c for further use. for the reverse transcription (rt) reaction, μl of template rna, μl of × rt mix, and μl of rnase-free water were added and mixed. the reaction mixture was incubated at °c for min and then at °c for min. pcr amplification and cloning of the s gene was performed as the previous report (zou et al., ) . the recombinant plasmids containing the target gene were sequenced by shanghai sanggong biological engineering technology & services co., ltd. (shanghai, china). nucleotide sequences of the s gene obtained from the ibv isolates were aligned using the editseq program in the lasergene package (dnastar inc., madison, wi, usa) and compared to the sequences of other reference ibvs using the megalign program in the same package. for the reference ibvs, strains were isolated from china, strains were isolated from the usa, strains were from japan, and the other strains were vaccine strains. a phylogenetic tree of the s gene was created using the neighbor-joining method in mega version . . . bootstrap values were determined from replicates of the original data. the s subunit sequences of ibv field strains and reference strains were aligned by megalign, and putative recombinant strains were selected by sequence homology analysis. in order to identify the assumed parent sequences, the s subunit sequences of suspected recombinant isolates were blasted against the genbank database of the national center for biotechnology information (ncbi). recombination analysis of the selected sequences was conducted with the aid of recombination detection program (rdp . ) and simplot version . . software. potential recombination events were identified using the rdp, maxchi, and geneconv methods in rdp . to identify putative parental sequences with significance set at p values b . and the sliding window size set as bp. putative potential recombination events were further identified using the simplot version . . . nucleotide identities were calculated using the kimura -parameter method with a transition-transversion ratio of in each window of bp, and the window was successively extended in -bp increments. for the preparation of antisera against the ten ibvs, -week old rabbits (n = ) were immunized with purified eid ibvs subcutaneously mixed with an equal volume of complete freund's adjuvant (sigma, missouri, usa) for the first injection, and with the same antigen emulsified in freund's incomplete adjuvant for the following two booster injections (two-week interval). rabbits were held in separate biosafety level (bsl ) isolators in the laboratory animal center of sichuan agricultural university (ya'an, sichuan, china) with a libitum access to feed and water and maintained under uniform standard management conditions. approval for these animal studies was obtained from the sichuan provincial laboratory animal management committee [permit number: xyxk(sichuan) - ] and the ethics and animal welfare committee of sichuan agricultural university. antisera from vaccinated animals were collected at days after the final immunization and stored at − °c. to determine the antigenic relatedness between the field ibv isolates and the vaccine viral strains, double-direction viral cross-neutralization (vn) tests were performed in chicken embryo kidney (cek) cells using constant viral titers and diluted serum. the tested strains came from six different genotypes and included seven ibv field isolates (sczy , ck/ch/scdy/ , ck/ch/scls/ , ck/ch/cqkx/ , ck/ch/scyb/ , ck/ch/scmy/ i, ck/ch/scyb/ ) and the three most commonly used vaccine viral strains (h , m , and / ). before vn testing, ibv strains were adapted to cek cells by serial passaging. briefly, allantoic fluid containing the ibv strain was propagated in monolayer primary cek cells prepared from -to -day-old chicken embryos. infected cek cells were cultured in dulbecco's modified eagle's medium (gibco, grand island, ny, usa) supplemented with % fetal bovine serum (zhejiang tian-hang biological technology stock co., ltd., zhejiang, china) and incubated at °c with % co . the supernatant was harvested h post-inoculation and passaged blindly in cek cells until a characteristic cytopathic effect (cpe), such as syncytia, was observed. determination of the tcid of the cek-adapted ibvs in cek cells was conducted per the method of reed and muench ( ) . for the vn test, equal volumes of tcid of the cek-adapted ibvs and serial two-fold dilutions of antisera were mixed and kept at °c for h. next, . ml of the virus-antisera mixture was then transferred to cek cell cultures in -well plates ( wells for each dilution). the plates were incubated for h, and the % end-point neutralizing titers were calculated by the method of reed and muench ( ) . negative rabbit serum was also incubated with tcid of ibv to calculate its non-specific neutralizing titer to ibvs, and this neutralizing titer was used as a background value for further analysis. the vn end-point titers were used to calculate the antigenic relatedness values (arv, r) by the method of archetti & horsfall ( ) :avrðrÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi r  r p  %, where r represents the ratio of the heterologous neutralizing titer of virus to the homologous titer of virus , and r represents the ratio of the heterologous titer of virus to the homologous titer of virus . isolates where r b % were considered to be antigenically unrelated, isolates with % ≤ r b % were considered to be antigenically related, and isolates with r ≥ % were considered to be antigenically identical. a total clinical samples, including trachea, lung, and kidney samples, were collected from dead or diseased chickens displaying respiratory symptoms and/or nephritis from chicken flocks located in southwestern china. this included the si-chuan, yun-nan, gui-zhou, and chong-qing areas. from these, ibv strains were isolated. typical signs of ibv, including embryo dwarfing and death, were observed during the passaging of samples through embryos. rt-pcrs of the clinical samples showed that only a few samples exhibited co-infection of the h aiv ( / , . %) or the ndv ( / , . %). bacterial isolation showed that e. coli and salmonella were often found in the clinical samples ( / , . %). the case histories of local strains are listed in table . . . phylogenetic analysis of s gene s gene sequences from the ibv isolates were determined and submitted to genbank under the accession numbers ku - , ku , kx - and kx - . the full-length s subunit gene open reading frame (orf) ranged from to bp. phylogenetic analysis based on s nucleotide sequences of the wild strains showed that of the isolates could be grouped into four genotypes: qx, j , tw (twi and twii), and mass. nine field isolates (ck/ch/scdy/ , ck/ch/gzgy/ , ck/ch/scms/ , ck/ch/scmy/ , ck/ch/scyb/ , ck/ ch/scmy/ , ck/ch/sclz/ , ck/ch/scem/ , and ck/ ch/scya/ ) were included in the qx group, sharing . - . % nucleotide identity with the s sequences of other qx-like ibvs from pandemics in recent years. four isolates (ck/ch/scms/ , ck/ch/ scyb/ , ck/ch/scms/ , and ck/ch/ynkm/ ) belonged to the tw type. among these, ck/ch/scyb/ , ck/ch/ scms/ , and ck/ch/ynkm/ belonged to the twi type, sharing . - . % nucleotide identity with those of other twi-type reference strains; ck/ch/scms/ grouped with twii-type sequences, sharing . - . % nucleotide identity with twii reference strains. two field isolates (ck/ch/gzlsy/ and ck/ch/scdy/ ) clustered with mass-type reference strains, sharing . - . % nucleotide identity with mass reference strains. one isolate (ck/ ch/scls/ ) clustered with the j group, exhibiting . - . % nucleotide identity with j reference strains. the phylogenetic tree is shown in fig. . recombination events in the s gene were identified using the rdp . and simplot . . software. simplot results were similar with rdp (date not shown). among the field isolates, a total of eight recombinant strains were found (fig. ) and were mainly clustered into four groups, termed variant- , variant- , variant- , and variant- ( fig. ) in the phylogenetic tree. the variant- group contained two isolates (ck/ch/scmy/ and ck/ch/scbz/ ), variant- contained four isolates (ck/ch/scyb/ , ck/ch/cqkx/ , ck/ch/gzxf/ , and ck/ch/scem/ ), and the variant- and variant- group contained only one isolate each ck/ch/sctq/ and ck/ ch/gzlxm/ , respectively. in the variant- group, the nucleotide sequences located at nucleotide positions - of strains ck/ch/scmy/ and ck/ch/scbz/ showed high identity to the sequences of the qx strains ck/ ch/lhlj/ ( . %) and ck/ch/lhb/ ( . %), respectively. the nucleotide sequences located at - nt of both strains, however, exhibited . % identity with that of tw / (twii type). the breakpoint site in both strains was located at nt in the s subunit gene. in the variant- group, nucleotide sequences located at - nt of the four isolates ck/ch/scyb/ , ck/ch/cqkx/ , ck/ch/ gzxf/ , and ck/ch/scem/ exhibited . - . % identity with qx-type strain lc , while nucleotide sequences located at - nt of the four isolates shared . - . % identity with those of ck/ch/lsc/ i-type strain saibk. the breakpoint site sequence in these four ibvs was located at nt in the s subunit gene. for the variant- strain ck/ch/sctq/ , positions - nt exhibited . % identity with those of ck/ch/lsc/ i-type strain ck/ch/ guangxi/hezhou/ , while positions - nt exhibited . % identity with that of / -type strain ck/ch/ljl/ . the breakpoint site was at position nt in the s subunit gene. finally, for the variant- strain ck/ch/gzlxm/ , positions - nt exhibited . % identity with sequences from the vaccine strain h , while positions - exhibited . % identity with sequences from the qx-type strain ck/ch/gd/ly . the breakpoint site was located at nt in the s subunit gene. further analysis the sequences near the breakpoint site, revealed a new potential, a-t rich hotspot sequence, atttt(t/a), which was found near the breakpoint site of the s subunit gene in all of the variant- , variant- , variant- and variant- groups (fig. ) . the antigenicities of representative isolates from different genotypes were assessed by viral cross-neutralization assays (tables and ). ten ibvs, including seven field strains from different genotype backgrounds and three commonly used vaccine strains (h , / , and m ), were analyzed and grouped into four serotypes: massachusetts (mass hereafter), b, sczy , and scyb. the commonly used vaccines h and / had no cross-neutralization titers and were used as reference serotypes. four ibvsincluding the vaccine strain h , field isolates ck/ ch/scdy/ (qx) and ck/ch/scls/ (j ), and the standard virulent strain m , commonly used in inactivated vaccine production (table ) . many ibvs, including six field strains and one vaccine strain ( / ) were included in different serotypes: ck/ch/scdy/ (qx) was included in both the mass and sczy serotypes, ck/ch/scls/ and m were included in all serotypes except scyb, ck/ch/scms/ and ck/ch/cqkx/ (variant- ) were included in both b and sczy serotypes, ck/ch/scyb/ was included in all serotypes except mass, ck/ch/scyb/ and / were included in both b and scyb serotypes. only h and the sczy standard were included in a single serotype (table ). in the unidirectional neutralization assay of vaccine-associated strains, immune sera against h could only neutralize three ibvs from three different genotypes, while immune sera against / were able to neutralize six ibvs from five different genotypes. although h and m belong to the same serotype, immune sera against m neutralized ibvs from six different genotypes, and the neutralizing titers were higher than those of h in most cases. the neutralization abilities of antisera against / and m were higher than that against h (table ). in terms of the unidirectional neutralization assays of the seven field strains, immune sera of ck/ch/scdy/ , ck/ch/scls/ , and ck/ch/cqkx/ neutralized all ten of the analyzed strains. immune sera of ck/ch/scyb/ and ck/ch/scyb/ neutralized all of the analyzed strains except h . immune sera of ck/ch/scms/ and ck/ch/scyb/ could neutralize all strains with the exception of one field strain. the neutralization abilities of antisera against most of the field strains were almost identical to that of / , which was higher than that of vaccine strain h (table ) . for cross-neutralization assay, most analyzed ibvs displayed bidirectional neutralization activity, but several ibvs displayed unidirectional neutralization activity. for example, immune sera against ck/ch/ cqkx/ , ck/ch/scmy/ , and ck/ch/scyb/ neutralized h well, but immune sera against h did not react well with these three field strains (table ) . although most strains belonging to the same genotype also belonged to the same serotype (e.g. h and m , sczy and ck/ch/ scdy/ ) some ibvs in the same genotype exhibited low antigenic relatedness values, such as ck/ch/scyb/ and ck/ch/cqkx/ from the variant- group. in contrast, ibvs from different genotype groups sometimes also exhibited high antigenic relatedness values (r ≥ %), such as h (mass) and ck/ch/scls/ (j ); / ( / fig. . phylogenetic analysis of the s gene from wild strains (filled triangles) and reference strains of infectious bronchitis virus (ibv), starting at the aug translation initiation codon and ending at the cleavage recognition motifs. the phylogenetic tree was constructed using mega version . . with the neighbor-joining method and bootstrap replicates. ) and ck/ch/cqkx/ (variant- ); and / ( / ) and ck/ch/ scms/ (tw) ( table ) . in recent years, outbreaks of ib have been reported frequently in southwestern china zou et al., ) . although the mortality rate of a single infection has been low, it could increase as a consequence of secondary infections or co-infection with e. coli, aiv, and ndv (dwars et al., ; hassan et al., ) . in our study, rates of co-infection of ibv and bacteria accounted for . % of total cases. control of co-infections or secondary infections with other pathogens are very important for the prevention and control of ib (sid et al., ; smith et al., ) . the molecular characteristics of ibvs vary around the world. over the past ten years, epidemics of different genotypes have been observed in different countries or areas, such as the variant- (is/ ) and mass types in asia (chen et al., ; patel et al., ; promkuntod et al., ; seger et al., ; xu et al., b; zou et al., ) ; the mass and b types in the middle east (ganapathy et al., ; hosseini et al., ; najafi et al., ) ; the mass, qx, and italy- types in africa (fellahi et al., a; fellahi et al., b; knoetze et al., ) ; the / , qx, and italy- types in europe (kiss et al., ; krapez et al., ) ; and the mass type in south america (balestrin et al., ) and north america (mondal et al., ) . the major genotype circulating around the world, however, is the qx type. in our study, ibvs were isolated from h -and / -vaccinated chickens. phylogenetic analysis of the s gene showed that those ibvs could be primarily grouped into four genotypes, with qx-type strains ( / ) dominating. many scholars have recently prepared qx vaccine strains for use as a candidate vaccine in china (huo et al., ; zhao et al., ) , but there is still no official approval for field application of a qx-like vaccine in china. considering the possibility of recombination between vaccine strains or between vaccine strain and field strain to generate variants, field application of new attenuated live vaccine such as qx-like vaccine should be carefully assessed before implementation. in contrast to qx-type epidemics, epidemics of the tw type (twi and twii) and variant- type have been increasing in recent years (mo et al., ; xu et al., a) , a fact that was also confirmed by this study. most scholars classify variant- strains into the ck/ch/lsc/ itype or saibk type, but our rdp and simplot analysis showed that variant- strains are the result of a recombination event between a qx-type and a ck/ch/lsc/ i-type (saibk-type) strain. both the twi and twii type were first identified in taiwan, and the isolation of twi-type strains in china has increased in recent years (fu-yan, ; zhang et al., ) , while twii-type strains were first identified in china in . in our study, we not only identified twii-type strains among our samples, but we also discovered recombinant ibvs originating from qx-type and twii-type strains. how twii and associated recombinant strains appeared in flocks from southwestern china remains unknown. measures should be taken to prevent the evolution of new variants and recombinant variants, such as strengthening supervision of the poultry trade, limiting the number and type of live vaccine strains, and increasing the biosecurity level of chicken farms. the s glycoprotein carries most of the neutralizing epitopes in the ibv genome, and the s subunit gene is highly variable. gene mutations may be introduced into the viral genome by viral rna-dependent rna polymerases, which display incomplete proofreading capabilities (denison et al., (denison et al., , and gene recombination can occur via a genomic template switching mechanism (lai, ) . the homology sequences between the donor and acceptor genome are usually required for the copy-choice of homologous recombination, and the rna secondary structure such as hairpin structure could also influence the recombination process (lai, ; nagy et al., ) . mutation and recombination of the s gene may lead to the emergence of new variants, genotypes, or serotypes . in this study, we found that mutations in the s subunit gene were mainly located in the three hyper-variable regions (hvr) (data not shown), which is consistent with our previous study (zou et al., ) . for the recombination occurred in the s gene, previous studies have demonstrated that most crossover occur at a relatively conserved sequence near the hvr (wang et al., ) . however, this was not the case for three of the four putative recombination events detected in this study. the breakpoint site in the variant- group was located at a conserved sequence in the hvr, while that in the variant- group was located in a variable region outside of the hvr. recombination hotspots are generally believed to be located adjacent to putative breakpoint sites, ct(t/ g)aacaa, cttttg, and cttttg(c/t) are usually considered to be potential hotspot sequences. in this study, the hotspot sequences above were not observed near the breakpoint sites, but a potentially new, a-t rich hotspot sequences, atttt(t/a), was observed near the breakpoint sites of the s subunit gene. sequence analysis of recombinant ibvs in other reports (mo et al., ; thor et al., ) showed that atttt(t/a) was also observed near the breakpoint sites in some of the strains (date not shown). previous report had shown that aurich sequence could be the important recombination-promoting signals of brome mosaic virus (bmv) (shapka and nagy, ) . to determine the antigenic relatedness between field strains and vaccine strains in southwestern china, viral cross-neutralization tests were performed. as most ibv isolates do not produce significant cpes in cek cells, ibvs should be allowed to adapt to cek cells before neutralization assays. in this study, h , / , and m were used as vaccine controls, as they represent the most commonly used vaccine strains. results of the cross-neutralization assay showed that all of the analyzed strains could be grouped into four serotypes: mass, b, sczy , and scyb. the cross-neutralization ability of / was higher than that of h , which may explain why the immunogenicity of / is higher than that of h in field applications. however, the antigenic relatedness between some field strains and / was also low, which may explain immune failure in flocks vaccinated with / . sczy and scyb were two serotypes that differed from that of the h and / vaccine, and the cross-neutralization abilities of strains in the sczy serotype were higher than those of strains in the scyb serotype. representative strains in the sczy serotype may therefore be more suitable for vaccine development than those in the scyb serotype. in general, there is a correlation between s gene homology and the level of cross-protection between strains, with strains in the same serotype usually sharing n % amino acid identity and strains from different serotypes sharing b % amino acid identity (cavanagh, ) . however, in this study, some strains with high amino acid identity ( . %), such as ck/ch/scyb/ and ck/ch/cqkx/ , exhibited low antigenic relatedness, corroborating a previous study that showed that strains with low amino acid identity could also display high antigenic relatedness (sjaak de wit et al., ) . it is unclear which amino acids play key roles in determining the serotype of ibvs, and genetic analysis could not be used to evaluate antigenic differences between ibvs. furthermore, we found that some ibv strains, such as ck/ch/scdy/ , could be grouped into more than one serotype, similar to the results of other reports (cowen and hitchner, ; winterfield and fadly, ) . this phenomenon may be explained by the existence of common vn epitopes present in different ibv genes. for example, in ck/ch/scdy/ and sczy , two common vn epitopes ( ppqgmaw and iqtrtep ) (zou et al., ) were observed in the s genes of both strains. although the five vn epitopes in the hvrs were quite different between strains ck/ch/scdy/ and h , other common vn epitopes may be located in the s and n proteins of these two strains, as there are reports that the s and n proteins may induce neutralizing reactions (ignjatovic and sapats, ; koch et al., ) . in conclusion, we have demonstrated that the genetic and antigenic characteristics of ibvs isolated from southwestern china have undergone some changes in recent years. our results provide a reference for the prevention and control of ib in southwestern china. strains with r ≥ % were classified as the same serotype, and strains with r b % were classified as different serotypes. isolation, propagation, identification and comparative pathogenicity of five egyptian field strains of eimeria tenella from broiler chickens in five different provinces in egypt persistent antigenic variation of influenza a viruses after incomplete neutralization in ovo with heterologous immune serum an in vitro system for the leader-primed transcription of coronavirus mrnas infectious bronchitis virus in different avian physiological systems-a field study in brazilian poultry flocks coronaviruses in poultry and other birds phylogenetic analysis of hemagglutinin genes of h n subtype avian influenza viruses isolated from poultry in china from molecular and antigenic characteristics of massachusetts genotype infectious bronchitis coronavirus in china assessing the economic burden of avian infectious bronchitis on poultry farms in brazil serotyping of avian infectious bronchitis viruses by the virus-neutralization test coronaviruses: an rna proofreading machine regulates replication fidelity and diversity progression of lesions in the respiratory tract of broilers after single infection with escherichia coli compared to superinfection with e. coli after infection with infectious bronchitis virus prevalence and molecular characterization of avian infectious bronchitis virus in poultry flocks in morocco from to and first detection of italy in africa phylogenetic analysis of avian infectious bronchitis virus s glycoprotein regions reveals emergence of a new genotype in moroccan broiler chicken flocks isolation and identification of a taiwan genotype i avian infectious bronchitis virus and the pathogenicity identification genotypes of infectious bronchitis viruses circulating in the middle east between prevalence of avian respiratory viruses in broiler flocks in egypt epidemiology of avian infectious bronchitis virus genotypes in iran attenuation mechanism of virulent infectious bronchitis virus strain with qx genotype by continuous passage in chicken embryos identification of previously unknown antigenic epitopes on the s and n proteins of avian infectious bronchitis virus survey indicates circulation of / and qx-type infectious bronchitis viruses in hungary in -short communication two genotypes of infectious bronchitis virus are responsible for serological variation in kwazulu-natal poultry flocks prior to antigenic domains on the peplomer protein of avian infectious bronchitis virus: correlation with biological functions circulation of infectious bronchitis virus strains from italy and qx genotypes in slovenia between rna recombination in animal and plant viruses cloning and identification of ipaj gene in salmonella pullorum serotype and genotype diversity of infectious bronchitis viruses isolated during - in guangxi genetic diversity of avian infectious bronchitis coronavirus in recent years in china molecular characterization of major structural protein genes of avian coronavirus infectious bronchitis virus isolates in southern china sequence analysis of infectious bronchitis virus isolates from the s in the united states dissecting rna recombination in vitro: role of rna sequences and the viral replicase molecular characterization of infectious bronchitis viruses isolated from broiler chicken farms in iran isolation and molecular characterization of nephropathic infectious bronchitis virus isolates of gujarat state analysis of the s gene of the avian infectious bronchitis virus (ibv) reveals changes in the ibv genetic groups circulating in southern thailand genotypes and serotypes of avian infectious bronchitis viruses isolated during a simple method of estimating fifty percent endpoints phylogenetic characterization of newcastle disease virus isolated in the mainland of china during genotyping of infectious bronchitis viruses from broiler farms in iraq during the au-rich rna recombination hot spot sequence of brome mosaic virus is functional in tombusviruses: implications for the mechanism of rna recombination co-infection with multiple respiratory pathogens contributes to increased mortality rates in algerian poultry flocks infectious bronchitis virus variants: a review of the history, current situation and control measures the experimental infection of chickens with mixtures of infectious bronchitis virus and escherichia coli recombination in avian gamma-coronavirus infectious bronchitis virus comparative analysis of oncogenic genes revealed unique evolutionary features of field marek's disease virus prevalent in recent years in china incorporation of spike and membrane glycoproteins into coronavirus virions relationship between serotypes and genotypes based on the hypervariable region of the s gene of infectious bronchitis virus evolutionary implications of genetic variations in the s gene of infectious bronchitis virus some characteristics of isolates of infectious bronchitis virus from commercial vaccines characterization and analysis of an infectious bronchitis virus strain isolated from southern china in emergence of novel nephropathogenic infectious bronchitis viruses currently circulating in chinese chicken flocks serotype shift of a /b genotype infectious bronchitis coronavirus by natural recombination molecular detection and smoothing spline clustering of the ibv strains detected in china during safety and efficacy of an attenuated chinese qx-like infectious bronchitis virus strain as a candidate vaccine genetic analysis revealed lx genotype strains of avian infectious bronchitis virus became predominant in recent years in sichuan area two novel neutralizing antigenic epitopes of the s subunit protein of a qx-like avian infectious bronchitis virus strain sczy as revealed using a phage display peptide library this work was financially supported by the program for chang-jiang scholars and innovative research team in university "pcsirt" (grant no. irt ). key: cord- - lfpy d authors: niu, ting-jiang; yi, shuai-shu; wang, xin; wang, lei-hua; guo, bing-yan; zhao, li-yan; zhang, shuang; dong, hao; wang, kai; hu, xue-gui title: detection and genetic characterization of kobuvirus in cats: the first molecular evidence from northeast china date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: lfpy d feline kobuvirus (fekov), a novel picornavirus of the genus kobuvirus, was initially identified in the feces of cats with diarrhea in south korea in . to date, there is only one report of the circulation of kobuvirus in cats in southern china. to investigate the presence and genetic variability of fekov in northeast china, fecal samples were collected from cats with obvious diarrhea and asymptomatic cats in shenyang, jinzhou, changchun, jilin and harbin regions, northeast china, and viruses were detected by rt-pcr with universal primers targeting all kobuviruses. kobuvirus was identified in fecal samples with an overall prevalence of . % ( / ) of which samples were co-infected with feline parvovirus (fpv) and/or feline bocavirus (fbov). diarrhoeic cats had a higher kobuvirus prevalence ( . %, / ) than asymptomatic cats ( . %, / ). by genetic analysis based on partial d gene, all kobuvirus-positive samples were more closely related to previous fekov strains with high identities of . %– . % and . %– % at the nucleotide and amino acid levels. additionally, phylogenetic analysis based on the complete vp gene indicated that all fekov strains identified in this study were placed into a cluster, which separated from other reference strains previously reported, and three identical amino acid substitutions were present at the c-terminal of the vp protein for these fekov strains. furthermore, two complete fekov polyprotein genomes were successfully obtained from two positive samples and designated jz and cc , respectively. the two strains shared . %– . % nucleotide identities and . %– . % amino acid identities to fekov prototype strains. phylogenetic analysis indicated that fekovs were clustered according to their geographical regions, albeit with limited sequences support. this study provides the first molecular evidence that fekov circulates in cats in northeast china, and these fekovs exhibit genetic diversity and unique evolutionary trend. kobuvirus (kov), which belongs to a recently classified genus (kobuvirus) of the family picornaviridae, is a small, non-enveloped, spherical virus approximately - nm in diameter (zell ) . kobuvirus has a single-stranded, positive-sense rna genome of . - . kb consisting of ′ untranslated region (utr), one single open reading frame (orf), ′utr and poly a tail (han et al. ) . this orf encodes a large polyprotein which is cleaved to yield a non-structural protein l, three structural proteins (vp , vp and vp ) and seven other nonstructural proteins ( a- c and a- d) (reuter et al. ) . of these, vp is the most important viral capsid protein determining the antigenicity and pathogenicity for kobuvirus. d gene encodes the rnadependent rna polymerase (rdrp), which plays a critical role in viral replication (lescar and canard ) . based on the function of encoded proteins, the kobuvirus genome is generally divided into three functional regions: p (encoding structural protein vp , vp and vp ), p and p (encoding non-structural protein a- c and a- d, respectively) (han et al. ; reuter et al. ) . since human aichi virus (aiv) was first recognized in as the cause of oyster-associated nonbacterial gastroenteritis in humans in aichi prefecture, japan (yamashita et al. ) , the novel kobuviruses have been identified in many mammalian animals. in , bovine kobuvirus (bkv) was first identified from a contaminant of hela cells in japan (yamashita et al. ). subsequently, porcine kobuvirus (pkov) was found in the stools of domestic pigs in hungary in (reuter et al. ) . canine kobuvirus (cakov) that was genetically related to human aiv was first identified in a domestic dog with acute gastroenteritis in usa in (kapoor et al. ) , and was subsequently found in healthy domestic dogs (oem et al. a ) and wild carnivores, including wolves (melegari et al. ) , red foxes (di martino et al. ) , golden jackals, side-striped jackal and spotted hyena (olarte-castillo et al. ) . to date, kobuviruses have been reported in human (yamashita et al. ) , cattle (yamashita et al. ) , sheep (reuter et al. ) , pig (reuter et al. ) , rodents (phan et al. ) , goat (oem et al. b) , wild boars (reuter et al. ) , roe deer (di martino et al. a) , rabbits (pankovics et al. ) , bats (wu et al. ) , ferrets (smits et al. ) , domestic and wild carnivores (olarte-castillo et al. ) and cats (chung et al. ) . according to the recent report of international committee on taxonomy of viruses (ictv) in (https://talk.ictvonline.org/taxonomy/), the genus kobuvirus was classified into six officially recognized species, namely aichivirus a (formerly aichi virus), aichivirus b (formerly bovine kobuvirus), aichivirus c (porcine kobuvirus), aichivirus d (kagovirus ), aichivirus e (rabbit kobuvirus) and aichivirus f (bat kobuvirus), respectively (adams et al. ; adams et al. ) . aichivirus a includes six types: human aiv, cakov, murine kobuvirus, roller kobuvirus, kathmandu sewage kobuvirus and feline kobuvirus (fekov). feline kobuvirus (fekov), a member of the species aichivirus a, was first identified in feces of cats with diarrhea in south korea in (chung et al. ) . the genetic analysis based on the partial rdrp gene indicated that fekov strains shared higher nucleotide ( . %- . %) and amino acid identities ( . %- . %) with cakov strains previously reported (kapoor et al. ) . in a study by cho, et al., it was demonstrated that kobuvirus widely circulated in domestic cats and was associated with viral diarrhea (cho et al. ) . recently, fekov infection in cats was reported in italy (di martino et al. b ). in , lu et al. first reported the circulation of fekov in diarrhoeic cats in southern china. fekov rna was found in domestic cats with diarrhea, but was undetected in healthy cats (lu et al. ) . however, only four complete genomes of fekov strains, including fk- (cho et al. ) , d (choi et al. ) , te/ /it/ (di martino et al. b ) and whj- (lu et al. ) , have been sequenced until now. furthermore, there is no data of kobuvirus infections in cats in northeast china. in this study, we provide the first molecular evidence for the circulation of fekov in northeast china, and investigate the prevalent levels, as well as genetic characteristics. in total, fresh fecal samples were collected from cats with diarrhea and asymptomatic cats from five different regions in northeast china, including shenyang, jinzhou, changchun, jilin and harbin, during january to november . individual fresh feces were immediately placed in rnase-free tubes and were stored at − °c until further use. fecal samples were suspended in phosphate-buffered saline (pbs, ph = . ) at a concentration of approximately . g/ml, and then the suspension was centrifuged at ×g for min at °c to collect the supernatant. total rna was extracted from μl of supernatant using axyprep body viral rna miniprep kit (corning, china), and reverse transcribed to synthesize cdna using the revertaid first strand cdna synthesis kit (invitrogen, usa) according to the manufacturer's instructions. viral dna of fecal samples was extracted using viral dna extraction kit i (omega, china) according to the manufacturer's instruction. the detection of kobuvirus was performed by rt-pcr using a pair of universal primer previously described (the primer sequences are shown in table ), univ-kobu-f/univ-kobu-r targeting -bp of partial d gene for all kobuviruses (reuter et al. ). these samples were also examined for other feline enteric viruses, including feline parvovirus (fpv) and feline bocavirus (fbov), using pcr assays previously described (takano et al. ; yang et al. ). the amplified products were separated after electrophoresis on . % agarose gels at v for t.-j. niu et al. infection, genetics and evolution ( ) - min, and were visualized using a gel documentation system (wealtec, usa). to further investigate the genetic diversity of fekov detected in the present study, two pairs of primers for the amplification of partial d gene and the complete vp gene were designed. the primer sequences are shown in table . the pcr conditions were as follows: initial denaturation at °c for min, followed by cycles of °c for min, °c for s and °c for min, and a final extension at °c for min. all pcr products were purified using axyprep dna gel extraction kit (corning, china), and then were cloned into pmd- t vector (takara, china). plasmid dna was extracted using axyprep plasmid miniprep kit (corning, china), and positive dh- α clones (three clones per sample) were sent to sangon biotech (shanghai, china) for sanger sequencing. two fekov-positive samples were randomly selected to amplify the complete polyprotein gene sequences using six primer sets designed in the present study (primer sequences are shown in table ). the reaction conditions were described as follows: pre-denaturation at °c for min followed by cycles of denaturation at °c for min, annealing at °c for min, extension at °c for s, and a final extension at °c for min. purification of pcr products, clone of purified fragment, plasmid extraction and sequencing were performed with the same methods as previously described. the nucleotide sequences were assembled using seqman program, and the complete fekov polyprotein gene sequences were deposited in genbank under accession numbers mh ( nt) and mh ( nt). pairwise alignments among different kobuviruses based on the partial d gene and the complete vp gene were performed using online blast (https://blast.ncbi.nlm.nih.gov/blast.cgi). the nucleotide and amino acid identities among all sequences were calculated using bioedit. for the obtained genome sequences, the orf was predicted using orf finder (https://www.ncbi.nlm.nih.gov/orffinder/), and the potential cleavage sites of polyprotein were predicted using netpicorna . and were further verified via the nucleotide and amino acid alignments with reference kobuviruses. similarity plot analysis of the complete polyprotein gene was performed using simplot . software. phylogenetic tree was constructed using the neighbour-joining method with bootstrap replicates in mega . software. we tested a total of fecal samples from shenyang (n = ), jinzhou (n = ), changchun (n = ), jilin (n = ) and harbin (n = ) in northeast china. background data was available for all cats of which were diarrhoeic and were healthy; were collected from private veterinary clinics and were collected from animal shelter centers. the screening results showed that of the samples ( . %) were positive for kobuvirus. out of these kobuvirus-positive samples, were positive for fpv, were positive for fbov and were co-infected with fpv and fbov. detailed information about kobuviruspositive samples was shown in table . the diarrhoeic cats had a higher kobuvirus prevalence ( . %, / ) than asymptomatic cats ( . %, / ) and the positive rate of cats from animal shelter centers ( . %, / ) were also higher than that of cats from private veterinary clinics ( . %, / ). moreover, there was no significant difference in the prevalence of samples from different regions (from . % to . %). partial d genes of kobuvirus-positive samples were sequenced in this study, and the nucleotide sequences were deposited in genbank under accession numbers mh -mh . the sequences shared . %- % nucleotide identities and . %- % amino acid identities with each other. these sequences had the highest nucleotide ( . %- . %) and amino acid identities ( . %- %) with fekov reference sequences deposited in genbank, suggesting that all twentyeight fecal samples were fekov-positive. furthermore, the sequences were . %- . %, . %- . % and . %- . % similar to cakovs, human aivs and murine kobuvirus at the nucleotide level, respectively. phylogenetic analysis based on the partial d gene showed that the sequences were more closely related to fekovs, clustering in the aichivirus a, which included also human aivs, cakovs and murine kobuvirus. the fekov sequences were divided into two major groups: sequences clustered with the chinese fekov strain, whj- (lu et al. ) , and formed a major group, while the other sequences and fekov reference strains identified in south korea and italy formed another group. interestingly, only one sequence identified in this study appeared more closely related to the korean fekov strain, d (choi et al. ) , while other sequences were separated from these reference sequences (fig. ) . eight samples were randomly selected from fekov-positive samples, and their complete vp gene sequences were obtained in the present study (genbank accession numbers mh -mh ). the eight sequences shared nucleotide and deduced amino acid identities of . %- % and . %- % with each other and the highest nucleotide identities of . %- . % with the chinese fekov strain, whj- (lu et al. ) , when compared with fekov reference strains. the neighbour-joining tree based on the vp nucleotide sequences showed that our sequences were more closely related to fekov strain whj- and clustered within a major group, while other fekov reference strains identified in south korea and italy formed another group (fig. a) . interestingly, the eight sequences and fekov strain whj- placed in different branches in the phylogenetic tree based on the deduced amino acid sequences (fig. b) . then, we analyzed the amino acid mutation sites in the vp gene between the eight sequences and other fekov sequences previously described, and discovered that three identical amino acid substitutions at amino acid positions (t → a), (p → s) and (s → t) were exhibited in all sequences identified in this study (table ) . two complete fekov polyprotein genomes were successfully sequenced from two kobuvirus-positive samples in the present study using six pairs of primers, and designated jz and cc , respectively. the obtained genome sequence of jz was nt in length and contained one orf ( nt) encoding the polyprotein of aa, while the cc was nt long. the orf of cc was nt in length with one-amino-acid deletion in vp gene. moreover, we predicted and verified the genomic organization and potential cleavage sites for the complete polyprotein gene of jz and cc , which were identical to other fekov strains previously described (fig. a) . the jz shared . % nucleotide identity and . % amino acid identity with cc . compared with other fekov reference strains, jz and cc had . %- . % nucleotide identities and . %- . % amino acid identities with the korean strains, fk- and d , and the italian strain, te/ /it/ , and shared a higher sequences homologies with the chinese fekov strain, whj- , at the nucleotide ( . %- . %) and deduced amino acid ( . %- . %) levels. we next compared the similarities of each functional region among jz , cc and other kobuviruses. for jz , the highest nucleotide and amino acid identities were found in the whj- p region, with values of . % and . %, while the highest nucleotide and amino acid divergences were found in the d l region, with values of . % and . %. cc shared amino acid identities of . %- %, . %- . %, . %- . % and . %- . % with fekov reference strains at the l, p , p and p regions, respectively. furthermore, the nucleotide and amino acid homologies between cc and whj- were higher than that between jz and whj- at each region of the polyprotein gene (table ) . in order to further analyze the genetic characteristics of the complete polyprotein gene, the similarity plot analysis which compared polyprotein nucleotide sequences of jz , cc and one cakov sequence (used as a out-group sequence) to fekov reference strain d /kj (used as a query sequence) was performed in this study. the analytical results showed that jz and cc shared similar similarities with reference strain d in the vp , vp , vp , a and from a to c regions, but different similarities in the l, b, c and d regions. in the l, a and b regions, cc was more similar to reference strain than jz , while cc presented considerably lower similarities in other regions of polyprotein gene than jz . moreover, higher range of genetic variability was observed in p and p regions of polyprotein gene (fig. b) . phylogenetic analysis based on the complete polyprotein nucleotide sequences indicated that jz and cc were more closely related to fekov reference strains than other kobuvirus strains and these fekov strains formed a group distinct from cakovs, human aivs and murine kobuvirus, within the aichivirus a. in the group of fekov, jz and cc clustered with the chinese fekov strain, whj- , and formed a tight branch, while other fekov strains also formed different branches according to their geographical regions (fig. ) . these results suggested that genetic diversity of fekov was presented in different geographical regions, albeit with limited sequences support. in the past few years, the circulations of kobuvirus in cats had been reported in south korea, italy and southern china (chung et al. ; di martino et al. b; lu et al. ). however, the related data of fekov in other countries and regions is lacking. this study presents the first identification and genetic characterization of kobuvirus in cats in northeast china. we investigated fecal samples of which ( . %) were positive for kobuvirus. the prevalence rate is similar to previous reports in south korea ( . %, / ), italy ( . %, / ) and southern china ( . %, / ) (chung et al. ; di martino et al. b; lu et al. ) , suggesting that kobuvirus widely circulates in domestic cats in northeast china. the prevalence of kobuvirus in diarrhoeic cats ( . %, / ) is significantly higher than that in healthy cats ( . %, / ), similar to a previous study by cho, et al. (cho et al. ) . moreover, we also tested other enteric viruses (fpv and fbov) for kobuvirus-positive samples of which samples were positive for fpv and/or fbov. in previous investigations, the co-infection of fekov, fpv and feline enteric coronavirus (fecv) in diarrheal cats with a higher prevalence had been reported (di martino et al. b) , and it was also determined that human aivs and other animal kobuvirus were associated with gastroenteritis (yang et al. ; zhai et al. ). these reveal that fekov, as a potential enteric virus, may be associated with viral diarrhea in cats. phylogenetic analysis based on partial d gene indicates that the fekov sequences identified in this study clustered into two large groups, majority had a higher nucleotide identity with the chinese fekov strain, whj- (lu et al. ) , and formed a novel group, while only sequences are divided into another group which formed with other fekov strains identified in south korea and italy (fig. ) . moreover, two unique amino acid replacements (amino acid positions and in the complete d gene) were present in most sequences (excluding jz , jz , sy , sy , sy and (caption on next page) t.-j. niu et al. infection, genetics and evolution ( ) - sy ) identified in the present study and whj- . these results suggest that a unique evolutionary trend is present in fekov strains circulating in china, when compared with fekov strains in other countries. the vp protein of picornaviruses is not only an important capsid protein determining the antigenicity and pathogenicity for kobuvirus, but also is the most variable structural protein in the kobuvirus (reuter et al. ) . phylogenetic analyses based on the complete vp sequences indicated that fekov strains identified in this study clustered together and formed a separate cluster compared to other fekov strains, and the identical amino acid mutations were present in the cterminal of vp protein. additionally, the different amino acid substitutions of the vp protein were observed in fekov strains identified in different regions (table ). these results suggest that the vp protein may be used as a considerable indication for the geographical distribution of fekovs. furthermore, a recent study indicated that a polyproline helix structure, as integrin binding motifs, is present at the c-terminal of vp protein on the outer surface of human aiv, and predicted this polyproline motif may be associated with signal transduction, antigen recognition, and viral infectivity and pathogenicity ). subsequently, the identical proline-rich motif was also reported in cakovs . a polyproline fragment is present in amino acid positions - of the vp protein for all fekov strains, similar to human aivs and cakovs. interestingly, one amino acid substitution is observed in this polyproline motif for fekov strains identified in this study (one substitution from proline to serine at position ) and that identified in italy (one substitution from proline to alanine at position ). the impact on viral infectivity for this amino acid mutation needs to be further investigated via structure prediction and viral isolation. fig. . phylogenetic analysis based on the nucleotide sequences ( nt) of partial d genes for kobuviruses. the phylogenetic tree was conducted using the neighbour-joining method with , bootstrap replicates using mega . software. black triangles indicate sequences identified in the present study, and black diamond indicates the chinese fekov strain, whj- . the silhouettes of hosts for different kobuviruses were observed on the branches. aiv, aichi virus; bkv, bovine kobuvirus; cakov, canine kobuvirus; fekov, feline kobuvirus; mokov, murine kobuvirus; pkov, porcine kobuvirus. br, brazil; chn, china; de, germany; egy, egypt; hun, hungary; it, italy; jp, japan; nl, nederland; kor, south korea; uk, the united kingdom; usa, the united states. the amino acid positions are referred to the complete vp gene. bold text indicates the identical amino acid substitutions for fekov sequences that identified in the present study. the complete polyprotein gene sequences of jz ( nt) and cc ( nt) were successfully obtained in this study. genomic analysis showed that the polyprotein of the two strains were all cleaved into viral proteins, l, vp , vp , vp , a, b, c, a, b, c and d. the predicted cleavage sites were q/g, q/h, q/a and q/s, in accordance with the korean fekov strains, d and fk- , and the chinese strain, whj- (cho et al. ; choi et al. ; lu et al. ). one of the important finding is that one-amino-acid deletion in vp gene of cc . in a previous study, thirty-amino-acid deletion was presented in b gene of pkov from healthy piglets, and these deletions were possibly related to the pathogenicity of pkov (jin et al. ) . however, in this study, jz and cc were all identified from diarrhoeic cats, the pathogenicity of fekov seemed unaffected by this amino-acid deletion. consequently, more complete polyprotein gene sequencing of chinese fekov strains is needed to determine whether this amino-acid deletion is widely existent in fekov table nucleotide and deduced amino acid sequence identities of the complete polyprotein gene and l, p -p regions between two chinese fekov strains, jz and cc , and other kobuviruses. t.-j. niu et al. infection, genetics and evolution ( ) - strains circulating in china, and further targated research is also needed to demonstrate the effect of this deletion in fekov. phylogenetic analysis based on the complete polyprotein sequences showed that jz and cc shared higher nucleotide and amino acid identities with the chinese strain, whj- , compared to other fekov strains, and all fekov strains were clustered according to their geographical regions, albeit with limited sequences support (fig. ) . moreover, jz and cc shared high nucleotide and amino acid identities to each other in the vp , a and c regions, but low identities to fekov reference strains. recent research indicates that the a protein plays a vital role in hijacking host acyl-coa-binding domaincontaining protein- (acbd ), which provides a site for the replication of kobuviruses (klima et al. ). c protein of most picornaviruses is also important for viral replication (fujita et al. ). therefore, these mutations in vp , a and b regions of jz and cc may affect viral replication and antigenicity, more targeted researches are needed to further determine. taken toghter, the considerable genetic diversity is existent in chinese fekov strains. furthermore, these fekov strains shared high amino acid identity with cakovs and human aivs in the complete polyprotein gene and different functional regions, especially in p region. considering the close genetic relationship of these kobuviruses, and frequent contact among their own host, the cross-species transmission of kobuviruses is worth investigating. in previous studies, several findings had provided considerable evidences for the potential risks of cross-species transmission for fekov, including frequent genetic variation in vp gene of fekov with . × − substitutions/site/year substitution rates (cho et al. ) , high nucleotide and amino acid identities between fekovs and cakovs (chung et al. ; di martino et al. b; lu et al. ) , and the detection of igg antibodies specific for aiv in cats (carmona-vicente et al. ) . moreover, it is also important to mention about conducting serological studies to investigate kobuvirus pathogenicity and whether the genetic diversity in fekov affects pathogenicity. thus, periodic genetic and serological investigations of fekovs will be helpful for the assessment of cross-species transmission and pathogenicity for fekovs. in conclusion, we provide the first molecular evidence for the circulation of feline kobuvirus in cats in northeast china. our findings indicate that the circulation of fekov in domestic cats with diarrhea was more prevalent, suggesting fekov infections may be related to viral diarrhea in cats. phylogenetic analyses based on partial d gene and complete vp gene indicate that the considerable genetic diversity is exhibited in chinese fekov strains, and novel fekov strains with unique evolutionary trends are circulating in china. moreover, the complete polyprotein genes of fekov strains jz and cc are successfully sequenced in this study. these findings will help us to understand the epidemics and genetics of kobuvirus in cats in china. further epidemiological and molecular investigations are also required to demonstrate the distribution, genetic diversity and potential risk of cross-species transmission of feline kobuvirus. ratification vote on taxonomic proposals to the international committee on taxonomy of viruses changes to taxonomy and the international code of virus classification and nomenclature ratified by the international committee on phylogeny and prevalence of kobuviruses in dogs and cats in the uk molecular characterization of the full kobuvirus genome in a cat genetic characteristics of the complete feline kobuvirus genome detection and genetic characterization of feline kobuviruses molecular evidence of kobuviruses in free-ranging red foxes (vulpes vulpes) molecular detection of kobuviruses in european roe deer (capreolus capreolus) in italy detection of feline kobuviruses in diarrhoeic cats membrane topography of the hydrophobic anchor sequence of poliovirus a and ab proteins and the functional effect of a/ ab membrane association upon rna replication sequence analysis reveals mosaic genome of aichi virus genetic characterization of porcine kobuvirus variants identified from healthy piglets in china characterization of a canine homolog of human aichivirus kobuviral non-structural a proteins act as molecular harnesses to hijack the host acbd protein rna-dependent rna polymerases from flaviviruses and picornaviridae prevalence and genomic characteristics of canine kobuvirus in southwest china first report and genetic characterization of feline kobuvirus in diarrhoeic cats in china first molecular identification of kobuviruses in wolves (canis lupus) in italy canine kobuvirus infections in korean dogs novel kobuvirus species identified from black goat with diarrhea molecular characterization of canine kobuvirus in wild carnivores and the domestic dog in africa novel picornavirus in domestic rabbits the fecal viral flora of wild rodents candidate new species of kobuvirus in porcine hosts complete nucleotide and amino acid sequences and genetic organization of porcine kobuvirus, a member of a new species in the genus kobuvirus, family picornaviridae kobuvirus in domestic sheep kobuviruses -a comprehensive review porcine kobuvirus in wild boars (sus scrofa) metagenomic analysis of the ferret fecal viral flora genetic characterization of feline bocavirus detected in cats in japan deciphering the bat virome catalog to better understand the ecological diversity of bat viruses and the bat origin of emerging infectious diseases prevalence of newly isolated, cytopathic small round virus (aichi strain) in japan isolation and characterization of a new species of kobuvirus associated with cattle aichi virus strains in children with gastroenteritis isolation and characterization of feline panleukopenia virus from a diarrheic monkey picornaviridae-the ever-growing virus family a novel porcine kobuvirus emerged in piglets with severe diarrhoea in china structure of human aichi virus and implications for receptor binding this study was supported by the research project of the national key research and development plan of china (grant no. yfd ). this study was also funded by national key r&d program of china (grant no. yfd ). jtn, ssy and hlw conceived and designed the experiments; jtn, ssy, hlw and ybg performed the experiments; jtn, ssy, ylz and hd analyzed the data; ybg, kw, sz and gxh contributed reagents/materials/analysis tools; jtn, ssy, hd and gxh drafted the manuscript; jtn, ssy, hd, xw and ylz revised the manuscript; hd and gxh supervised and approved the message for publication. all authors declare that they have no competing interests. key: cord- -fd xi q authors: rojas, miguel a.; gonçalves, jorge luiz s.; dias, helver g.; manchego, alberto; santos, norma title: identification of two novel rotavirus a genotypes, g and p[ ], from peruvian alpaca faeces date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: fd xi q rotavirus a (rva) alp b was detected from a neonatal peruvian alpaca presenting with diarrhea, and the alp b vp , vp , vp , nsp , and nsp genes were sequenced. the partial genotype constellation of this strain, rva/alpaca-wt/per/alp b/ , was determined to be g -p[ ]-i -e -h . rotaviruses (rv) are non-enveloped double-stranded rna viruses in the reoviridae family and in the rotavirus genus, and are classified into eight species (a-h) and two candidate species (i and j) (matthijnssens et al., ; mihalov-kovács et al., ; bányai et al., ) . species a rotavirus (rva) is a major cause of dehydrating diarrhea in humans and animals worldwide (estes and greenberg, ) . the rva genome consists of segments of double-stranded rna (dsrna) encoding six structural proteins (vp - , vp , and vp ) and five or six nonstructural proteins (nsp -nsp / ) depending on the strain (estes and greenberg, ) . the rva genomic classification nomenclature is based on all segments of dsrna (matthijnssens et al., (matthijnssens et al., , . currently, there are vp (g), vp (p), vp (i), vp (r), vp (c), vp (m), nsp (a), nsp (n), nsp (t), nsp (e), and nsp / (h) genotypes (rotavirus classification workgroup (rcwg), ). in this study, we described two new rva vp and vp genotypes in strain rva/alpaca-wt/per/alp b/ from a peruvian alpaca. during the first week of february , a diarrhea outbreak occurred that resulted in high rates of morbidity and mortality among neonatal alpacas in a community in silli, peru (rojas et al., a) . this community is located in the southern highlands of peru ( ° ′ . ″s, ° ′ . ″ w;~ m above sea level) in the province of canchis in the state of cusco. the animals were subjected to postmortem examinations at the laboratory of histology, embryology and veterinary pathology, universidad nacional mayor de san marcos, peru. intestinal lavage samples were obtained during necropsy by thoroughly washing the intestines with warm water, and then these samples were analyzed for e. coli, clostridium ssp. eimeria spp., cryptosporidium spp., coronavirus and rva. the sample alp b was positive only for rva (rojas et al., a) . the importation of these alpaca specimens was approved by the brazilian institute of environment (ibama; brasília, df, brazil; license br /df / / ). for rva detection the lavage samples were diluted in % v/v using sterile phosphate-buffered saline and clarified by low speed centrifugation at g for min. total rna was extracted from μl of the supernatant using the totally rna® kit, according to the manufacturer's instructions (applied biosystems/ambion, austin, usa). rva detection was performed by rt-pcr with primers that amplified a small conserved portion of the vp gene (rojas et al., b) . once rva was detected, the sample was subjected to additional rt-pcr amplifications to identify the vp , vp , vp , nsp , and nsp genotypes using specific primers (appendix) that were previously published or designed based on rva sequences available in genbank. overlapping sequences were assembled and edited using seqman, editseq, and megalign in the lasergene software package (dnastar, madison, wi). phylogenetic analysis was performed with mega software version . . (kumar et al., ) . dendrograms were constructed using the maximum likelihood method based on the kimura two-parameter model. was estimated by bootstrap analysis with pseudoreplicates. the sequences of our strain were compared to the sequences of the rva strains obtained from genbank. genotypes were assigned to each genome segment by the web-based automated rotavirus genotyping tool rotac (http://rotac.regatools.be) (maes et al., ) . sequences of alp b strain were aligned and compared to that of each corresponding gene of rva strains obtained from genbank, by using megalign, which are available in the lasergene software package (dnastar, madison, wi). multiple alignments were done by using the complete open reading frame (orf) for each gene segment. nucleotide and amino acid identities were determined with the megalign p-distance algorithm. sequences generated in this study were deposited into genbank under accession numbers km , km , ky , ky and ky . the genotypes of the vp , nsp , and nsp genes in alp b were identified as i , e and h , respectively. the vp gene was closely related to the human strain ecu ( . % nucleotide identity). the nsp gene was closely related to the vicuña strain c ( . % nucleotide identity), and the nsp gene was % identical to strain sa , which is also from a peruvian alpaca, and closely related to the rojas et al. infection, genetics and evolution ( ) - bat strain ( . % nucleotide identity) (fig. ) . the nucleotide sequences of the vp and vp genes from alp b were not related to any rva strain available in genbank (fig. ) . the highest nucleotide identity found for the vp gene was . % with the human strain ecu , and the highest nucleotide identity for the vp sequence was . % also with strain ecu . both of these identity values were below the % cutoff value proposed by the rcwg (matthijnssens et al., ) . the vp and vp nucleotide sequences were submitted to the rcwg for further analysis and were assigned novel p[ ] and g genotypes, respectively. however, the vp sequence was a borderline new genotype, and it would be possible that there might be some cross reactivity at the serological level with g . few south american camelid (sac) rva strains have been characterized, but those that have showed great genotype diversity. the vicuña strain rva/vicuña-wt/arg/c / /g p[ ] (badaracco et al., ) and the guanaco strains rva/guanaco-wt/arg/chubut/ /g p[ ] and rva/guanaco-wt/arg/ríonegro/ /g p[ ] possess a bovine-like genome constellation, g -p[ / ]-i . the guanaco strains also possess e and h nsp and nsp genotypes, respectively. the vicuña strain possesses the unique nsp -e genotype, but the nsp gene was not characterized. in contrast, the vp and vp genes of the alpaca strains rva/ alpaca-tc/per/k′ayra/ - / /g p[ ] and rva/alpaca-tc/ per/k′ayra/ - / /g p[x] showed % and % identity to porcine and human rva strains suggesting that they resulted from interspecies transmission (garmendia et al., ) , but their nsp and nsp genes have not been characterized. only one alpaca strain has been fully characterized, rva/alpaca-tc/per/sa / , and it bears the unique constellation g -p[ ]-i -r -c -m -a -n -t -e -h . this unique genetic makeup suggests that strain sa emerged from multiple reassortment events between bat, equine, and human-like rva strains (rojas et al., b) . strain alp b and strain sa were detected from the same location in peru, and alp b also has a unique genetic constellation, g -p[ ]- rojas et al. infection, genetics and evolution ( ) - e -h with high identity to camelid, bat, and human-like rva strains. the nsp -e genotype was also found in vicuña strain c from a camelid in argentina, thus it could be a common genotype in camelids. the nsp -h genotype of alp b was related to alpaca strain sa and bat strain , but vp -i is a rare genotype that has only been found in the human strain ecu , which was detected in ecuador in (solberg et al., ). the vp and vp genes alp b represent two new genotypes with the highest identities to the ecu strain. because of limited sample, the entire genome constellation of alp b could not be characterized and the origin of strain alp b remains unclear. moreover, because the sequences were obtained directly from the clinical sample we could not exclude the possibility of mixed infection with two or more rva strains in the sample. discovery and molecular characterization of a group a rotavirus strain detected in an argentinean vicuña (vicugna vicugna) candidate new rotavirus species in schreiber's bats, serbia rotaviruses molecular characterization of rotavirus isolated from alpaca (vicugna pacos) crias with diarrhea in the andean region of cusco mega : molecular evolutionary genetics analysis version . for bigger datasets rotac: a web-based tool for the complete genome classification of group a rotaviruses recommendations for the classification of group a rotaviruses using all genomic rna segments are human p[ ] rotavirus strains the result of interspecies transmissions from sheep or other ungulates that belong to the mammalian order artiodactyla? uniformity of rotavirus strain nomenclature proposed by the rotavirus classification working group (rcwg) vp -sequence-based cutoff values as a criterion for rotavirus species demarcation candidate new rotavirus species in sheltered dogs wholegenome characterization of a peruvian alpaca rotavirus isolate expressing a novel vp genotype outbreak of diarrhea among preweaning alpacas (vicugna pacos) in the southern peruvian highland newly assigned genotypes-update characterization of novel vp , vp , and vp genotypes of a previously untypeable group a rotavirus this study was supported by the conselho nacional de desenvolvimento científico e tecnológico (cnpq, grant number / - ), and the fundação carlos chagas de amparo à pesquisa do estado do rio de janeiro (faperj, grant number e- / . / ), brazil. the funders were not involved in the study design, data collection, data interpretation, or the decision to submit the work for publication. we thank soluza dos santos gonçalves for technical assistance.soluza dos santos gonçalves is a recipient of a fellowship from faperj e- / . / . the authors declare that there are no conflicts of interest. supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . key: cord- -tumtzad authors: franco-muñoz, carlos; Álvarez-díaz, diego a.; laiton-donato, katherine; wiesner, magdalena; escandón, patricia; usme-ciro, josé a.; franco-sierra, nicolás d.; flórez-sánchez, astrid c.; gómez-rangel, sergio; rodríguez-calderon, luz d.; barbosa-ramirez, juliana; ospitia-baez, erika; walteros, diana m.; ospina-martinez, martha l.; mercado-reyes, marcela title: substitutions in spike and nucleocapsid proteins of sars-cov- circulating in south america date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: tumtzad sars-cov- is a new member of the genus betacoronavirus, responsible for the covid- pandemic. the virus crossed the species barrier and established in the human population taking advantage of the spike protein high affinity for the ace receptor to infect the lower respiratory tract. the nucleocapsid (n) and spike (s) are highly immunogenic structural proteins and most commercial covid- diagnostic assays target these proteins. in an unpredictable epidemic, it is essential to know about their genetic variability. the objective of this study was to describe the substitution frequency of the s and n proteins of sars-cov- in south america. a total of amino acid and nucleotide sequences of the s and n proteins of sars-cov- from seven south american countries (argentina, brazil, chile, ecuador, peru, uruguay, and colombia), reported as of june , and corresponding to samples collected between march and april , were compared through substitution matrices using the muscle algorithm in mega x. forty-three sequences from colombian departments were obtained in this study using the oxford nanopore and illumina miseq technologies, following the amplicon-based artic network protocol. the substitutions d g in s and r k/g r in n were the most frequent in south america, observed in % and % of the sequences respectively. strikingly, genomes with the conserved position d were almost completely replaced by genomes with the g substitution between march to april . a similar replacement pattern was observed with r k/g r although more marked in chile, argentina and brazil, suggesting similar introduction history and/or control strategies of sars-cov- in these countries. it is necessary to continue with the genomic surveillance of s and n proteins during the sars-cov- pandemic as this information can be useful for developing vaccines, therapeutics and diagnostic tests. forty-three sequences from colombian departments were obtained in this study using the oxford nanopore and illumina miseq technologies, following the amplicon-based artic network protocol. the substitutions d g in s and r k/g r in n were the most frequent in south america, observed in % and % of the sequences respectively. strikingly, genomes with the conserved position d were almost completely replaced by genomes with the g substitution between march to april, . a similar replacement pattern was observed with r k/g r although more marked in chile, argentina and brazil, suggesting similar introduction history and/or control strategies of sars-cov- in these countries. it is necessary to continue with the genomic surveillance of s and n proteins during the sars-cov- pandemic as this information can be useful for developing vaccines, therapeutics and diagnostic tests. the recently emerged sars-cov- responsible for the coronavirus disease pandemic, has increased significantly in the number of cases and deaths, so that daily, about , new cases are reported globally (who, a (who, , c . the first case of covid- in south america was reported in brazil on february , in a years old man traveling from italy (gob.br, ) . in colombia, the first case of covid- was announced on march , in a traveler from italy, after which the number of patients has exceeded , and over deaths (ins, ) . the sars-cov- genome consist of a single, positive-stranded rna (ssrna[+]), with , nucleotides long. the virus has shown to be highly infectious and easily transmitted among human populations , even infecting other vertebrate species under laboratory conditions (shi et al., ) . the sars-cov- genome has nine open reading frames (orfs); the first one, subdivided in orf a and orf b by ribosomal frameshifting, encodes the polyproteins pp a and pp ab which are processed into non-structural proteins involved in subgenomic/genome length rna synthesis and virus replication. structural proteins, spike (s), envelope (e), membrane (m), and nucleocapsid (n) are encoded in subgenomic mrna transcripts within orfs , , , and , respectively (sib, ; yount et al., ) spike protein, a type i membrane glycoprotein, is the most exposed viral protein recognized by the cellular receptor angiotensin- -converting enzyme (ace ) during the infection of the lower respiratory tract and considered the main inducer of neutralizing antibodies. the n protein is associated with the rna genome to form the ribonucleocapsid and is abundantly expressed during infection. both n and s proteins are highly immunogenic and most commercial covid- diagnostic tests (molecular and immunologic) target these proteins (Álvarez-díaz et al., ; lee et al., ) . j o u r n a l p r e -p r o o f journal pre-proof furthermore, non-synonymous mutations in the s and n proteins have been reported, their implications in the potential emergence of antigenically distinct and/or more virulent strains remain to be studied, although it was reported that mutations in the receptor-binding domain (rbd) at the s protein of sars-cov related viruses disrupt the antigenic structure and binding activity of rbd to ace (du et al., ) similarly, how non-synonymous mutations could impact the antibody response and the specificity and sensitivity of serological tests for covid- diagnosis is unknown. thus, identifying variable sites at these proteins can provide a valuable resource for choosing the target antigens for the development of sars-cov- vaccines, therapeutics, and diagnostic tests (du et al., ; jacofsky et al., ) . the objective of this study is to describe the frequency of substitutions in s and n proteins of sars-cov- in south america. this work was developed according to the national law / , decrees / and / , which establishes that the instituto nacional de salud (ins) from colombia is the reference lab and health authority of the national network of laboratories and in cases of public health emergency or those in which scientific research for public health purposes as required. the ins is authorized to use the biological material for research purposes, without informed consent, which includes the anonymous disclosure of results. this study was performed following the ethical standards of the declaration of helsinki and its later amendments. the information used for this study comes from secondary sources of data that were previously anonymized and do protect patient data. j o u r n a l p r e -p r o o f journal pre-proof nasopharyngeal swab samples from patients with suspected sars-cov- infection were processed for rna extraction using the automated magna pure lc nucleic acid extraction system (roche diagnostics gmbh, mannheim, germany) and viral rna detection was performed by real-time rt-pcr using the superscript iii platinum one-step quantitative rt-qpcr kit (thermo fisher scientific, waltham, ma, usa), following the charité-berlin protocol (corman et al., ) for the amplification of the sars-cov- e (betacoronavirus screening assay) and rdrp (sars-cov- confirmatory assay) genes. ngs of sars-cov- from patients was performed using the amplicon-based illumina and nanopore sequencing approaches, artic network protocol (quick, a) . following cdna synthesis with superscript iv reverse transcriptase (thermo fisher scientific, waltham, ma, usa) and random hexamers (thermo fisher scientific, waltham, ma, usa), a set of -bp tiling amplicons across the whole genome of sars-cov- were generated using the primer schemes ncov- /v (quick, a) . sars-cov- specific oligonucleotides were used for the generation of amplicons by means of a high-fidelity dna polymerase (q ® high-fidelity dna polymerase -(new england biolabs inc., uk, eb), in order to avoid the introduction of artificial substitutions. reads were mapped to the wuhan-hu- reference genome (nc_ . ) using bwa and bbmap (brian-jgi, ); then, assembled sequences were submitted to gisaid. substitution matrices of nucleotides and amino acids of s and n proteins were generated from a multiple sequence alignment with the reference genome against the assembled colombian sars-cov- genomes (table ) using the muscle algorithm (edgar, ) in mega x (kumar et al., ) . subsequently, sars-cov- sequences from south american countries, including argentina, brazil, ecuador, peru, uruguay and other sequences from colombia available on the gisaid, ncbi, and gsa databases were analyzed (supplementary table s , and supplementary table s ). several non-synonymous substitutions were observed in the s and n proteins of the colombian sars-cov- sequences generated in this study. three amino acid substitutions were observed in the s protein, d g was present in % ( / ) of the sequences. furthermore, substitutions g v and d y were found in low frequencies of . % ( / ) and . % ( / ) respectively (table ). in the n protein, five amino acid substitutions were found; the most frequent being r k and g r in . % ( / ) of the sequences. amino acid substitutions, r c, r i and g c were found in . % ( / ), . % ( / ) and . % ( / ) of j o u r n a l p r e -p r o o f journal pre-proof the colombian sequences, respectively (table ) . some nucleotide substitutions were synonymous. genomic resource databases, ncbi, gisaid and gsa were consulted to determine the substitutions in s and n proteins of sars-cov- from south america. a total of genomes reported as of june th , were analyzed, from colombia (including the genomes reported in this study), from argentina, from brazil, from chile, from ecuador, from peru and from uruguay. fifty sequences of s and of n were excluded from the analysis because the presence of undetermined bases that did not allow the proper identification of the s and n orfs in the amino acid substitution matrices. twenty-eight and twenty-two non-synonymous substitutions were identified in the sequence of s and n proteins respectively, in genomes of south america (table s and s ). the most frequent in s were d g ( %) v f ( . %) and p l ( . %), while the most frequent in n were r k ( . %), g r ( . %), i t ( . %) and s l ( . %). the remaining substitutions in both, s and n occurred in less than % of the sequences. these included g v and d y in s, and r c and g c in n, as observed in the colombian genomes ( fig. ). the analysis of substitution frequencies by country shows that d g substitution in the s protein was frequent in argentina, brazil, chile, colombia and peru, with j o u r n a l p r e -p r o o f journal pre-proof - % of the reported sequences ( fig. a) . in ecuador and uruguay d position was predominant by march, however by april the g substitution reached % in uruguay. in general, the percentage of genomes in south america with this substitution augmented nearly to % from march to april (fig. b ). non-synonymous substitutions r k and g r, which are the hallmarks of the b. . lineage, were the most frequent in the n protein of south american sequences. both substitutions were frequent in argentina and brazil with % and % of the reported sequences respectively (fig. a) . in ecuador and chile the frequency of these substitutions was about %, while in uruguay the frequency was similar to colombia. furthermore, the proportion of genomes with this double substitution augmented in chile, argentina and brazil from march to april. in contrast, this proportion increased slightly in colombia and uruguay, and remained below % (fig. b ). the substitution i t in the n protein was rare in argentina ( . %), chile ( . %) and uruguay ( . %); and absent in colombia, peru and ecuador. in contrast, this substitution was very frequent in brazil ( . %) (fig. c) . the spatiotemporal distribution pattern of this substitution was similar to that of r k and g r, increasing from march to april in chile, argentina and brazil in contrast to colombia and uruguay where this substitution was almost absent in genomes registered on april (fig. d ). j o u r n a l p r e -p r o o f the first covid- case in colombia was confirmed on march , , from a traveler who entered the country from italy on february , (epi_isl_ ). by june , , a total , confirmed cases and , deaths have been reported (ins, ) . this study evidenced the presence of the this lineage has been reported in samples from travelers with connection to italy (gupta and mandal, ) , also observed in the first confirmed case of sars-cov- in colombia (epi_isl_ ) and another patient with travel connection to spain (epi_isl_ ) ( table ) . furthermore, multiple countries outside italy have reported this lineage among their samples including, belgium, switzerland, vietnam, india, nigeria and mexico, demonstrating a wide distribution worldwide (gupta and mandal, ) . rna viruses are known to possess high substitution rates compared to dna viruses, leading to high genetic variability and the rapid action of evolutionary mechanisms of natural selection and genetic drift tang et al., ) . however, sars-cov- and others coronaviruses have proteins with exonuclease activity, as nsp , with error correcting capacity (romano et al., ; subissi et al., ) . despite some evolutionary changes may be in fact adaptive, it is important to be careful with conclusions in the absence of an experimental model table s ). recombinant proteins or synthetic peptides of sars-cov- are widely explored as alternatives to be used in serological tests and therapeutics against sars-cov- and related betacoronavirus (du et al., ; jacofsky et al., ) , considering that s and n proteins are the major immunogenic proteins of sars and mers coronavirus and the first choice for producing recombinant antigens (yan et al., ) . amino acid changes were found in the s and n proteins of sars-cov- circulating in south america, the most frequent being d g in s, r k-g r and i t in n. it is necessary to continue with genomic surveillance of changes in these proteins during the sars-cov- pandemic, even more considering that these proteins are the most commonly used in serological and molecular tests. the identification of nucleotide substitutions, amino acid changes and their frequencies in circulating viruses, can be useful for public health decision-making, including vaccine design efforts, design of sars-cov- diagnostic tests, and therapeutic compounds. the authors thank the national laboratory network for routine virologic j o u r n a l p r e -p r o o f molecular analysis of several in-house rrt-pcr protocols for sars-cov- detection in the context of genetic variability of the virus in colombia sars-cov- viral spike g mutation exhibits higher case fatality rate global spread of sars-cov- subtype with spike protein mutation d g is shaped by human genomic variations that regulate expression of tmprss and mx genes. biorxiv. brian-jgi distinct viral clades of sars-cov- : implications for modeling of viral spread detection of novel coronavirus ( -ncov) by real-time rt-pcr. euro surveillance : bulletin europeen sur les maladies transmissibles = european communicable disease bulletin the spike protein of sars-cov-a target for vaccine and therapeutic development muscle: multiple sequence alignment with high accuracy and high throughput loss of epitopes from sars-cov- proteins for nonsynonymous mutations: a potential global threat temporal dynamics in viral shedding and transmissibility of covid- coronavirus (covid - ) en colombia. instituto nacional de salud understanding antibody testing for covid- a novel synonymous mutation of sars-cov- : is this possible to affect their antigenicity and immunogenicity? vaccines mega : molecular evolutionary genetics analysis version . for bigger datasets serological approaches for covid- : epidemiologic perspective on surveillance and control bayesian phylodynamic inference on the temporal evolution and global transmission of sars-cov- the global emergences of multiple sars-cov- sub-strains: digital annotations for human behaviors may assist automated retracing of symptomatic features and origins ncov- sequencing protocol. protocols.io a structural view of sars-cov- rna replication machinery: rna synthesis, proofreading and final capping susceptibility of ferrets, cats, dogs, and other domesticated animals to sarscoronavirus sib, . betacoronavirus. swiss institute of bioinformatics one severe acute respiratory syndrome coronavirus protein complex integrates processive rna polymerase and exonuclease activities on the origin and continuing evolution of sars-cov- phylogenetic interpretation during outbreaks requires caution who director-general's opening remarks at the media briefing on covid- - novel coronavirus ( -ncov) technical guidance: laboratory testing for -ncov in humans. world health organization laboratory testing of sars-cov, mers-cov, and sars-cov- ( -ncov): current status, challenges, and countermeasures severe acute respiratory syndrome coronavirus groupspecific open reading frames encode nonessential functions for replication in cell cultures and mice the authors declare no competing interest. this study was funded by the national institute of health, bogota, colombia key: cord- -w ytp q authors: lokman, syed mohammad; rasheduzzaman, m.d.; salauddin, asma; barua, rocktim; tanzina, afsana yeasmin; rumi, meheadi hasan; hossain, m.d. imran; siddiki, a.m.a.m. zonaed; mannan, adnan; hasan, m.d. mahbub title: exploring the genomic and proteomic variations of sars-cov- spike glycoprotein: a computational biology approach date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: w ytp q the newly identified sars-cov- has now been reported from around countries with more than a million confirmed human cases including more than , deaths. the genomes of sars-cov- strains isolated from different parts of the world are now available and the unique features of constituent genes and proteins need to be explored to understand the biology of the virus. spike glycoprotein is one of the major targets to be explored because of its role during the entry of coronaviruses into host cells. we analyzed whole-genome sequences and spike protein sequences of sars-cov- using multiple sequence alignment. in this study, unique variations have been identified among the genomes of sars-cov- including nonsynonymous mutations and one deletion in the spike (s) protein. among the variations detected, variations were located at the n-terminal domain and variations at the receptor-binding domain (rbd) which might alter the interaction of s protein with the host receptor angiotensin converting enzyme- (ace ). besides, amino acid insertions were identified in the spike protein of sars-cov- in comparison with that of sars-cov. phylogenetic analyses of spike protein revealed that bat coronavirus have a close evolutionary relationship with circulating sars-cov- . the genetic variation analysis data presented in this study can help a better understanding of sars-cov- pathogenesis. based on results reported herein, potential inhibitors against s protein can be designed by considering these variations and their impact on protein structure. wuhan, hubei province of china in december . the death toll rose to more than , among , , confirmed cases around the globe (until april , ) [ ] . the virus causing covid- is named as severe acute respiratory syndrome coronavirus (sars-cov- ). based on the phylogenetic studies, the sars-cov- is categorized as a member of the genus betacoronavirus, the same lineage that includes sars coronavirus (sars-cov) [ ] that caused sars (severe acute respiratory syndrome) in china during [ ] . recent studies showed that sars-cov- has a close relationship with bat sars-like covs [ , ] [ ] ]. interestingly, s glycoprotein is characterized as the critical determinant for viral entry into host cells which consists of two functional subunits namely s and s . the s subunit recognizes and binds to the host receptor through the receptor-binding domain (rbd) whereas s is responsible for fusion with the host cell membrane [ [ ] , [ ] , [ ] ]. mers-cov uses dipeptidyl peptidase- (dpp ) as entry receptor [ ] whereas sars-cov and sars-cov- utilize ace- (angiotensin converting enzyme- ) [ ] , abundantly available in lung alveolar epithelial cells and enterocytes, suggesting s glycoprotein as a potential drug target to halt the entry of sars-with remarkable properties like glutamine-rich aa long exclusive molecular signature (dsqqtvgqqdgsednqtttiqtivevqpqlemeltpvvqtie) in position - of polyprotein ab (pp ab) [ ] , diversified receptor-binding domain (rbd), unique furin cleavage site (prrar↓sv) at s /s boundary in s glycoprotein which could play roles in viral pathogenesis, diagnosis and treatment [ ] . to date, few genomic variations of sars-cov- are reported [ [ ] , [ ] ]. there is growing evidence that spike protein, a amino acid long glycoprotein having multiple domains, possibly plays a major role in sars-cov- pathogenesis. viral entry to the host cell is initiated by the receptor-binding domain (rbd) of s head. upon receptor-binding, proteolytic cleavage occurs at s /s cleavage site and two heptad repeats (hr) of s stalk form a six-helix bundle structure triggering the release of the fusion peptide. as it comes into close proximity to the transmembrane anchor (tm), the tm domain facilitates membrane destabilization required for fusion between virus-host membranes [ [ ] , [ ] ]. insights into the sequence variations of s glycoprotein among available genomes are key to understanding the biology of sars-cov- infection, developing antiviral treatments and vaccines. in this study, we have analyzed genomic sequences of sars-cov- to identify mutations between the available genomes followed by the amino acid variations in the glycoprotein s to foresee their impact on the viral entry to host cell from structural biology viewpoint. analysis. the ncbi reference sequence of sars-cov- s glycoprotein, accession number yp_ was used as the canonical sequence for the analyses of spike protein variants. variant analyses of sars-cov- genomes were performed in the genome detective coronavirus typing tool version . which is specially designed for this virus the dataset was then aligned with muscle [ ] . entropy (h(x)) plot of nucleotide variations in sars-cov- genome was constructed using bioedit [ ] . mega x (version . . ) was used to construct the msas and the phylogenetic tree using pairwise alignment and neighborjoining methods in clustalw [ , ] . tree structure was validated by running the analysis on bootstraps [ ] replications dataset and the evolutionary distances were calculated using the poisson correction method [ ] . variant sequences of sars-cov- were modeled in swiss-model [ ] using the cryo-em spike protein structure of sars-cov- (pdb id vsb; [ ] ) as a template. the overall quality of models was assessed in rampage server [ ] by generating ramachandran plots (supplementary table ). pymol and biovia discovery studio were used for structure visualization and superpose [ , ] . j o u r n a l p r e -p r o o f . results multiple sequence alignment of the available genomes of sars-cov- were performed and variations were found throughout the , bp long sars-cov- genome with in total variations in utr region, synonymous variations that cause no amino acid alteration, non-synonymous variations causing change in amino acid residue, indels, and variations in non-coding region (supplementary table ). among the variations, variations ( synonymous, non-synonymous mutations and one deletion) were observed in the region of orf s that encodes s glycoprotein which is responsible for viral fusion and entry into the host cell [ ] . notable that, most of the sars-cov- genome sequences were deposited from the usa ( ) and china ( ) (supplementary fig. ). positional variability of the sars-cov- genome was calculated from the msa of sars-cov- whole genomes as a measure of entropy value (h(x)) [ ] . excluding ′ and ′ utr, ten hotspot of hypervariable position were identified, of which seven were located at orf ab ( c>t, c>t, c>t, c>t, c>t, a>g, c>t) and one at orf s ( a>g), orf a ( g>t), and orf ( t>c) respectively. the variability at position and were found to be the highest among the other hotspots ( fig. ). the phylogenetic analysis of a total of sequences ( unique sars-cov- and different coronavirus s glycoprotein sequences) was performed. the evolutionary distances showed that all the sars-cov- spike proteins cluster in the same node of the phylogenetic tree confirming the sequences are similar to refseq yp_ (fig. ) . bat coronaviruses has a close evolutionary relationship as different strains were found in the nearest outgroups and clades (bat coronavirus bm - , bat hp-beta coronavirus, bat coronavirus hku ) conferring that j o u r n a l p r e -p r o o f journal pre-proof coronavirus has vast geographical spread and bat is the most prevalent host (fig. ) . in other clades, the clusters were speculated through different hosts which may describe the evolutionary changes of surface glycoprotein due to cross species transmission. viral hosts reported from different spots at different times is indicative of possible recombination. the s glycoprotein sequences of sars-cov- were retrieved from the ncbi virus variation resource repository and aligned using clustalw. the position of sars-cov- spike protein domains was measured by aligning with the sars-cov spike protein (fig. ) [ , ] . from the sequence identity matrix, unique variants among unique sars-cov- spike glycoprotein sequences were identified to have substitutions and a deletion ( fig. a and supplementary table ). sequences were found identical with sars-cov- s protein reference sequence (yp_ ) while sequences were identical with the same variation of d g (supplementary table respectively due to substitution of amino acid that differs in charge. the remaining variants were mutated with the amino acids that are similar in charge (fig. a) . the sars-cov- spike protein variants were superposed with the cryo-electron microscopic structure of sars-cov- spike protein [ ] . fig. ) . the s subunit of spike protein, especially the heptad repeat region , fusion peptide domain, transmembrane domain, and cytoplasmic tail, were found to be highly conserved in the sars-cov and the sars-cov- variants while the s subunit was more diverse, specifically the n-terminal domain (ntd) and receptor-binding domain (rbd). the spatial distribution of s protein sequences having different variation over time reveals that most of the variants ( out of s glycoprotein sequences) were reported from the us j o u r n a l p r e -p r o o f journal pre-proof followed by out of sequences (including y deletion) and out of sequences from india and china, respectively (fig. ) . only one variant was found out of only one available sequence in the repository from sweden, australia, south korea and peru. interestingly, all sequences are unique among countries from the sequence reported except d g, which was found in the us and peru (fig. ) . moreover, we have also analyzed sequences from brazil, italy, nepal, pakistan, spain, taiwan and vietnam but there is no variation in the s glycoprotein sequence was found when compared to refseq yp_ . covid is one of the most contagious pandemics the world has ever had with , , confirmed cases to date (april , ) and the cases have increased as high as times in less than a month [ ] . phylogenetic analysis showed that the sars-cov- is a unique coronavirus presumably related to bat coronavirus (bm - , hp-betacoronavirus). during this study, we [ ] , [ ] , [ ] ]. likewise, a number of studies targeting sars-cov- spike protein have been undertaken for the therapeutic measures [ ] , but the unique structural and functional details of sars-cov- spike protein are still under scrutiny. we also found a variant (r i) at receptor binding domain (rbd) that mutated from positively charged arginine residue to neutral and smaller sized isoleucine residue (fig. i) . this change might alter the interaction of viral rbd with the host receptor because the r residue of sars-cov- is known to interact with the ace receptor for viral entry [ ] . similarly, alterations of rbd (g s, v a, h q, and a s) also could affect the interaction of sars-cov- spike protein with other molecules j o u r n a l p r e -p r o o f which require further investigations. qia and qis variants were found to have an alteration of alanine to valine (a v), and aspartic acid to tyrosine (d y) respectively in the alpha helix of the hr domain. previous reports have indicated that hr domain plays a significant role in viral fusion and entry by forming helical bundles with hr , and mutations including alanine substitution by valine (a v) in hr region are predominantly responsible for conferring resistance to mouse hepatitis coronaviruses against hr derived peptide entry inhibitors [ ] . this study hypothesizes the mutation (a v) found in that of sars-cov- might also have a role in the emergence of drug-resistance virus strains. also, the mutation (d h) found in the heptad repeat (hr) sars-cov- could play a vital role in viral pathogenesis. moreover, we found that variants including one deletion out of were located within s especially within ntd and rbd region of glycoprotein s (fig. a) which region is responsible for the preliminary interaction with the host cell receptor ace . this indicates that the ntd and rbd are very prone to mutations. however, the ntd and rbd portions harbour potential epitopes that might serve as potential peptide vaccine candidates against sars-cov- as reported in different studies [ ] [ ] [ ] . the reason behind choosing the sequences from s protein domain ntd and rbd is they are situated in the outer surface of the virus that could be more accessible for the immune system (fig. c ). so the variations reported herein within the outer domains of s glycoprotein could help to design effective epitope-based vaccines or antivirals. the sars-cov- s protein contains additional furin protease cleavage site, prrars, in s /s domain which is conserved among all sequences as revealed during this study ( supplementary fig. ). this unique signature is thought to make the sars-cov- more virulent than sars-cov and regarded as novel features of the viral pathogenesis [ ] . according to previous reports the more the host cell protease can process the coronavirus s can accelerate viral tropism accordingly in influenza virus [[ ] , [ ] , [ ] , [ ] ]. apart from that, this could also promote viruses to escape antiviral therapies targeting transmembrane protease j o u r n a l p r e -p r o o f tmprss (clinicaltrials.gov, nct ) which is well reported protease to cleave at s /s of s glycoprotein [ ] . comparative analyses between sars-cov and sars-cov- spike glycoprotein showed % similarity between them where the most diverse region was coronavirus disease (covid- ) situation reports severe acute respiratory syndrome-related coronavirus--the species and its viruses, a statement of the coronavirus study group lim, others, a novel coronavirus associated with severe acute respiratory syndrome bats are natural reservoirs of sars-like coronaviruses, science ( -. ) huang, others, a pneumonia outbreak associated with a new coronavirus of probable bat origin pei, others, a new coronavirus associated with human respiratory disease in china genome composition and divergence of the novel coronavirus ( -ncov) originating in china cryo-em structure of the -ncov spike in the prefusion conformation, science ( -. ) structure, function, and antigenicity of the sars-cov- spike glycoprotein structure analysis of the receptor binding of -ncov fouchier, others, dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc greenough, others, angiotensin-converting enzyme is a functional receptor for the sars coronavirus functional assessment of cell entry and receptor usage for sars-cov- and other lineage b betacoronaviruses a. nitsche, others, sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor an exclusive amino acid signature in pp ab protein provides insights into the evolutive history of the novel human-pathogenic coronavirus (sars-cov ) the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade genomic variance of the -ncov coronavirus genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex interaction between heptad repeat and regions in spike protein of sars-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors muscle: multiple sequence alignment with improved accuracy and speed bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt mega x: molecular evolutionary genetics analysis across computing platforms the neighbor-joining method: a new method for reconstructing phylogenetic trees bootstrap confidence levels for phylogenetic trees evolutionary divergence and convergence in proteins swiss-model: homology modelling of protein structures and complexes structure validation by calpha geometry: phi, psi and cbeta deviation pymol: an open-source molecular graphics tool receptor recognition mechanisms of coronaviruses: a decade of structural studies a parvovirus b synthetic genome: sequence features and functional competence cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding long-term protection from sars coronavirus infection conferred by a single immunization with an attenuated vsv-based vaccine human monoclonal antibodies against highly conserved hr and hr domains of the sars-cov spike protein are more broadly neutralizing a truncated receptor-binding domain of mers-cov spike protein potently inhibits mers-cov infection and induces strong neutralizing antibody responses: implication for developing therapeutics and vaccines fusion mechanism of -ncov and fusion inhibitors targeting hr domain in spike protein role of changes in sars-cov- in the interaction with the human ace receptor: an in silico analysis coronavirus escape from heptad repeat (hr )-derived peptide entry inhibition as a result of mutations in the hr domain of the spike fusion protein development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach in silico identification of novel b cell and t cell epitopes of wuhan coronavirus ( -ncov) for effective multi epitope-based peptide vaccine production epitope-based chimeric peptide vaccine design against s, m and e proteins of sars-cov- etiologic agent of global pandemic covid- : an in silico approach host cell proteases controlling virus pathogenicity role of hemagglutinin cleavage for the pathogenicity of influenza virus host cell proteases: critical determinants of coronavirus tropism and pathogenesis coronaviruses: an overview of their replication and pathogenesis receptor for mouse hepatitis virus is a member of the carcinoembryonic antigen family of glycoproteins laude, others, aminopeptidase n is a major receptor for the enteropathogenic coronavirus tgev positional organization of major structural protein-encoding genes in orange color (s = spike protein, e = envelope protein, m = membrane protein, n = nucleocapsid protein) and accessory protein orfs in blue colors. b. variability within sars-cov- genomic sequences represented by entropy (h(x)) value across genomic location key: cord- -zxerb de authors: liu, xiaoli; shao, yuhao; ma, huijie; sun, chuyang; zhang, xiaonan; li, chengren; han, zongxi; yan, baolong; kong, xiangang; liu, shengwang title: comparative analysis of four massachusetts type infectious bronchitis coronavirus genomes reveals a novel massachusetts type strain and evidence of natural recombination in the genome date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: zxerb de four massachusetts-type (mass-type) strains of infectious bronchitis coronavirus (ibv) were compared genetically with the pathogenic m and h vaccine strains using the complete genomic sequences. the results revealed that strains ck/ch/lnm/ and ck/ch/ldl/ were closely related to the h vaccine, which suggests that they might represent re-isolations of vaccine strains or variants of vaccine strains that have resulted from the accumulated point mutations after several passages in chickens. in contrast, strains ck/ch/lhlj/ vii and ck/ch/lhlj/ had a close genetic relationship with the pathogenic m strain. in addition, molecular markers have been identified that distinguish between field and vaccine (or vaccine-like) mass-type viruses, which may be able to differentiate between field and vaccine strains for diagnostic purposes. phylogenetic analysis, and pairwise comparison of full-length genomes and the nine genes, identified the occurrence of recombination events in the genome of strain ck/vh/lhlj/ vii, which suggests that this virus originated from recombination events between m - and h -like strains at the switch site located at the ′ end of the nucleocapsid (n) genes. to our knowledge, this is the first time that evidence for the evolution and natural recombination under field conditions between mass-type pathogenic and vaccinal ibv strains has been documented. these findings provide insights into the emergence and evolution of the mass-type ib coronaviruses and may help to explain the emergence of mass-type ibv in chicken flocks all over the world. in , schalk and hawn described ''an apparently new respiratory disease of chicks'' in north dakota in the united states, which was considered to be infectious bronchitis (ib) by later researchers of avian respiratory diseases (schalk and hawn, ) . currently, ib still occurs in nearly all poultry-producing countries; it is a highly contagious, acute, and economically important viral disease of chickens. the etiology of ib, which was first demonstrated by beach and schalm ( ) , is infectious bronchitis virus (ibv). ibv is grouped in the genus gammacoronavirus of the family coronaviridae in the order nidovirales (de groot et al., ) . the coronavirus genomes are the largest among the known rna viruses and are polycistronic, generating a nested set of subgenomic rnas with common and sequences (masters, ) . like those of all other coronaviruses, the two-thirds of the ibv genome consists of two large replicase open reading frames (orfs), orf a and orf b. the orf a polyprotein (pp a) can be extended with orf b-encoded sequences via a À ribosomal frameshift at a conserved slippery site (brierley et al., ) , which generates the polyprotein pp ab, comprising more than amino acids, which includes the putative rna-dependent rna polymerase (rdrp) and rna helicase (hel) activity (ziebuhr et al., ) . the pp a and pp ab of ibv are processed autocatalytically by two different viral proteases, encoded by a papain-like protease (plp) and a c-loke protease ( cl pro ) (lee et al., ; ziebuhr et al., ziebuhr et al., , . other putative domains, presumably associated with a -to- exonuclease (exon) activity, a poly(u)-specific endo-rnase (xendou) activity, and a -o-methyltransferase ( -o-mt) activity, have been predicted in pp ab (ivanov et al., ; snijder et al., ) . the end of a coronavirus genome includes the viral structural and accessory protein genes: a spike (s) glycoprotein gene; an envelope (e) protein gene; a membrane (m) glycoprotein gene; a nucleocapsid (n) phosphoprotein gene; and several orfs that encode putative non-structural accessory proteins (masters, ) . of the virus-encoded proteins, the s subunit of the s protein carries virus-neutralizing activity, determines the serotype of ibv and is responsible for viral attachment to cells. it is also a major determinant of cell tropism in culture (casais et al., ) . the accumulation of point mutations, deletions, insertions and recombination events that have been observed in multiple structural genes, especially the s gene, of ibv recovered from naturally occurring infections have been considered to contribute to the genetic diversity and evolution of ibv, and consequently, to a number of ibv serotypes (cavanagh, ) . the occurrence and emergence of multiple serotypes of the virus have complicated control by vaccination because many serotypes and variants do not confer complete cross-protection against each other (cavanagh and gelb, ) . the originally discovered massachusetts (mass) type of ibv was identified in the united states, beginning in the s (fabricant, ; johnson and marquardt, ; mondal et al., ) . mass-type strains have been isolated in europe and asia since the s and up to the present day (cavanagh and gelb, ) , together with dozens of other serotypes that have been isolated in africa, asia, india, australia, europe, and south america (cavanagh, (cavanagh, , (cavanagh, , . the first mass-type ''h'' vaccines were developed in about . they include h and h (bijlenga et al., ) , and are used very commonly and widely around the world. however, virus of this type is occasionally isolated from massachusetts-vaccinated and -unvaccinated flocks with respiratory clinical signs. some of the viruses have shown close genetic relationships with pathogenic mass-type, rather than vaccine, strains by s gene analysis. however, conclusions based on the genetic analysis of a single gene sequence, and sometimes even a partial gene sequence, require caution because the true phylogeny can only be demonstrated by analyzing complete genomic sequences. herein, we sequenced the complete genome of four ibv mass-type strains that showed s gene diversity (liu et al., ; ma et al., ; sun et al., ) , and we present evidence for in-field recombination between pathogenic and vaccinal strains. furthermore, we characterized the molecular variability of the four mass-type strains to gain insight into the emergence and evolution of these viruses. four mass-type ibv strains were used for complete genomic sequence comparison and analysis in this study. strain ck/ch/lhlj/ vii was isolated in from the kidney of a layer hen vaccinated with h and / in heilongjiang province, china (liu et al., ) . strain ck/ch/lnm/ was isolated in from the swollen proventricular tissues of a broiler vaccinated with h in neimenggu province, china . strains ck/ ch/ldl/ and ck/ch/lhlj/ , both of which were isolated in , were obtained from laying hens in dalian and heilongjiang provinces, respectively, in china; the birds were suffering from nephropathogenic lesions and respiratory signs, respectively. in addition, the diseased birds in both flocks were suffering from proventriculitis (ma et al., ) . all of the ibv strains have been associated with various ib outbreaks in recent years in china and were assigned to the mass-type strains by s sequence analysis. to avoid the possible mutation in the viral genome after serial passages in specific-pathogen-free (spf) embryonated chicken eggs, the first passage of each original virus stock was used and purified once by propagating in -to -day-old spf chicken eggs with a dose of l -fold dilutions per egg, and the presence of viral particles in the allantoic fluids of inoculated eggs was confirmed with a negative contrast electronic microscope (jem- , ex) and reverse transcriptase-polymerase chain reaction (rt-pcr) as described previously . in addition, since these viruses were isolated from chickens vaccinated with h , it is possible that mixed ibv infections are present in one chicken flock. to exclude this, nine clones of s gene of each virus obtained from three independent pcr reactions were sequenced and compared. sequences of each virus identical to the previously results were obtained (liu et al., ; ma et al., ; sun et al., ) . fertile white leghorn spf chicken eggs were obtained from the laboratory animal center, harbin veterinary research institute, the chinese academy of agricultural sciences, china. to determine the full-length genomic sequences of the four viruses, pairs of overlapping primers encompassing the entire genome were used. the primers were designed in regions that are conserved among most of the ibv strains available in the gen-bank database. the sequences and locations of the primers used in rt and pcr in this study are presented in table . viral rna was extracted from ll of infectious allantoic fluid using trizol reagents (invitrogen, grand island, usa), following the manufacturer's protocol. complementary dna (cdna) was synthesized using ll of the first strand mixture (invitrogen) containing lm of primers n (À), . mm each of dntp (takara, dalian, china) and ll of total rna. the mixture was incubated at °c for min and then quick-chilled on ice for min. the rt master mix was composed of ll  rt buffer (invitrogen), ll mm dtt, u of m-mlv reverse transcriptase (invitrogen), and u rnase inhibitors (invitrogen). this rt master mix was incubated at °c for h. the reaction was terminated by heating at °c for min then chilling on ice for min. the pcr was performed in a ll reaction containing ll first strand cdna; nmol each of downstream and upstream primers; ll of  pcr buffer (mg + plus, takara); ll of . mmol dntps; u taq polymerase (takara); and ll of water. the reaction was conducted at °c for min, and cycles of °c for min; °c for min; °c for min, and a final extension step of °c for min. a product, detectable by ethidium bromide staining, of the expected size was generated. . . the -and -ends of the genome a cdna clone representing the and ends of the genome of the four ibv strains was synthesized according to the race and race system for rapid amplification of cdna ends (takara). pcr was performed according to the instructions accompanying the kits. the sense and antisense primers used to amplifying the and -ends of the genome had been designed on the basis of the sequences obtained above that were constant in the four ibv strains, respectively. the outer and inner primers used to amplify the -end of the four ibv strains were -cagctatggcaatgcg cag- and -catctttggtgtctca/tcc- , respectively. the primer used to amplify the -end was -gaggagaggaacaatgc aca- . the dna generated by pcr amplification was cloned using a ttailed vector, pmd -t (takara), and transformed using jm competent cells (takara) according to the manufacturer's instructions. at least five clones of each fragment in each strain were sequenced and the consensus sequence was determined. the sequences were analyzed using the sequencher . sequence analysis program, and a single contiguous sequence comprising the entire ibv genome of each of the four ibv strains was constructed. the nucleotide and amino acid sequences of the entire genome of the four ibv strains were assembled, aligned, and compared with those of other reference ibv and turkey coronavirus (tcov) strains using the megalign program in dnastar (version , lasergene corp, madison, wi). the orfs were determined using the gene runner program version . (http://www.generunner.com) by comparison with those of other reference ibv and tcov strains. a total of ibv and tcov reference strains, for which entire genomic sequences were available in genbank database, were selected for phylogenetic analysis of full-length genomes. the selected avian coronavirus reference strains and their accession numbers are provided in table . phylogenetic analysis, accurate estimation and comparison of the -utr, gene , s , s , gene , m, gene , n and -utr of the four ibv strains was conducted with those of the mass-type strains selected in this study using the clustal v method of dnastar software and mega (liu et al., ) , and the alignments were edited manually and adjusted to remove mistakes. deletion, insertion and gene recombination were determined according to the results of the phylogenetic analysis and pairwise comparisons. the full genomic sequences of the four mass-type ibv strains described in this report have been deposited in the genbank database with accession numbers ck/ch/lnm/ jf , ck/ ch/lhlj/ vii jf , ck/ch/ldl/ jf and ck/ ch/lhlj/ jf . four mass-type ibv strains were subjected to genome sequencing and phylogenetic analysis in this study. the sequences of each the four strains were assembled into one contiguous sequence to represent the entire viral genomes. sequences of , , and nucleotides were obtained from strains ck/ch/ lnm/ , ck/ch/lhlj/ vii, ck/ch/ldl/ and ck/ch/ lhlj/ , respectively, excluding the polyadenylation tail at the end. the genomes of the viruses were similar overall in their coding capacity and genomic organization to those of other ibvs. the genome of each of the viruses contained two large slightly overlapping orfs in the two-thirds of the genome and multiple additional orfs in the one-third of the genome. both termini were flanked with untranslated regions (utrs). ten orfs were identified within the genome. gene contained motifs common to all coronaviruses, including ribosomal frameshifting and slippery sequences, because orf b is translated in the À frame. the typical coronavirus structural genes encoding the s, e, m and n proteins were identified following gene (fig. ) . the genome organization was determined to be as follows: -utr-gene (orf a, b)-s-gene (orfs a, b, e)-m-gene (orfs a, b)-n-utr- . the analysis of the complete genome showed that strains ck/ch/ lnm/ and ck/ch/ldl/ possessed . % and . % nucleotide identity with h , respectively. however, they shared . % and . % identity with m , respectively. phylogenetic analysis using the full-length genome and the -utr, gene , s , s , gene , m, gene , n and -utr showed that the ibv strains ck/ ch/lnm/ and ck/ch/ldl/ consistently formed the same clade with vaccine-related strains of mass-type (figs and ). the analysis of the s gene showed that strains ck/ch/lnm/ and ck/ch/ldl/ had high nucleotide identities ( . % and . %, respectively) with h , while they had . % and . % identity with m . multiple alignments revealed that there were and nucleotide mutations within the s gene between strains ck/ch/lnm/ and ck/ch/ldl/ and h ; however, there were and mutations between strains ck/ch/lnm/ and ck/ch/ldl/ and m . all these results suggest that strains ck/ch/lnm/ and ck/ch/ldl/ are closely related to the h vaccine strain. the percent nucleotide similarity between strain ck/ch/lhlj/ and h for the full-length genomes was . %; however, the percent similarity was up to . % between strain ck/ch/lhlj/ and m . in addition, in all of the nine trees constructed for the -utr, gene , s , s , gene , m, orf , n and -utr, the strain ck/ch/lhlj/ constantly fell into the same clusters as the pathogenic m strain, and both belonged to the mass-type. pairwise comparison of the s protein gene revealed that strain ck/ ch/lhlj/ had and nucleotide mutations with respect to m and h , respectively. taken together, these results demonstrate that strain ck/ch/lhlj/ exhibits a close genetic relationship to pathogenic m . pathogenic and non-pathogenic mass-type strains were clustered into different clades by phylogenetic analysis of full-length genomic sequences and the -utr, gene , s , s , gene , m, gene , n and -utr. in addition, insertions and deletions were also observed that distinguished between the genomes of pathogenic and non-pathogenic mass-type strains, as illustrated in fig. and supplementary material . in non-pathogenic strains, five deletions: of nucleotides, nucleotides, nucleotides, nucleotides, and nucleotides, respectively, were observed to be located in the nsp of gene . they were found to occur between genomic positions - , - , - , - , and - , respectively , by comparing the sequences with the homologous regions of pathogenic strains. in contrast, a -nucleotide and a nucleotide insertion were found in nsp and between the m gene and gene , respectively. additionally, a cluster of insertions was found at the -utr region in the non-pathogenic strains. these changes might not only account, at least partly, for viral fitness when the pathogenic virus has become adapted to egg embryos hewson et al., ) , but may act also as molecular markers, able to differentiate between vaccine and field strains, for diagnostic purposes. comparative sequence analysis based on full-length genomic sequences and the sequences of the -utr, gene , s , s , gene , m, gene and n showed that strain ck/ch/lhlj/ vii clustered with pathogenic mass-type strains. the exceptions were the trees constructed using the -utr and s gene, in which ck/ch/lhlj/ vii was grouped with non-pathogenic strains; this suggests that a possible recombination event may have occurred. thus the n and -utr of ck/ch/lhlj/ vii were carefully compared pairwise with those of strains ck/ch/lnm/ , ck/ch/ldl/ , ck/ch/ lnm/ , h and m . parallel to the result of the phylogenetic analysis, ck/ch/lhlj/ vii showed high similarity with m at the -end of the n gene; however, it showed high similarity with vaccine strain h at the -end of the n gene (supplementary material ). the data strongly suggest that ck/ch/lhlj/ vii arose from a homologous rna recombinant event that involved a template switch between massachusetts pathogenic m -like and non-pathogenic h -like strains. we located the switch site at the -end of the n gene (supplementary material ), which implies that the template switch occurred within the n gene. the percent nucleotide similarity between strain ck/ch/lhlj/ vii and h , and ck/ch/lhlj/ vii and m , for the full-length genomes was . % and . %, respectively. alignment revealed that a -nucleotide insertion was located in nsp of the ck/ch/ lhlj/ vii strain between genomic positions and (supplementary material ). in addition, the s gene of strain ck/ ch/lhlj/ vii showed extensive mutations by pairwise comparison (supplementary material ) though it was grouped with h by s gene phylogenic analysis. these and our previous results (liu et al., ) showed that, with the exception of the occurrence of recombination events, ck/ch/lhlj/ vii has experienced multiple mutations and deletions in the genome over time. understanding the evolution of mass-type ibv is important because not only is this virus circulating worldwide but information on virus genomics will aid our understanding of the evolution and emergence of ibv with infectious potential in vaccinated chicken flocks. in this study, we focused on the full-length genomic sequences of four ibv isolates which had been shown to be of the mass-type by s gene analysis (liu et al., ; ma et al., ; sun et al., ) . based on the high degree of similarity in the full genomic sequence, it could be concluded that two ibv strains, ck/ ch/lnm/ and ck/ch/ldl/ , were very similar to the vaccine strain h . they might therefore represent re-isolations of vaccine strains, although they were isolated from vaccinated chickens with respiratory disease. similarly, ibv strains that showed a very close relationship to the h vaccine strain have been isolated from unvaccinated broiler flocks in slovenia with respiratory problems (krapež et al., ) . alternatively, these strains might be variants of vaccine strains that have resulted from accumulated point mutations after several passages in chickens. a few key mutations in the s subunit of the spike protein might result in a change to a new serotype, which is defined as a lack of cross-neutralization with specific sera against different ibv serotypes (cavanagh et al., ) . the point mutations found in the genome that distinguish between the two isolates and vaccine strain h might be the result of adaptive evolution driven by the host immune response when the vaccine strain was transmitted among chickens. adaptive evolution is the process by which genetic changes in the viral genome leading to a more fit virus population become fixed over time, and it has been reported to occur in many coronaviruses (hasoksuz et al., ; lee and jackwood, ; shi et al., ; tang et al., ; zhang et al., ) . as shown in this study, and as also occurs in other countries (dolz et al., ; rimondi et al., ; roussan et al., ) , the isolation of mass-type ibv is expected, because attenuated vaccine strains are used extensively in chicken flocks in china. however, vaccination is not likely to be the only explanation for the circulation of mass-type virus, because ck/ch/lhlj/ and ck/ch/ lhlj/ vii were most closely related to a massachusetts pathogenic type strain, m . the isolation of a massachusetts pathogenic strain from h -vaccinated chicken flocks may be due to vaccination failure in these flocks . alternatively, molecular studies have shown that only a few changes in the amino acid composition of the s spike protein can result in immune failure, even when the majority of the virus genome remains unchanged (cavanagh et al., ) . our findings showed that mutations had occurred in the genomes of both ck/ch/lhlj/ and ck/ch/ lhlj/ vii, especially in the s genes of ck/ch/lhlj/ vii though it was in the same group with h strain in the phylogenetic analysis, implicating that strain ck/ch/lhlj/ vii has experienced evolution over time. it has been reported that amino acid changes may result from immunological pressure caused by the widespread use of vaccines (cavanagh et al., . the occurrence of recombination events is another process that allows new strains to emerge, and this has been well documented in ibv (hughes, ; jia et al., ; kottier et al., ; kusters et al., ; wang et al., ) and other coronaviruses baric, , ; makino et al., ) . it is believed that the conditions for recombination amongst ibv strains in the field are as follows: an extremely large number of chickens, most maintained at high density; the ease of spread of the virus; and serotype cocirculation, including proof of co-infection with more than one serotype in a given flock (cavanagh, ) . in china, intensive chicken farms are concentrated in many provinces, including heilongjiang, where ck/ch/lhlj/ vii was isolated (liu et al., ) . almost all the chickens in china receive mass-type vaccines at a very young age and subsequently receive this vaccine a couple more times during the rest of their life span. therefore the vaccine virus exists constantly in chickens; it may persist in various internal organs for days or longer (cavanagh and gelb, ) . generally, vaccination using the h vaccine provides full protection against pathogenic mass-type pathogenic ibvs and prevents the same type of pathogenic strain from being replicated and spreading in the flocks. however, a single amino acid substitution at position of the s subunit of the spike has resulted in escape mutants of mass (cavanagh et al., ) . this may have occurred in the case of the ck/ch/lhlj/ vii s gene (liu et al., ) , which might have made it possible for both pathogenic and vaccine strains to co-exist in a given flock, leading to the occurrence of recombination. similarly, an escape mutant could be a result of adaptive evolution driven by the host immune response. consequently, it is likely that genetic changes due to adaptive evolution and recombination both contributed to the origin and evolution of strain ck/ch/lhlj/ vii: it is possible that adaptive evolution created a mutant, followed by recombination between mass -and h -like strains to create a novel virus. the recombination events from which the ck/ch/lhlj/ vii virus resulted can be explained by a scenario in which the recombination may have involved two parental viral strains, with initiation of rna replication in a m -like template of either negative or positive polarity (liao and lai, ) . this would be followed by switching of the polymerase-nascent crna complex to an h like virus template. the switch may have occurred at the -end of the n gene. in general, for a recombinant virus to emerge and establish itself in the field, it must be viable and have selective advantages. it has been reported that uptake of canine coronavirus (ccv) sequences by type ii feline coronavirus (fcov) may have led to increased viral fitness when compared with type i fcov (herrewegh et al., ) . recombination can also result in the emergence of new strains with distinct characteristics, such as pathogenicity and tissue tropism (worobey and holmes, ) . in addition, in the cov genome, as with most rna viruses, the -utrs usually harbor important structural elements that are involved in replication and/or translation (chang et al., ; raman et al., ; raman and brian, ; goebel et al., ; züst et al., ) . in ibv, the -utr-binds to the n protein, which is essential for the synthesis of negative-strain viral rna. perhaps the acquisition of the end of the n gene and the -utr from h -like virus by an m -like virus (e.g. ck/ch/lhlj/ vii) can alter the efficiency of viral replication. this alteration may in turn affect pathogenicity. however, it remains unknown whether this is the true origin of ck/ ch/lhlj/ vii, and therefore this strain is of particular importance to the surveillance of ibv in china. it will be of equal importance to examine future outbreaks of ibv in chickens by full-length genomic sequence analysis in the context of novel recombination events among ibv strains. furthermore, investigations using reverse genetic systems might provide further insight into these issues and increase our understanding of ibv pathogenesis. a filterable virus, distinct from that of laryngotracheitis, the cause of a respiratory disease of chicks development and use of the h strain of avian infectious bronchitis virus from the netherlands as a vaccine: a review an efficient ribosomal frame-shifting signal in the polymeraseencoding region of the coronavirus ibv recombinant avian infectious bronchitis virus expressing a heterologous spike gene demonstrates that the spike protein is a determinant of cell tropism commentary: a nomenclature for avian coronavirus isolates and the question of species status severe acute respiratory syndrome vaccine development: experiences of vaccination against avian infectious bronchitis coronavirus coronaviruses in poultry and other birds coronavirus avian infectious bronchitis virus location of the amino acid differences in the s spike glycoprotein subunit of closely related serotypes of infectious bronchitis virus amino acids within hypervariable region of avian coronavirus ibv (massachusetts serotype) spike glycoprotein are associated with neutralization epitopes infectious bronchitis variations in the spike protein of the /b type of infectious bronchitis virus in the field and during alternate passage in chickens and embryonated eggs a cis-acting function for the coronavirus leader in defective interfering rna replication virus taxonomy, classification and nomenclature of viruses, ninth report of the international committee on taxonomy of viruses, international union of microbiological societies, virology division molecular epidemiology and evolution of avian infectious bronchitis virus in spain over a fourteen-year period the early history of infectious bronchitis evidence for variable rates of recombination in the mhv genome map locations of mouse hepatitis virus temperature sensitive mutants: confirmation of variable rates of recombination a hypervariable region within the cis-acting element of the murine coronavirus genome is nonessential for rna synthesis but affects pathogenesis a -year analysis of molecular epidemiology of avian infectious bronchitis coronavirus in china biologic, antigenic, and full-length genomic characterization of a bovine-like coronavirus isolated from a giraffe feline coronavirus type ii strains - and - originate from a double recombination between feline coronavirus type i and canine coronavirus the present of viral subpopulations in an infectious bronchitis virus vaccine with differing pathogenicity -a preliminary study recombinational histories of avian infectious bronchitis virus and turkey coronavirus multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase a novel variant of infectious bronchitis virus resulting from recombination among three different strains the neutralizing characteristics of strains of infectious bronchitis virus as measured by the constant virus variable serum methods in chicken tracheal cultures experimental evidence of recombination in coronavirus infectious bronchitis virus molecular analysis of infectious bronchitis viruses isolated in slovenia between and : a retrospective study sequence evidence for rna recombination in field isolates of avian coronavirus infectious bronchitis virus origin and evolution of georgia (ga ), a new serotype of avian infectious bronchitis virus the complete sequence ( kilobases) of murine coronavirus gene encoding the putative proteases and rna polymerase rna recombination in a coronavirus: recombination between viral genomic rna and transfected rna fragments molecular characterization and pathogenicity of infectious bronchitis coronaviruses: complicated evolution and epidemiology in china caused by cocirculation of multiple types of infectious bronchitis coronaviruses genetic diversity of avian infectious bronchitis coronavirus in recent years in china high-frequency rna recombination of murine coronaviruses the molecular biology of coronaviruses isolation and characterization of a novel antigenic subtype of infectious bronchitis virus serotype de stem-loop iii in the untranslated region is a cis-acting element in bovine coronavirus defective interfering rna replication stem-loop iv in the untranslated region is a cisacting element in bovine coronavirus defective interfering rna replication molecular characterization of avian infectious bronchitis virus strains from outbreaks in argentina infectious bronchitis virus in jordanian chickens: seroprevalence and detection an apparently new respiratory disease of chicks evolutionary implications of avian infectious bronchitis virus (aibv) analysis unique and conserved features of genome and proteonome of sars-coronavirus, an early split-off from the coronavirus group lineage phylogenetic analysis of infectious bronchitis coronaviruses newly isolated in china, and pathogenicity and evaluation of protection induced by mass serotype h vaccine against strains of the lx -type (qx) differential stepwise evolution of sars coronavirus functional proteins in different host species evidence of natural recombination within the s gene of infectious bronchitis virus evolutionary aspects of recombination in rna viruses adaptive evolution of the spike gene of sars coronavirus: changes in positively selected sites in different epidemic groups virus-encoded proteinases and proteolytic processing in the nidovirales the autocatalytic release of a putative rna virus transcription factor from its polyprotein precursor involves two paralogous papain-like proteases that cleave the same peptide bond genetic interactions between an essential cis-acting rna pseudoknot, replicase gene products, and the extreme end of the mouse coronavirus genome supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/ . /j.meegid. . . . key: cord- -iyteg h authors: shesheer kumar, munpally; venkateswara rao, khareedu; mohammed habeebullah, chittor; dashavantha reddy, vudem title: expression of alternate reading frame protein (f ) of hepatitis c virus in escherichia coli and detection of antibodies for f in indian patients date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: iyteg h apart from the core ( kd), a novel hepatitis c virus (hcv) frame shift protein (f ) is synthesized from the initiation codon of the polyprotein sequence followed by ribosomal frame shift into the − /+ reading frame. to date, no information is available on f protein of indian isolates, and hence detection of antibodies for f protein in indian patients assumes great relevance. specific primers have been designed to amplify sequence coding for aa of truncated f (tf ). the amplified tf has been cloned in bacterial expression vector, pet b for expression in escherichia coli. partially purified expressed protein has been subjected to western blot analysis using patients’ sera. three hcv positive sera employed in western analysis showed positive signals to tf , while sera from uninfected individuals failed to give any signals. further, results of western blots, carried out with patients sera titrated with purified core protein, confirmed the presence of antibodies specific to f . the positive signal observed for f in western analysis with hcv infected sera suggests that f protein is synthesized in the natural course of hcv infection in indian patients as well. presence of antibodies against f protein of subtype c has been demonstrated, for the first time, in indian patients. hepatitis c virus (hcv) is the major causative agent of posttransfusion and parenterally transmitted, non-a, non-b hepatitis throughout the world (alter and seeff, ) . hcv is an enveloped rna virus that is classified in the family flaviviridae (robertson et al., ) . hcv has high genomic variability and at least six different genotypes and an increasing number of subtypes have been reported (simmonds, ) . the genome of hcv comprises a single stranded positive-sense rna of $ . kb in length and contains a single open reading frame (orf) that encodes for a non-functional polyprotein of about aminoacids (grakoui et al., ) . this polyprotein is cleaved co-and post-translationally by cellular and viral proteases to yield different functional proteins. structural proteins are the major components of the mature virion, which are coded by the quarter of the orf and arranged as c-e -e and p , while the non-structural proteins are coded by the three-quarters of the orf in the order ns , ns , ns a, ns b, ns a and ns b (barbara and contreras, ) ; these proteins are involved in polyprotein processing and replicative functions of the virus (suzuki et al., ; suzuki et al., ) . translation of the hcv polyprotein sequence was reported to be regulated by a cap-independent mechanism that requires most of the -non-coding region and the first nine codons of the polyprotein-coding sequence to serve as the internal ribosomal entry sequence (ires) (rijnbrand and lemon, ) . initial expression studies indicated that, besides the core protein ( kd), another protein ( kd) also expressed from the same core protein coding sequence both in vitro and in mammalian cells and it was thought to be a truncated core protein (lo et al., (lo et al., , basu et al., ) of family flaviviridae, revealed that a novel translation mechanism of a ribosomal frame shift exists within the capsid-encoding region, which results in a frame shift protein (varaklioti et al., ; walewski et al., ; xu et al., ) . the frame shift protein was named as f or arfp (alternative ribosomal frame shift protein) based on translation initiated at a non-aug codons in a À /+ reading frame relative to the polyprotein of hcv (baril and brakier-gingras, ) . similar synthesis of capsid protein via frame shift was also observed in various other viruses such as sars-cov (baranov et al., ) . it was first demonstrated that the kd protein was synthesized by ribosomal frame shift and was mostly derived from the coding sequence that overlaps the hcv core protein reading frame (xu et al., ; choi et al., ) . the expressed f protein was localized in the cytoplasm of hepg cells, with a notable perinuclear localization (roussel et al., ) , and was found to be associated with the endoplasmic reticulum . this subcellular localization of hcv f protein is similar to that of the hcv core and ns a proteins, raising the hypothesis that the f protein may participate in hcv morphogenesis or replication . in addition, sera from patients who were positive for hcv genotype a or b were shown to react differently to synthetic peptides of f (boulant et al., ) . the present study mainly deals with the cloning and expression of f coding sequence of the hcv indian isolate belonging to genotype c. further, antibodies against f protein have been detected in indian patients. the recombinant plasmid containing hcv core coding sequence, belonging to genotype c (genbank acc. no. ay ), was used as template for amplification of truncated f (tf ) coding sequence employing f f att cat atg gca cga atc cta aac c and f r att aag ctt acc caa att gcg tga cct gc as forward and reverse primers, respectively. the pcr amplification was performed using conditions of c/ s, c/ s, c/ min for cycles and a final extension of c/ min. pcr product was gel purified and subjected to restriction digestion using ndei and hindiii and subsequently cloned at same sites of pet b. the pet b-tf was subjected to automated dna sequencing. e. coli bl (de ) competent cells were transformed with pet b-tf to carryout the expression studies. western blot analysis was carried out using three positive (anti-hcv and hcv rna positive) and three negative sera (anti-hcv and hcv rna negative). westerns were also performed employing patients' sera titrated with purified core protein. the deduced aminoacid sequences in + reading frame of the standard reference set representing various genotypes were utilized in multiple alignment of f sequences. the phylogenetic tree was generated based on alignment using clustalw program (http:// swift.embl-heildelberg.de). analyses of f coding À /+ reading frame indicated a protein product of aa in the indian isolate ay belonging to genotype c. different genotypes of hcv were reported to code for varied lengths of f -the genotype a encoded aa, b coded for aa and a coded for aa fig. . nucleotide sequence of clone pet b-tf with deduced aminoacids. the deduced aminoacid sequence of f is represented in red and the sequence in black indicate the aminoacids derived from pet b plasmid. the sequence in bold represents the sequence of cloning sites. fig. . phylogenetic analysis based on deduced aminoacid sequences of f of various genotypes of hcv. the phylogenetic analysis was done using clustalw and the tree was constructed using treeview program. scale bar shows number of nucleotide substitutions per site. (kolykhalov et al., ; lohmann et al., ; yanagi et al., ; xu et al., ) . pcr amplification product of $ bp region coding for truncated f (tf ) was cloned at ndei and hindiii sites of pet b. the clone pet b-tf released a fragment of $ bp upon double digestion with ndei and hindiii. the clone having the tf insert when subjected to sequencing revealed the presence of f coding sequence (fig. ) . the deduced aa of tf sequence subjected to blast search exhibited domain based identity of - % with poco , a core frame shift product of hcv isolate h belonging to type a. sequence alignment of deduced aa of f belonging to different genotypes displayed substantial diversity in f sequences. despite these variations, presence of various conserved aa clusters in f indicates the conserved nature of its secondary structure among isolates. phylogenetic analysis of f showed close clustering of sequences belonging to various subtypes of specific genotype (fig. ) implicating that f sequences are genotype specific. motif search analysis of f revealed the presence of caesin kinase -phosphorylation site; protein kinase c-phosphorylation site and ldl class b (ldlrb) receptor binding site. the function of f protein in the life cycle of hcv remains unknown (baril and brakier-gingras, ) . presence of a binding site for ldlrb indicates the possibility of interaction of f with lipids in the natural course of infection. the protein expressed upon iptg induction yielded $ kd band which was absent in un-induced samples (fig. a) . the expressed protein tf was in the insoluble fraction as inclusion bodies. inclusion bodies were purified by washing the pellet after lysis using . % triton-x and subsequently the pellet was dissolved in phosphate buffer (ph . ) containing . % sodium laural sarcosine (sls). the partially purified tf was employed in western blot analyses. three hcv positive sera employed in western analysis showed the presence of antibodies to tf , while sera from uninfected individuals failed to give any signals (fig. b) . similar results were observed with patients' sera titrated with purified core protein (fig. c ). purified core protein was electro-transferred on to nitrocellulose membrane from sds-page. several strips of the membrane containing purified core protein were used to titrate out anti-core antibodies in three different positive sera. finally a western blot without a signal for purified core protein, when titrated sera were used, confirmed the absence of anticore antibodies. the positive signal observed for f in western blot analysis with hcv infected sera and its absence with uninfected sera suggests that f protein is plausibly synthesized in the natural course of hcv infection in indian patients as well. an overview of the results amply indicates that f protein is also synthesized in the natural course of hcv in indian patients. phylogenetic analysis of f of various hcv isolates revealed that aminoacid sequences of f are genotype specific. n, n, n : three different negative sera were used as negative controls. (c) western blot analysis of tf using patients' sera titrated with purified c . p, p, p: three different positive sera used as primary antibody after titration with purified core protein. n represents negative sera and p for positive sera. establishment of the presence of antibodies to f , in this investigation, emphasizes the need for further studies dealing with the role of f in hcv pathogenesis. recovery, persistence, and sequelae in hepatitis c virus infection: a perspective on long-term outcome programmed ribosomal frameshifting in decoding the sars-cov genome non-a, non-b hepatitis and the anti-hcv assay translation of the f protein of hepatitis c virus is initiated at a non-aug codon in a + reading frame relative to the polyprotein functional properties of a kda protein translated from an alternative open reading frame of the core encoding genomic region of hepatitis c virus unusual multiple recoding events leading to alternative forms of hepatitis c virus core protein from genotype b triple decoding of hepatitis c virus rna by programmed translational frameshifting expression and identification of hepatitis c virus polyprotein cleavage products transmission of hepatitis c by intrahepatic inoculation with transcribed rna comparative studies of the core gene products of two different hepatitis c virus isolates: two alternative forms determined by a single amino acid substitution differential subcellular localization of hepatitis c virus core gene products replication of subgenomic hepatitis c virus rnas in a hepatoma cell line internal ribosome entry site-mediated translation in hepatitis c virus replication classification, nomenclature, and database development for hepatitis c virus (hcv) and related viruses: proposals for standardization characterization of the expression of the hepatitis c virus f protein viral heterogeneity of hepatitis c virus processing and functions of hepatitis c virus proteins molecular biology of hepatitis c virus alternate translation occurs within the core coding region of the hepatitis c viral genome evidence for a new hepatitis c virus antigen encoded in an overlapping reading frame synthesis of a novel hepatitis c virus protein by ribosomal frame shift hepatitis c virus f protein is a short lived protein associated with the endoplasmic reticulum hepatitis c virus: an infectious molecular clone of a second major genotype ( a) and lack of viability of intertypic a and a chimeras msk is highly grateful to the csir, govt. of india, new delhi, for the award of fellowship. authors extend their thanks to prof. t. papi reddy, former head, department of genetics, o.u, for his helpful suggestions in improving the manuscript. key: cord- -bsypo l authors: van dorp, lucy; acman, mislav; richard, damien; shaw, liam p.; ford, charlotte e.; ormond, louise; owen, christopher j.; pang, juanita; tan, cedric c.s.; boshier, florencia a.t.; ortiz, arturo torres; balloux, françois title: emergence of genomic diversity and recurrent mutations in sars-cov- date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: bsypo l sars-cov- is a sars-like coronavirus of likely zoonotic origin first identified in december in wuhan, the capital of china's hubei province. the virus has since spread globally, resulting in the currently ongoing covid- pandemic. the first whole genome sequence was published on january , , and thousands of genomes have been sequenced since this date. this resource allows unprecedented insights into the past demography of sars-cov- but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. we curated a dataset of public genome assemblies and analysed the emergence of genomic diversity over time. our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of , supporting this as the period when sars-cov- jumped into its human host. due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. we identify regions of the sars-cov- genome that have remained largely invariant to date, and others that have already accumulated diversity. by focusing on mutations which have emerged independently multiple times (homoplasies), we identify filtered recurrent mutations in the sars-cov- genome. nearly % of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of sars-cov- . three sites in orf ab in the regions encoding nsp , nsp , nsp , and one in the spike protein are characterised by a particularly large number of recurrent mutations (> events) which may signpost convergent evolution and are of particular interest in the context of adaptation of sars-cov- to the human host. we additionally provide an interactive user-friendly web-application to query the alignment of the sars-cov- genomes. on december , china notified the world health organisation (who) about a cluster of pneumonia cases of unknown aetiology in wuhan, the capital of the hubei province. the initial evidence was suggestive of the outbreak being associated with a seafood market in wuhan, which was closed on january . the aetiological agent was characterised as a sars-like betacoronavirus, later named sars-cov- , and the first whole genome sequence (wuhan-hu- ) was deposited on ncbi genbank on january ( ) . human-to-human transmission was confirmed on january , by which time sars-cov- had already spread to many countries throughout the world. further extensive global transmission led to the who declaring covid- as a pandemic on march . betacoronaviridae comprise a large number of lineages that are found in a wide range of mammals and birds ( ) , including the other human zoonotic pathogens sars-cov- and mers-cov. the propensity of betacoronaviridiae to undergo frequent host jumps supports sars-cov- also being of zoonotic origin. to date, the genetically closest-known lineage is found in horseshoe bats (batcov ratg ) ( ) . however, this lineage shares % identity with sars-cov- , which is not sufficiently high to implicate it as the immediate ancestor of sars-cov- ( ) . the zoonotic source of the virus remains unidentified at the date of writing (april ). the analysis of genetic sequence data from pathogens is increasingly recognised as an important tool in infectious disease epidemiology ( , ) . genetic sequence data sheds light on key epidemiological parameters such as doubling time of an outbreak/epidemic, reconstruction of transmission routes and the identification of possible sources and animal reservoirs. additionally, whole-genome sequence data can inform drug and vaccine design. indeed, genomic data can be used to identify pathogen genes interacting with the host and allows characterization of the more evolutionary constrained regions of a pathogen genome, which should be preferentially targeted to avoid rapid drug and vaccine escape mutants. there are thousands of global sars-cov- whole-genome sequences available on the rapid data sharing service hosted by the global initiative on sharing all influenza data (gisaid; https://www.epicov.org) ( , ) . the extraordinary availability of genomic data during the covid- pandemic has been made possible thanks to a tremendous effort by hundreds of researchers globally depositing sars-cov- assemblies (table s ) and the proliferation of close to real time data visualisation and analysis tools including nextstrain (https://nextstrain.org) and cov-glue (http://cov-glue.cvr.gla.ac.uk). in this work we use this data to analyse the genomic diversity that has emerged in the global population of sars-cov- since the beginning of the covid- pandemic, based on a download of assemblies. we focus in particular on mutations that have emerged independently multiple times (homoplasies) as these are likely candidates for ongoing adaptation of sars-cov-measured via the site specific consistency index. for this analysis all ambiguous sites in the alignment were set to 'n'. to assess whether any particular open reading frame (orf) showed evidence of more homoplasies than expected given the length of the orf, an empirical distribution was obtained by sampling, with replacement, equivalent length windows and recording the number of homoplasies detected (table s ) . homoplasyfinder identified homoplasies ( excluding masked sites), which were distributed over the sars-cov- genome ( figure s , table s ). of these, sites have a derived allele at > % of the total isolates. however, homoplasies can arise due to convergent evolution (putatively adaptive), recombination, or via errors during the processing of sequence data. the latter is particularly problematic here due to the mix of technologies and methods employed by different contributing research groups. we therefore filtered identified homoplasies using a set of thresholds attempting to circumvent this problem (filtering scripts and figures are available at https://github.com/liampshaw/cov-homoplasy-filtering). in summary, for each homoplasy we computed the proportion of isolates with the homoplasy pnn where the nearest neighbouring isolate in the phylogeny also carried the homoplasy (excluding identical sequences). this metric ranges between pnn= (all isolates with the homoplasy present as singletons) and pnn= (no singletons i.e. clustering of isolates with the homoplasy in the phylogeny). we reasoned that artefactual sequencing homoplasies would tend to show up as singletons, so excluded all homoplasies with pnn< . from further analysis. to obtain a set of high confidence homoplasies, we then used the following criteria: ≥ . % isolates in the alignment share the homoplasy (equivalent to > isolates), pnn> . , and derived allele found in strains sequenced from > originating lab and > submitting lab. we also required the proportion of isolates where the homoplasic site was in close proximity to an ambiguous base (± bp) to be zero. the application of these various filters reduced the number of homoplasies to (table s ) . we also plotted the distributions of cophenetic distances between isolates carrying each homoplasy compared to the distribution for all isolates ( figure s ) , and inspected the distribution of all identified homoplasies in the phylogenies from our own analyses and on the phylogenetic visualisation platform provided by nextstrain. finally, we examined whether ambiguous bases were seen more often at homoplasic sites than at random bases(excluding masked sites), which was not the case ( figure s ). to further validate the homoplasy detection method applied to the alignment of the sars-cov- genome assemblies, we took advantage of the genome sequences for which raw reads were available on the short read archive (sra). a variant calling pipeline (available at https://github.com/damienfr/cov-homoplasy) was used to obtain high-confidence alignments for the (out of as of april ) sra genomic datasets both meeting our quality criterions and matching gisaid assemblies. the topology of the maximum likelihood phylogeny of these samples was compared to that of the corresponding samples from the gisaid genome assemblies using a mantel test and the phytools r package ( ) (figures s -s , see supplementary text). ≥ %), and homoplasies were kept in the sra dataset and in the gisaid dataset, respectively. nine sites were detected in both datasets. for sites which failed the filtering thresholds, this was largely due to the low number of studied accessions, which increases the probability of an isolated strain displaying a homoplasy e.g. if n= isolates have a homoplasy, by definition they cannot be nearest neighbours, so pnn= . the alignment was translated to amino acid sequences using seaview v ( ) . sites were identified as synonymous or non-synonymous and amino acid changes corresponding to these mutations were retrieved via multiple sequence alignment. we assessed the change in hydrophobicity and charge of amino acid residues arising due to homoplastic non-synonymous mutations using the hydrophobicity scale proposed by janin ( ) . the ten most hydrophobic residues on this scale were considered hydrophobic and the rest as hydrophilic. in addition, amino acid residues were either classified as positively charged, negatively charged or neutral at ph . the charge of each residue can either increase, decrease or remain the same (neutral mutation) due to mutation ( figure s ). sars-cov- and mers-cov are both zoonotic pathogens related to sars-cov- , which underwent a host jump into the human host previously. we investigated whether the major homoplasies we detect in sars-cov- affect sites which also underwent recurrent mutations in these related viruses as these adapted to their human host. all coronaviridae assemblies were downloaded (ncbi taxid: ) on th of april and human associated mers-cov and sars-cov- assemblies extracted. this gave a total of assemblies for sars-cov- and assemblies for mers-cov. following the same protocol (augur align) as applied to sars-cov- assemblies, each species was aligned against the respective refseq reference genomes: nc_ . for sars-cov- and nc_ . for mers-cov. this produced alignments of , bp ( snps) and , bp ( snps) respectively. the sars-cov- genomes offer an excellent geographical and temporal coverage of the covid- pandemic (figure a-b) . the genomic diversity of the sars-cov- genomes is represented as maximum likelihood phylogenies in a radial (figure c ) and linear layout ( figure s -s ). there is a robust temporal signal in the data, captured by a statistically significant correlation between sampling dates and 'root-to-tip' distances for the sars-cov- ( figure s ; r = . , p< . ). such positive association between sampling time and evolution is expected to arise in the presence of measurable evolution over the timeframe over which the genetic data was collected. specifically, more recently sampled strains have accumulated additional mutations in their genome than older ones since their divergence from the most recent common ancestor (mrca, root of the tree). the origin of the regression between sampling dates and 'root-to-tip' distances ( figure s ) provides a cursory point estimate for the time to the mrca (tmrca) around late . using treedater ( ), we observe an estimated tmrca, which corresponds to the start of the covid- epidemic, of october - december ( % cis) ( figure s ). these dates for the start of the epidemic are in broad agreement with previous estimates performed on smaller subsets of the covid- genomic data using various computational methods ( table ) , though they should still be taken with some caution. indeed, the sheer size of the dataset precludes the use of some of the more sophisticated inference methods available. the sars-cov- global population has accumulated only moderate genetic diversity at this stage of the covid- pandemic with an average pairwise difference of . snps between any two genomes, providing further support for a relatively recent common ancestor. we estimated a mutation rate underlying the global diversity of sars-cov- of ~ × - nucleotides/genome/year (ci: x - - x - ) obtained following time calibration of the maximum likelihood phylogeny. this rate is largely unremarkable for an rna virus ( , ) , despite coronaviridae having the unusual capacity amongst viruses of proofreading during nucleotide replication, thanks to the non-structural protein nsp exonuclease, which excises erroneous nucleotides inserted by their main rna polymerase nsp ( , ) . some of the major clades in the maximum likelihood phylogeny (figure c and figure s ) are formed predominantly by strains sampled from the same continent. however, this likely represents a temporal rather than a geographic signal. indeed, the earliest available strains were collected in asia, where the covid- pandemic started, followed by extensive genome sequencing efforts first in europe and then in the usa. the sars-cov- genomic diversity found in most countries (with sufficient sequences) essentially recapitulates the global diversity of covid- from the -genome dataset. figure highlights the proportion of the global genetic diversity found in the uk, the usa, iceland and china. in the uk, the usa and iceland, the majority of the global genetic diversity of sars-cov- is recapitulated, with representatives of all major clades present in each of the countries (figure a-c) . the same is true for other countries such as australia ( figure s a ). this genetic diversity of sars-cov- populations circulating in different countries points to each of these local epidemics having been seeded by a large number of independent introductions of the virus. the main exception to this pattern is china, the source of the initial outbreak, where only a fraction of the global diversity is present (figure d ). this is also to an extent the case for italy (figure s b) , which was an early focus of the covid- pandemic. however, this global dataset includes only sars-cov- genomes from italy, so some of the genetic diversity of sars-cov- strains in circulation likely remains unsampled. the genomic diversity of the global sars-cov- population being recapitulated in multiple countries points to extensive worldwide transmission of covid- , likely from extremely early on in the pandemic. the sars-cov- alignment can be considered as broken into a large two-part open reading frame (orf) encoding non-structural proteins, four structure proteins: spike (s), envelope (e), membrane (m) and nucleocapsid (n), and a set of small accessory factors (figure a ). there is variation in genetic diversity across the alignment, with polymorphisms often found in neighbouring clusters ( figure s ) . a simple permutation resampling approach suggests that both orf a and n exhibit snps which fall in the th percentile of the empirical distribution (table s ) . however, not all of these sites can be confirmed as true variant positions, due to the lack of accompanying sequence read data. however, we closely inspected those sites that appear to have arisen multiple times following a maximum parsimony tree building step. we identified a large number of putative homoplasies (n= excluding masked regions), which were filtered to a high confidence cohort of positions (see methods). these positions in the sars-cov- genome alignment ( . % of all sites) were associated with amino acid changes across all genomes. of these amino acid changes, comprised non-synonymous and comprised synonymous mutations. two non-synonymous mutations involved the introduction or removal of stop codons were found (* y, * g). of the remaining non-synonymous mutations involved neutral hydrophobicity changes ( figure s a ). in addition, of the remaining non-synonymous mutations involved neutral changes ( figure s b ). both orf ab and n had a four-fold higher frequency of hydrophilic → hydrophobic mutations than hydrophobic → hydrophilic mutations ( figure s ). in addition, neutral hydrophobic changes were clearly favoured in the s protein. lastly, of the remaining non-synonymous mutations involved neutral charge changes. amongst the strongest filtered homoplasic sites (> change points on the tree), three are found within orf ab (nucleotide positions , , ) and s ( ). we exemplify the strongest signal and our approach using position in figure and provide a full list of homoplasic sites, both filtered and unfiltered, in tables s - . the strongest hit in terms of the inferred minimum number of changes required (figure b -c) at orf ab ( , codon ) falls over a region encoding the non-structural protein, nsp , and is also observed in our analyses of the sra dataset (table s ) . we note that some of the hits also overlap with positions identified as putatively under selection using other approaches (http://virological.org/t/selection-analysis-of-gisaid-sars-cov- data/ / , accessed april ), with orf ab consistently identified as a region comprising several candidates for non-neutral evolution. orf ab is an orthologous gene with other humanassociated betacoronaviruses, in particular sars-cov- and mers-cov which both underwent host jumps into humans from likely bat reservoirs ( , ) . we performed an equivalent analysis on human-associated virus assemblies available on the ncbi virus platform. we identified six putative homoplasic sites within sars-cov- , two occurring within the c-like proteinase just upstream of nsp ( , ) and a further two homoplasies within orf ab at nsp and nsp ( figure s ). in addition, one homoplasy was identified in the spike protein and one in the membrane protein orfs. for mers-cov, multiple unfiltered homoplasies were detected, consistent with previous observations of high recombination in this species ( ) , though only one invoked more than a minimum number of changes on the maximum parsimony tree ( figure s ) . this corresponded to a further homoplasy identified in orf ab nsp (position ). it is of note that this genomic region coincides with the strongest homoplasy in sars-cov- which also occurs in the nsp encoding region of orf ab. codon of orf ab shares a leucine residue in mers-cov and sars-cov- , though a valine in sars-cov. the exact role of these and other homoplasic mutations in human associated betacoronaviruses represents an important area of future work, although it appears that the orf ab region may exhibit multiple putatively adapted variants across human betacoronavirus lineages. the genome alignment of the sars-cov- genomes can be queried through an open access, interactive web-application (https://macman .shinyapps.io/ugi-scov -alignmentscreen/). it provides users with information on every snp and homoplasy detected across our global sars-cov- alignment and allows visual inspection both within the sequence alignment and across the maximum likelihood tree phylogeny. figure illustrates some of the functionalities of the web application using position in the alignment as an example. this particular homoplasy was observed times across the genomes and requires a minimum of character-site changes to become congruent with the observed sars-cov- phylogeny (figure a and b ). pandemics have been affecting humanity for millennia ( ) . over the last century alone, several global epidemics have claimed millions of lives, including the / influenza a (h n ) pandemic, the sixth ( - ) and seventh 'el tor' cholera pandemic ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , as well as the hiv/aids pandemic ( -today). covid- acts as an unwelcome reminder of the major threat that infectious diseases represent in terms of deaths and disruption. one positive aspect of the current situation, relative to previous pandemics, is the unprecedented availability of scientific and technological means to face covid- . in particular, the rapid development of drugs and vaccines has already begun. modern drug and vaccine development are largely based on genetic engineering and an understanding of host-pathogen interactions at a molecular level. the mobilisation to address the covid- pandemic by scientists worldwide has been remarkable. this includes the feat of the global scientific community who has already produced and publicly shared well over , complete sars-cov- genome sequences at the time of writing (april ), which we have used here with gratitude. further initatives in the united kingdom (https://www.cogconsortium.uk/data/) have already to date produced over , genomes, some of which overlap with those already available on gisaid. to put these numbers of sars-cov- genomes in context, it is interesting to consider parallels with the h n pdm influenza pandemic, the first epidemic for which genetic sequence data was generated in near-real time ( , ) . the genetic data available at the time looks staggeringly small in comparison to the amount that has already been generated for sars-cov- during the early stages of the covid- pandemic. for example, fraser et al. considered partial hemagglutinin gene sequences two months after the who had declared h n pdm influenza a pandemic ( ) . this unprecedented genomic resource has already provided strong conclusions about the pandemic. for example, analyses by multiple independent groups place the start of the covid- pandemic towards the end of ( table ). this rules out any scenario that assumes sars-cov- may have been in circulation long before it was identified, and hence have already infected large proportions of the population. extensive genomic resources for sars-cov- should in principle also be key to informing on optimal drug and vaccine design, particularly when coupled with knowledge of human proteome and immune interactions ( ) . ideally, drugs and vaccines should target relatively invariant, strongly constrained regions of the sars-cov- genome, to avoid drug resistance and vaccine evasion. therefore ongoing monitoring of genomic changes in the virus will be essential to gain a better understanding of fundamental host-pathogen interactions that can inform drug and vaccine design. the vast majority of mutations observed so far in sars-cov- circulating in humans are likely neutral ( , ) or even deleterious ( ) . homoplasies, such as those we detect here, can arise by product of neutral evolution or as a result of ongoing selection. of the homoplasies we detect (after applying stringent filters), some proportion are very likely genuine targets of positive selection which signpost to ongoing adaptation of sars-cov- to its new human host. indeed, we do observe an enrichment for non-synonymous changes ( %) in our filtered sites. as such, our provided list (table s ) contains candidates for mutations which may affect the phenotype of sars-cov- and virus-host interactions and which require ongoing monitoring. conversely, the finding that % of the homoplasic mutations involve no polarity change could still reflect strong evolutionary constraints at these positions ( , ) . the remaining non-neutral changes to amino acid properties at homoplasic sites may be enriched in candidates for functionally relevant adaptation and could warrant further experimental investigation. one of the strongest homoplasies lies at site in the sars-cov- genome in a region of orf a encoding nsp . this site passed our stringent filtering cirteria and was also present in our analysis of the sra dataset (table s ) . interestingly, this region overlaps a putative immunogenic peptide predicted to result in both cd + and cd + t-cell reactivity ( ) . more minor homoplasies amongst our top candidates, identified within orf a (table s ) , also map to a predicted cd t cell epitope. while the immune response to sars-cov- is poorly understood at this point, key roles for cd t cells, which activate b cells for antibody production, and cytotoxic cd t cells, which kill virus-infected cells, are known to be important in mediating clearance in respiratory viral infections ( ) . of note, we also identify a strong recurrent mutation in nucleotide position , corresponding to the sars-cov- spike protein (codon ). while the spike protein is the known mediator of host-cell entry, our detected homoplasy falls outside of the n-terminal and receptor binding domains. our analyses presented here provide a snapshot in time of a rapidly changing situation based on available data. although we have attempted to filter out homoplasies caused by sequencing error with stringent thresholds, and also used available short-read data to validate a subset of homoplasic sites in a smaller dataset, our analysis nevertheless remains reliant on the underlying quality of the publicly available assemblies. as such, it is possible that some results might be artefactual, and further investigation will be warranted as additional raw sequencing data becomes available. however, given the crucial importance of identifying potential signatures of adaptation in sars-cov- for guiding ongoing development of vaccines and treatments, we have suggested what we believe to be a plausible approach and initial list in order to facilitate future work and interpretation of the observed patterns. more data continues to be made available, which will allow ongoing investigation by ourselves and others. we believe it is important to continue to monitor sars-cov- evolution in this way and to make the results available to the scientific community. in this context, we hope that the interactive web-application we provide will help identify key recurrent mutations in sars-cov- as they emerge and spread. figure . global sequencing efforts have contributed hugely to our understanding of the genomic diversity of sars-cov- . a) viral assemblies available from global regions as of / / . b) cumulative total of viral assemblies uploaded to gisaid included in our analysis. c) radial maximum likelihood phylogeny for complete sars-cov- genomes. colours represent continents where isolates were collected. green: asia; red: europe; purple: north america; orange: oceania; dark blue: south america according to metadata annotations available on nextstrain (https://github.com/nextstrain/ncov/tree/master/data). figure c .  phylogenetic estimates support that the covid- pandemic started sometimes around october - december , which corresponds to the time of the host-jump into humans.  the diversity of sars-cov- strains in many countries recapitulates its full global diversity, consistent with multiple introductions of the virus to regions throughout the world seeding local transmission events.  sites in the sars-cov- genome appear to have already undergone recurrent, independent mutations based on a large-scale analysis of public genome assemblies.  detected recurrent mutations may indicate ongoing adaptation of sars-cov- to its novel human host.  monitoring the build-up and patterns of genetic diversity in sars-cov- has potential to inform targets for drug and vaccine development. a new coronavirus associated with human respiratory disease in china the phylogenetic range of bacterial and viral pathogens of vertebrates a pneumonia outbreak associated with a new coronavirus of probable bat origin the genomic and epidemiological dynamics of human influenza a virus unifying the epidemiological and evolutionary dynamics of pathogens disease and diplomacy: gisaid's innovative contribution to global health global initiative on sharing all influenza datafrom vision to reality mafft multiple sequence alignment software version : improvements in performance and usability raxml-ng: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data bayesian inference of ancestral dates on bacterial phylogenetic trees scalable relaxed clock phylogenetic dating mpboot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation homoplasyfinder: a simple tool to identify homoplasies on a phylogeny toward defining course of evolution -minimum change for a specific tree topology phytools: an r package for phylogenetic comparative biology (and other things) seaview version : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building surface and inside volumes in globular proteins an unusually high substitution rate in transplant-associated bk polyomavirus in vivo is further concentrated in hla-c-bound viral peptides the evolution of ebola virus: insights from the - epidemic unique and conserved features of genome and proteome of sars-coronavirus, an early split-off from the coronavirus group lineage discovery of an rna virus '-> ' exoribonuclease that is critically involved in coronavirus rna synthesis severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats middle east respiratory syndrome coronavirus in bats, saudi arabia mers-cov recombination: implications about the reservoir and potential for adaptation what are pathogens, and what have they done to and for us? pandemic potential of a strain of influenza a (h n ) : early findings origins and evolutionary genomics of the swine-origin h n influenza a epidemic a sars-cov- -human protein-protein interaction map reveals drug targets and potential drug-repurposing infectious diseases of humans a dynamic nomenclature proposal for sars-cov- to assist genomic epidemiology computational inference of selection underlying the evolution of the novel coronavirus, sars-cov- a sars-cov- vaccine candidate would likely match all currently circulating strains synonymous mutations and the molecular evolution of sars-cov- origins looking for darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level distribution of the strength of selection against amino acid replacements in human proteins a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- immunity to respiratory viruses transmission dynamics and evolutionary history of -ncov the first two cases of -ncov in italy: where they come from genomic epidemiology of sars-cov- in guangdong province % bci may % bci november rate-estimated relaxed clock model ( ) % ci october % ci november unreported clock model (beast % hpd november strict clock model (beast v . ) relaxed clock model (beast v . ) % ci november o analysed data and performed computational analyses l.v.d and f.b. acknowledge financial support from the newton fund uk-china nsfc initiative (grant mr/p / ) and the bbsrc (equipment grant bb/r x/ ). computational analyses were performed on ucl computer science cluster and the south green bioinformatics platform hosted on the cirad hpc cluster. we thank jaspal puri for insights and assistance on the development of the alignment visualisation tool and nicholas mcgranahan and rachel rosenthal for their comments on the manuscript. we additionally wish to acknowledge the very large number of scientists in originating and submitting labs who have readily made available sars-cov- assemblies to the research community. key: cord- - s e y s authors: kim, you-jin; kim, dae-won; lee, wan-ji; yun, mi-ran; lee, ho yeon; lee, han saem; jung, hee-dong; kim, kisoon title: rapid replacement of human respiratory syncytial virus a with the on genotype having nucleotide duplication in g gene date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: s e y s human respiratory syncytial virus (hrsv) is the main cause of severe respiratory illness in young children and elderly people. we investigated the genetic characteristics of the circulating hrsv subgroup a (hrsv-a) to determine the distribution of genotype on , which has a -nucleotide duplication in attachment g gene. we obtained hrsv-a positive samples between october and february , which were subjected to sequence analysis. the first on genotype was discovered in august and samples were identified as on up to february . the prevalence of the on genotype increased rapidly from . % in – to . % in – . the mean evolutionary rate of g protein was calculated as . × (− ) nucleotide substitution/site/year and several positively selected sites for amino acid substitutions were located in the predicted epitope region. this basic and important information may facilitate a better understanding of hrsv epidemiology and evolution. human respiratory syncytial virus (hrsv) is recognized by pediatricians as the most common cause of acute respiratory tract infections and is a leading cause of hospital admissions and death among children aged < years worldwide (hall et al., ; cho et al., ; munywoki et al., ; bezerra et al., ; nair et al., ) . the world health organization has estimated that the annual global disease burden is more than million hrsv infections and , deaths related to hrsv infection (world health organization (who), ). hrsv infection is a major concern in developed and developing countries, but no effective vaccine is available and immunoprophylaxis is the only treatment for preventing hrsv infection, although access is limited (chang, ; rudraraju et al., ; graham, ; jorquera et al., ; shaw et al., ; wang et al., a; zeitlin et al., ) . two antigenic groups of hrsv have been differentiated based on antigenic variability in the attachment g gene, i.e., hrsv subgroup a (hrsv-a) and hrsv subgroup b (hrsv-b). ten hrsv-a and hrsv-b genotypes have been classified in different geographical regions, which are designated as ga -ga , saa , na , and na for hrsv-a, and gb -gb , sab -sab , and ba -ba for hrsv-b (auksornkitti et al., ; eshaghi et al., ; lee et al., ; khor et al., ; shobugawa et al., ) . most previous molecular epidemiological and divergence studies of hrsv have focused on analyses of nucleotide and/or amino acid changes in part of the g protein, which is a type ii glycoprotein that mediates attachment of the virus to the cell during virus entry, and is one of the targets of the immune response (agoti et al., ; cui et al., ; baek et al., ; tan et al., ; murata and catherman, ) . these studies have yielded significant volumes of partial genomic information for hrsv and primary analyses based on variation in the g protein. in particular, the ba genotype of hrsv-b, which was isolated in buenos aires, argentina during , contains a duplication of nucleotides (nt) in the c-terminal third of the g protein gene and is the predominant strain according to global epidemiological studies (sullender et al., ; trento et al., ; trento et al., ; van niekert and venter, ; zhang et al., ) . more recently, a similar duplication was reported in hrsv-a (on ) isolates from canada, germany, malaysia, thailand, kenya, and south korea, which was characterized as a -nt duplication in the c-terminal third of the g gene (munywoki et al., ; auksornkitti et al., ; eshaghi et al., ; lee et al., ; prifert et al., ) . the exact mechanism that allows such duplications to play roles during selective pressure and the factors that may increase their fitness to substitute the ba genotype or other hrsv-b viruses remain to be defined. similarly, the basis of the evolutionary advantage and antigenic dominance of the hrsv-a (on ) strain due to the introduction of identical bases in the g gene also needs to be clarified. in this study, we investigated the emergence of the new hrsv-a on genotype and conducted an in-depth analysis of the genetic predisposition of the g protein gene. in addition, we predicted the epitope of the duplicated g protein with an insertion of amino acids, which we compared with ancestral strains. this analysis is of importance for elucidating the antigenic variation of the hrsv g protein and its relationships with clinical manifestations and vaccine development. the clinical samples used in this study were collected as part of the laboratory surveillance system in south korea, i.e., the acute respiratory infection network (ari-net) conducted until april and the korea influenza and respiratory viruses surveillance system (kin-ress) since may . this study was approved by the korea national institute of health institutional review boards (approval nos. - exp- -r, - con- -c, - exp -c, - exp- - c, and - con- - c) because it involved anonymization of the remaining respiratory tract samples, which were not related to human gene studies. these samples were collected for respiratory virus diagnosis and written informed consent was obtained from the patients, their parents, or legal guardians. this study used nasal aspirate specimens and throat swab samples taken from patients enrolled in ari-net and kinress with acute respiratory illness who were diagnosed as hrsv-positive including co-infected samples with influenza virus, human rhinovirus, adenovirus, human coronavirus, human bocavirus, human enterovirus, parainfluenza virus or human metapneumovirus. ari-net used conventional reverse transcriptase (rt)-pcr (solgent, seoul, south korea) to detect the hrsv-a and -b subgroups simultaneously. by contrast, kinress used an improved real-time one-step rt-pcr to distinguish the hrsv-a and -b subgroups (kogen bio, seoul, south korea) after july , where viral rna was extracted from ll of each respiratory specimen using qiaamp viral rna mini kits (qiagen gmbh, hilden, germany). hrsv-positive clinical samples were subjected to amplification of the partial g gene using a g gene-specific primer set for sequence analysis, i.e., the forward primer g( - )f: ctggcaatgataatctcaacttc and reverse primer f( - )r: caactccattgttatttgcc (da silva et al., ) . the cdna was prepared using the viral rna extraction method employed by the routine respiratory virus test. the reaction mixture contained ll of rna, which was mixed with a final concentration of mm dntps, lm random primer,  rt buffer, u of superscript iii reverse transcriptase (invitrogen, ca, usa), u of rnase-out rnase inhibitor (invitrogen), mm mgcl , and . mm dithiothreitol (dtt), and rnase-free water was added to make a final volume of ll. the mixture was then incubated at °c for min, °c for min, and °c for min to terminate cdna synthesis. next, ll of cdna was added to a pcr mixture containing ll of sp-taq dna polymerase ( . u/ll) (cosmo genetech, seoul, south korea), ll of distilled water, ll of  pcr buffer, ll of mm dntps, and ll each of the forward and reverse primers (both lm) for the g gene. primary denaturation was conducted at °c for min, which was followed by cycles of pcr where each cycle comprised denaturation for s at °c, annealing for s at °c, and elongation for min at °c, with a final extension cycle of min at °c. the pcr products were separated by electrophoresis using % agarose gel and visualized using  sybr safe dna gel stain (invitrogen). the amplified pcr products were sequenced bidirectionally with same primers for pcr amplification mentioned in section . . using an abi xl dna analyzer (applied biosystems, foster city, ca, usa). the sequences were edited with seqman pro in the lasergene software suite (version . ; dnastar, madison, wi, usa) and aligned using mega (ver. . ). the nt length fragments of g gene from nt position to stop codon based on a strain were used for further analysis. to minimize potential biases in the alignment and to obtain a more tractable representation of the dataset, identical sequences were removed using cd-hit before performing the alignments (huang et al., ) . after clustering sequences with an amino acid sequence similarity of > . into one cluster, one sequence was selected from each cluster and the sequences with no identical sequence were also removed. the final representative dataset comprised sequences, which were submitted to genbank and assigned accession numbers of ab -ab . to obtain a comprehensive representation of the diverse hrsv-a subgroup, we downloaded publicly available hrsv sequences: wue/ / (jx ), ng- - (ab ), rsv a (m ), chiba-c/ (ab ), on - a (jn ), mo (af ), sel/ / (af ), cn (af ), ch (af ), ng- - (ab ), ny (af ), mo (af ), ng- - (ab ), sa v (af ), and wue/ / (jx ) (prifert et al., ) . in total, sequences were used in the subsequent analysis, which comprised representative sequences and reference sequences. the phylogenetic trees were constructed using muscle and mega (tamura et al., ) . the maximum composite likelihood for nucleotide sequences and the jtt model for amino acid sequences were used with the neighbor-joining (nj) methods to perform the distance calculations. the trees generated were visualized and edited using evolview (zhang et al., ) . to investigate the selective pressure, we used a dataset that comprised c-terminal regions (secondary hypervariable region) of the g genes and ng- - (genbank accession no. ab ) as a reference sequence. in this analysis, we performed a multiple sequence alignment and a phylogenetic tree was generated using clustalw and mega . the nucleotide frequencies in the codon positions were assumed based on unequal codon frequencies. the maximum-likelihood (ml) method was used to analyze the selection pressure with the codeml program in the phylogenetic analysis ml package (paml, http://abacus.gene.ucl.ac.uk/software/paml.html) (yang, ) . codeml was used to estimate the numbers of nonsynonymous (dn) to synonymous (ds) codon changes per site. positive selection was defined as dn > ds (x ratio > ). different codon substitution models were tested in this study, i.e., m and m (neutral), and m and m (positive). the likelihood rates were calculated as twice the difference between the log-likelihood values ( dl) of the models, which were compared using the v distribution (two degrees of freedom). this analysis used the partial g gene sequences of hrsv samples isolated in the present study between and (n = ) and additional sequences (n = ) reported from south korea during - (baek et al., ; choi and lee, ) . after removing % identical sequences using cd-hit (huang et al., ) , the population dynamics of hrsv were estimated over time using a bayesian markov chain monte carlo approach (mcmc in beast version . ), which included the date of virus sampling (drummond et al., ) . the dataset was analyzed with an uncorrelated log-normal relaxed uncorrelated clock using the general time-reversible substitution model selected by jmodeltest version . (posada, ) . the mcmc chain was run for million steps to achieve convergence, with sampling every , steps. convergence was assessed based on the effective sample size (ess) after a % burn-in using tracer version . (http://beast.bio.ed.ac.uk/tracer) and only parameters where ess > were accepted. the uncertainties of the estimates were indicated by the % highest posterior density intervals. the final tree was visualized and edited with figtree version . . (http://tree.bio.ed.ac.uk/software/ figtree). we predicted the epitope for three hrsv-a g gene sequences using seven prediction tools, i.e., bepipreds (larsen et al., ) , lbtope (singh et al., ) , bcpred/fbcpred (el-manzalawy et al., ) , antigenic (rice et al., ) , leps (wang et al., b) , and epitopia (rubinstein et al., ) . these tools made the predictions using the initial values of the parameters, and the common epitopes predicted by four or more tools with p consecutive amino acids were selected. we collected , samples during - via kinress, nationwide surveillance for outpatients with acute respiratory illness covers about hospitals located all over korea and includes all ages. of these, , samples were positive for respiratory viral infections and , samples ( . %) had positive results for hrsv. mean age of entire study group is . year. sex ratio and co-infection rate were . % and . %, respectively and corresponding data per each period is summarized in table . during the - season, samples ( . % in respiratory viruspositive patients) were hrsv-positive. during the next two seasons, however, the hrsv-positive ratio increased and ( . %) and ( . %) cases were detected during the - and - seasons, respectively, as shown in fig. a . there were no specific differences in the gender distribution, mean age, or coinfection rate during these three consecutive seasons. to investigate the distributions of the subgroups, we randomly selected hrsv-positive samples obtained during the - season, which were analyzed by real-time rt-pcr because the rt-pcr methods used in that season could not distinguish subgroups hrsv-a and -b. we tested samples, and only samples ( . %) belonged to the hrsv-a subgroup. in - , the prevalence of the hrsv-a subgroup had increased greatly and over % of the samples were found to be hrsv-a, while the prevalence of the hrsv-a subgroup was also high in the following season ( . %) ( fig. b and table ) . eshaghi et al. ( ) reported the discovery of a novel genotype in canada with a -nt duplication in the g gene during the winter season in - and we also found on genotype strains in south korea, for which we reported the whole-genome sequences . thus, extensive genotype and sequence analyses were performed using the hrsv-a subgroup to further study the prevalence of on genotype strains. in total, we investigated hrsv-a samples obtained during - , where we analyzed the g gene sequences to determine whether a -nt duplication was present or absent. according to the broad sequence analysis, the first on genotype strains were discovered in august . in the season from may to april , samples ( . %) had the on genotype in south korea among hrsv-a samples tested. the major genotype in that season was na ( samples, . %) and the ga genotype was also identified (five samples, . %). these genotype distributions were similar to the results reported by the canadian group (eshaghi et al., ) . in the next season ( - ), however, the prevalence of the on genotypes increased significantly to . % ( / hrsv-a-positive samples) and the ga genotype was not detected (table and fig. c ). na genotype strains were still detected but only at a very low frequency ( samples, . %). our analysis of this newly emerged strain showed that the rapid replacement of the non-on genotypes without a -nt duplication in the g gene by the on genotype was dramatic compared with the earlier spread of the hrsv ba genotype. in a further analysis of the hrsv-a sequences obtained during - , we added unpublished hrsv-a sequences collected between october and april . a clustering approach was applied (as described in the materials and methods section) to remove any redundant sequences and to make the data clear and manageable, while ensuring that the data still represented the diversity of hrsv. the numbers of sequences that belonged to the same clusters (amino acid sequence similarity > . ) were plotted next to the phylogenetic tree. finally, isolates and reference sequences were analyzed and the clusters in the phylogenetic tree generated four major groups: on , na , ga , and others, including the reference sequences. the phylogenies based on the nucleotide and amino acid sequences ( fig. a and b) were in almost complete agreement based on the topology of the phylogeny. the branching patterns differed slightly between groups, but the clade compositions were stable within the groups. however, several sequences (n = ), which were represented by the gg / sequence, had an apparently different branching structure, although there was low statistical support (bootstrap) for how most of these branches were related. based on the align-ment analysis, these clusters had a -nt deletion at nt position in the -nt duplicated region. this deletion caused a frame shift from amino acid position in the c-terminal region to the stop codon located at amino acid position (fig. ) . we considered that the gg / clusters represented a subgenotype, which we designated as on _os. in addition, the first on sequence, identified i.e., on - a/ from canada (eshaghi et al., ) and us / in the present study, had the identical duplicated region with the template. fig. . phylogenetic analysis of the hrsv-a subgroup using partial g sequences based on (a) nucleotide and (b) amino acid sequences. partial g protein sequences were used to study the phylogenetic relationships between hrsv-a subgroups. the nucleotide and protein sequences were aligned using the muscle algorithm and the phylogenetic trees were generated by the neighbor-joining method with mega based on bootstrap replicates. the length of the square bar indicates the number of sequences that belong to the same cluster, while the color indicates the year the sample was isolated. (for interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) however, the l p and y h substitutions in other on clusters were occurred as the same as those reported by tsukagoshi et al. ( ) from japan. the deduced amino acid sequences contained various substitutions in the secondary variable region of the g protein. in total, representative sequences and an na genotype reference sequence (ng- - , ab ) were subjected to selection pressure analysis, as described in the section . the dn/ds (nonsynonymous to synonymous codon changes per site) ratio averaged over all sites ranged from . to . with different codon substitution models. models m and m , which allowed for positive selection, fitted our dataset better than the two neutral models (m and m ). using the ng- - strain as an outgroup, and sites were found to be under positive selection by the m and m models, respectively, and these positively selected sites are shown in fig. and table . the empirical bayes analysis showed that / (positions , , posterior probability of positively selected sites with m model: % to % ( , , , , , , , , , , , , ) ; % to % ( , , , , ) ; % to % ( , , , ); and % < (none). à posterior probability of positively selected sites with m model: % to % ( , , , , , , , , , , , , ) ; % to % (none); % to % ( , , , , , , , ); and % < ( , , ). and ) positively selected sites were under positive selection at the % level in the m model. previously, positions , , , and were reported to be flip-flop sites that tend to revert to a previous state over time (botosso et al., ) . thus, we also found that the substitutions of v a, p l, and l p/t were forward replacements, whereas l h/p was a backward replacement. a frame shift occurred because of the -nt deletion at amino acid position , but the results of the analysis showed that there was high pressure for positive selection from amino acid position to the stop codon. to estimate the dynamics of the nucleotide substitutions and variation in hrsv-a, all of the hrsv-a sequences obtained from this study during - and previously published sequences from south korea during - (baek et al., ; choi and lee, ) were analyzed using the bayesian skyline plot method after removing homologous sequences (fig. ) . the calculated mean evolutionary rate was .  À nucleotide substitutions/site/year, which was faster than the whole-genome evolutionary rate of .  À for hrsv-a (tan et al., ) but slower than the rate of .  À nucleotide substitutions/ site/year in the hrsv-b g gene during the years since the discovery of the -nt duplicated ba genotypes (trento et al., ) . a previous study reported a similar evolutionary rate for the hrsv-a g gene of .  À nucleotide substitutions/site/year . the time course analysis showed that the genetic diversity was steady during years in the course of na genotype emergence in late 's, consistent with the analysis of kushibuchi in japan . the relative genetic diversity has increased around in population size. following that period, a limited fluctuation was observed in with a subsequent dramatic increase in the effective virus population size after . the specific selective pressure applied to the virus in remains to be clarified, but this increasing trend in the relative genetic diversity of hrsv was followed by the initial appearance of the hrsv-a on strain. in this analysis, we could also assume that on genotype was emerged in year, although first on case was discovered in . the bayesian skyline plot also demonstrated that the growth phase of the virus population size agreed with the rapid emergence of the hrsv-a on strain. the g protein is the major antigenic protein expressed on the surface of hrsv, so we assumed that the -nt duplication, which caused a -amino-acid duplication in secondary hypervariable regions of the ectodomain, may have affected the antigenicity, antigen recognition by antibodies, or virulence. thus, we performed b-cell epitope prediction using the representative prototype a (genbank accession no. m ), the on genotype table epitope prediction results of novel on types (gn / , gg / ) and prototype a strains. bold letter means common epitope regions in three strains. position , and the on _os subgenotype gg / sequences (table ) . for each sequence, - antigenic peptides were identified by computational prediction. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] epitope positions were determined for all three sequences. the predicted epitopes were similar but the - epitope located in the duplicated region was predicted only in the gg / sequence. in this study, we analyzed the distribution in south korea of the novel hrsv on genotype, which has a -nt duplicated genetic region, and its rapid replacement of the non-duplicated strains. our results provide up-to-date sequence information on the prevalent hrsv genotypes in south korea and insights into hrsv evolution. in south korea, the first on genotype was discovered in august , which was the next season after eshaghi et al. identified the novel on genotype in canada ( (eshaghi et al., ) . however, the on genotype was estimated as emerged in with time scale evolution analysis and became the predominant strain in south korea in - . studies of hrsv-b show that the prevalence of the -nt duplicated ba genotype has fluctuated over the last years but all of the hrsv-b subgroups now belong to the ba genotype (baek et al., ; trento et al., ; van niekert and venter, ) . further long-term molecular epidemiological studies are required during consecutive seasons to determine whether the on genotype will replace the non-on genotypes completely. in addition to the introduction of a -nt duplication, the c-terminal region of the g gene has been the target of various amino acid sequence changes via substitutions and deletion. we found l p and y h substitutions, as well as a -nt deletion that caused a frame shift from amino acid position and a finally two amino acids longer than that in previously reported on strains (munywoki et al., ; auksornkitti et al., ; eshaghi et al., ; lee et al., ; prifert et al., ) . the nucleotide substitution in the duplicated region was also found in japan . however, this is the first report of the -nt deletion in the duplicated region, and we designated this as the on _os subgenotype in the present study. this on _os strain was isolated from eight patients who were enrolled between october and january in five cities in south korea (fig. c) . this finding suggests the replication and spread of these strains. the change of a amino acid region into a novel sequence at the c-terminal is quite long and we still do not know the structure of g protein, so it might be possible that structural transformations occurred because of the -nt duplication or other variations. in vitro and in vivo functional studies should be performed using this novel g protein during attachment to host cells, or in interactions with immune responses, to understand the effects of this major change. previous reports have shown that a major hrsv type dominates in a given season, although two or more different genotypes cocirculate at the same time with different levels (trento et al., ) . a similar prevalence pattern was also observed in the present study. for example, hrsv-a did not lead the infections of patients with acute respiratory illness during the - season, but hrsv-a was prevalent in the following season of - , which might be related to the emergence of the novel hrsv-a genotype in the human population. hrsv-a remained prevalent in the following season of - , but we found that the prevalent subgroup of the - season is hrsv-b in south korea (data not shown). the g protein is the major antigen of hrsv, and it is known to be highly antigenic with high genetic diversity, which may be related to frequent reinfection with hrsv (johnson et al., ; sullender, ; parveen et al., ) . variation and the positive selection for amino acid changes are focused in two hypervariable regions, and the c-terminal third of the g protein contains multiple epitopes (melero et al., ) . immune escape by new variants may contribute to the diversity and prevalence of hrsv. combinations of novel substitutions and flip-flop changes in amino acids are involved with various epitope transitions, which reflect the immune status of the human population. our epitope prediction analysis suggested the presence of a new epitope in the gg / strain, where a -nt deletion occurred after the -nt duplication. we also found that several positively selected sites, i.e., amino acids , , and , were located in common epitopes with a high probability, which also supported our hypothesis. furthermore, it was known that protective immunity through neutralizing antibody which was important for host defense against hrsv is of short duration (sande et al., ) . immunological evidence is also required to clarify this phenomenon, but periodic substitutions of the prevalent hrsv-a and -b subgroups, as well as variations in the g protein and short-lived immunity, might be accompanied by changes in the herd immune status of human populations with respect to specific genotypes (collins and graham, ) . we used bayesian skyline plots to infer the relative genetic diversity based on hrsv-a g genes collected between and , which may reflect the general trend. previous reports indicate that the rapid substitution rate of the g gene compared with other genes or the entire hrsv genome have contributed to its high evolutionary rate (tan et al., ) . most recently, balmaks et al. ( ) also described population dynamics of on genotype based on limited information of sequences and showed effective population size of the on genotype was expanded slowly and even decreased before the beginning of the season - . whereas, our data which were encompassed hrsv-a g genes collected in strongly indicate that the relative genetic diversity patterns of the hrsv-a g genes were significantly correlated with the emergence and prevalence of the hrsv-a on genotype, which was rapidly emerged and maintained as a predominant strain. in summary, our large-scale analysis of the hrsv-a g gene indicates that the emergence of the hrsv-a on genotype in south korea was correlated with an increase in genetic diversity. it remains to be clarified whether changes in the antigenicity of the g protein and/or substantial changes in the herd immune status have contributed to the rapid dominance of hrsv-a on . furthermore it will also be crucial to understand determining factor(s) of viral phenotype, fitness of the hrsv-on and its variant such as hrsv-on _os subgenotype or transmission/complete replacement like ba genotype of hrsv-b. although we conferred emergence of the strain into south korea might be mediated by overseas transmission through limited genetic analysis (data not shown), cumulative evolutionary information with time scale is still required to verify global spread of the on genotype precisely, therefore, meticulous and continuous monitoring of the evolutionary trends in the g gene is essential to obtain insights that may facilitate vaccine development and amendment of public health responses against hrsv infection. we investigated the emergence of the new hrsv-a on genotype having -nt duplication in g gene and rapid replacement during years. time scaled evolutionary study support the drastic increase of genetic diversity resulting to the prevalence of the new genotype which was subdivided around . in addition, we predicted the epitope of the duplicated g protein with an insertion of amino acids and the results suggest the antigenic variation of the hrsv g protein. intrapatient variation of the respiratory syncytial virus attachment protein gene molecular characterization of human respiratory syncytial virus, - : identification of genotype on and a new subgroup b genotype in thailand prevalence and genetic characterization of respiratory syncytial virus (rsv) in hospitalized children in korea molecular epidemiology of human respiratory syncytial virus over three consecutive seasons in latvia viral and atypical bacterial detection in acute respiratory infection in children under five years positive selection results in frequent reversible amino acid replacements in the g protein gene of human respiratory syncytial virus current progress on development of respiratory syncytial virus vaccine respiratory viruses in neonates hospitalized with acute lower respiratory tract infections genetic diversity and molecular epidemiology of the g protein of subgroup a and b of respiratory syncytial viruses isolated over consecutive epidemics in korea viral and host factors in human respiratory syncytial virus pathogenesis genetic variation in attachment glycoprotein genes of human respiratory syncytial virus subgroups a and b in children in recent five consecutive years bayesian phylogenetics with beauti and the beast . predicting linear b-cell epitopes using string kernels genetic variability of human respiratory syncytial virus a strains circulating in ontario: a novel genotype with a nucleotide g gene duplication biological challenges and technological opportunities for respiratory syncytial virus vaccine development respiratory syncytial virus-associated hospitalizations among children less than months of age cd-hit suite: a web server for clustering and comparing biological sequences the g glycoprotein of human respiratory syncytial viruses of subgroups a and b: extensive sequence divergence between antigenically related proteins advances in and the potential of vaccines for respiratory syncytial virus displacement of predominant respiratory syncytial virus genotypes in malaysia between molecular evolution of attachment glycoprotein (g) gene in human respiratory syncytial virus detected in japan improved method for predicting linear bcell epitopes complete genome sequence of human respiratory syncytial virus genotype a with a -nucleotide duplication in the attachment protein g gene antigenic structure, evolution and immunobiology of human respiratory syncytial virus attachment (g) protein severe lower respiratory tract infection in early infancy and pneumonia hospitalizations among children antibody response to the central unglycosylated region of the respiratory syncytial virus attachment protein in mice global burden of acute lower respiratory infections due to respiratory syncytial virus in young children: a systematic review and meta-analysis genetic diversity among respiratory syncytial viruses that have caused repeated infections in children from rural india jmodeltest: phylogenetic model averaging novel respiratory syncytial virus a genotype emboss: the european molecular biology open software suite epitopia: a web-server for predicting b-cell epitopes respiratory syncytial virus: current progress in vaccine development kinetics of the neutralizing antibody response to respiratory syncytial virus infections in a birth cohort the path to an rsv vaccine emerging genotypes of human respiratory syncytial virus subgroup a among patients in japan improved method for linear b-cell epitope prediction using antigen's primary sequence respiratory syncytial virus genetic and antigenic diversity genetic diversity of the attachment protein of subgroup b respiratory syncytial viruses mega : molecular evolutionary distance, and maximum parsimony methods the comparative genomics of human respiratory syncytial virus subgroups a and b: genetic variability and molecular evolutionary dynamics ten years of global evolution of the human respiratory syncytial virus ba genotype with a -nucleotide duplication in the g protein gene natural history of human respiratory syncytial virus inferred from phylogenetic analysis of the attachment (g) glycoprotein with a -nucleotide duplication genetic analysis of attachment glycoprotein (g) gene in new genotype on of human respiratory syncytial virus detected in japan replacement of previously circulating respiratory syncytial virus subtype b strains with the ba genotype in south africa palivizumab for immunoprophylaxis of respiratory syncytial virus (rsv) bronchiolitis in high-risk infants and young children: a systematic review and additional economic modelling of subgroup analyses prediction of b-cell linear epitopes with a combination of support vector machine classification and amino acid propensity identification paml : phylogenetic analysis by maximum likelihood prophylactic and therapeutic testing of nicotiana-derived rsvneutralizing human monoclonal antibodies in the cotton rat model genetic variability of respiratory syncytial viruses (rsv) prevalent in southwestern china from to : emergence of subgroup b and a rsv as dominant strains evolview, an online tool for visualizing, annotating and managing phylogenetic trees this study was supported by the intramural research fund ( -ng - ) of the korea national institute of health. key: cord- -p u qa authors: zhang, lei; han, xiaohong; shi, yuankai title: comparative analysis of sars-cov- receptor ace expression in multiple solid tumors and matched non-diseased tissues date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: p u qa the emerging severe acute respiratory syndrome coronavirus (sars-cov- ) poses a global public health emergency. sars-cov- employs the host cell receptor ace for cellular entry. nonetheless, the differences in ace expression pattern in lung versus other normal and solid tumor tissues remain incompletely characterized. here, we analyze a large data set comprising ace mrna expression for tissue samples across types of primary solid tumor and samples across matched non-diseased tissues. our results unravel eight normal tissues and primary solid tumors, which might be at high risk of sars-cov- infection. these findings may provide additional insight into the prevention and treatment of sars-cov- infection, in particular for patients with these vulnerable cancer types. manifestations have been observed (n. guan et al., ; huang et al., ; wang et al., ) ,such as diarrhea, nausea or vomiting, liver abnormality, acute cardiac injury, and acute kidney injury. it is reported that cancer patients might harbor a higher risk of sars-cov- infection and inferior prognosis than those in infection without cancer (liang et al., ) . however, whether a heterogeneity of risk for infection exists among various cancer types remains unclear. here, we retrieved ace mrna expression data of tissue samples across primary solid tumor types in the cancer genome atlas (tcga) and samples across matched nondiseased tissues in genotype-tissue expression (gtex) from ucsc xena (https://xena.ucsc.edu), where tcga and gtex data were co-analyzed by the same toil rna-seq pipeline to eliminate computational batch effects (vivian et al., ) . the expression values of ace were quantified by rna-seq by expectation-maximization algorithm (li and dewey, ) and then normalized using the upper quartile method. the normalized values were log -transformed after adding an offset of to avoid taking log of zero before analysis. further, virus abundances for tcga tumors, quantified by numbers of virus-supporting reads per hundred million reads processed (rphm), were obtained (cao et al., ) . we defined a tumor sample with rphm  for a given virus as virus-positive and examined ace expression across seven tumor types with frequent viral presence according to the previous study (cao et al., ) (supplementary table) . mann-whitney u test was used to compare the expression level between two groups. this study is exempt from ethical review because its data are publicly available and deidentified. as shown in fig. , we observed a widespread distribution of ace in these normal tissues, which is consistent with previous reports (harmer et al., ) . notably, eight normal tissues, including testis, kidney, thyroid, pancreas, breast, esophagus, liver, and ovary, had significantly higher ace levels than lung as a reference (all p < . ), and expression levels in colon and bladder were similar to that in lung (both p > . ). the differences in the expression abundance between these tissues and lung indicate possible sars-cov- infection in extrapulmonary organs. for instance, j o u r n a l p r e -p r o o f journal pre-proof highest ace abundance in testis may implicate its great possibility of sars-cov- exposure. of note, a pathological analysis of testes from six patients who died of sars showed that orchitis is a complication of sars (xu et al., ) . thus, we propose strengthening follow-ups for reproductive functions of recovered sars-cov- male patients. additionally, comparably high expression observed in kidney, liver, and colon may partially contribute to acute kidney injury, liver impairment, and diarrhea at onset of covid- , respectively (n. guan et al., ; huang et al., ; wang et al., ) . interestingly, our finding that ace was highly expressed in breast appears to be in contrast to a retrospective study on nine pregnant women with covid- in the third trimester, in which the colostrum from six patients tested negative for sars-cov- (h. . however, considering the small sample size and short duration of the study period, the risk of vertical transmission via breastfeeding deserves further investigations. cancer cases analyzed in the study by liang et al. ( ) is, however, limited in number (five lung cancers, four colorectal cancers, three breast cancers, two bladder cancers, and four other types of cancer). this restricts its ability to draw conclusions about the risk of sars-cov- infection in subgroups of specific cancer type. we hypothesize that another contributing mechanism may be increased likelihood of sars-cov- entry into certain cancer tissues due to aberrantly abundant ace expression. therefore, during this pandemic, we propose reinforcing personal protection, j o u r n a l p r e -p r o o f journal pre-proof such as remote medical counselling, minimizing the number of hospital visits, and appropriate isolation procedures when admitted to hospitals, for cancer patients, especially patient subgroups with these solid tumor types. interestingly, our analysis for seven types of tumors related to viruses revealed that virus-positive hnsc samples showed significantly lower ace abundance than virus-negative ones (p < . ) (fig. ) , suggesting potential viral roles of ace expression in hnsc. in conclusion, we performed the first, to our knowledge, large-scale comparative analysis of ace expression across multiple solid tumors and matched non-diseased tissues based on a consistently analyzed expression repository, which highlights eight normal tissues and primary solid tumors with potentially similar or greater risk of sars-cov- exposure compared with lung, and identify a potential association between hnsc-related viruses and ace expression. given the large sample sizes for these risky candidates, our results may be statistically robust and reliable; notably, we did not use normal tissue samples from tcga (adjacent to the tumor), because they are typically limited in number and their proximity to tumor may introduce signals of tumor microenvironment in their ace expression profile (aran et al., ) ; moreover, we used expression data unified by a standardized bioinformatic pipeline (toil rna-seq), enabling the direct comparison of ace expression level from two sources (tcga and gtex). our findings may contribute additional insight into the prevention and treatment of covid- , especially in patient subgroups with certain vulnerable cancer types. however, further clinical and autopsy studies are required to validate these findings. box plots display the median and interquartile range, whiskers extend to . times the interquartile range, and outlier data are shown as dots. **, p < . ; ns, not significant. cesc, cervical squamous cell carcinoma and endocervical adenocarcinoma; coad, colon adenocarcinoma; esca, esophageal carcinoma; hnsc, head and neck squamous cell carcinoma; lihc, liver hepatocellular carcinoma; read, rectum adenocarcinoma; stad, stomach adenocarcinoma. comprehensive analysis of normal adjacent to tumor transcriptomes divergent viral presentation among human tumors and adjacent normal tissues clinical characteristics and intrauterine vertical transmission potential of covid- infection in nine pregnant women: a retrospective review of medical records epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of coronavirus disease in china quantitative mrna expression profiling of toil enables reproducible, open source, big biomedical data analyses clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention orchitis: a complication of severe acute respiratory syndrome (sars) a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord- -lyvvtwvg authors: li, huiping; tang, cheng; yue, hua title: molecular detection and genomic characteristics of bovine kobuvirus from dairy calves in china date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: lyvvtwvg in this study, diarrheic and non-diarrheic fecal samples from dairy calves were collected from dairy farms in provinces to investigate the molecular prevalence and genomic characteristics of bovine kobuvirus (bkov) in china. the results showed that the bkov positive rate for the diarrheic feces ( . %) was significantly higher than that for the non-diarrheic feces ( . %, p < . ). interestingly, three potential novel vp lineages were identified from complete vp sequences, and a unique triple nucleotide insertion which can result in an aa insertion, was first observed in the / vp fragments with bp long in this study, compared with known bkov vp sequences. moreover, the first chinese bkov genome was successfully obtained from a diarrheic fecal sample, named chz/china. the open reading frame (orf) of the genome from strain chz/china shares . %– . % nucleotide (nt) and . %– . % amino acid (aa) identity, compared with the three known genomes of bkov. interestingly, phylogenetic tree based on aa sequences of these genomes showed that chz/china was clustered into an independent branch, suggesting the strain may represent a novel bkov strain. the findings contribute to better understanding the molecular characteristics and evolution of bkov. bovine kobuvirus (bkov) is a member of aichivirus b, and another member in aichivirus b is the sheep kobuvirus (khamrin et al., ; reuter et al., ) . since bkov was first identified in japan in (yamashita et al., ) , this virus has been detected in bovine with and without diarrhea symptomatology in countries (liu et al., ) . it had been suggested that bkov may be associated with diarrhea in calves (candido et al., ) , but the pathogenicity of bkov still needs to be determined. recently, bkov was also detected in the spinal fluid from the brain of an day-old calf where the animal had a history of diarrhea and neurological disease (moreira et al., ) , which indicates that this virus can cause systemic infections. the bkov genome is approximately . - . kb long and has the typical kobuvirus genome organization comprising a leader (l) protein, followed by structural (capsid proteins vp , vp , and vp ) and nonstructural ( a- c and a- d) proteins. in bkov, vp is the most variable immune determinant protein (yamashita et al., ) , making it appropriate for genetic typing (oh et al., ; pham et al., ; shi et al., ) . the function of vp and vp of bkov protein remains unclear, but vp (residues - ) in aichi virus (aiv) may be involved in cellular receptor recognition (zhu et al., ) . the vp viral protein in porcine kobuvirus (pkv) may play a immune evasion role via the ifn signaling pathway (peng et al., ) . the aim of this study was to further investigate the molecular prevalence and genomic characteristics of bkov in china. from january to april , a total of diarrheic and non-diarrheic were collected from dairy farms across four the chinese provinces of liaoning (three farms), henan (three farms), shandong (three farms) and shanxi (five farms). the ages of the tested calves ranged from days old to months old. all samples were shipped on ice and stored at − °c in sterile -ml centrifuge tubes. the fecal samples were fully resuspended in phosphate-buffered saline ( : ) and centrifuged at , ×g for min, followed by filtration through a . -μm filter. viral rna was extracted from μl of each fecal suspension using rnaios plus (takara bio inc., japan) according to the manufacturer's instructions. the cdna was synthesized using the primescript™ rt reagent kit according to the manufacturer's https://doi.org/ . /j.meegid. . instructions (takara bio inc.) and then stored at − °c. bkov was detected by an rt-pcr assay targeting a bp fragment of the d gene according to previous report (jeoung et al., ) . to screen for the presence of co-infections with bovine rotavirus (brv), bovine coronavirus (bcov), and bovine viral diarrhea virus (bvdv), all the bkovpositive diarrheic samples were subjected to specific rt-pcr assays for these viruses (guo et al., ; zheng et al., ) . the complete vp ( nt) sequences were amplified from the bkov-positive samples according to previous report (liu et al., ) . a pair of primers was designed based on known bkov vp sequences, located at positions - in chz/china genome sequence. moreover, pairs of primers (table s ) were used to amplify the genome sequence of the bkov chz/china strain. all pcr products were purified using the omega gel kit (omega), cloned into the pmd -t simple vector (takara bio inc.), and then sequenced (sangon biotech) in both directions. sequences were assembled using seqman software (version . ; dnastar inc., wi, usa). single open reading frames (orfs) were identified using the online orf finder at https://www. ncbi.nlm.nih.gov/orffinder/. nt and deduced aa sequence homologies were determined using the megalign program in dnastar . software (dnastar inc.). phylogenetic trees were constructed using the maximum likelihood method with the jukes-cantor model, bootstrap replicates and default parameters in mega . the bkov detection rates for the diarrhea and healthy samples were calculated and compared using the epi info statistical program (version . ), and values of p < . were regarded as statistically significant. among the diarrheic samples, were detected as bkov positive ( . %), and out of non-diarrheic samples were detected as bkov positive ( . %), as shown in table . the bkov detection rate for the diarrheic samples was significantly higher than that for the nondiarrheic samples (p < . ), suggesting that the virus may be associated with diarrhea in calves. however, of the bkov positive diarrheic samples, were confirmed to be co-infection positive for other viruses, as shown in table s . therefore, further investigations are needed to better understand the role of bkov infection in cattle with diarrhea.it may be that calf challenge experiments will be the only way to answer the question as to whether bkov causes diarrhea. complete vp sequences were successfully obtained for fecal samples (genbank accession numbers mk -mk ), which share . %- . % nt sequence identity ( . %- . % aa identity) with each other, and share . %- . % nt sequence identity ( . %- . % aa identity) with vp sequences in the genbank database. a phylogenetic tree analysis based on all available complete sequences for bkov vp and the vp sequences from the present study shows that all the vp sequences fall into distinct branches (fig. ) . / vp sequences from our study were clustered into the known lineage . interestingly, the remaining / vp sequences from our study were clustered into independent branches distinct from the known lineages, suggesting that these branches may represent novel lineages. the potential lineages , , share . %- . %, . %- . % and . %- . % nt sequence identity with all complete bkov vp , including the vp sequences from the present study, respectively. furthermore, potential lineage contains only strain of b /hn/china strains; potential lineage contains strains of /ln/ china and /ln/china, which share . % nt sequence identity with each other; potential lineage contains strains of /ln/china, /ln/ china, a /sd/china, a /sd/china and b /hn/china strains, which share . %- % nt sequence identity with each other. further analysis found that the strain in potential lineage has out of unique aa mutations (p s, t s, r k, t a, and y l), strains in potential lineage share out of unique aa mutations (r q and i v), and strains potential lineage share out of unique aa mutations (e d), compared to all known complete vp sequences of bkov. a previous study reported that a strain with a nt identity < . % should be considered a novel lineage (liu et al., ) , but the value of . % may be biased, due to the limitation of sequence numbers of bkov vp at that time. currently, the precise biological function of bkov vp remains unclear. however, information from aiv shows that vp is the structural protein to the enteric receptor recognition and may be involved in viral pathogenesis (zhu et al., ) . as vp is the most variable immune determinant protein in kobuvirus, vp in bkov, aiv and pkv had been divided into different lineages (liu et al., ; oh et al., ; pham et al., ; shi et al., ) . further investigations are needed to better understanding the antigenicity of different vp lineages. vp fragments were successfully obtained from fecal samples (genbank accession numbers mk -mk ). the sequences were found to share . %- % nt and . %- % aa sequence identity with each other, and . %- . % nt and . %- . % aa sequence identity with the only bkov vp sequences in the genbank database. a phylogenetic tree based on the three known vp aa sequences and the sequences from this study revealed that of the bkov strains from the present study clustered on an independent branch. the remaining strain from this study and known strains clustered on an independent branch (fig. ) . further analysis found that the / sequences in an independent branch shared a unique aa insertion in two forms (s or n ) in the vp region, showing that the vp type was the high frequency in chinese bkov. interestingly, a unique aa insertion within the vp region also found in aiv strains (pham et al., ) . hence, whether the variation in vp is a potential evolutionary characteristic in kobuvirus needs further to be investigated. until now, there are complete bkov genomes (u- , sc , egy- ) in the genbank database, contributing to understanding the genetic characteristics of bkov. in this study, we added to a nearly complete bkov strain (genbank accession no. mk ) genome of nt in length which contains the bp complete orf, which is the first bkov genome from china. compared with known bkov, with the exception that the chz/china strain's vp sequence is nt longer, the lengths of the other chz/china genes are identical to those of the other three genomes (table s ) and shares . %- . % nt and . %- . % aa identity. further phylogenetic analysis based on genomic sequences revealed that chz/china clusters on an independent branch, with the three vp , vp , vp protein aa sequences generating the same result (fig. ) , showing that chz/china displays a larger genetic distance from the other three genomes and indicating that chz/china may represent a novel bkov strain. moreover, the most significant difference between chz/china and other bkov table sample collection and bkov detection information. the maximum likelihood phylogenetic tree based on complete vp nt sequences. black circles denote isolates from the present study and hollow circles denote isolates from a previous chinese study (liu et al., ) . bootstrap values based on replicates are shown on the nodes. strains is the vp protein. and vp from the chz/china strain contains unique aa mutations, and a unique triple nt insertion which can result in an aa insertion. the function of bkov vp remains unclear. but, both vp and vp in aiv may be involved in cellular receptor recognition (zhu et al., ) and viral pathogenesis (adzhubei et al., ) . thus, it is worth studying the functional effects of this unique vp aa mutation in bkov strains. in conclusion, the results of this study showed that three potential novel vp lineages in bkov were identified and a unique bkov vp sequence type was found in diarrheic feces. the first nearly complete bkov genome was obtained and phylogenetic analysis shows that this strain may represent a novel bkov strain. these data contribute to further understanding of the molecular characteristics and genetic evolution of bkov. depositories more information about sequences is in the genbank database:mk -mk . this study did not involve animal experiments besides the fecal sampling of diarrhea calves that visited farm for clinical treatment. none. polyproline-ii helix in proteins: structure and function molecular characterization and genetic diversity of bovine kobuvirus, brazil detection and molecular characteristics of neboviruses in dairy cows in china three clusters of bovine kobuvirus isolated in korea epidemiology of human and animal kobuviruses prevalence and genetic diversity of bovine kobuvirus in china identification by next-generation sequencing of aichivirus b in a calf with enterocolitis and neurologic signs: a cautionary tale molecular characterization of the first aichi viruses isolated in europe and in south america kobuvirus vp protein restricts the ifn-β-triggered signaling pathway by inhibiting stat -irf and stat -stat complex formation sequence analysis of the capsid gene of aichi viruses detected from japan kobuvirus in domestic sheep molecular characterization of a porcine kobuvirus variant strain in china isolation and characterization of a new species of kobuvirus associated with cattle molecular investigation of bovine viral diarrhea virus infection in yaks (bos gruniens) from qinghai, china structure of human aichi virus and implications for receptor binding this work was funded by the th five-year plan national science and technology support program (grant number yfd ) and the innovation team for animal epidemic diseases prevention and control on qinghai-tibet plateau, state ethnic affairs commission (grant number td ). we thank sandra cheesman, phd, from liwen bianji, edanz group china (www.liwenbianji.cn/ac), for proofreading the english grammar of drafts of this manuscript. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . key: cord- - rsfs d authors: yan, nan; tang, cheng; kan, ruici; feng, fan; yue, hua. title: genome analysis of a g p[ ] group a rotavirus isolated from a dog with diarrhea in china date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: rsfs d genotype g is an emerging genotype among species a rotavirus (rva) circulating in humans and pigs worldwide. in this study, an rva strain designated rva/dog-tc/chn/sccd-a/ /g p[ ] was isolated in cell culture from a pet dog stool sample with acute diarrhea, and its whole genome was sequenced. the genotype constellation of sccd-a was g -p[ ]-i -r -c -m -a -n -t -e -h . all genome segments except the vp gene were closely related to the genes from porcine rva strains or porcine-like human rva strains. on the other hand, the vp gene clustered in a distinct lineage only with that of a g p[ ] porcine-like human rva, preventing the identification of the exact host species origin, but very unlikely to be originated from human rva. in addition, phylogenetic analysis showed that the g vp gene of sccd-a clustered into a novel sublineage within the lineage iii of g . this first isolation of a g p[ ] rva from a pet dog may justify the exploration of the role dogs play in the interaction of rva circulating in pigs and humans. rotavirus a (rva), family reoviridae, are the major pathogens causing diarrhea in animals and children worldwide. the rva virion encapsidates a genome of dsrna segments, encoding six structural viral proteins (vp -vp , vp and vp ) and five or six nonstructural proteins (nsp -nsp / ) (vlasova et al., ) . rva has two outer capsid proteins, vp and vp , which define the g and p genotypes, respectively. to date, at least g and p genotypes have been identified by the rotavirus classification working group (rcwg) (https://rega.kuleuven.be/cev/viralmetagenomics/virus-classification/ rcwg). for highly genetically diverse rva strains, the dual (g/p) typing system was extended to a full-genome sequence classification system, with the notations gx-p[x]-ix-rx-cx-mx-ax-nx-tx-ex-hx used for the genes encoding vp -vp -vp -vp -vp -vp -nsp -nsp -nsp -nsp -nsp / , respectively (matthijnssens et al., ) . in dogs, rva is usually associated with mild diarrhea in puppies (greene, ) . previously, only the g p[ ] genotype had been described in dogs, until a bovine g p[ ] genotype was isolated from a young dog in (luchs et al., ; sieg et al., ) . the isolation and characterization of g p [ ] have been extensively reported in dogs, but information about the prevalence of rva in dogs is still limited (otto et al., ; ortega et al., ; alves et al., ) . as canine rvas have zoonotic potential between humans and dogs, further investigation of the molecular prevalence of rva in dogs is needed (wu et al., ; luchs et al., ; papp et al., ) . g rotavirus is an emerging genotype spreading worldwide in pigs and humans (wu et al., ) . in mainland china, the first g strain was detected in , and g strains were uncommon before (li et al., ) . recently, however, g rva has emerged as the predominant genotype in china (dian et al., ; yu et al., ) . the rva g p[ ] genotype has been detected in thailand, the united states, japan, south korea, italy, belgium, brazil, mainland china and taiwan . recently, a g p[ ] strain was isolated from a child with severe diarrhea in thailand, showing that porcine g p[ ] can infect humans directly (komoto et al., ) . in this study, an rva positive sample, detected by rt-pcr as described by ortega et al. ( ) , was used to isolate the virus. moreover, this fecal sample was detected as negative for canine parvovirus type , canine coronavirus and canine distemper virus by pcr assay (decaro et al., ; decaro et al., ; elia et al., ) . the sample was collected from a three-month-old labrador with acute diarrhea in october at the animal hospital of southwest university for nationalities in sichuan province, china. rva isolation was conducted on embryonic rhesus monkey kidney tissue cells line (ma- cells, atcc crl- . ) as previously described . characterized by cell shrinking, cell layer splitting, lysis, detachment, and shedding, was observed at h post-infection (hpi). after four continuous passages, virus cytopathic effect (cpe) was more visible, and the time of occurrence of cpe was stable. the fourth-generation cultures were used to plaque purification the virus. in passages - , the time of occurrence of cpe was stable at hpi. the vp , vp and vp genes of the purified virus were sequenced at passages , , and , and they shared % nucleotide sequence identity between each passage, which indicated that the virus stock contained only a single virus strain. this rva strain was designated rva/dog-tc/chn/sccd-a/ /g p [ ] . at passage , virus titration was performed in -well plates with tenfold serial dilutions with eight replicates per dilution. the virus titer was determined by the reed-muench method, and endpoints were expressed as % tissue culture infective dose (tcid )/ml, the virus titer was . tcid / ml. to determine the rva genome sequence, pairs of primers were designed (table s , available in the online supplementary material). viral rna was extracted from culture-adapted virus using rnaiso plus® (takara bio inc., japan) and then reverse transcribed into cdna using the primescript™ rt reagent kit (takara bio inc., japan) according to the manufacturer's instructions. the resulting cdna was stored at − °c for pcr amplification. the pcr products were purified and cloned into the pmd -t simple vector prior to sequencing, and the sequences were assembled using seqman software (version . ; dnastar). the open reading frame (orf) was identified by orf finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). genotype assignments were carried out using the rotac v . online tool according to the genotyping recommendations of the rcwg (maes et al., ) . sequence identity analyses were performed with aligned nucleotide and amino acid sequences by the clustalw method using the megalign . program (dnastar). phylogenetic trees were constructed using the maximum likelihood method with the jukes-cantor model, bootstrap replicates and default parameters in mega . recombination events were assessed using simplot software (version . . ) and rdp . with rdp, geneconv, chimaera, maxchi, bootscan, siscan, seq methods and lard. the genome of rva/dog-tc/chn/sccd-a/ /g p[ ] was successfully determined, and the constellation of this strain was g -p [ ]-i -r -c -m -a -n -t -e -h . to exclude the possibility of virus reassortment between rva strains that might have been presented in the clinical sample during the cell culture adaptation, we amplified fragments of all genome segments from the original fecal sample. we confirmed that the nucleotide sequences were % identical between the virus genome present in the original fecal sample and the cultureadapted strain. the accession numbers for the segments for vp -vp , vp , vp and nsp -nsp were deposited in genbank (mh -mh ). according to nucleotide identity of the n. yan, et al. infection, genetics and evolution ( ) - coding region sequences, the nsp , nsp , nsp , vp , vp , vp , vp and vp genes were closely related to cognate genes of porcine rotavirus strains. the sequence closest to the nsp gene of rva/dog-tc/ chn/sccd-a/ /g p [ ] was that of human strain r which was previously shown to be of porcine rotavirus origin (wang et al., ) . similarly, the sequence closest to the nsp gene of sccd-a was that of human strain nt which was previously shown to be of porcine rotavirus origin (do et al., ) . on the other hand, the sequence closest to the vp gene of sccd-a was that of strain ll which was reported to be the result of a porcine rotavirus having transmitted to a human, but the phylogenetic tree for the vp genes showed that strains sccd-a and ll clustered together in an independent group that was distinct from any of the previously established lineages (fig. ) . as the exact host species origin of the vp gene of strain ll was reported to be indeterminable (li et al., ) , so was the origin of the vp gene of sccd-a. the nucleotide and amino acid identity of genes in strain sccd-a with the closest sequences was shown in table , the phylogenetic tree of vp was shown in fig. . the phylogenetic tree of vp and vp genes were shown in fig. s and fig. s (phan et al., ; shi et al., ; esona et al., ) . phylogenetic tree based on complete vp gene coding region nucleotide sequence bp, sequence alignment and clustering were performed by clustalw using mega . software. the tree was constructed by the maximum likelihood method with bootstrap values calculated for replicates. ▲ marks the strain in this study. prevalence of the g p[ ] strain in dogs is required. due to rotavirus is transmitted via fecal-oral transmission, whether the infectious source of this virus in the dog was from dietary contamination needed to be further investigated. rva vp defines the g genotype and induces neutralizing antibodies (aoki et al., ) . to accurately understand the antigenicity of g rva, it was proposed that g can be divided into six distinct lineages (i-vi) based on the vp nucleotide sequence (phan et al., ) . lineages i, ii, iv and v are found only in humans, while lineages iii (with sublineages a-d) and vi (with sublineages a-g) are found in both humans and pigs (shi et al., ; esona et al., ) . according to previous reports about the division of g lineages (phan et al., ; shi et al., ; esona et al., ) , all strains in the three papers which represented every lineages and sublineages in g and g sequences which had the highest similarity with strain sccd-a in the genbank were used for phylogenetic analysis. vp of strain rva/dog-tc/chn/ sccd-a/ /g p[ ] clustered into lineage iii, but was located at a unique sub-branch with six g strains (kc . , ky . , kt . , kt . , kj . and kc . ), which were distinct from the other known sublineages in lineage iii, indicating that these seven strains may represent a novel sublineage of lineage iii (fig. ) . only representative strains and strains of potential novel sublineage were retained in fig. . the table of amino acids comparison between the potential novel lineage and other lineages was shown in table s (available in the online supplementary material). no recombination event was identified in these strains, g lineage iii is an emerging genotype, and both porcine and human strains of lineage iii might have a common progenitor (phan et al., ) . in conclusion, a g p[ ] rva strain, named rva/dog-tc/chn/ sccd-a/ /g p[ ], was isolated from a young pet dog with acute diarrhea, and the genome constellation of this strain was g -p[ ]-i -r -c -m -a -n -t -e -h . this strain was considered as a porcine origin of rva strain, which indicated that dogs may play a role in rva g p[ ] circulating in pigs and humans. the potential public health significance this poses, because of close contact between dogs and humans, highlights the necessity of further surveillance for this virus in dogs. to our knowledge, this is the first isolation of the g p[ ] genotype from dogs. the full-length genome of rva/dog-tc/chn/sccd-a/ /g p [ ] has been deposited in genbank under accession no. mh -mh . identification of enteric viruses circulating in a dog population with low vaccine coverage structure of rotavirus outer-layer protein vp bound with a neutralizing fab quantitation of canine coronavirus rna in the faeces of dogs by taqman rt-pcr a realtime pcr assay for rapid detection and quantitation of canine parvovirus type in the feces of dogs completely genomic and evolutionary characteristics of human-dominant g p group a rotavirus strains in yunnan molecular epidemiology of rotavirus a, causing acute gastroenteritis hospitalizations among children in nha trang detection of canine distemper virus in dogs by real-time rt-pcr genetic relationships with other g strains and detection of a new g subtype infectious diseases of the dog and cat identification and characterization of a human g p[ ] rotavirus strain from a child with diarrhoea in thailand: evidence for porcine-to-human interspecies transmission molecular characterization of unusual human g p[ ] rotaviruses identified in china molecular epidemiology of g rotavirus strains in children with diarrhoea hospitalized in mainland china from rare g p[ ] rotavirus strain detected in brazil: possible human-canine interspecies transmission rotac: a web-based tool for the complete genome classification of group a rotaviruses full genome-based classification of rotaviruses reveals a common origin between human wa-like and porcine rotavirus strains and human ds- -like and bovine rotavirus strains identification of co-infection by rotavirus and parvovirus in dogs with gastroenteritis in mexico detection of rotavirus species a, b and c in domestic mammalian animals with diarrhoea and genotyping of bovine species a rotavirus strains full-genome sequencing of a hungarian canine g p[ ] rotavirus a strain reveals high genetic relatedness with a historic italian human strain genetic heterogeneity, evolution and recombination in emerging g rotaviruses molecular characterization of a rare g p[ ] porcine rotavirus isolate from china a bovine g p[ ] group a rotavirus isolated from an asymptomatically infected dog porcine rotaviruses: epidemiology, immune responses and control strategies full genomic analysis of a porcine-bovine reassortant g p[ ] rotavirus strain r isolated from an infant in china the dynamics of a chinese porcine g p[ ] rotavirus production in ma- cells and intestines of -day-old piglets putative canine origin of rotavirus strain detected in a child with diarrhea novel g rotavirus strains co-circulate in children and pigs prevalence of rotavirus and rapid changes in circulating rotavirus strains among children with acute diarrhea in china the authors declare that they have no conflicts of interest. this study did not involve animal experiments besides the fecal sampling of diarrhea dogs that visited animal hospitals for clinical treatment. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . key: cord- -lt uo q authors: saha, indrajit; ghosh, nimisha; maity, debasree; sharma, nikhil; sarkar, jnanendra prasad; mitra, kaushik title: genome-wide analysis of indian sars-cov- genomes for the identification of genetic mutation and snp date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: lt uo q the wave of covid- is a big threat to the human population. presently, the world is going through different phases of lock down in order to stop this wave of pandemic; india being no exception. we have also started the lock down on rd march . in this current situation, apart from social distancing only a vaccine can be the proper solution to serve the population of human being. thus it is important for all the nations to perform the genome-wide analysis in order to identify the genetic variation in severe acute respiratory syndrome coronavirus- (sars-cov- ) so that proper vaccine can be designed. this fast motivated us to analyze publicly available indian complete or near complete sars-cov- genomes to find the mutation points as substitution, deletion and insertion. in this regard, we have performed the multiple sequence alignment in presence of reference sequence from ncbi. after the alignment, a consensus sequence is build to analyze each genome in order to identify the mutation points. as a consequence, we have found substitutions, deletions and insertions, in total unique mutation points, in genomes across . k bp. further, it has been classified into three groups as clusters of mutations (mostly deletions), point mutations as substitution, deletion and insertion and snps. these outcomes are visualized using biocircos and bar plots as well as plotting entropy value of each genomic location. moreover, phylogenetic analysis has also been performed to see the evolution of sars-cov- virus in india. it also shows the wide variation in tree which indeed vivid in genomic analysis. finally, these snps can be the useful target for virus classification, designing and defining the effective dose of vaccine for the heterogeneous population. severe acute respiratory syndrome coronavirus- (sars-cov- ), generally known as covid- , which originated in wuhan, china (zhu et al., ) , has wreaked havoc on human lives and declared as a pandemic by world health organisation on th march, . among others, symptoms of sars-cov- include fever, cough and shortness of breath (chen et al., ) . in more severe cases, infection may lead to pneumonia (zhou et al., ) , kidney failure and eventual death. as of now, no vaccine or medicine has been invented or discovered and the only protective measures are being taken by different countries are through lock downs and social distancing. however, even these extreme measures have not been able to contain the sars-cov- . everyday thousands of new cases are coming into light. according to the record of th june, globally more than . million people are affected by this deadly virus, with a total reveals that the sars-cov- is a single-stranded enveloped rna virus with a genome length of . kilobases (cui et al., , su et al., , weiss & navas-martin, , zhou et al., . it has coding regions as reported in ncbi that can encode orf ab polyproteins, spike (s) glycoprotein, envelope (e) protein, membrane (m) glycoprotein, nucleocapsid (n) protein and accessory proteins such as orf a, orf , orf a, orf b, orf and orf . it has also been reported that several non-structural proteins (nsp) are encoded from open reading frame (orf). the genomic orientation of sars-cov- virus is shown in fig. . the strain of this virus is novel and the understanding the genetic variability as mutation of this virus in different nations is still very limited, especially the coding region of open reading frame (orf). generally, the mutation occurs when an error is incorporated in a viral genome (fleischmann, ) . it can also be considered to be a coping mechanism with genomic damage. as a consequence, the resultant mutated strain may cause an outbreak in human host like the case with sars-cov- . the dna mutation can be of three types: base substitution, deletion and insertion. moreover, if the substitution occurs more than % of the population, it can be considered as single nucleotide polymorphism (snp). such polymorphism usually different from the mutation as it creates a variant in the population while mutation keeps the population same (pavlovic-lazetic et al., ) . on the other hand, rna viruses have high mutation rates (jenkins et al., , woo et al., . thus it is difficult to identify the proper string of the virus. subsequently, designing and define the dose of the vaccine is also very challenging task (paital et al., ) . in this regard, chothe et al. used snp on sequences of bovine herpesvirus- (bohv- ) (chothe et al., ) , which was affecting cattle and causing respiratory illness, to cluster them into three groups with two different vaccine groups and one distinct cluster of field isolates. based on this information, they developed an snp-based pcr assay to show differentiation between to address the above facts, we have analyzed publicly available indian complete or near complete sars-cov- genomes in order to find the mutation points as substitution, deletion j o u r n a l p r e -p r o o f and insertion. for this purpose, multiple sequence alignment (wallace et al., ) is performed in presence of reference sequence from ncbi. thereafter, a consensus sequence is build to analyze each genome to identify the mutation points. as a result, we have found substitutions, deletions and insertions, in total unique mutation points, in genomes genomes across . k bp. further, it has been classified into three groups (a) cluster of mutation points if the mutation appears more than two times in consecutive genomic positions (b) point mutations as substitution, deletion and insertion that are not present in clusters (c) single nucleotide polymorphism (snp) that appeared more than % of the population of sars-cov- used in our study. finally, clusters of mutation (mostly deletions), point mutations as substitution, deletion and insertion and snps out of categories (a) and (b) have been identified as they appeared more than % of the population i.e. times in indian sars-cov- genomes. these outcomes are visualized using biocircos and bar plots as well as plotting entropy value of each genomic location. moreover, phylogenetic analysis (stuessy, ) has also been performed to see the evolution of sars-cov- virus in india. in this section, we have discussed the source of data or genomic sequence of virus and methods used in systemic way to accomplish this task of finding mutation points as substitution, deletion, insertion as well as snps. the genomic sequences of indian sars-cov- virus was collected from global initiative on sharing all influenza data (gisaid) in fasta format on th june . the dataset contains genomes with sequence id and sequence in fasta format. we have extracted the date from the sequence id to show the number of sequences uploaded per month. this is shown in fig. . for our study. further, we have downloaded the reference sequence (nc . ) from national center for biotechnology information (ncbi) to conduct the experiment with indian sars-cov- genomes. this reference genome is also used to map the coding regions as collected from ncbi. this is also reported in table and used while mentioning the mutation points in result section. please note that for the data visualization and editing bioedit and mega-x have been used. the pipeline of the workflow is shown in fig. (a). in order to find the mutations in indian sars-cov- genomes, the multiple sequencing alignment (msa) technique called clustalw thompson et al. ( ) is used in presence of reference sequence from ncbi. the clustalw uses the concept of neighbor-joining tree where bootstrap size is consider as . it is a widely used msa technique for aligning any number of homologous nucleotide or protein sequences like in our case. it uses progressive alignment method where the most similar sequences with the best alignment score are aligned first. after performing the alignment, a consensus sequence is built in order to extract the mutation points from each genome as substitution, deletion and insertion. the detection scheme of identifying substitution, deletion and insertion is shown in fig. where p a v represents the frequency of each residue a occurring at position p . represents the number of possible residues for nucleic acid (in this case ) plus gap. further to verify the such genetic variation in indian sars-cov- genome phylogenetic analyses is performed so that the evolution can be seen. the phylogenetic analysis is conducted using maximum likelihood technique stuessy ( ) where neighbor-joining is used to construct the tree for the visualization of evolution. the results of the experiment are discussed here. our objective is to identify point mutation as substitution, deletion and insertion initially after performing the multiple sequence alignment. in supplementary table s . thereafter, substitutions are considered to identify snps that are present at least in virus genomes as a clause of % of the virus population. as a consequence, we have found snps, out of which snps in coding regions while of them are present in ′-utr and ′-utr. however, in table table s . moreover, snps in different coding regions are shown in fig. using bioedit software. this is to be noted that for each lists of mutations as mentioned in tables and , we have provided genomic coordinates, number of occurrence of mutation in virus genome (frequency of mutation), change in nucleotide, change in amino acid, entropy to measure the change in nucleotide as information contains at that genomic location and mapping with coding region so that mutation point can be identified precisely in tables and . for example, in the snp at occurs in virus sequences where the change in nucleotide is c > t, change in corresponding amino acid is s > f and the value of entropy is . . higher the entropy value signifies that the change in nucleotide is more informative. this is important to mention that the over all results of the mutation as substitution, deletion, insertion, cluster and snps are shown using biocircos plots in fig. where each track shows the frequency of occurrence of mutation as histogram using bar and dot plots. generally, it summarizes all the results visually. moreover, the computed entropy at each genomic location to have the information of change of nucleotide for the whole population of virus genome is also shown in fig. . this is prepared using bioedit software. finally, phylogenetic analysis is shown in fig. for and its bootstrap samples of virus sequences in order to visualize the variation in trees clearly. it is evident from the trees that the indian sars-cov- genomes are having wide variation which we have also noticed in genomic analysis. these trees are generated using mega-x software. in addition to this, the aligned sequences are provided as supplementary for further use. in this paper we have analyzed indian sars-cov- genomes in order to find the mutation as substitution, deletion and insertion as well as snps. our analysis has identified clusters of mutations (mostly deletions), point mutations as substitution, deletion and insertion and snps. out of these snps, are present in the coding regions. the purpose of finding snps is to identify the genomic location that can be targeted to classify the virus strain in india. apart from this, the major advantage is that for personalized vaccine these snps could be used to define the dose of the vaccine after identifying the proper strain of the virus. moreover, for future research, these snps can be used to model the proteins and to see its conformational changes so that potential drag can be designed to target such proteins for indian patients. we are currently working in this direction and also help the other researchers to conduct their research with the use of these snps. the ethical approval or individual consent was not applicable. the aligned indian sars-cov- genomes with reference and consensus sequences, software to find mutation and supplementary are available at "http://www.nitttrkol.ac.in/indrajit/projects/covid-mutation-india/". moreover, indian sars-cov- genomes used in this work are publicly available at gisaid database. not applicable. this work has been partially supported by crg short term research grant on covid- table : mutation as snps in more than % of population of indian sars-cov- genomes epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study whole-genome sequence analysis reveals unique snp profiles to distinguish vaccine and wild-type strains of bovine herpesvirus- (bohv- ) origin and evolution of pathogenic coronaviruses viral genetics applying next-generation sequencing to unravel the mutational landscape in viral quasispecies inter nation social lockdown versus medical care against covid- , a mild environmental insight with special reference to india. science of the total environment bioinformatics analysis of sars coronavirus genome polymorphism plant taxonomy: the systematic evaluation of comparative data epidemiology, genetic recombination, and pathogenesis of coronaviruses clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice multiple sequence alignments coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus. microbiology and molecular biology worldometer ( ). coronavirus disease (covid- ) cases in india a pneumonia outbreak associated with a new coronavirus of probable bat origin we thank all those who have contributed sequences to gisaid database and reviewers for the valuable comments to improve the article. key: cord- -swmb ty authors: wang, yong; guo, xu; zhang, da; sun, jianfei; li, wei; fu, ziteng; liu, guangqing; li, yongdong; jiang, shudong title: genetic and phylogenetic analysis of canine bufavirus from anhui province, eastern china date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: swmb ty bufavirus is a novel virus associated with canine gastroenteritis. three strains of bufavirus were first detected in dog feces collected from anhui province in eastern china. the near-complete genome sequences were amplified. sequence alignment showed . – . % homology between the three bufavirus strains and reference strains. phylogenetic analysis showed the distributed viruses forming a cluster of close relationships. selective pressure analysis of the vp region indicated that the canine bufavirus (cbuv) was mainly subject to negative selection during evolution. the negative selection site was located on the residue of b-cell epitopes, indicating minimal change to the virus's immunogenicity. since this is the first report of cbuv circulating in anhui province, this study will provide further understanding of the phylogenetic and molecular characteristics of cbuv and serve as a reference for prevention and vaccine development. members of the family parvoviridae are common pathogens, which cause a wide range of animal diseases (lau et al., ) bufaviruses (buv) are part of the protoparvovirus genus (hargitai et al., ; huang et al., ) . it is a small, non-enveloped, single-stranded dna virus with a genome size of . - . kb with complex hairpin structures at the ' and ' ends. buv also contains two open reading frames (orfs), orf and orf . orf encodes non-structural protein, and orf encodes capsid protein (sun et al., ) . in , it was discovered in the fecal samples of children with diarrhea in burkina faso (phan et al., ) . subsequently, buv was found in wild shrews, megabats, wild rats, pigs, dogs, and cats (diakoudi et al., ; huang et al., ; martella et al., ; sasaki et al., ; sasaki et al., ) . in , a virus with a close genetic relationship to the human bufavirus (hubuv) was detected in dogs with either gastroenteric or respiratory disease in italy; it was named canine bufavirus (cbuv) (martella et al., ) . in china, the cbuv was first detected in shanghai, causing a high infection rate in dogs (li et al., ) . j o u r n a l p r e -p r o o f presently, the distribution of cbuv has only been reported in italy and china (di martino et al., ; li et al., ; martella et al., ; sun et al., ) , and its genetic characteristics and pathogenicity are poorly understood. the primary symptom caused by a member of protoparvovirus is diarrhea in carnivore (chaiyasak et al., ; piegari et al., ) . recent studies have shown a positive correlation between cbuv and diarrhea, and cbuv dna was also detected in the serum sample of dogs with gastroenteritis (li et al., ) . in terms of genetic and phylogenetic characteristics, a report has shown that the potential heterogeneity of cbuv and recombination may be a factor in its evolution (di martino et al., ) . as mentioned, the cbuv was found in shanghai, china with a hige prevalence. the prevalence of the virus in anhui province, which has a close relationship with shanghai in terms of trade, is unknown. to this end, fecal samples from different cities in anhui province, eastern china were collected in this study to explore the molecular and phylogenetic characteristics of cbuv. the study reveals the epidemic status in anhui province and related molecular characteristics of cbuv, which provide significant reference for studies on the evolution and epidemiology of cbuv. dogs (> year old). fifty-two fecal samples ( adult dogs and puppies) were from healthy dogs, and were from puppies with diarrhea. the fecal samples were collected in sterile centrifuge tubes using rectal swabs and stored at − °c until used. the samples were dissolved in % phosphate-buffered saline and mixed in an oscillating manner. the mixture was centrifuged at , × g for min, and the supernatant was collected. cbuv was detected through conventional pcr (cpcr), as previously described (martella et al., ) . the primers used for detecting canine parvovirus (cpv) were (table ). the pcr products were visualized through agarose gel electrophoresis and then verified by sequencing. the specific primers used to amplify the near-complete genome were designed from other sequences in genbank (accession number: mk . ) using the primer premier software (dnastar, inc.; table ). pcr products were purified using a dna purification kit (tiangen biotech) following the manufacturer's instructions. after ligation of the pcr products into the pmd -t vector (takara bio inc.), the recombinant plasmids were sent to sangon biotech co ltd. (shanghai, china) for sequencing. each plasmid was sequenced three times. near-complete genomes of the cbuv strain obtained for this study were splined together using seqman software (dnastar). orfs were identified using the ncbi orffinder (https://www.ncbi.nlm.nih.gov/orffinder/). sequence identity was analyzed with the megalign . program (dnastar inc.) by aligning nucleotide and amino acid sequences via the mafft method (katoh and standley, ) . phylogenetic analysis was performed using phylosuite software (zhang et al., ) . maximum likelihood phylogenies were inferred using iq-tree for , ultrafast bootstraps (minh et al., ; nguyen et al., ) and shimodaira-hasegawa-like approximate likelihood-ratio test (guindon et al., ) . modelfinder selected the best substitution model (kalyaanamoorthy et al., (table ) . interestingly, a comprehensive analysis of pressure selected b-cell epitopes revealed that codons , , and were located on the predicted b-cell epitopes (fig. ). bufavirus is a potential enteric pathogen that causes diarrhea in children (altay et al., ; chieochansin et al., ) . cbuv was first found in canines with gastrointestinal and respiratory diseases (martella et al., ) . in italy, the positive rate was . % ( / ) (di martino et al., ) . in china, cbuv has been found in shanghai, guangxi province and henan province, and the positive rates were . % ( / ), . % ( / ), and . % ( / ), respectively (li et al., ; shao et al., ; sun et al., ) . in this study, three positive samples were detected, and the positive rate was . % ( / ) in anhui province. positive rates in other parts of china are significantly lower than those in shanghai, indicating that cbuv prevalence may have regional differences.further studies are needed to confirm the significance of this difference. in previous reports from china and italy, the presence of cbuv was found in j o u r n a l p r e -p r o o f more pathogenicity studies are required to clarify whether cbuv plays a major role in diarrhea in dogs. cbuv homology was compared with the references, and the results showed a high homology. phylogenetic analysis, based on the nearly complete sequencing of the vp , indicated that cbuv formed a unique cluster, and the three isolated strains were closely related to other reference strains. compared with hubuv with three distinct branches, cbuv was conservative in evolution (yahiro et al., ) . as a member of the genus protoparvovirus, cbuv was not closely related to cpv in the phylogenetic tree, which is consistent with the homology comparison between cpv and cbuv mentioned above. in addition, genetic heterogeneity has been recently reported to be found in the region downstream, where there may be recombination. this finding suggests that recombination may play a role in the evolution of cbuv. in this study, no significant recombination event was found, which may be due to geographical differences or limited genetic information. hence, the role of recombination in cbuv evolution needs more attention. in amino acid alignment, we found relatively higher amino acid mutations in vp than in ns . to elucidate whether external selective pressure was associated with j o u r n a l p r e -p r o o f journal pre-proof these mutations, selection pressure analysis was performed by analyzing existing sequences. the results showed a lower rate of positive selection in the vp but higher rates of negative selection. negative selection pressure means that the gene will not increase in variation under pressure from the external environment, and that there is a tendency for gene sequence conservation (miller et al., ) . therefore, external selection pressure may not be significantly associated with the presence of mutations. due to the lack of available genetic information, further analysis is limited. our concern was regarding whether external selection pressure affects the immune response to the virus. interestingly, we found that the positively selected codon site did not coincide with the b-cell epitope. however, negative selected codon sites , , and were located on the predicted b-cell epitopes (fig. ) . this suggests that selective pressure does not significantly change the b-cell epitopes of cbuv. in parvoviruses, the vp region contains major epitopes (lopez de turiso et al., ) . as is known to all, b-cell epitope is a group of residues on the surface of an antigen which recognized by either a particular b-cell receptor (bcr) or a particular antibody molecule of the immune system and determine the immunogenicity of viruse (zhang et al., ) .therefore, the conservatism of epitope residues means that the immunogenicity of the cbuv may not significantly alter, which indicates that new serotypes of the cbuv may not easily produced. this is beneficial for virus prevention and vaccine development. herein, three of fecal samples were positive for cbuv, providing molecular evidence for the presence of the cbuv in anhui province. phylogenetic analysis and j o u r n a l p r e -p r o o f journal pre-proof sequence alignment showed a high homology with other reference strains, indicating that cbuv was relatively conservative. in addition, cbuvs were subject to negative selection, which helped maintain the conservation of viral genes. and the negative selection codons were located on b-cell epitopes, it did not affect the immunogenicity of cbuvs. this study provides a reference for further understanding of the epidemic and molecular characteristics of the virus in china. table the selected sites of the vp gene according to three methods. table the results of b-cell epitope prediction. j o u r n a l p r e -p r o o f identification of a novel parvovirus in domestic cats new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . detection and genetic characterization of a novel parvovirus distantly related to human bufavirus in domestic pigs detection and molecular characterization of novel porcine bufaviruses in guangxi province modelfinder: fast model selection for accurate phylogenetic estimates mafft multiple sequence alignment software version : improvements in performance and usability identification and characterization of bocaviruses in cats and dogs reveals a novel feline bocavirus and a novel genetic group of canine bocavirus canine bufavirus in faeces and plasma of dogs with diarrhoea fine mapping of canine parvovirus b cell epitopes novel parvovirus related to primate bufaviruses in dogs evolutionary dynamics of newcastle disease virus ultrafast approximation for phylogenetic bootstrap divergent bufavirus harboured in megabats represents a new lineage of parvoviruses distinct lineages of bufavirus in wild shrews and nonhuman primates genomic sequencing and characterization of a novel group of canine bufaviruses from henan province first identification of a novel parvovirus distantly related to human bufavirus from diarrheal dogs in china novel human bufavirus genotype in children with severe diarrhea svmtrip: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity the relationship between b-cell epitope and mimotope sequences phylosuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies we would like to thank editage (www.editage.cn) for english language editing. this study was supported by the ningbo health branding subject fund (no. note :the bold were predicted by more than one method j o u r n a l p r e -p r o o f key: cord- - wq k y authors: choi, jong-chul; lee, kun-kyu; pi, jae ho; park, seung-yong; song, chang-seon; choi, in-soo; lee, joong-bok; lee, dong-hun; lee, sang-won title: comparative genome analysis and molecular epidemiology of the reemerging porcine epidemic diarrhea virus strains isolated in korea date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: wq k y porcine epidemic diarrhea virus (pedv), a member of the coronaviridae family, is an enveloped, positive-sense, single-stranded rna virus, which causes severe diarrhea and dehydration in suckling pigs. we detected three pedv strains from ten small intestine samples from piglets with acute diarrhea and we determined the complete genome sequences of the reemerging korean pedv field isolates, except for the noncoding regions from both ends. the complete genome sequences of the strains were identical or almost identical (one synonymous single-nucleotide polymorphism (snp) in the orf a/ b genomic sequence). interestingly, comparative genome analysis of recent korean pedv isolates and other strains revealed that the complete genome sequences of recent korean strains were almost identical ( . %) to those of the us pedv strains isolated in . these results suggest that the three reemerging korean strains are distinct from previous endemic korean pedv strains and has been recently introduced into korea from oversea with high likelihood. porcine epidemic diarrhea virus (pedv) is a member of the family coronaviridae, subfamily coronavirinae, and genus alphacoronavirus, which include some human and bat coronaviruses. pedv containing a positive-sense, single-stranded rna genome, causes severe diarrhea and dehydration in suckling piglets (song and park, ) . since the first report of isolation in europe in (pensaert and de bouck, ) , pedv has become an economic concern in the swine industry in europe and asia (song and park, ) . in late , various chinese strains of pedv that were clinically more severe than the classical strains, with - % morbidity and - % mortality in suckling piglets, were detected (li et al., ) . in april , pedv outbreaks were confirmed in the us for the first time and the isolates showed very close relationship with the chinese isolate ah . a previous study showed that the emergent us pedv strains were likely introduced into the us through intercontinental transmission from china (huang et al., ) . in korea, pedv was first isolated in , followed by a large two-year-long outbreak. despite the use of vaccines, frequent occurrence of pedv was detected across the country, mainly during the winter season kweon et al., ) . since late november , pedv has reemerged in korea and caused significant economic losses in the swine industry. this study aimed to determine the complete genome sequence of the reemerging korean pedv strain and to investigate their genetic relationship with other strains using comparative genome analysis and phylogenetic analysis. ten small intestine samples were collected from dead piglets from two commercial pig farms in korea. the piglets died following acute watery diarrhea. the macroscopic features of the intestines were typical of pedv infections, including yellowish contents and distended appearance. to detect pedv genome, m gene-targeted rt-pcr was performed using total rna from mucosal scrapings. three of ten samples were positive in the pedv specific rt-pcr. to investigate the origin of the reemerging korean pedv strain, complete genome sequences of the three reemerging korean pedv strains were determined using sanger sequencing. for sanger sequencing, primer pairs were designed for the highly conserved sites of the pedv genome using primer infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d (koressaar and remm, ; untergasser et al., ) or designed manually when primer failed to identify optimal primer sites near the appropriate genomic regions (table ) . eighteen dna fragments, which covered the entire genome of pedv except the noncoding regions from both ends, were amplified using the superscript Ò one-step rt-pcr system. sequencing reactions were performed using the bigdye Ò terminator v . cycle sequencing kit (applied biosystems). the products were analyzed using abi xl dna analyzer (applied biosystems). the sequencing results were assembled using geneious v . . software. complete genome sequences of the three reemerging korean pedv strains have been submitted into the genebank database under the accession numbers kj , kj , and kj . complete genome alignment between the reemerging korean strains and ten other available strains were performed using multiple alignment with fast fourier transformation (mafft) v (katoh and toh, ) . to study the relationship between the us and korean pedv outbreaks, us and chinese strains (which were epidemic in ) were included in genome alignment. in addition, all available complete genome sequences of pedv isolates from korea were included in the alignment to compare the recent korean strains with previous endemic korean pedv strains. cv strain was included in the alignment as pedv reference strain. the maximum likelihood phylogenetic trees for the complete genome and the s gene sequence alignments were generated using phyml version . . (guindon and gascuel, ) with the generalized time reversible (gtr) substitution model (rodriguez et al., ) . the best nucleotide substitution model for analysis was confirmed using mega . . (tamura et al., ). the complete genome sequences of the reemerging korean strains showed a typical pedv gene order of utr-orf a/ b-s-orf -e-m-n- utr and were identical or almost identical (one synonymous single-nucleotide polymorphism (snp) in the orf a/ b gene) to each other. multiple alignment with other pedv complete genomes indicated that the reemerging korean strains possess genome sequences, which are distinct from those of previous korean field strains (fig. ) . a previous study had discussed evidence of frequent recombination events between different genetic lineages or sublineages of pedv (huang et al., ) . however, genomic sequences of the reemerging korean strains did not show any regions recombined with those of the previous korean strains during the recombination analysis performed using simplot . . (lole et al., ) (data not shown). in addition, phylogenetic analysis of the s gene between the reemerging and previous korean strains of pedv indicated that the reemerging korean strains were included into a genetic lineage different from those of previous endemic korean pedv strains (fig. ) . these results suggest that the reemerging strains have been recently introduced into korea from another country. interestingly, comparative genome analysis of the reemerging korean pedv isolates and other strains revealed that the complete genome sequences of the recent korean strains were almost identical ( . %) to those of the us pedv strains isolated in (fig. ) . compared with the complete genome of the reemerging korean isolates, genomes of the us strains, usa/iowa/ / and usa/indiana/ / showed five (three non-synonymous table primers used for the amplification of full-length genomes of the reemerging korean pedv strains. sequence ( - ) pcr product size ( fig. . nucleotide sequence alignment and phylogenetic tree analysis for complete genomes of pedv strains. (a) alignment of the complete genome sequences of the reemerging korean pedv field strains and other strains was performed using mafft. one of the reemerging korean strain sequences was set as the reference sequence. vertical lines indicate the snps compared to the reference sequence and dashes indicate sequence gaps. protein-coding regions are indicated with arrows. (b) a maximum likelihood phylogenetic tree was generated using the alignment. one-hundred bootstrap replicates were used to assess the significance of the tree topology. a bar indicates nucleotide substitutions per site. fig. . phylogenetic tree showing the relationship between the reemerging korean strains from previous endemic korean strains and strains from overseas, based on the analysis of the s gene. a maximum likelihood phylogenetic tree was generated from the alignment of complete s gene sequences. one-hundred bootstrap replicates were used to assess the significance of the tree topology. a bar indicates nucleotide substitutions per site. korean pedv strains are denoted using bold characters. and one synonymous in the orf a/ b gene and one nonsynonymous in the s gene) and seven (four non-synonymous and two synonymous in the orf a/ b gene and one non-synonymous in the orf gene) snps, respectively. both us strains have one insertion sequence causing early termination of the translation of polyprotein encoded in the orf a/ b gene. on the other hand, the complete genome of the us strain usa/ia / did not show any indels, but nine (three non-synonymous and three synonymous in the orf a/ b gene, one non-synonymous in the s gene, and one non-synonymous in the n gene) snps, when compared with the complete genome of the reemerging korean isolates. according to the phylogenetic analysis, the reemerging korean pedv isolates were closely clustered with the us strains isolated in and chinese strains isolated in (fig. ) . comparative genome analysis and phylogenetic analysis revealed that the reemerging korean pedv strains are practically identical to the us strains. a previous study suggested that the three emergent us strains were most closely related to a strain isolated in from anhui province in china. in addition, the genomes of the reemerging korean pedv strains did not possess any genetic feature from the genomes of the previously sequenced korean field and attenuated vaccine strains. these results suggest that the reemerging korean pedv strains are not variant strains of old korean field or attenuated vaccine strains. there are two possible sources of origin of the reemerging korean pedv strains. first, the same source of origin of the us strains containing chinese pedv-like virus could have been introduced into korea slightly later than the us outbreak events. this hypothesis can explain why the reemerging korean pedv strains are identical to the us strain. another possibility is that us strain has been directly transmitted into korea. during the us outbreak of pedv in , two genetic sublineages of the us strains were isolated. in a previous report, the authors stated that the us strain of pedv diverged during evolution and that evolution generated two genetic sublineages, namely, ia -co/ and mn-ia (huang et al., ) . during complete genome alignment as part of this study, one of the us strains, ia , showed a recombined genomic region in the orf a/ b gene, which closely matched that of the chinese strain js-hz . all reemerging korean pedv strains isolated in this study showed a close relationship with only one of the genetic sublineages of the us strains, namely, mn-ia . we could not detect a pedv strain with a close relationship with the ia -co/ sublineage. this possibly suggests that only one sublineage of the us strain has been directly introduced into korea from the us. to identify the exact source of origin of the reemerging korean strain, further investigation and surveillance are required. furthermore, to prevent the introduction of pedv into korea from overseas in future, the quarantine policy on feed ingredients should be reinforced. prevalence of porcine epidemic diarrhoea virus and transmissible gastroenteritis virus infection in korean pigs a simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood origin, evolution, and genotyping of emergent porcine epidemic diarrhea virus strains in the united states recent developments in the mafft multiple sequence alignment program detection and differentiation of porcine epidemic diarrhoea virus and transmissible gastroenteritis virus in clinical samples by multiplex rt-pcr enhancements and modifications of primer design program primer isolation of porcine epidemic diarrhea virus (pedv) in korea new variants of porcine epidemic diarrhea virus full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination a new coronavirus-like particle associated with diarrhea in swine the general stochastic model of nucleotide substitution porcine epidemic diarrhoea virus: a comprehensive review of molecular epidemiology, diagnosis, and vaccines mega : molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods primer -new capabilities and interfaces key: cord- - n d dh authors: li, zhijie; liu, dafei; ran, xuhua; liu, chunguo; guo, dongchun; hu, xiaoliang; tian, jin; zhang, xiaozhan; shao, yuhao; liu, shengwang; qu, liandong title: characterization and pathogenicity of a novel mammalian orthoreovirus from wild short-nosed fruit bats date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: n d dh mammalian orthoreoviruses (mrvs) have a wide range of geographic distribution and have been isolated from humans and various animals. this study describes the isolation, molecular characterization and analysis of pathogenicity of mrv variant b/ from wild short-nosed fruit bats. negative stain electron microscopy illustrated that the b/ strain is a non-enveloped icosahedral virus with a diameter of nm. sodium dodecyl sulfate-polyacrylamide gel electrophoresis (sds-page) migration patterns showed that the b/ viral genome contains segments in a : : arrangement. the isolate belongs to mrv serotype based on s gene nucleotide sequence data. balb/c mice experimentally infected with b/ virus by intranasal inoculation developed severe respiratory distress with tissue damage and inflammation. lastly, b/ virus has an increased transmission risk between bats and humans or animals. mammalian orthoreovirus (mrv) belongs to the genus orthoreovirus, which includes non-enveloped double-stranded rna viruses, each with a genome comprising genetic segments divided into three size classes (attoui et al., ) . four major mrv serotypes have been characterized by neutralization assays, and all inhibit hemagglutination: type lang (t l), type jones (t j), type dearing (t d) and type ndelle (t n) (kohl et al., ; attoui et al., a attoui et al., , b . mrv isolates were obtained from hosts with or without clinical signs of disease, and the virus can infect a broad range of mammals (dermody et al., ) . mrvs are ubiquitous mammalian pathogens, infecting nearly all mammalian hosts, including humans and other animal species (steyer et al., ; decaro et al., ; attoui et al., ) . infected bats are associated with an increasing number of emerging and re-emerging viruses, including the hendra virus (hev), nipah virus (niv), ebola virus (ebov) and sars coronavirus. infected bats threaten public health because they exist in large populations and travel across wide geographical distances (wong et al., ; calisher et al., ) . however, reports on the detection and isolation of orthoreovirus from bats are limited. in , the first orthoreovirus in bats, nelson bay virus (nbv), was isolated from the blood of fruit bats in australia. in , the second bat-borne orthoreovirus, pulau virus (pulv), was isolated from fruit bat urine collected on tioman island, malaysia. since then, bat-borne orthoreoviruses have received much attention. additional orthoreoviruses (melv, kamv, xi-river, broome viruses, kampar, sikamat, hk / , rpmrv-yn , cangyuan virus) have been isolated from or detected in bats and in humans who were likely in contact with bats (chua et al., ; chua et al., ; du et al., ; thalmann et al., ; cheng et al., ; chua et al., ; wang et al., ; hu et al., ) . recently, several groups have reported mrv infection in bats that resulted in visible pathology within tissues (kohl et al., ; lelli et al., ) . the authors speculated that bat-to-human interspecies transmission was possible, but no substantial evidence to support this hypothesis was provided. in this study, we report the characterization of a novel mrv strain (called "b/ ") isolated from healthy, wild shortnosed fruit bats in guangdong province, china. the whole genome sequence of strain b/ was determined. its evolution and evidence of genetic reassortment were analyzed by sequence comparison using phylogenetic analysis. furthermore, we evaluated the pathogenicity of b/ virus using four-week-old female balb/c mice. mrv strain mpc/ was isolated from masked palm civets in guangdong province in southern china by our laboratory and caused a infection, genetics and evolution ( ) infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d potentially fatal infection of the inoculated host mouse . vero e cells were obtained from the atcc (atcc® crl- ™) and grown at °c in % co in dmem supplemented with mm glutamine, % fetal calf serum and antibiotics. four-week-old female balb/c mice were obtained from the experimental animal center of harbin veterinary research institute (hvri). all animals were housed in the animal facility at hvri under standard conditions in accordance with institutional guidelines. thirty tissue samples from short-nosed fruit bats were collected from shaoguan city of china's guangdong province and homogenized. the homogenate was filtered through a . μm pore-size filter and used to inoculate confluent monolayers of vero e cells. blind passages were performed until a cytopathic effect (cpe) was observed. the infected cells were plaque purified, and the virus was propagated in vero e cultures. virus was collected from infected cells by three freeze-thaw cycles. aliquots were stored at − °c. one aliquot was titrated on vero e cells to estimate a titer by plaque assay. if cpe was not observed after passages, the result of virus isolation was considered negative. the infected cells were prepared for negative stain and thin section examination by electron microscopy (em). in addition, an indirect immunofluorescence assay (ifa) was used to detect mrv proteins in infected cell cultures. briefly, after washing with pbs, cells were fixed with % paraformaldehyde and incubated with % bsa for h. then, the cells were incubated with a mouse anti-mrv (t d) antibody, followed by a goat anti-mouse igg-fitc secondary antibody (santa cruz, usa). after washing, fluorescence was observed under an amg evos f inverted microscope. normal mouse sera, diluted : , was used as a negative control. viral dsrna was extracted from purified virus particles using trizol reagent according to the manufacturer's protocol. double strand rna (dsrna) segments were separated by electrophoresis in % (w/v) polyacrylamide slab gels. approximately μl of each sample was loaded into the gels, and electrophoresis was performed at v for h at room temperature. to further characterize the virus, primers were designed with primer premier . software based on published sequences. all information regarding the primers is provided in table . rt-pcr was performed using the one step rt-pcr kit (qiagen) as described in previous reports . the whole genome of b/ was amplified and sequenced by the sanger method; then, the sequence data were assembled using the seqman program and manually edited. the sequence of b/ was compared with other published mrv sequences. phylogenetic analyses were performed using the neighbor-joining (nj) method with the kimura -parameter model in mega . a total of four-week-old female balb/c mice were randomly divided into ten groups of . the animals in groups - were infected by intranasal (i.n.), intracranial (i.c.), intraperitoneal (i.p.) or intragastric (i.g.) inoculation with and pfu-purified b/ virus diluted in pbs. animals in group received pfu of mpc/ (i.n.). simultaneously, mice were inoculated with pbs (i.n.) as a control. for days, all mice were monitored daily for clinical signs of disease. tissues were harvested from mice euthanized by co narcosis for analysis of viral replication and pathology. a histopathological scoring system was used to characterize pathological lesions in detail. standard histopathological procedures were used to record all observed lesions. samples were fixed in buffered formalin and embedded in paraffin wax. sections ( μm) were stained with hematoxylin and eosin (h-e) for histopathological examination. a scoring system for gross pathology and histopathology was developed, and the severity of lesions ranged from (no lesion) to (severe lesion). data from both scoring systems were analyzed using the kruskal-wallis non-parametric mean comparison test (elliott and hynan, ) , and differences were considered significant at p b . . organs from duplicate animals were collected at each time point, suspended in vol of pbs and disrupted by sonication for min on ice. titration was performed by virus plaque assay. infectious titers were calculated per gram of tissue. the data were plotted as the mean values with variation shown as ± standard error. the animal experiments were approved by the animal ethics committee of hvri of the chinese academy of agricultural sciences (caas) and performed in accordance with animal ethics guidelines and approved protocols. the animal ethics committee approval number was syxk (hei) - . after four passages of the isolated virus on vero e cells, a distinct cpe was observed in infected cells, characterized by granulating, shrinking, rounding, seining, and detaching, as determined by ifa using serum from mice immunized with t d (fig. a) . mrv particles in infected vero cells were also examined by em techniques. as shown in fig. b , negative stain em showed multiple virus-like particles with nonenveloped icosahedral formation in the homogenates. ultra-thin sections of infected vero cells showed typical electrondense virus particles organized in a paracrystalline pattern within the cytoplasm. polyacrylamide gel electrophoresis (page) migration patterns of the genome segments showed that b/ virus contains segments in a : : arrangement, typical of reoviruses (fig. c) . the complete genome sequence of b/ virus was generated, and comparative analysis with other mrv strains was performed. the complete sequences of segments were submitted to a genbank database under accession numbers kx -kx . the complete genome of b/ virus is , bp, and sizes of segments through are as follows: l , bp; l , bp; l , bp; m , bp; m , bp; m , bp; s , bp; s , bp; s , bp; and s , bp. the inferred lengths of eight structural proteins and three nonstructural proteins are as follows: λ ( aa), λ ( aa), λ ( aa), μns ( aa), μ ( aa), μ ( aa), σ ( aa), σ s ( aa), σ ( aa), σns ( aa), and σ ( aa). pairwise nucleotide and inferred amino acid comparison between strain b/ and other mrv strains were performed for all ten segments (table ) . nucleotide sequence alignments indicated that the l , m , m , m , and s genes from b/ virus were highly related to those of wiv - virus ( . , . , . , . , and . %, respectively). the l , l and s genes had highest identities ( . - . %) with pig strains and sc-a. the s segment of strain b/ was most similar to strain mpc/ on both the nucleotide and inferred amino acid level. segment s of strain b/ was most similar to strains hb-c and hb-a (isolated from minks) on both nucleotide and amino acid levels. the mrv s gene encodes the viral attachment protein σ , which is unique to each mrv prototype strain and determines the serotype. when compared to the s sequences available in genbank, the b/ strain shares higher identity with that of mrv- than with those of mrv- , mrv- or mrv- (fig. ) . phylogenetic analysis of the l , l , l , m , m , m , s , s and s genome segments for the b/ strain and most related whole-genome strains available in genbank is shown in supplementary . to determine the potential route of infection, balb/c mice were inoculated in four ways with different doses of purified virus b/ ( or pfu). mice infected by i.n. inoculation in groups ( pfu b ), ( pfu b ) and ( pfu mpc/ ) exhibited clinical signs (noticeable respiratory distress and body weight loss), and the time of death and clinical manifestations of infection varied with dose. the highest dose of b/ virus caused disease and resulted in death in of mice starting at days post-infection (fig. ) . the same dose of mpc/ virus also induced clinical symptoms and death in mice. mice infected with pfu of b virus also manifested respiratory crackling and death in of mice starting at days post-infection. in contrast, all of the mice exposed to the same doses by either i.c. i.p. or i.g. inoculation did not exhibit any signs of respiratory distress or changes in body weight during the course of the experiment, and no histological changes were observed in their organs at any time point. the control mice remained healthy throughout the trial. these experiments showed that respiratory infection with b/ virus induces disease in balb/c mice. to assess the gross pathological consequences of infection, the liver, lung, brain, intestine, and spleen were subjected to histopathological examination using standard procedures. all surviving animals were euthanized at dpi for inclusion in this analysis. samples were paraffinembedded, sectioned and stained with hematoxylin and eosin. infection of the lung produced signs of inflammation associated with alveolar thickening and lymphocytic infiltration (fig. a) . a composite analysis using the abovementioned histopathological scoring system was performed (table ). these data indicate that the b/ strain induces histopathological changes associated with disease. no statistically significant differences were observed in mice inoculated by other routes or in control animals. to analyze viral replication in different organs of infected mice, viral titers were quantified in tissues days after infection with and pfu of b/ virus. animals inoculated with pfu of b/ virus had higher viral titers (up to pfu/g) than those of mice inoculated with pfu. higher levels of viral rna were detected in the lungs than in the brain, liver, intestine or spleen (fig. b) . however, the inoculated tissue (i.c., i.p. and i.g.) with pfu had - times less virus in each organ compared to those of i.n. inoculated mice (data not shown). no virus was detected in the control animals. the most recent disease outbreaks have been associated with zoonotic transmission events, and newly emerging viruses have originated from wildlife (moratelli and calisher, ; o'shea et al., ) . thus, surveillance and evaluation of viruses prevalent in wildlife are of special interest. bats are the natural host reservoir for a number of high-impact zoonotic viruses. more than viruses belonging to families were isolated from or detected in bats. a few of these viruses have been responsible for human disease, including ebola virus (leroy et al., ) , middle east respiratory syndrome coronavirus (mers-cov) (ithete et al., ) , severe acute respiratory syndrome coronaviruses (sars-cov) (ge et al., ) , nipah and hendra viruses (halpin et al., ; marsh et al., ) . several bat orthoreovirus isolates have been obtained from bats in recent years (moratelli and calisher, ; chua et al., ; pritchard et al., ; du et al., ) . a novel bat reassortant mrv, rpmrv-yn , obtained from least horseshoe bats in china resulted from a reassortment of mrvs known to infect humans and animals . six mrv strains were isolated from hipposideros and myotis and grouped into mrv serotypes , , or based on the s gene sequence . three novel mrvs were isolated from european bats, with rather mild or clinically unapparent infections in their hosts (kohl et al., ) . considering the diversity and wide distribution of bats and the potential for transmission of bat viruses to humans and other animals, continued surveillance of mrvs in all host species is urgently needed. in this study, one strain of mrv, b/ , was isolated by in vitro cell culture from thirty wild short-nosed fruit bat samples from shaoguan city of china's guangdong province. the isolate was serially propagated in cell culture and characterized by cell culture cpe, immunofluorescence staining, electropherotype, em and entire genome sequencing. mrv genomes undergo multiple types of genomic alteration, including intragenic rearrangement and reassortment, in both laboratory and natural conditions (dermody et al., ) . to molecularly characterize b/ virus, we amplified and sequenced the three large (l -l ), three medium (m -m ), and four small (s -s ) viral genes. based on sequence comparison and phylogenetic analysis, we conclude that the b/ isolate is a novel type bat orthoreovirus, and it might have originated from gene segment mixing during infection with more than one mrv strain in nature. the potential function of these genes may be important for understanding pathogenic mechanisms and should be studied further. mrvs were traditionally believed to be causative agents of mild respiratory and enteric diseases without significant clinical impact (steyer et al., ) . however, several recent studies have suggested that mrv can cause serious illness and even death in humans and other mammals, characterized by upper respiratory tract infection, diarrhea and encephalitis (tyler et al., ; ouattara et al., ) . indeed, the pathogenesis of reovirus infections has been most extensively studied using both suckling and adult mice, and infections lead to systemic viral replication, morbidity, and mortality (dermody et al., ; doyle et al., ; organ and rubin, ) . this study aimed to use balb/c mice to study mrv b/ pathogenesis. balb/c mice were infected by intranasal (i.n.), intracranial (i.c.), intraperitoneal (i.p.) or intragastric (i.g.) inoculation with different doses of b/ virus. we found that mice are susceptible to mrv b/ infection by intranasal inoculation. the highest dose of b/ virus ( pfu) induced signs of disease on day after infection and resulted in the death of of mice starting at days post-infection. we observed approximately % mortality in mice that underwent i.n. inoculation with pfu of b/ virus. in a previous study, we showed that the mpc/ strain (isolated from masked palm civets) is pathogenic to four-week-old female balb/c mice with approximately % mortality and pronounced pathological changes in tissues of mice inoculated (i.n.) with pfu . in this study, infection with b/ virus caused obvious lesions in the lungs due to tissue damage and inflammation associated with alveolar thickening and lymphocytic infiltration, as well as accumulation of cellular debris and distended bronchioles and alveoli. our study shows that the b/ strain is pathogenic to balb/c mice. additional studies regarding viral replication, pathogenesis and host interactions are needed to better understand the pathogenesis of this virus. these findings parallel those found by others in rats (gauvin et al., ; morin et al., ) . to characterize viral replication, we analyzed viral titers in the lung, brain, liver, intestine and spleen after i.n. infection with b/ virus. viral loads in the lung were higher ( -fold) than those in other tissues, and this observation was associated with the ability of the virus to cause acute respiratory distress. these data are consistent with pathological changes in the lung. mice in the pfu inoculation group displayed severe acute respiratory symptoms and died starting at day dpi. our results indicate that b/ virus replicates to higher levels in the lung than in other organs, a finding that is in agreement with the induction of acute respiratory distress in infected mice. we also provide evidence that these novel mrv strains are pathogenic to mice, leading to lethal respiratory disease. considering b/ may have resulted from a reassortment of bat, mink, and/or human mrv strains, which can cause severe disease in humans and animals, it is necessary to identify pathogenicity in animal hosts. our data confirm that mice were infected with b/ virus via the respiratory route, causing a potentially fatal respiratory infection. further work is required to understand the full zoonotic potential and pathogenesis of b/ virus. supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . sequence characterization of ndelle virus genome segments , , , , and : evidence for reassignment to the genus orthoreovirus, family reoviridae sequence characterization of ndelle virus genome segments , , , , and : evidence for reassignment to the genus orthoreovirus, family reoviridae orthoreovirus, reoviridae bats: important reservoir hosts of emerging viruses a novel reovirus isolated from a patient with acute respiratory disease a previously unknown reovirus of bat origin is associated with an acute respiratory disease in humans identification and characterization of a new orthoreovirus from patients with acute respiratory infections investigation of a potential zoonotic transmission of orthoreovirus associated with acute influenzalike illness in an adult patient virological and molecular characterization of a mammalian orthoreovirus type strain isolated from a dog in italy orthoreoviruses diminished reovirus capsid stability alters disease pathogenesis and littermate transmission xi river virus, a new bat reovirus isolated in southern china a: sas(®) macro implementation of a multiple comparison post hoc test for a kruskal-wallis analysis respiratory infection of mice with mammalian reoviruses causes systemic infection with age and strain dependent pneumonia and encephalitis isolation and characterization of a bat sarslike coronavirus that uses the ace receptor isolation of hendra virus from pteropid bats: a natural reservoir of hendra virus characterization of a novel orthoreovirus isolated from fruit bat close relative of human middle east respiratory syndrome coronavirus in bat isolation and characterization of three mammalian orthoreoviruses from european bats identification of mammalian orthoreovirus type in italian bats isolation and pathogenicity of the mammalian orthoreovirus mpc/ from masked civet cats cedar virus: a novel henipavirus isolated from australian bats bats and zoonotic viruses: can we confidently link bats with emerging deadly viruses? reovirus infection in rat lungs as a model to study the pathogenesis of viral pneumonia pathogenesis of reovirus gastrointestinal and hepatobiliary disease bat flight and zoonotic viruses novel human reovirus isolated from children with acute necrotizing encephalopathy pulau virus; a new member of the nelson bay orthoreovirus species isolated from fruit bats in malaysia high similarity of novel orthoreovirus detected in a child hospitalized with acute gastroenteritisto mammalian orthoreoviruses found in bats in europe broome virus, a new fusogenic orthoreovirus species isolated from an australian fruit bat isolation and molecular characterization of a novel type reovirus from a child with meningitis isolation and identification of a natural reassortant mammalian orthoreovirus from least horseshoe bat in china bats as a continuing source of emerging infections in humans isolation and identification of bat viruses closely related to human, porcine, and mink orthoreoviruses this work was supported by funds from the national natural science foundation of china ( ), the state key laboratory of veterinary biotechnology (sklvbp ), and the basic scientific research operation cost of state-leveled public welfare scientific research courtyard ( ). key: cord- -tjjkz y authors: wille, michelle; lindqvist, kristine; muradrasoli, shaman; olsen, björn; järhult, josef d. title: urbanization and the dynamics of rna viruses in mallards (anas platyrhynchos) date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: tjjkz y urbanization is intensifying worldwide, and affects the epidemiology of infectious diseases. however, the effect of urbanization on natural host-pathogen systems remains poorly understood. urban ducks occupy an interesting niche in that they directly interact with both humans and wild migratory birds, and either directly or indirectly with food production birds. here we have collected samples from mallards (anas platyrhynchos) residing in a pond in central uppsala, sweden, from january to january . this artificial pond is kept ice-free during the winter months, and is a popular location where the ducks are fed, resulting in a resident population of ducks year-round. nine hundred and seventy seven ( ) fecal samples were screened for rna viruses including: influenza a virus (iav), avian paramyxovirus , avian coronavirus (cov), and avian astrovirus (astrov). this intra-annual dataset illustrates that these rna viruses exhibit similar annual patterns to iav, suggesting similar ecological factors are at play. furthermore, in comparison to wild ducks, autumnal prevalence of iav and cov are lower in this urban population. we also demonstrate that astrov might be a larger burden to urban ducks than iav, and should be better assessed to demonstrate the degree to which wild birds contribute to the epidemiology of these viruses. the presence of economically relevant viruses in urban mallards highlights the importance of elucidating the ecology of wildlife pathogens in urban environments, which will become increasingly important for managing disease risks to wildlife, food production animals, and humans. urbanization is intensifying worldwide; most humans live in urbanized areas, and the urban human population is expected to continue to grow (united nations population fund, ) . within the global growth of cities, urbanization increasingly shapes the emergence and trajectory of infectious disease, both human disease and disease and parasitism in wild animals (alirol et al., ; neiderud, ) . in association with urbanization, factors affecting pathogen (and parasite) transmission in wild animals include an increase in aggregation and resource availability resulting in increased contact rates, decrease in biodiversity, modulation in host immunity and stress levels (becker and hall, ; becker et al., ; bradley and altizer, ; delgado and french, ; patz et al., ; penczykowski et al., ) . furthermore, in cities, increased contact among humans, domestic animals and wild animals may facilitate cross species spillover of (vertebrate) pathogens, with consequences for wildlife conservation, agriculture, and human health (becker et al., ; bradley and altizer, ; delgado and french, ; patz et al., ) . influenza a virus (iav) is a multi-host virus, wherein spillover between birds, humans and agricultural animals does occur, and dabbling ducks, such as those found in city parks, constitute the main reservoir host for these viruses (olsen et al., ; webster et al., ) . indeed, highly urbanized areas may contain canals and large city parks with ponds housing a wide variety of wild and semi-domestic birds. rna viruses such as iav have a low pathogenicity phenotype in their natural hosts (olsen et al., ) , but have large negative socioeconomic consequences when they spillover into food production animals and humans (alexander and brown, ; fao, fao, , . for example, the most recent remerging highly pathogenic iav h n , which was transported globally by waterfowl, resulted in the culling of hundreds of thousands of chickens and turkeys, and is a risk to human health given the reassortment potential (european food safety authority, ; lee et al., ; pasick et al., ; verhagen et al., ; wu et al., ) . dabbling ducks are a host for a number of rna viruses, including avian coronavirus [cov] , avian paramyxovirus type [apmv- ], and emerging evidence suggests they may also be hosts for an array of avian astroviruses [astrov] (e.g. chu et al., ; ramey et al., ; tolf et al., b; wille et al., ; wille et al., b) . these viruses do not cause signs of disease in their wildlife hosts, but have closely related forms causing morbidity and mortality in poultry, such as infectious bronchitis (cov) (e.g. domanska-blicharz et al., ; jackwood et al., ; zhuang et al., ) , newcastle disease (apmv- ) (e.g. alexander, ; jindal et al., ; ramey et al., ; snoeck et al., ; tolf et al., b) , duck hepatitis (astrov) or avian nephritis (astrov) (e.g. chu et al., ; pantin-jackwood et al., ) . these viruses have been assessed, to various degrees, in wild migrating waterfowl. in sweden, and globally, the ecology of iav is well described in wild waterfowl, where up to % of mallards (anas platyrhynchos) are infected during the autumn migration (latorre-margalef et al., ; olsen et al., ) . recent studies have been instrumental in starting to describe dynamics and ecology of ampv- and cov in wild birds; - % of migrating mallards have cov infections, compared to a lower prevalence ( %) of ampv- towards the end of the migratory season in sweden (tolf et al., b; wille et al., ) . most apmv- is detected during iav studies where agglutinating agents are detected after culture that are not iav (e.g. jindal et al., ; ramey et al., ) , so few true prevalence estimates exist. beyond these viruses, we have a limited understanding of the virodiversity in waterfowl; astroviruses for example have only recently been assessed in wild birds, and the results of a single study suggest that waterfowl may be important in the epidemiology of these viruses (chu et al., ) . given that waterfowl are hosts for both multi-host viruses and viruses that cause morbidity and mortality in food production birds, combined with the increased contact between waterfowl and humans in urban areas, dynamics of these viruses in urban bird populations should be explored. in this study, we followed the dynamics of rna viruses at a pond utilized year round by mallards, located in the centre of sweden's fourth largest city. this pond is on the same migratory route as wild mallards assessed for these viruses in southern sweden, allowing a comparison between urban and wild ducks on a limited spatial scale (latorre-margalef et al., ; tolf et al., a; tolf et al., b; wille et al., ; wille et al., b) . thus, this intra-annual dataset allows us to add to the natural history of iav, cov, ampv- , and the rarely assessed astrov. furthermore, we aim to elucidate if less frequently studied rna viruses follow intra-annual cycles similar to that of the intensively studied iav. in context of iav, and to a lesser degree cov and apmv- , an assessment of virus prevalence and diversity in an urban population will further allow us to assess if dynamics in wild birds are reflected in an urban setting. an urban population of mallards residing in the artificial pond "svandammen" in the centre of the city of uppsala, sweden ( ° ′ ″n, ° ′ ″e) were sampled. the pond is kept ice-free during the winter months, and is a popular location where the ducks are fed, resulting in a resident population of ducks year-round. this pond has a largely constant population size between and individuals through the autumn and winter, with fewer birds occupying the pond during breeding in the summer months ( fig. a. ) . the low population count in may is likely the result of unfavorable conditions on the day of the count and sampling. slightly higher population counts in the winter, when most of the city ponds are frozen, likely represent the congregation of birds from ponds across uppsala to utilize this ice-free habitat ( fig. a. ) . two sampling strategies were employed: following capture, freshly deposited feces were collected from a single-use cardboard box, or, due to difficulties in capturing birds, freshly deposited feces were collected from the ground around the perimeter of the pond. samples were collected with a sterile tipped applicator, and were placed in virus transport media (vtm) and stored at − °c within - h of collection. ethical approval for trapping and sampling was obtained from the uppsala animal ethical committee (reference number c / ), a permit was obtained from the city of uppsala to capture, and a permit from swedish museum of natural history to ring birds. viral rna was extracted from pooled vtm samples, containing samples per pool, with the magnatrix extraction robot (magnetic biosolutions, sweden) and vet viral na kit (nordiag asa, oslo, norway). the rna extraction was performed by the molecular diagnostics department at the swedish national veterinary institute. positive pools were re-extractioned individually using the maxwell instrument and viral total nucleic acid purification kit (promega, madison, usa). following extraction, samples were assayed by real time reverse transcriptase pcr (rrt-pcr) for iav, cov, and ampv- using previously published methods. briefly, iav was screened using a rrt-pcr assay targeting a short region of the matrix gene (spackman et al., ) and a pan-coronavirus rrt-pcr assay targeting the rna-dependant rna polymerase (rdrp) gene (muradrasoli et al., ) using the iscript one step rt-pcr kit (biorad, hercules, usa). a rrt-pcr targeting the matrix (m) gene (tolf et al., b; wise et al., ) with the one step rt-pcr kit (qiagen, hilden, germany) was employed to screen for apmv- . a cycle threshold (ct) cutoff of was used for all screens. to screen for astrov, cdna was synthesized using superscript iii (invitrogen) and random hexamers (invitrogen) followed by a nested pcr targeting the rdrp (chu et al., ; chu et al., ) using taq polymerase (qiagen). samples positive for iav were propagated in - day old embryonated chicken eggs. eggs were inoculated via the allantoic route, and allantoic fluid was harvested two days following inoculation. the fluid was assayed for the presence of iav using a haemagglutination assay. rna was extracted from positive samples as previously described. egg isolation and extractions from allantoic fluid were performed by the molecular diagnostics department at the swedish national veterinary institute. full length ha, na, and m sequences were generated as described in wille et al. ( ) , and two samples were additionally deep sequenced in-house at the swedish national veterinary institute (virus /h n and /h n ). a fragment of the cov rdrp was sequenced as described in wille et al. ( b) . the rdrp fragment generated during screening of astrov was used and subsequently cloned with pgem-t easy vector system (promega). all pcr products were purified by the wizard clean-up system (promega) and all sequencing was completed at macrogen (the netherlands). in the case of astroviruses, - clones of each sample were sequenced. resulting sequences were aligned using the mafft algorithm (katoh et al., ) within geneious (biomatters, new zealand). phylogenetic models were determined in mega (tamura et al., ) , and maximum likelihood trees were built using phyml (guindon and gascuel, ) implemented in seaview (gouy et al., ) and bootstrapped , times. reference sequences for phylogenetic analysis comprised of the top blast hits for each sequence generated in this study, as well as similar sequences from sweden. outgroup sequences were added to root all trees. all sequences generated in this study have been deposited in genbank under the accession numbers ky - . seasonal prevalence for each virus was estimated using generalized additive models (gams) with binomial errors including a spline function of month using the mgcv package in r (r development core team, ). the best order polynomial was evaluated through akaike information criterion (aic) and given similar aic values the least complex model was selected (table a. ) . prevalence estimates of iav, cov and apmv- from this study were compared to those from wille et al. ( ) , wherein prevalence for these viruses was estimated in wild migratory mallards across the autumn season. we compared data from sept-dec, which represents large sample sizes in both studies. prevalence data were compared with fisher exact tests for the four rna viruses for each month. pvalues of b . were taken to indicate a significant difference in the compared proportions. over the course of months, samples were collected from mallards. most of the samples collected were freshly deposited feces from the ground (n = ), though samples were fecal samples collected from captured birds. during the autumn months samples were collected each month, with smaller sample sizes in the spring and winter (fig. a. ). prevalence of iav, apmv- and cov were low, with an overall prevalence of %, %, and . % respectively, across the intra-annual sampling regime. as expected, prevalence for iav peaked during the autumn months, with only a single positive fecal sampled collected from the ground outside this period, in april. seasonal prevalence of other rna viruses mirrored patterns of iav, with a prevalence peak in the autumn through to the early winter, as well as a detection in april; cov in particular had a very similar prevalence curve to iav in both the temporal trend and amplitude. prevalence of apmv- was low, even in the autumn, with a single detection in august, november and december. interestingly, both iav and cov were detected in april, and this did not represent a co-infected sample (fig. ) . in comparing prevalence [sept-dec] between our urban dataset and a wild bird dataset from southern sweden using the same qpcr methods (wille et al., ) , autumnal prevalence for iav (p b . ) and cov (p b . ) is significantly different, where prevalence for both these viruses is lower in urban mallards (fig. ) . a more detailed comparison suggests that this effect is strongest in oct/nov (p b . ) for iav and sept/oct (p = . , p = . ) for cov. total autumnal prevalence, and monthly comparisons are not significantly different between these datasets for apmv- , but this is driven by sample size constraints (fig. ) . timing of prevalence peaks do vary across years, that is the prevalence peak may occur in a different month across years. however, the difference in prevalence between urban and wild ducks does not appear to be driven by a temporal mismatch in prevalence peaks. rather, the overall amplitude of the prevalence curves across the entire autumn for wild ducks and urban ducks are different, where the urban ducks consistently had lower prevalence for iav and cov (fig. ) . overall for both iav and cov there was detected diversity, irrespective of lower prevalence compared to the wild migratory bird system. despite few iav detections, five subtype combinations were detected: h n , h n , h n , h n and h n . furthermore h and h , representing group ha, were detected earlier in the season (september), followed by h and h , group ha viruses, in october and december, respectively (table a. ). genetically the ha segment of h and h were similar to viruses previously detected in europe, including sweden (fig. a. -a. ) . specifically the ha segment of h falls into a mixed clade containing viruses from europe, asia and north america (fig. a. ) . the ha of h is similar to sequences from sweden, the netherlands, as well as egypt, and the republic of georgia (fig. a. ) . the ha segment of both h and h fall into clades dominated by asian viruses (fig. , fig. a. -a. ) . the h n virus is further interesting as the na segment and the m segment are also most similar to asian viruses (fig. ) ; as compared to the n , n and n sequences and the m segment of the other viruses (table a . , fig. a. ). all n sequences were identical, despite being detected in two different months and with different ha types (h n in september and h n in october) ( table a. ). finally, the m segment of all viruses except h n were highly similar (fig. a. ) . similarly, a diversity of cov rdrp fragments was present in this population (fig. ) . all viruses were identified as gammacoronaviruses, and fell into the clade dominated by wild bird viruses and those recovered from domestic ducks. some sequences generated here were very similar to those from mallards migrating through southern sweden in (virus , ). but, virus and were most similar to sequences from waterfowl coronaviruses isolated in hong kong. most interestingly, virus and were identical, despite being isolated months apart ( april and october , respectively), suggesting rdrp sequences falling into this clade were present in sweden despite not being previously detected (fig. ) . we were unable to sequence apmv- given we were unable to culture these viruses and the original material had high ct values. against expectation, the highest rna virus prevalence in this study was that of astrov (fig. d) . furthermore, we detected astrov in / months, making it more pervasive than iav in this population. however, prevalence did follow the general seasonal trend where most detections occurred in the autumn migration period (fig. d) . additionally, / co-infected samples in the entire dataset were co-infected with astrov (astrov:flu = , astrov:cov = , cov:apmv- = ). as with the other viruses, there was a diversity of astrov in the population (fig. ) . indeed, we identified viruses in all three branches of the avian astrov tree. the virus detected in group that is viruses similar to avian nephritis virus -(virus ) was the outgroup to this clade, suggesting undiscovered diversity, potentially in the wild bird reservoir. we similarly found an outgroup virus to group viruses (virus ), which are wild bird astroviruses detected thus far only in waterfowl. two other viruses also fell into group . most of the viruses sequenced were group viruses, both group . and . , and most viruses were similar to duck hepatitis viruses (eu duck hepatitis virus and eu duck hepatitis virus ). viruses were also similar to turkey astrovirus (virus ; e.g. dq ) and chicken astrovirus (virus ; e.g. eu ). our preliminary findings suggest that group . was more common early in the year (february, april, august) and group . was more common towards the end of the study (september, november, december, january) (fig. b) , however a larger dataset is needed to confirm this putative trend. urbanization is intensifying world-wide, directly affecting interactions between humans and wild animals, in particular wild animals utilizing urban environments. in this study we aimed to characterize the dynamics of four avian rna viruses in wild birds utilizing an urban environment. these viruses, while causing no apparent clinical signs in wild waterfowl, are closely related to or have pathogenic variants, which may cause significant morbidity and mortality in poultry. in wild birds, especially waterfowl, iav has been intensively assessed, and we found that in an urban environment, annual dynamics of iav was similar to the global consensus. that is, very low prevalence in the spring and summer, with a higher prevalence in autumn and early winter when birds are migrating (olsen et al., ) . in this study prevalence is lower in urban ducks than wild migrating ducks, here significantly lower than wild ducks utilizing a stop-over site in southern sweden (latorre-margalef et al., ; wille et al., ) . there is some evidence that prevalence of iav might be lower in urban and sentinel populations, but this still needs more expansive assessment. verhagen et al. ( ) demonstrated that iav prevalence is inversely correlated with urbanization, and urban mallard prevalence is only . %, which corroborates our findings. however, the bird species composition and temporal sampling between urban and rural areas were mismatched (verhagen et al., ) . in another study, in eastern canada, prevalence of largely non-migratory urban ducks in the city of st. john's, newfoundland, was . %, with higher prevalence only reported when samples sizes were small (huang et al., a) . one hypothesis for low viral prevalence in urban areas is host population structure and migratory propensity. concentrated resources presented in urban environments influence host migration and among/between species contact rates (altizer et al., ; bradley and altizer, ) . specifically, a larger proportion of urban ducks are non-migratory, although local movements do occur, particularly during breeding. due to a more sedentary lifestyle, following the initial input of susceptible ducklings after breeding, there is limited immigration, representing input of susceptible individuals across the autumn. in contrast, at a migratory stopover location such as ottenby, there is continual input of new individuals across the season representing both susceptible and infected birds (latorre-margalef et al., ) . the continual immigration creates a constant pool of susceptible birds and input of diverse ha subtypes. emigration allows for the removal of recovered birds from the system, allowing for higher viral prevalence across the autumn migration (altizer et al., ; avril et al., ) . lack of migration is also a feature of sentinel ducks, and prevalence and viral diversity was low in sentinel ducks being assessed on lake constance (globig et al., ; globig et al., ) and adult sentinel ducks in sweden (tolf et al., a ). an interesting parallel is iav dynamics in africa where iav seasonality in fig. . comparisons of autumnal prevalence for (a) iav, (b) cov and (c) apmv- between urban ducks (this study) and wild migratory ducks (wille et al., ) . mean and % confidence intervals shown for prevalence, and asterisks indicate a significant difference. "total" here is the combined number of viruses detected/total samples screened for the months of september, october, november and december. muted, and prevalence is very low. the putative driver is different life history strategies of waterfowl, wherein the classical patterns of waterfowl aggregation and migration in temperate regions are less pronounced, with only the subset of palearctic breeding waterfowl exhibiting long distance migration; afro-tropical waterfowl are resident or partial migrants likely due to more abundant resources. furthermore, an increase in iav prevalence was correlated to the influx of palearctic migrants (gaidet et al., ; gaidet et al., ) . in terms of population structure, urban ponds in uppsala are utilized by a single dabbling duck species -mallards; other dabbling duck and waterfowl species are absent limiting the breadth and size of the host reservoir. finally, urban mallards have access to more resources than their wild conspecifics due to supplemental feeding, which in turn may allow these individuals to mount a more efficient antiviral response (chandra, ; hall et al., ) , at both the innate (antiviral genes more highly unregulated) (barber et al., ; vanderven et al., ) and acquired level (length of antibody life) (magor, ) . unfortunately, there are few studies assessing the antiviral response to iav, and those that do exist are largely focussed on the response to highly pathogenic iav (barber et al., ; huang et al., b; vanderven et al., ) . interestingly, despite the potential for an improved immune response, some studies suggest that an increase of resource may increase transmission potential and pathogen transmission (becker and hall, ; penczykowski et al., ) . despite empirical studies suggesting lower prevalence of iav in urban systems, however limited, theoretical studies imply that pathogen prevalence should be higher in these conditions (e.g. hall et al., ) . these theoretical studies have been verified by empirical work. for example, in monarch butterflies (danaus plexippus) that have lost migratory behaviour there is an increase in infection risk of a protozoan parasite (satterfield et al., ) . the reason for this conflict is unknown, however, one hypothesis is that these studies utilize a chronic disease model, whereas influenza is an acute infectious disease, and dynamics are driven largely by the herd immunity of the population (latorre-margalef et al., ; van dijk et al., ; wille et al., a) . there are a number of factors which may be important drivers in dynamics of diseases in urban environments, including the relationship between provisioning, stress, pollution and immune response which affect susceptibility and ability to fight infection, however these are challenging to disentangle (becker et al., ; bradley and altizer, ; delgado and french, ; patz et al., ) , and these factors in relation to rna virus dynamics need to be assessed. while iav has been intensively assessed, we are only starting to explore dynamics of other avian rna viruses. indeed, this is the first intraannual dataset exploring dynamics of cov, apmv- and astrov and, furthermore, the first comparison between wild and urban settings for cov and apmv- . it is also the second study assessing astrov in wild birds (chu et al., ) . given the economic implications of these viruses, our limited understanding of the dynamics and ecology of these viruses in wild birds is disquieting. perhaps unsurprising, overall annual trends in prevalence were similar for all viruses, and shared ecological drivers, such as those identified for iav (van dijk et al., ) , are the most parsimonious explanation for the shared patterns in long term dynamics of these viruses. that is, increased prevalence due to input of immunologically naïve birds into the system after breeding and aggregating of birds for autumn migration, and decreased prevalence in the winter following an increase in herd immunity (latorre-margalef et al., ; olsen et al., ; van dijk et al., ) . this dataset provides further evidence of the importance of waterfowl, urban or wild, in the epidemiology of cov and apmv- . it is only within the last years that cov have been assessed in wild birds and these viruses have largely been assessed using single time point studies, across a range of species, and using an array of different screening methods (e.g. chu et al., ; muradrasoli et al., ; wille et al., b) . given the long history of apmv- research (alexander et al., ) , and "accidental" isolation of this virus in iav studies (e.g. jindal et al., ; ramey et al., ) , it is known that these viruses are present in waterfowl, however accurate prevalence estimates are still rare (tolf et al., b; wille et al., ) . not all economically relevant avian rna viruses have been assessed in wild birds, and astroviruses are such an example. most strikingly, in this population, astrov might be more pervasive than iav, which has long been thought to be one of the most important rna viruses in wild waterfowl. not only was prevalence of astrov higher than iav, viruses were detected over a longer temporal interval. these viruses are particularly interesting due to the importance in poultry including chickens, turkeys and ducks, but the overall lack of assessment in wild birds leads to limited understanding of the epidemiology and ecology beyond food production birds. furthermore, this study corroborates chu et al. ( ) in that wild birds appear to contribute to the epidemiology of chicken or turkey "adapted" astrovirus strains. not only was the overall prevalence trend conserved across all viruses, the prevalence difference between wild and urban birds was also conserved for cov. this relationship between cov and iav, illustrated here by similar trends in urbanization could be due to a mutualistic relationship, that is prevalence of cov in waterfowl has been shown to be higher given infection with iav in wild migrating mallards (wille et al., ) . prevalence for apmv- was not significantly different between this urban population and a wild migratory population, however this could be driven by sample size constraintsthat is for a disease with a prevalence of b % a much larger sample size is required to adequately assess prevalence with confidence (hoye et al., ) . furthermore, given the scarcity of prevalence studies in wild birds and diversity of methods used, it is not certain if this trend in apmv- is due to methodological constraints or whether there are different drivers of apmv- ecology. indeed, there might be an inverse relationship between apmv- and iav prevalence, where apmv- prevalence increases when iav prevalence decreases in wild mallards (tolf et al., b; wille et al., ) . despite the factors associated with urbanization, this overall seasonal trend for these viruses, likely driven by shared ecological factors, remains clear. this study highlights our limited understanding of rna virus dynamics in birds in general, and more specifically, viruses in mallards. mallards are one of the most common avian species on the planet, which is owed to the fact that they are able to adapt to environments disturbed by human activities, and are a common sight in many cities (cramp and simmons, ; drilling et al., ) . mallards and other dabbling ducks are the natural reservoir for iav, are known to harbour high prevalence of iav in the wild, and may be implicated in the spread of highly pathogenic iav (latorre-margalef et al., ; van dijk et al., ; verhagen et al., ) . indeed, the h n iav isolated in this study was more similar to viruses isolated in asia than europe suggesting long distance dispersal prior to circulation in this urban duck pond. this study was undertaken in , prior to the influx of highly pathogenic h n which were carried by apparently healthy birds (verhagen et al., ) . given these urban mallards harbour "asian" iav there is certainly concern for zoonotic spillover. furthermore, it is not a stretch to imagine that mallards may also be reservoirs and important in the spread and dynamics of other economically relevant rna viruses such as cov, apmv- , and astrov. of all pathogens, rna viruses are the most likely to be zoonotic (woolhouse and gowtage-sequeria, ) , and it is in environments where humans are in close proximity to a high density of birds that zoonotic spillover is most likely to occur. for example, live-bird markets are central in the transmission of avian influenza viruses from birds to humans (wan et al., ) . the role of urban ducks, given low virus prevalence, is uncertain, however, to better understand the zoonotic risk a better understanding of the rna virus diversity and wildlife-pathogen dynamics in urban landscapes is crucial. newcastle disease in the european union the long view: a selective review of years of newcastle disease research history of highly pathogenic avian influenza urbanisation and infectious diseases in a globalised world animal migration and infectious disease risk capturing individual-level parameters of influenza a virus dynamics in wild ducks using multistate models association of rig-i with innate immunity of ducks to influenza too much of a good thing: resource provisioning alters infectious disease dynamics in wildlife linking anthropogenic resources to wildlife-pathogen dynamics: a review and meta-analysis urbanization and the ecology of wildlife diseases nutrition and immunology: from the clinic to cellular biology and back again avian coronavirus in wild aquatic birds a novel group of avian astroviruses in wild aquatic birds novel astroviruses in insectivorous bats mallard (anas platyrhynchos) parasite-bird interactions in urban areas: current evidence and emerging questions detection and molecular characterization of infectious bronchitis-like viruses in wild bird populations birds of north america online highly pathogenic avian influenza a subtype h n economic and social impacts of avian influenza, fao emergency centre for transboundary animal diseases operations h n highly pathogenic avian influenza global review. (empres/glew report understanding the ecological drivers of avian influenza virus infection in wildfowl: a continental-scale study across africa avian influenza viruses in water birds ducks are sentinels for avian influenza in wild birds consecutive natural influenza a virus infections in sentinel mallards in the evident absence of subtype-specific hemagglutination inhibiting antibodies a simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood greater migratory propensity in hosts lowers pathogen transmission and impacts quality matters: resource quality for hosts and the timing of epidemics surveillance of wild birds for avian influenza virus a -year study of avian influenza virus prevalence and subtype diversity in ducks of newfoundland the duck genome and transcriptome provide insight into an avian influenza virus reservoir species molecular evolution and emergence of avian gammacoronaviruses phylogenetic analysis of newcastle disease viruses isolated from waterfowl in the upper midwest region of the united states multiple alignment of dna sequences with mafft long-term variation in influenza a virus prevalence and subtype diversity in a migratory mallards in northern europe novel reassortant influenza a(h n ) viruses, south korea immunoglobulin genetics and antibody responses to influenza in ducks prevalence and phylogeny of coronaviruses in wild birds from the bering strait area (beringia) broadly targeted multiprobe qpcr for detection of coronaviruses: coronavirus is common among mallard ducks how urbanization affects the epidemiology of emerging infectious diseases global patterns of influenza a virus in wild birds molecular characterization of avian astroviruses reassortant highly pathogenic influenza a h n virus containing gene segments related to eurasian h n in british columbia unhealthy landscapes: policy recommendations on land use change and infectious disease emergence poor resource quality lowers transmission potential by changing foraging behaviour r: a language and environment for statistical computing. r foundtation for statistical computing genetic diversity and mutation of avian paramyxovirus serotype (newcastle disease virus) in wild birds and evidence for intercontinental spread loss of migratory behaviour increases infection risk for a butterfly host genetic diversity of newcastle disease virus in wild birds and pigeons in west africa development of a real-time reverse transcriptase pcr assay for type a influenza virus and the avian h and h hemagglutinin subtypes mega : molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods individual variation in influenza a virus infection histories and longterm immune responses in mallards prevalence of avian paramyxovirus type in mallards during autumn migration in the western baltic sea region unleashing the potential of urban growth minor differences in body condition and immune status between avian influenza virus-infected and noninfected mallards: a sign of coevolution? juveniles and migrants as drivers for seasonal epizootics of avian influenza virus avian influenza rapidly induces antiviral genes in duck lung and intestine how a virus travels the world avian influenza a virus in wild birds in highly urbanized areas indications that live poultry markets are a major source of human h n influenza virus infection in china evolution and ecology of influenza a viruses temporal dynamics, diversity, and interplay in three components of the viriodiversity of a mallard population: influenza a virus, avian paramyxovirus and avian coronavirus no evidence for homosubtypic immunity to influenza h in mallards following vaccination in a natural experimental system high prevalence and putative lineage maintenance of avian coronaviruses in scandinavian waterfowl frequency and patterns of reassortment in natural influenza a virus infection in a reservoir host rna-dependent rna polymerase gene analysis of worldwide newcastle disease virus isolates representing different virulence types and their phylogenetic relationship with other members of the paramyxoviridae host range and emerging and reemerging pathogens novel reassortant influenza a(h n ) viruses in domestic ducks, eastern china genomic analysis and surveillance of the coronavirus dominant in ducks in china we acknowledge the contribution of jon hessman and carolina stigwall to sample collection. positive controls were kindly provided by siamak zohari (sva) and camille lebarbenchon (inserm, reunion).ethical approval for trapping and sampling was obtained from the uppsala animal ethical committee (reference number c / ), and a permit was obtained from the city of uppsala to capture, and swedish museum of natural history to ring birds. this work was supported by the swedish research council (grant number - ) vr and the swedish research council formas (grant number - - ). supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . key: cord- -gk n slx authors: yadav, pragya; sarkale, prasad; patil, deepak; shete, anita; kokate, prasad; kumar, vimal; jain, rajlaxmi; jadhav, santosh; basu, atanu; pawar, shailesh; sudeep, anakkathil; gokhale, mangesh; lakra, rajen; mourya, devendra title: isolation of tioman virus from pteropus giganteus bat in north-east region of india date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: gk n slx bat-borne viral diseases are a major public health concern among newly emerging infectious diseases which includes severe acute respiratory syndrome, nipah, marburg and ebola virus disease. during the survey for nipah virus among bats at north-east region of india; tioman virus (tiov), a new member of the paramyxoviridae family was isolated from tissues of pteropus giganteus bats for the first time in india. this isolate was identified and confirmed by rt-pcr, sequence analysis and electron microscopy. a range of vertebrate cell lines were shown to be susceptible to tioman virus. negative electron microscopy study revealed the “herringbone” morphology of the nucleocapsid filaments and enveloped particles with distinct envelope projections a characteristic of the paramyxoviridae family. sequence analysis of nucleocapsid gene of tiov demonstrated sequence identity of . % and . % nucleotide and amino acid respectively with of tiov strain isolated in malaysia, . this report demonstrates the first isolation of tioman virus from a region where nipah virus activity has been noticed in the past and recent years. bat-borne viruses have become serious concern world-wide. a survey of bats for novel viruses in this region would help in recognizing emerging viruses and combating diseases caused by them. bat-borne viruses are considered to be important emerging viruses, as they can pose a serious threat to human and animal health. henipaviruses, coronaviruses, filoviruses and rabies-causing lyssaviruses are all transmissible from bats to humans. bats are primary reservoir host and often the resulting human disease is fatal. they are known to harbor more zoonotic viruses per species than rodents and recognized as a significant source of zoonotic agents (newman et al., ; calisher et al., ; mackenzie et al., ; pavri et al., ; mourya et al., ; raut et al., ; wynne and wang, ) . old world fruit bats of the family pteropodidae, particularly species belonging to the genus pteropus, have been considered as natural hosts for a large number of emerging viruses, especially of the family paramyxoviridae (calisher et al., ) . due to special characteristics, pteropus bats are the perfect reservoir for most of the recently emerging zoonotic pathogens. they often live in large colonies or roosts and travel long distances; thus they are very effective in transmitting viruses among colony members and disseminating them over a considerable distance. interactions among bats, humans and livestock are constantly increasing due to anthropogenic activities, thereby increasing the potential for transmission of viruses. deforestation in tropical areas destroyed the natural habitats of these fruit bat species thus forcing them to live in the vicinity of human settlements. the resulting close contact is responsible for the emergence of highly pathogenic paramyxoviruses, like hendra and nipah virus (niv) in human populations in southeast asia and australia (mackenzie et al., ) . paramyxoviridae is a family of viruses that comprises important pathogens like nipah virus, measles virus, human parainfluenza virus type and human respiratory syncytial virus (aguilar and lee, ) . while investigating niv in urine samples of giant fruit bats of the pteropus genus on tioman island, malaysia, in , researchers isolated a novel virus which was placed in the rubulavirus genus of the paramyxoviridae family. the virus was named as tioman virus (tiov) after the place of isolation from malaysia (chua et al., ) . in this communication, we report the isolation and confirmation of a tioman virus isolated from pteropus species of bats from north-east region of india. the scientific advisory committee, institutional biosafety committee, and institutional animal ethical committee of national institute of infection, genetics and evolution ( ) to determine the presence of niv in pteropus bats, a survey was conducted in two states of north-east region, india i.e. west bengal and assam states that share boundaries with bangladesh. the criteria for selection of the study areas were based on earlier reports of a niv seropositive bat from the myanaguri area in west bengal and confirmed human cases from siliguri and nadia districts of west bengal and roosting areas of pteropus bats (yadav et al., ; chadha et al., ) . sixty-eight pteropus bats were collected from jalpaiguri (n = ) and cooch behar (n = ) districts of west bengal and dhubri district (n = ) of assam on two occasions from march to may . mist nets were used to capture the bats. after capturing the bats, species identification and morphometry was done. further, the bats were euthanized and necropsies were performed in the field following proper biosafety measures. blood, organs (kidney, liver, and spleen), throat swabs, rectal swabs and urine samples were collected from bats. waste disposal was done following guidelines and proper precautionary measures. organ specimens were frozen in liquid nitrogen immediately after necropsy, while blood samples were kept at room temperature for min and centrifuged for min at approximately g. separated serum was aliquoted into labeled cryovials. vials of serum were transported at + °c in styrofoam box to national institute of virology (niv), pune, for further investigation. liver/spleen and kidney tissues of bats were homogenized in sterile minimum essential medium (mem; gibco) using a homogenizer (genogrinder ; bt&c inc., lebanon, nj, usa). further, tissue homogenates were centrifuged at rpm for min, and . ml of the supernatants was inoculated on to monolayers of vero ccl- cells grown in -well cell culture plates after removing the growth medium. the cells were incubated for h at °c to allow virus adsorption, with rocking every min for uniform virus distribution. after the incubation, the inoculum was removed and the cells were washed with × phosphate buffer saline (pbs). finally mem supplemented with % fetal bovine serum (fbs) was added to each well. the cultures were incubated further in % co incubator at °c and observed daily for cytopathic effects (cpe) under an inverted microscope. cultures that showed cpe were harvested and the suspension was centrifuged at rpm for min at °c; the supernatants were processed immediately or stored at − °c in ml aliquots. viral rna was extracted using tripure reagent and rna extraction kit (qiagen, valencia, ca, usa) as per the manufacturer's instruction. virus isolations were also attempted with other specimens of bats (throat and rectal swabs, urine), following the same protocol (chua et al., ) . cell culture supernatants and pellets of vero ccl- infected cells showing distinct cpe were examined by negative-stain transmission electron microscopy (tem) as described previously (brenner and horne, ; gangodkar et al., ) . to identify the virus isolate, various diagnostic tests were undertaken targeting niv and genus paramyxovirus using specific primers by rt-pcr (guillaume et al., ; tong et al., ) . further, the isolates were screened by rt-pcr using primers targeting the nucleocapsid gene and phosphoprotein gene of paramyxoviruses, as described earlier (chua et al., ) . amplified products were further sequenced targeting nucleocapsid and phosphoprotein gene. the sequences obtained by sequencing were curated using sequencher . (gene codes corporation, ann arbor, mi, usa) version software. the curated sequences were aligned using clustal w (embl-ebi, cambridgeshire, uk), and a phylogenetic tree was constructed using the neighbor-joining algorithm (kimura parameter model) with -bootstrap replicates as implemented by mega v . software (tamura et al., ) . in order to study susceptibility of different vertebrate cells to tiov, the infectious virus titer was determined by estimating % tissue culture infective dose (tcid ) using reed and muench method (reed and muench, ) . four vertebrate cell lines vero e- cells, pipistrellus ceylonicus bat embryo cells, baby hamster kidney- (bhk- ) and madine darbey canine kidney (mdck) cells were used for virus infection. the - % confluent monolayer of the cells was infected with multiplicity of infection (m.o.i) virus and observed for seven post-infection days. the methodology of cell infection by the virus was similar as mentioned above in section . . two passages of the virus were made in all cell lines in order to confirm the susceptibility of the cells. all the cells were studied for susceptibility based on cpe and further confirmed using real-time rt-pcr. to explore the possibility of propagation in embryonated chicken eggs, . ml of vero ccl- grown tiov was inoculated in the allantoic cavities of -day-old embryonated white-leghorn chicken eggs. the eggs were incubated at °c for days and were observed for sluggishness and mortality after every h. allantoic fluids from infected eggs were harvested after days of incubation and stored at − °c. first blind passage was performed and allantoic fluid was tested for tioman virus by real-time rt-pcr. cytopathic effect was (cpe) observed in vero ccl- cells inoculated with a kidney tissue homogenate of p. giganteus bat (nivan ). the characteristics of cpe included cell fusion and formation of syncytium with aggregation of the nucleolus. cpe was prominent on post-infection-day and cell detachment was observed on days post-infection- (dpi) (isolation dated st april ) (fig. ). the supernatant was tested by pcr, sequencing and electron microscopy for identification of the suspected virus isolate. negative contrast electron microscopy of the cell supernatant of vero ccl- infected with virus isolate showed the presence of virus particles with the typical paramyxovirus morphology. the "herringbone" morphology of the nucleocapsid filaments, a characteristic of the paramyxoviridae family, was clearly visible (fig. ) . distinct enveloped paramyxovirus particles with envelope projections of approximately nm in length were also visualized. out of bat sample processed, kidney samples from two bats (nivan , nivan ) were found to be positive by rt-pcr. pcr products of bp were observed for nucleocapsid gene of tiov. amplified products were further confirmed by sequencing partial nucleocapsid and phosphoprotein genes of tiov (gene bank accession no. kt , kt , kt . kt ). sequence analysis showed . % nucleotide sequence identity with both nucleoprotein gene and phosphoprotein gene sequences of the malaysian tiov isolate respectively (bat/ /genbank af ) (fig. ) . partial sequences of tiov phosphoprotein and nucleocapsid gene revealed that tiov strains from india and malaysia are from one lineage, it also makes up a clade with menagle virus while tuhoko, achimota and sosuga viruses make up a separate clade. tiov isolated from kidney tissue homogenate of bat showed a titer of . / μl by tcid in vero ccl- cell line. cpe-based susceptibility studies showed that all the studied vertebrate cell lines were susceptible to tiov with varying productivity. cpe in vero ccl- cell line became evident by nd dpi and there was total degeneration of cells by th dpi. vero e- cells, pipistrellus ceylonicus bat embryo cells and bhk- cell line showed cpe by th dpi. ps (porcine stable cell line) cells did not show cpe in first passage; however it showed distinct cpe at th dpi in the second passage. mdck cells showed growth of tiov with rounding and detachment of cells within h post-infection. however, susceptibility study by cpe showed that tiov grows faster in vero ccl- cells in comparison with other vertebrate cells. the study of susceptibility of different vertebrate cells to tiov indicated that vero ccl- cell lines are best suited for propagation of tiov. this may be useful for viral replication studies in future. embryonated eggs did not show any sluggishness or mortality in the initial passage and first blind passage. real-time rt-pcr using tiov specific primers and probe on the allantoic fluid of both the passages did not show any virus amplification. this showed that tiov did not grow in embryonated eggs. the present study reports the isolation of tiov from pteropus giganteus bat from dhubri, assam, india (fig. ) ; this is the second report of tiov isolation besides malaysia (chua et al., ) . tiov is antigenically related to menangle virus (bowden and boyle, ) which is also harbored by pteropid fruit bats; the menangle virus caused an outbreak of fetal deformities in pigs in australia in (philbey et al., ) . all the above-named pteropus-borne viruses group in a single clade, which separates them from other paramyxoviruses. the bats and flying foxes belonging to the order chiroptera are ecologically remarkable. they are among the most abundant, diverse and geographically dispersed vertebrates and are natural reservoirs for a number of highly pathogenic zoonotic viruses. bats are known to have persistent viral infections at a rate higher than other mammals, possibly due to shorter antibody half-life in these animals (calisher et al., ) . detailed studies are needed on their importance as reservoirs of viruses and their potential to harbor important pathogen causing human and animal diseases. there is scanty information available regarding the hosts, reservoirs and transmission of tiov, though direct transmission via ingestion of fruit by humans has been suggested (lehle et al., ) . however, batto-human transmission of tiov has not yet been reported. neutralizing antibodies against tiov have been detected in human serum samples from tioman island in malaysia, from where the virus was first isolated (yaiw et al., ) . tiov's estimated prevalence of . % is suggestive of its potential to cause subclinical infection in humans. experimental studies have shown that tiov is capable of infecting and replicating in pigs and its main cellular targets are lymphocytes, thymic epithelioreticular cells and the tonsillar epithelium in these animals (yaiw et al., ) . hence, pigs could act as an intermediate or amplifying host for human transmission, as has happened during menangle virus and niv outbreaks (parashar et al., ) . during niv outbreaks in malaysia, pigs played a critical role in transmitting the disease to pig handlers by direct contact. pig farms are a source of daily livelihood for a large number of populations in assam and other states like nagaland. the nagaland pig production and marketing project is funded by the national agricultural innovation project with a contribution from the international fund for agricultural development and aims to develop sustainable solutions to livelihood improvement in one of the poorest districts in india. pig farming was rampant during the year and being a good reservoir of many diseases in recent past the number of japanese encephalitis cases and outbreaks were increased in these areas. undetected mild tiov infection could occur in naturally infected pigs and this could facilitate viral transmission to humans via contact with oral secretions; this transmission could cause serious illness by crossing the species barrier. therefore, the role of bats and pigs in transmitting viruses to humans in asia needs to be determined. although no evidence of tiov illness in humans or animals exists, tiov's close relationship to other disease-causing bat paramyxoviruses, including niv, suggests the possibility that it too may cross the species barrier (bowden and boyle, ) . our study has shown the presence of tiov by highlighting its isolation from pteropus bat from dhubri district, assam india. the presence of large colonies of pteropus bats in close proximity of human settlements warrants implementation of necessary steps for detection and identification of emerging bat-borne viruses circulating in north-east region of india. emerging paramyxoviruses: molecular mechanisms and antiviral strategies completion of the full-length genome sequence of menangle virus: characterization of the polymerase gene and genomic -trailer region a negative staining method for high resolution electron microscopy of viruses bats: important reservoir hosts of emerging viruses nipah virus-associated encephalitis outbreak tioman virus, a novel paramyxovirus isolated from fruit bats in malaysia isolation of nipah virus from malaysian island flying-foxes. microbes infect dengue virus induced autophagosomes and changes in endomembrane ultrastructure imaged by electron tomography and whole-mount-grid cell culture techniques specific detection of nipah virus using real-time rt-pcr (taq man) henipavirus and tioman virus antibodies in pteropodid bats, madagascar managing emerging diseases borne by fruit bats (flying foxes), with particular reference to henipaviruses and australian bat lyssavirus malsoor virus, a novel bat phlebovirus, is closely related to severe fever with thrombocytopenia syndrome virus and heartland virus investigating the role of bats in emerging zoonoses: balancing ecology, conservation and public health interest. fao animal production and health manual no. case-control study of risk factors for human infection with a new zoonotic paramyxovirus, nipah virus, during a - outbreak of severe encephalitis in malaysia isolation of a new parainfluenza virus from a frugivorous bat, rousettus leschenaulti, collected at poona, india. am an apparently new virus (family paramyxoviridae) infectious for pigs, humans, and fruit bats isolation of a novel adenovirus from rousettus leschenaultii bats from india a simple method of estimating fifty percent endpoints mega : molecular evolutionary genetics analysis (mega) software version . sensitive and broadly reactive reverse transcription-pcr assays to detect novel paramyxoviruses bats and viruses: friend or foe? detection of nipah virus rna in fruit bat (pteropus giganteus) from india serological evidence of possible human infection with tioman virus, a newly described paramyxovirus of bat origin tioman virus, a paramyxovirus of bat origin, causes mild disease in pigs and has a predilection for lymphoid tissues authors express their sincere gratitude to the secretary and director general, indian council of medical research, new delhi for her continuous support. we would like to acknowledge icmr for funding extramural project 'multi-site epidemiological and virological survey of nipah virus: special emphasis on north-east region of india' (grant number : ). authors are grateful to dr. ms chadha (scientist 'f'& head of department), influenza department for continuous guidance and support and dr. r laxminarayanan, senior administrative officer, niv, pune, for rendering logistic support. technical assistance rendered by divya bhattad, kumar bagmare, amita bargat, shital melag and uk shende (laboratory) is gratefully acknowledged. the authors declare that they have no competing interests. key: cord- -whw pq f authors: torres, orlando a.; calzada, josé e.; beraún, yasmina; morillo, carlos a.; gonzález, antonio; gonzález, clara i.; martín, javier title: role of the ifng + t/a polymorphism in chagas disease in a colombian population date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: whw pq f genetic susceptibility to trypanosoma cruzi infection and the development of cardiomyopathy is complex, heterogeneous, and likely involves several genes. previous studies have implicated cytokine and chemokine genes in susceptibility to chagas disease. here we investigated the association between the interferon-gamma gene (ifng) + t/a polymorphism and chagas disease, focusing on susceptibility and severity. this study included chagasic patients (asymptomatic, n = ; cardiomyopathic, n = ) and healthy controls from a colombian population where t. cruzi is highly endemic. individuals were genotyped for functional single nucleotide polymorphism (snp; rs ; a/t) of the ifng gene by amplification refractory mutational system pcr (arms-pcr). moreover, clinical manifestations of chagas in patients were analyzed. we found a significant difference in the distribution of the ifng + “a” allele between patients and healthy controls (p = . ; or = . , % ci, . – . ). the frequency of the ifng + genotype a/a, which is associated with reduced production of interferon-gamma, was increased in the patients relative to controls ( . % vs. . %). we compared the frequencies of ifng alleles and genotypes between asymptomatic patients and those with chagasic cardiomyopathy and found no significant difference. our data suggest that the ifng + t/a genetic polymorphism may be involved in susceptibility but not in the progression of chagas disease in this colombian population. chagas disease, also known as american trypanosomiasis, is caused by infection with the protozoan parasite trypanosoma cruzi (who, ) . more than million people carry the protozoan organism t. cruzi, which multiplies inside cells, particularly of heart and smooth muscle (who, a,b) . chagas disease has a broad spectrum of clinical presentations, ranging from asymptomatic infections to life-threatening cardiac and digestive disease. the type of clinical presentation varies by geographical region (prata, ) . following the acute phase, patients enter the chronic phase. up to years after the initial infection, $ % of infected people develop pathological signs characteristic of chagas disease. autoimmunity, granulocytic cell activation, tissue damage caused by t. cruzi, neurogenic factors, and microvascular disturbance have been reported in association with the development of the chronic features of the disease (kierszenbaum, ) . the mechanisms responsible for the susceptibility to infection and the clinical heterogeneity observed among infected individuals are not well understood, but substantial evidence suggests that differences in the expression of genes related to the immune response may be involved. previous studies have implicated cytokine and chemokine genes in determining increased susceptibility and further development of chagasic heart disease (calzada et al., ramasawmy et al., ; torres et al., ) . nevertheless, genetic susceptibility to t. cruzi infection and the development of cardiomyopathy is complex, heterogeneous, and likely involves several genes (nieto et al., ) . interferon-gamma (ifn-g) is a multifunctional cytokine, which is produced by effector t and natural killer cells. ifn-g controls the development of t helper (th ) cells and is critical for host defence against a variety of intracellular pathogens, including t. cruzi infection (silva et al., ; torrico et al., ) . the human ifng ifn-g chagas disease single nucleotide polymorphism (snp) genetics association study a b s t r a c t genetic susceptibility to trypanosoma cruzi infection and the development of cardiomyopathy is complex, heterogeneous, and likely involves several genes. previous studies have implicated cytokine and chemokine genes in susceptibility to chagas disease. here we investigated the association between the interferon-gamma gene (ifng) + t/a polymorphism and chagas disease, focusing on susceptibility and severity. this study included chagasic patients (asymptomatic, n = ; cardiomyopathic, n = ) and healthy controls from a colombian population where t. cruzi is highly endemic. individuals were genotyped for functional single nucleotide polymorphism (snp; rs ; a/t) of the ifng gene by amplification refractory mutational system pcr (arms-pcr). moreover, clinical manifestations of chagas in patients were analyzed. we found a significant difference in the distribution of the ifng + ''a'' allele between patients and healthy controls (p = . ; or = . , % ci, . - . ). the frequency of the ifng + genotype a/a, which is associated with reduced production of interferon-gamma, was increased in the patients relative to controls ( . % vs. . %). we compared the frequencies of ifng alleles and genotypes between asymptomatic patients and those with chagasic cardiomyopathy and found no significant difference. our data suggest that the ifng + t/a genetic polymorphism may be involved in susceptibility but not in the progression of chagas disease in this colombian population. ß elsevier b.v. all rights reserved. gene on chromosome q . spans . kb and contains four exons that encode a -aa protein. several polymorphisms within the ifng non-coding regions, such as + a/t, ca repeat microsatellite and À t/g, have been implicated in numerous autoimmune and chronic inflammatory conditions (chong et al., ; pacheco et al., ; pravica et al., ) . a single nucleotide polymorphism (snp) located in the first intron of the human ifng gene at the end, adjacent to a ca repeat region (+ t/a polymorphism rs ), can influence the secretion of ifn-g (pravica et al., ) . analysis of the biological role of this snp suggested that + a allele carriers are low ifn-g producers (lopez-maderuelo et al., ) . susceptibility to other infectious diseases like severe acute respiratory syndrome (sars) have also been described (chong et al., ) suggesting that variability in ifn-g production linked to this snp is possibly playing a major role in susceptibility to infectious diseases, especially intracellular pathogens. due to this, we selected the + t/a polymorphism of ifng to assess the potential association of this snp in the susceptibility and/or clinical features of chagas disease in a colombian population from an endemic area. this study included patients from the province of santander, colombia, divided into serologically negative and positive for t. cruzi antigens. both seropositive and seronegative patients were from rural area of an endemic region in northeastern colombia, the samples were collected directly in the same villages, where approximately % of individuals are seropositive for t. cruzi infection (gutierrez et al., ) . all participants were older than years. the mean age of the seronegative group was years, the mean age of the asymptomatic group was . years, and the mean age of the cardiomyopathic group was . years. a total of % of asymptomatic and % of cardiomyopathic were female. the serological diagnosis was based on results of two independent tests, enzyme-linked immunosorbent assay and indirect hemagglutination test (who, a,b) . patients were classified according to clinical and electrocardiographic characteristics. those without cardiac symptoms (n = ) and with a normal electrocardiogram (ecg) were classified as asymptomatic. patients that by clinical evaluation, ecg, holter monitoring ( h) and echocardiogram showed conduction alterations and/or structural cardiomyopathy were included in the cardiomyopathic or symptomatic group (n = ) as follows: cc ii (n = , radiology indicative of light heart hypertrophy or minor ecg alterations), cc iii (n = , moderate heart hypertrophy and considerable ecg alterations, mainly advanced conduction abnormalities) and cc iv (n = , severe cardiomegaly and marked ecg alterations, predominantly frequent and/or complex forms of ventricular arrhythmia) . all the individuals are from the same geographic region and have been living there for more than years and they shared the same environmental and socioeconomic living conditions. the population from this region is homogeneous and there is not concentration of ethnical groups such as indigenous or black population. the population's structuring was determined by the arlequin program (excoffier et al., ) . all the subjects were included in this study after written informed consent. we obtained approval for the study from all local ethical committees. genomic dna was isolated from ml of edta-anticoagulated blood sample using the standard salting-out technique (miller et al., ) . ifng + a/t (rs ) polymorphism was determined by amplification refractory mutational system (arms) pcr method followed by gel electrophoretic analysis as described previously (pravica et al., ) . the following primers were used for amplification: -tcaacaaagctgatactcca- (consensus primer), -ttcttacaacacaaaatcaaatca- (a allele specific), -ttcttacaacacaaaatcaaatct- (t allele specific). amplification yielded a -bp pcr product. primers amplifying human growth hormone (f: -gccttcccaaccattccctta- and r: -tcacggatttctgttgtgtttc- ), yielding a -bp pcr product, were utilised as an internal control. the pcr conditions consisted of an initial denaturation step at c for min, cycles of incubation at c for s, c for s and c for s, followed by cycles of incubation at c for s, c for s and c for s, with a final extension at c for min. the amplified products were visualised by electrophoresis using % agarose gels containing ethidium bromide. the power of the sample size was calculated using the quanto software, version . . using an unmatched ( : . ) case-control design, and a gene only hypothesis. we calculated power for analyzed snp to confirm the effect. in our case-control study, we had a power of . to detect a modest effect sizes (or = . ), assuming a two-sided a-level of . and a dominant heredity pattern. allele and genotype frequencies were obtained by direct counting. we assessed the quality of the genotype data by testing for hardy-weinberg equilibrium in the case and control samples, using fisher's exact test (p > . ). differences between allele and genotype frequencies were determined using a x test. odds ratios and % confidence intervals were calculated according to woolf's method. the software statcalc epiinfo (centers for disease control and prevention, atlanta, ga) was used for statistical analyses. a p-value < . was considered statistically significant. the ifng + a/t genotype and allele frequencies for chagas patients and healthy controls as well as for cardiac and asymptomatic patients are listed in tables and , respectively. the genotype frequencies of the polymorphism studied were not found to be significantly different from those predicted by the hardy-weinberg equilibrium among healthy controls or patients. to ensure the absence of population substructure we estimated the fst using approximately different markers to ifn-g, we found that the population from this region is a homogeneous mixture and there is not concentration of ethnical groups (fst . ). we found a statistically significant difference in the distribution of the a/a genotype (low production of ifn-g) and the a allele at the ifng polymorphism between chagas patient and control groups. these findings suggest a genetic influence of this polymorphism on t. cruzi infection susceptibility. the a/a genotype among individuals with the ifng + a/t polymorphism was significantly more prevalent in chagas patients than in controls (p = . ; or = . , % ci = . - . ) ( table ). in addition, the ifng a allele showed evidence of association with chagas disease (p = . ; or = . , % ci, . - . ). to investigate the possible influence of the ifng + a/t polymorphism on the development of cardiomyopathy, ifng genotype and allele frequencies between asymptomatic patients and those with chagasic cardiomyopathy were compared. no significant difference was observed in the distribution of alleles or genotypes of the ifng + a/t polymorphism among cardiac and asymptomatic individuals, indicating no influence of this polymorphism on chagas disease progression (table ) . a significant amount of evidence indicates that susceptibility to chagas disease or other infectious diseases may be related to genetic variability at cytokine loci (florez et al., ; karplus et al., ; zafra et al., ) . control of chagas infection requires both humoral and cell-mediated immunity directed by a type cytokine response (kumar and tarleton, ) . endogenous ifn-g and tnfa play critical roles in the control of the infection through a mechanism including release of free radicals (silva et al., ) . in this work, we genotyped a snp located within the first intron of the human ifng gene at the end, adjacent to a ca repeat region (+ t/a). the location of this polymorphism coincides with a putative nf-kb binding site, which might have functional consequences on the transcription of the human ifng gene (pravica et al., ) . indeed, the t allele at the ifng gene was shown to be associated with higher ifn-g protein production and the a allele with lower ifn-g protein production in healthy individuals (lopez-maderuelo et al., ; pravica et al., ) . in this study, the frequency of the a allele or a/a genotype, coding for low production of ifn-g, was found to be higher in patients with chagas disease than in healthy individuals, indicating that this allele may be a risk factor for genetic susceptibility to chagas disease. resistance to acute infection with t. cruzi has been shown to be dependent on ifn-g, which activates macrophages to produce nitric oxide (no) and kill the obligate intracellular amastigote form of the parasite (torrico et al., ) . in addition, tnf-a provides a second signal that stimulates no production and anti-t. cruzi activity in ifn-g-activated macrophages (silva et al., ) . this mechanism would explain the higher susceptibility to t. cruzi infection among individuals carrying the a allele compared with individuals carrying the ''t'' allele. similar results have been reported for other infectious diseases, such as pulmonary tuberculosis and severe acute respiratory syndrome (chong et al., ; lopez-maderuelo et al., ) . previous studies have shown an association between ifng genetic polymorphisms and severity or progression of diseases, including diseases of severe acute respiratory syndrome and hepatitis b infection (chong et al., ; ribeiro et al., ) . contrary to expectations, no significant differences were observed in the distribution of alleles or genotypes of the ifng + t/a polymorphism between cardiomyopathic and asymptomatic patients with chagas disease, indicating no influence of this polymorphism on chagas disease progression. a larger sample size may be required in order to establish whether a cause-effect association exists between this polymorphism and to development of cardiomyopathy. consistent with our result, d' avila et al. ( ) found no difference in ifn-g production between cardiac and asymptomatic patients. complex interactions take place following parasite infection, predicting that the clinical course of the disease cannot be explained by a single mechanism. consistent with this prediction, interleukin il- and tgf-b are associated with susceptibility to infection (cardillo et al., ) by inhibiting ifn-g-mediated macrophage activation. therefore, not only the presence of ifn-g, per se, but also the secretion levels of others cytokines (e.g., il- , il- , tgf-b, and tnf-a) constitute key factors in the immunoregulation of the host-parasite relationship (gomes et al., ; martin et al., ; rodriguez-perez et al., ) . in conclusion, our data suggest that the ifng + t/a genetic polymorphism may be involved in susceptibility to t. cruzi infection in the south american population studied here. however, the association between polymorphisms and disease progression is still unclear. given the crucial role of ifn-g in the inflammatory response, further studies on other functional polymorphisms of ifng and the genes coding for the ifn-g receptors are required to clarify the role of ifn-g in the pathogenesis of chagas disease. none. transforming growth factor beta (tgfbeta ) gene polymorphisms and chagas disease susceptibility in peruvian and colombian patients chemokine receptor ccr polymorphisms and chagas' disease cardiomyopathy regulation of trypanosoma cruzi infection in mice by gamma interferon and interleukin : role of nk cells the interferon gamma gene polymorphism + a/t is associated with severe acute respiratory syndrome immunological imbalance between ifn-gamma and il- levels in the sera of patients with the cardiac form of chagas disease arlequin (version . ): an integrated software package for population genetics data analysis interleukin- gene cluster polymorphism in chagas disease in a colombian case-control study evidence that development of severe cardiomyopathy in human chagas' disease is due to a th -specific immune response comparison of four serological tests for the diagnosis of chagas disease in a colombian endemic area association between the tumor necrosis factor locus and the clinical outcome of leishmania chagasi infection chagas' disease and the autoimmunity hypothesis the relative contribution of antibody production and cd + t cell function to immune control of trypanosoma cruzi interferon-gamma and interleukin- gene polymorphisms in pulmonary tuberculosis tgf-beta regulates pathology but not tissue cd + t cell dysfunction during experimental trypanosoma cruzi infection assessment of aryl hydrocarbon receptor complex interactions using pbevy plasmids: expressionvectors with bi-directional promoters for use in saccharomyces cerevisiae hla haplotypes are associated with differential susceptibility to trypanosoma cruzi infection ifng + t/a, il - g/a and tnf À g/a polymorphisms in association with tuberculosis susceptibility: a meta-analysis study clinical and epidemiological aspects of chagas disease in vitro production of ifn-gamma correlates with ca repeat polymorphism in the human ifn-gamma gene a single nucleotide polymorphism in the first intron of the human ifn-gamma gene: absolute correlation with a polymorphic ca microsatellite marker of high ifn-gamma production the monocyte chemoattractant protein- gene polymorphism is associated with cardiomyopathy in human chagas disease association of cytokine genetic polymorphism with hepatitis b infection evolution in adult patients clinical management of chronic chagas cardiomyopathy tumor necrosis factor-alpha promoter polymorphism in mexican patients with chagas' disease interleukin and interferon gamma regulation of experimental trypanosoma cruzi infection tumor necrosis factor alpha mediates resistance to trypanosoma cruzi infection in mice by inducing nitric oxide production in infected gamma interferon-activated macrophages association of the macrophage migration inhibitory factor À g/c polymorphism with chagas disease endogenous ifn-gamma is required for resistance to acute trypanosoma cruzi infection in mice control of chagas disease control of chagas disease. world health organ tech rep ser. , i-vi who expert committee on specifications for pharmaceutical preparations polymorphism in the utr of the il b gene is associated with chagas' disease cardiomyopathy we thank sofía vargas for excellent technical assistance and the patients for their essential contribution.financial support: this work was supported by the junta de andalucía, group cts- and grant - - from colciencias. key: cord- - vm fgy authors: lee, in-hee; lee, ji-won; kong, sek won title: a survey of genetic variants in sars-cov- interacting domains of ace , tmprss and tlr / / across populations date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: vm fgy the covid- pandemic highlighted healthcare disparities in multiple countries. as such morbidity and mortality vary significantly around the globe between populations and ethnic groups. underlying medical conditions and environmental factors contribute higher incidence in some populations and a genetic predisposition may play a role for severe cases with respiratory failure. here we investigated whether genetic variation in the key genes for viral entry to host cells—ace and tmprss —and sensing of viral genomic rnas (i.e., tlr / / ) could explain the variation in incidence across diverse ethnic groups. overall, these genes are under strong selection pressure and have very few nonsynonymous variants in all populations. genetic determinant for the binding affinity between sars-cov- and ace does not show significant difference between populations. non-genetic factors are likely to contribute differential population characteristics affected by covid- . nonetheless, a systematic mutagenesis study on the receptor binding domain of ace is required to understand the difference in host-viral interaction across populations. coronavirus disease caused by sars-cov- is a pandemic as of mar. . initial reports from china revealed diverse risk factors, clinical courses and outcome for a relatively homogenous population (zhou et al., a) . morbidity and mortality vary between populations (yancy, ) . african americans and latinos are disproportionately affected by covid- and show significantly higher mortality compared to the other race and ethnic groups in the us (wadhera et al., ) and in the uk (kirby, ) . a -healthcare disparity‖ must be responsible for the high incidence among minorities although socioeconomic factors, underlying medical conditions, and the difference in genetic susceptibility to sars-cov- infection may contribute (chen et al., ) . of note, a p . gene cluster-slc a , lztfl , ccr , fyco , cxcr and xcr -is associated with genetic susceptibility for severe covid- cases with respiratory failure (ellinghaus et al., ) . to find allelic variation across populations in the genes that are known be involved in viral entry to the host cells and sensing of viral rna in host immune cells, we surveyed publicly available databases of genomic variants. sars-cov- is an enveloped and positive single-stranded rna (ssrna) virus and initiates human cell entry by binding of spike (s) protein present on the viral envelope to angiotensin converting enzyme (ace ) receptor on the host cells (zhou et al., b) . the sars-cov s protein/ace interface has been elucidated at the atomic level, and the ace was found to be a key factor of sars-cov transmission (li et al., b) . the binding mode of sars-cov- receptor binding domain (rbd) to ace is nearly identical to sars-cov (lan et al., ) . the s protein is cleaved into s and s by the type transmembrane serine protease (tmprss ) and endosomal cysteine proteases cathepsin b and l (catb/l) (du et al., ) . tmprss is believed to be of utmost importance for sars-cov- entry into host cells. recent studies demonstrated that an inhibitor of the protease activity of tmprss -camostat mesylate-attenuated sars-cov- entry into lung epithelial cells suggesting a promising candidate for potential intervention against covid- (hoffmann et al., ) . the c-terminal domain of s subunit is responsible for binding of sars-cov- to ace and the s subunit undergoes a conformational change that result in virus-membrane fusion and entry into the target cell (du et al., ). viral genomic rna j o u r n a l p r e -p r o o f journal pre-proof is then released and translated into viral polymerase proteins for viral replication. innate immune response is the first line of host defense mechanism for sars-cov- infection. toll-like receptors recognize the viral rnadouble-stranded rna (dsrna) by tlr and ssrna by tlr and tlr and trigger innate immune responses such as the expression of inflammatory genes for type i interferons and proinflammatory cytokines (iwasaki and pillai, ; iwasaki and yang, ) . here we surveyed the genetic variants in functional residues of ace , tmprss , ctsb/l (catb/l), and tlr / / to investigate the difference in the genetic predisposition to the susceptibly of sars-cov- infection and the initiation of innate immune response. for ace , we investigated genetic variants in the residues on the interface to sars-cov- rbd from recent structural analyses (hussain et al., ; lan et al., ; shang et al., ; wrapp et al., ; yan et al., ) . given the high sequence similarity between s proteins of sars-cov- and sars-cov, we also investigated the residues shown to inhibit interactions from in vitro mutagenesis analysis (li et al., b) . we checked two residues reported to cause loss of cleavage activity of tmprss (afar et al., ) and the enzymatically active sites for catb/l. a total of residues of tlr that are necessary for ssrna-induced activation (zhang et al., ) and the residues affecting reaction to ssrnas from in vitro mutagenesis studies for tlr (bell et al., ; de bouteiller et al., ; sarkar et al., ) and for tlr (tanji et al., ) were checked for sequence variation. additionally, we searched for nonsynonymous variants that would cause loss of gene function (i.e., frameshift, in-frame insertion/deletion, stop-gain, splice-disrupting, start-lost and stop-lost). the list of reported genetic variants in the genes and their allele frequencies (afs) were ace is highly conserved with few nonsynonymous variants in the interacting domain with the sars-cov- rbm (lan et al., ) . of coding variants in ace , were nonsynonymous variants with the highest af of . % (rs ). within residues interfacing the sars-cov- rbm, variants (including synonymous variants) were found with average af of . % (ranges - . %) ( table ) . only one of the variants (rs ; k r) had global af greater than . % (af= . %). rs (nc_ . :g. t>c) had the largest af difference across populations: the lowest af ( . %) in east asian and the highest ( . %) in non-finnish european. the impact of this variant is not yet investigated with structural analysis but was not classified as deleterious (of possible impact on the structure and function of the protein) by in silico prediction algorithms such as sift and polyphen . the other variants were either very rare (i.e., population af < . %) or unique to a population or two. for the five known residues-k , e , d , m and k that were reported to significantly change binding affinity to viral s protein (li et al., a) , we found three variants: rs (k k), rs (e k), and rs (m i). however, all three were either synonymous or predicted to have little impact on protein. rs showed significant af difference across populations, especially among east asian populations. it is found only among east asian individuals in gnomadconsists of , korean, japanese, and , other east asian individualswith af of . %. the variant is also found at korean reference genome database (n= , ) with af of . %, similar value to gnomad. however, it was found with higher af of . % at japanese genetic variation database (n= , ). rs at residue was found only in european and east asian populations with very low frequencies: . % and . %, respectively. lastly, rs at residue was found only in african population (af= . %). nonetheless, protein modeling predicts little topological difference between all ace variants and wild-type ace in their binding to s protein (hussain et al., ) . therefore, we expect minimal genetic variance across populations critically affecting interaction between ace and sars-cov- . figure a illustrates the variants over known functional protein domains of ace . the proteolysis activity of tmprss is crucial for viral entry to host cells (hoffmann et al., ) . two residues, v and m , are reported to impact the catalytic activity of tmprss (afar et al., ) but we found no variants at these residues (supplementary table ). reported variants for tmprss contain nonsynonymous variants including loss-of-function variants. all of loss-offunction variants were very rare (af < . %). the rest of nonsynonymous variants were also of low frequencies (af < . %) mostly. of the only nonsynonymous variants with af > . %, rs (v m, global af= . %) predicted deleterious and its af ranged from . % (latino) to . % (east asian). further studies are required to test whether rs could exert functional impact on tmprss activity. thus, differences in tmprss activity caused either by variants at critical loci or by loss-of-function variants are unlikely. sars-cov- uses both tmprss and the endosomal cysteine proteases cathepsin b and l (ctsb and ctsl) for priming s protein (hoffmann et al., ) . uniprot entries for human ctsb and ctsl report active sites. we found variants in the active sites for ctsb (two missense variants and one synonymous variant), and one missense variant for ctsl ( table and figure b) . although all missense variants on active sites of ctsb/l are predicted deleterious, they were of very low allele frequencies (af < . %). ctsb has nonsynonymous variants including lossof-function variants (all with af < . %). ctsl has nonsynonymous variants including loss-offunction variants. of note, one of variants in ctsl (rs , nc_ . :g. a>c) is a common allele (global af of . %, population af ranges from . % to . %). the variant changes stop codon to serine for one ctsl transcript isoform (enst . ) but falls in intron for the other transcript isoforms. next we checked genetic variants in tlrs that sense viral rnas and initiate innate immune responses. there were variants- synonymous and nonsynonymous-in the residues of ssrna interacting domain of tlr ( table and figure c ). most variants were of extremely low frequencies (af < . %) except for one synonymous variant, rs (d d), found only in east asian population (af= . %). tlr harbors nonsynonymous variants including loss-of-function variants. as in tmprss , afs of loss-of-function variants were also very low (af < . %). the uniprot entries for tlr and tlr list sites ( for tlr (bell et al., ; de bouteiller et al., ; sarkar et al., ) and for tlr (tanji et al., ) ) from in vitro mutagenesis study that impact their response to viral infection (sensing of dsrna or ssrna, respectively). for these loci, two missense variants on tlr and one missense variant with one synonymous variant on tlr were found (table and figure c ). all of these variants in tlrs were very rare (af < . %) across all populations. to summarize, the critical loci for host-viral interaction and sensing viral genomic rna are highly conserved in all populations with few very rare variants. especially, ace and tlr seem to be under strong selection pressure as reflected in their relatively lower number of loss-of-function variants than expected in large variant databases such as gnomad (karczewski et al., ) : three observed variants out of expected ones for ace and two observed variants out of . expected ones for tlr . moreover, nonsynonymous variants in these genes were mostly of very low frequencies which suggests the chance of gene function altered by these variants would be unlikely, compared to the incidence of covid- around the globe. other factors such as existing medical conditions and environmental risk factors could contribute the regulation of expression of these key genes in susceptible individuals; however, further studies are required to elucidate potential associations. the majority of infected individuals experience no or mild symptoms of upper respiratory tract infection; however, for some individuals, the consequence of sars-cov- infection could be fatal. one of the contributing factors may be the viral load due to differential affinity of viral spike proteins to ace and the efficiency of cleavage by tmprss that are essential for virus to enter and replicate inside of host cells. we did not find genetic variation between populations while there is a significant difference in incidence and mortality between race and ethnic groups in the u.s. therefore, underlying medical conditions, age, environmental factors (e.g., air pollution, smoking, and humidity), and a healthcare disparity influence morbidity and mortality from covid- considering the allelic spectrum for the key j o u r n a l p r e -p r o o f journal pre-proof genes associated with viral entry. nonetheless, genetic susceptibility may play a role for severe cases with respiratory failure (ellinghaus et al., ) . the population-scale genotype databases and datasets used in this study have limitations from relatively small sample size and imbalanced and incomplete representation of various human populations. thus, there could be unreported variants in ace , tmprss , and tlr / / that may be associated with change of susceptibility to covid- . with additional population-scale genomic databases for diverse populations, it will be possible to identify the individuals with rare genetic variants such as rs in the interacting domain of ace and the genetic predisposition to cytokine storm that causes an acute progress of illness in young people. in parallel, a systematic mutagenesis analysis of the rbm of ace is highly required to understand the difference in host-viral interaction across populations (lan et al., ) . j o u r n a l p r e -p r o o f [ ] kg p [ ] sg dp [ ] gte x [ ] krg db [ ] togo var [ ] globa l afric an europ ean east asia n south asian figure catalytic cleavage of the androgen-regulated tmprss protease results in its secretion by prostate and prostate cancer epithelia the dsrna binding site of human toll-like receptor epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study the international genome sample resource (igsr): a worldwide collection of genome variation incorporating the genomes project data biospecimen collection source site, n., biospecimen collection source site recognition of double-stranded rna by human toll-like receptor and downstream receptor signaling requires multimerization and an acidic ph the spike protein of sars-cov--a target for vaccine and therapeutic development sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor structural variations in human ace may influence its binding with sars-cov- spike protein innate immunity to influenza virus infection the potential danger of suboptimal antibody responses in covid- krgdb: the largescale variant database of koreans based on whole genome sequencing the mutational constraint spectrum quantified from variation in , humans evidence mounts on the disproportionate effect of covid- on ethnic minorities structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structure of sars coronavirus spike receptor-binding domain complexed with receptor receptor and viral determinants of sars-coronavirus adaptation to human ace the simons genome diversity project: genomes from diverse populations two tyrosine residues of tolllike receptor trigger different steps of nf-kappa b activation structural basis of receptor recognition by sars-cov- tolllike receptor senses degradation products of single-stranded rna variation in covid- hospitalizations and deaths across new york city boroughs cryo-em structure of the -ncov spike in the prefusion conformation structural basis for the recognition of sars-cov- by full-length human ace covid- and african americans structural analysis reveals that toll-like receptor is a dual receptor for guanosine and single-stranded rna clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study allele frequencies for european are from non-finnish european population expression project (gtex), v whole genomes nbdc's integrated database of japanese genomic variation (togovar) based on mutagenesis studies from uniprot protein information for q byf (ace _human) the ligand-binding sites for small ligands and ssrna from zhang based on active sites from uniprot protein information for p (catb_human) based on active sites from uniprot protein information for p (catl _human) based on mutagenesis studies from uniprot protein information for o (tlr _human) based on mutagenesis studies from uniprot protein information for q nr (tlr _human) ace s [ , ] x: - nc_ . [ , [ ] [ ] [ ] ☒ the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐the authors declare the following financial interests/personal relationships which may be considered as potential competing interests:j o u r n a l p r e -p r o o f key: cord- -sdtxi xw authors: yu, ping; hu, ben; shi, zheng-li; cui, jie title: geographical structure of bat sars-related coronaviruses date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: sdtxi xw bats are the natural reservoirs of severe acute respiratory syndrome coronavirus (sars-cov) which caused the outbreak of human sars in – . we introduce the genetic diversity of sars-related coronaviruses (sarsr-covs) discovered in bats and provide insights on the bat origin of human sars. we also analyze the viral geographical structure that may improve our understanding of the evolution of bat sarsr-covs. coronaviruses (covs) are enveloped positive-sense, single-stranded rna viruses belonging to the subfamily coronavirinae, family coronaviridae, in the order nidovirales, which are further divided into four genera, alpha-, beta-, gamma-and deltacoronavirus (de groot et al., ; payne, ) . covs are the pathogenic agents for both avian and mammals, and have a worldwide distribution, usually causing respiratory diseases when infecting humans. in - , a novel coronavirus termed severe acute respiratory syndrome (sars) coronavirus caused > cases of infection with a mortality of approximately %, drawing the attention for covs of zoonotic origin (ksiazek et al., ; peiris et al., ) . subsequently, more covs were identified from humans and different animals, containing human coronavirus nl (hcov-nl ), hcov-hku , middle east respiratory syndrome coronavirus (mers-cov), swine acute diarrhoea syndrome coronavirus (sads-cov), bat-cov hku , bat-cov hku , white-eye coronavirus hku (wecov hku ), sparrow coronavirus hku (spcov hku ), magpie robin coronavirus (mrcov hku ) and so on (raj et al., ; su et al., ; woo et al., ; woo et al., ; zhou et al., ) , indicating that covs have greater diversity and host range than estimated and remain a potential risk for the public health. frequent contacts with humans and animals carrying coronaviruses provide a greater chance to facilitate cross-species viral transmission and emerge new viral variants. in late , sars first emerged in guangdong province in southern china, and rapidly spread to other provinces and other countries, resulting in a global pandemic of severe respiratory diseases (zhong et al., ) . initial investigations and researches indicated that marketplace masked palm civets (paguma larvata) were likely to be the animal origin for sars coronavirus (sars-cov) kan et al., ; song et al., ) , but no sars-cov was detected in farmed or wildcaught civets in the subsequent epidemiological studies, revealing that civets probably served only as intermediate hosts for sars-cov transmission (chan and chan, ; shi and hu, ; tu et al., ) . in , the discovery of novel covs related to human sars-covs in chinese horseshoe bats (genus rhinolophus), named sars-related coronaviruses (sarsr-covs), provided new clue that bats may be the natural host for sars-cov (lau et al., ; li et al., ) . since then, genetically diverse sarsr-covs have been discovered in asia, europe, and africa, including china, south korea, thailand, bulgaria, slovenia, italy, luxembourg, nigeria, and kenya (balboni et al., b; drexler et al., ; he et al., ; lau et al., ; lau et al., ; li et al., ; pauly et al., ; ren et al., ; rihtaric et al., ; yang et al., ; yuan et al., ) . importantly, it was reported that some bat sarsr-covs were able to use angiotensin converting enzyme ii (ace ) from humans, civets and chinese horseshoe bats as a receptor for cell entry (ge et al., ) , further supporting human sars-cov originated from chinese horseshoe bats and suggesting that these sarsr-covs had the ability to infect humans immediately without other intermediate hosts. furthermore, serological evidence by elisa of infection of bat sarsr-covs in human who live close to the bat cave in yunnan, china, where diverse sarsr-covs were detected in bats, suggested the potential spillover of sarsr-covs from bats to humans . sars-cov and sarsr-covs belong to lineage b of genus betacoronavirus in the family coronaviridae and share the same genomic organization with other coronaviruses, including genes coding for nonstructural proteins (nsp, in orf ab domain), the structural proteins like spike protein (s), envelope (e), membrane (m), nucleocapsid (n) and other several genes (perlman and netland, ; woo et al., ) . the major distinction between sars-cov and sarsr-cov genomes lies in the non-structural protein (nsp ), orf , s and orf , among which s gene and orf are the most variable (shi and hu, ; wu et al., ) . the s gene coding for spike protein can be further divided into two subunits s and s , responsible for receptor binding and cellular membrane fusion, respectively (belouzard et al., ) . the s subunit is composed of the n-terminal domain (ntd) and the receptor-binding domain (rbd), the latter of which is critical for host-receptor binding and plays an important role on determining host range (becker et al., ; de haan et al., ; li, ; schickli et al., ; tusell et al., ) . compared with human/civet sars-cov, most known sarsr-covs had two deletions in the rbd domain such as rp (dq ), while a bulgarian strain bm - (gu ) from rhinolophus blasii had only one deletion in that region (drexler et al., ; he et al., ) . several strains like wiv (kf ) had same sequences length with sars-cov in the rbd regions, which were authenticated to be able to use human ace as a cellular entry receptor (ge et al., ; hu et al., ; yang et al., ) . however, these sarsr-covs without any deletions have so far been merely discovered in yunnan, indicating that the origin of the s genes of the immediate ancestors of sars-cov had been restricted in yunnan. the orf was highly variable during the course of the sars epidemic in china (csme, ) . most bat sarsr-covs (except the strain hku - , rs and african and european bat sarsr-covs) and the early human sars-cov contain a single orf (balboni et al., a) . the hku - (gq ) has a nt deletion in the orf gene which subdivides its orf into orf a, b, c. the orf of rs is split into a and b due to a nt deletion in its orf , similar to the orf a/ b of the middle/late human sars-covs with a -nt deletion in the orf . in the european strain bm - , the orf was entirely absent (drexler et al., ; hu et al., ; lau et al., ) . moreover, compared with other bat sarsr-covs, some viruses such as wiv and wiv had an additional orf (named orfx) in their gene organization, involved in modulation of the host immune response (hu et al., ; yang et al., ; zeng et al., ) . sarsr-covs have been detected in bats from a wide range of provinces in china, including guangdong, guangxi, guizhou, hebei, henan, hong kong, hubei, jilin, shaanxi, shanxi, taiwan and zhejiang (table ) . except several from hipposideridae, these viruses were mainly detected in bats from the family rhinolophidae, indicating that they are likely to be natural hosts for sarsr-covs. we collected the full-length rna-dependent rna polymerase (rdrp) sequences of previously reported sarsr-covs and sars-covs retrieved from genbank (table s ). we used the xia' test, phi test/rdp and likelihood mapping analysis to check the saturation index, recombination and phylogenetic signal of our data, respectively before performing the phylogenetic reconstruction (huson and bryant, ; martin et al., ; strimmer and von haeseler, ; xia, ) . subsequently, we constructed a phylogenetic tree using these nucleotide sequences of full-length rdrp gene with the maximum likelihood (ml) method under the gtr + i + Γ model of nucleotide substitution as implemented in phyml (version . ) (guindon et al., ) . optimal model of nucleotide substitution were determined using akaike information criterion (aic) available in jmodeltest (version . . ) (darriba et al., ) . three main lineages were found from that phylogenetic tree when hku - (ef ) was set as a outgroup (fig. a) . the lineage , composed of bat sarsr-covs from the southwestern provinces including yunnan, guizhou and guangxi with human/civet sars-cov. the viruses from other southern regions containing guangdong, hong kong, hubei and zhejiang made up the second lineage (lineage ). the third lineage (lineage ) consisted of the strains from the central and northern areas such as hubei, henan, shanxi, shaanxi, hebei and jilin. although sars first emerged in guangdong province, the lineage sarsr-covs from southwestern china were closer to human sars-cov than other provinces in china including guangdong, indicating guangdong is unlikely to be the geographical origin of sars-cov and the direct progenitor of human sars-cov may have originated from lineage (hu et al., ) . additionally, the sarsr-covs from adjacent provinces grouped together (fig. b) , revealing that similar viruses have circulated in the neighboring provinces. in addition, it is also suggested that the bat hosts of sarsr-covs from southern china were more diversified than those from other locations. coronaviruses are single-stranded rna viruses easy to mutate, which increases the diversity of the species and give them the ability to rapidly adapt to new hosts (longdon et al., ) . nevertheless, the evolution and development of covs were not only the consequence of the coronavirus phylogeny and biology, but also the results of the interaction between covs and their hosts (cui et al., ; graham and baric, ; longdon et al., ; parrish et al., ) . bats are the only mammals naturally capable of true and sustained flight. the bat tagging exercise had shown that the longest distance of the migration of the chinese horseshoe bats is km and other rhinolophus species may migrate up to km for hibernation (lau et al., ) . such migration distance would help the transmission of sarsr-covs carried by bats within a certain geographical range. in order to identify the relationships between bat covs and their hosts, a tanglegram was made connecting the rdrp phylogeny of the sarsr-covs and the cytochrome b (cytb) phylogeny of their hosts ( fig. ; table s ). different bat species in the same location like yunnan, guizhou and zhejiang harbor closely (caption on next page) related sarsr-covs, suggesting the lack of a strict host restriction and the existence of host shift in bat sarsr-covs (cui et al., ) . in addition, host shift mostly happened in different species under the same genus rhinolophus, indicating that genetic distance between hosts as a key factor determines both the host shifts and cross-species transmission. besides, though from same bat species, the sarsr-covs from adjacent provinces clustered, further supporting that the evolution of sarsr-covs were restricted by geography rather than by bat species. recombination plays a significant role in the evolution of virus, which may create emerging virus, expand their host range (graham and baric, ; vennema et al., ) . recombination events have been discovered in sars-cov and bat sarsr-covs (graham and baric, ; hon et al., ) . the two major recombination hotspots between bat sarsr-covs and sars-cov are s gene and orf , which probably contributes to the variability of the two genes (hon et al., ; lau et al., ; wu et al., ) . all the genomic constituents of sars-cov including the hypervariable regions s and orf were discovered from different bat sarsr-covs in the same cave in yunnan, with evidence of recombination events detected between these bat sarsr-covs (hu et al., ) , suggesting that human sars-cov may originate from the recombinant of bat sarsr-covs in this region. the sarsr-covs without any deletion at the rbd domain were only identified in yunnan, so the s genes of human sars-covs were from the recombination of these viruses in yunnan. as recombination occurs frequently among bat sarsr-covs, further genomic characterization of bat sars-covs in a broader range of host species and geographical origin needs to be done to understand the role of recombination plays in the evolution of sarsr-covs. as bats have been identified to be the natural reservoirs of various emerging viruses, the concept of zoonotic origin of important viral pathogens becomes widely accepted (parrish et al., ) .deciphering the evolution of a viral pathogen is vital for us to understand the context of its emergence. although sars were controlled and vanished in , those recently identified sarsr-covs which are able to use human ace receptor have posed a potential risk of future emergence (ge et al., ; graham and baric, ; parrish et al., ) . in particular, the serological evidence of bat sarsr-cov infected in human was discovered in yunnan, suggesting these viruses may have spilled over to human from bats directly or via other intermediate hosts in yunnan. up to present, bat sarsr-covs have been discovered in asia, europe and africa (balboni et al., b; drexler et al., ; he et al., ; lau et al., lau et al., , li et al., ; ren et al., ; rihtaric et al., ; yang et al., ; yuan et al., ) . however, for most of these strains from countries other than china, only partial rdrp fragment were obtained and full-length genome sequences have been determined for only few of them, thus the available genetic information is insufficient to explore the evolution and spread of these sarsr-covs. phylogeny using these short sequences of currently known sarsr-covs indicated that the bat sarsr-covs from china are closer to human sars-cov than those from other countries (ar gouilh et al., ; drexler et al., ; quan et al., ; rihtaric et al., ) , suggesting fig. . phylogenetic analysis of sars-covs and bat sarsr-covs. (a) the phylogenetic tree was constructed using the complete rdrp coding sequences and viewed in itol (http://itol.embl.de/). all strains here were named using abbreviations of virus id and sampling provinces. the strain hku - (nc_ ) was used as a outgroup of that tree. the taxa for lineage , lineage and lineage are highlighted in light red, light green and light blue, respectively. the lineage , lineage and lineage are displayed with colored pentagrams. the taxa for the only european strain european bm - (gu ) is displayed by light purple. the branch of sars-cov is marked in red. these strains from zhejiang were collapsed into a triangle named zj-sl/zhejiang. the viruses from hong kong also were collapsed into a triangle named hku /hong kong. the numbers adjacent to the node represents the bootstrap value of replicates and only bootstrap values ≥ % are shown. that human sars-cov may have originated from china. our analysis revealed that the human sars-cov may have originated from south china including yunnan, guangxi and guizhou, and similar viruses likely circulated in these provinces for an extended time period before eventually emerging in humans. in addition, sarsr-covs clustered according to their geographical location of sampling, indicating that geographical range overlap between hosts is likely to play an important role in shaping the evolution of these viruses (faria et al., ) . co-phylogeny analysis indicated the lack of a host restriction and the existence of frequent host shift in bat sarsr-covs, mainly occurred in horseshoe bats (genus rhinolophus), which may be due to that close relatives of the hosts offer a similar environment for the virus to adapt (longdon et al., ) . however, space presents a greater barrier to virus diversification than host species for the evolution of bat sarsr-covs. most importantly, cross-species transmission and frequent recombination of sarsr-covs within horseshoe bat populations in yunnan could eventually lead to the generation of human sars-cov (graham and baric, ; hon et al., ; hu et al., ) . although rhinolophus species may migrate up to km (lau et al., ) , it is very unlikely for them to migrate a long distance such as from yunnan to guangdong. there are still some gaps needed to be filled in the origin of human sars-cov. given that human sars-cov originated from bats in southwestern china including yunnan, guangxi and guizhou, their transmission and migration to guangdong where human sars first appeared are unclear and needed to be clarified in the future. although the serological evidence of bat sarsr-cov infection was discovered in human living in proximity to the cave where diverse sarsr-covs are circulating , it is unable to judge that the sarsr-covs infecting those human populations are from bats or other animals inhabiting with bats. in short, it is necessary to carry out continuous surveillance of sarsr-covs in different geographical locations targeting different bat species and surrounding animals. sars-cov related betacoronavirus and diverse alphacoronavirus members found in western old-world the sars-like coronaviruses: the role of bats and evolutionary relationships with sars coronavirus a real-time pcr assay for bat sars-like coronavirus detection and its application to italian greater horseshoe bat faecal sample surveys synthetic recombinant bat sars-like coronavirus is infectious in cultured cells and in mice activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites tracing the sars-coronavirus molecular evolution of the sars coronavirus during the course of the sars epidemic in china evolutionary relationships between bat coronaviruses and their hosts jmodeltest : more models, new heuristics and parallel computing coronaviridae. virus taxonomy: ninth report of the international committee on taxonomy of viruses cooperative involvement of the s and s subunits of the murine coronavirus spike protein in receptor binding and extended host range genomic characterization of severe acute respiratory syndrome-related coronavirus in european bats and classification of coronaviruses based on partial rna-dependent rna polymerase gene sequences simultaneously reconstructing viral cross-species transmission history and identifying the underlying constraints isolation and characterization of a bat sars-like coronavirus that uses the ace receptor recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission isolation and characterization of viruses related to the sars coronavirus from animals in southern china new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in china evidence of the recombinant origin of a bat severe acute respiratory syndrome (sars)-like coronavirus and its implications on the direct ancestor of sars coronavirus discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus application of phylogenetic networks in evolutionary studies molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms a novel coronavirus associated with severe acute respiratory syndrome severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related rhinolophus bat coronavirus in china reveal bats as a reservoir for acute, self-limiting infection that allows recombination events severe acute respiratory syndrome (sars) coronavirus orf protein is acquired from sars-related coronavirus from greater horseshoe bats through recombination receptor recognition and cross-species infections of sars coronavirus bats are natural reservoirs of sars-like coronaviruses the evolution and genetics of virus host shifts rdp : detection and analysis of recombination patterns in virus genomes cross-species virus transmission and the emergence of new epidemic diseases. microbiol novel alphacoronaviruses and paramyxoviruses cocirculate with type and severe acute respiratory system (sars)-related betacoronaviruses in synanthropic bats of luxembourg family coronaviridae. in: viruses severe acute respiratory syndrome coronaviruses post-sars: update on replication and pathogenesis identification of a severe acute respiratory syndrome coronavirus-like virus in a leaf-nosed bat in nigeria mers: emergence of a novel human coronavirus full-length genome sequences of two sars-like coronaviruses in horseshoe bats and genetic variation analysis identification of sars-like coronaviruses in horseshoe bats (rhinolophus hipposideros) in slovenia the n-terminal region of the murine coronavirus spike glycoprotein is associated with the extended host range of viruses from persistently infected murine cells a review of studies on animal reservoirs of the sars coronavirus cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment epidemiology, genetic recombination, and pathogenesis of coronaviruses mutational analysis of aminopeptidase n, a receptor for several group coronaviruses, identifies key determinants of viral host range feline infectious peritonitis viruses arise by mutation from endemic feline enteric coronaviruses serological evidence of bat sars-related coronavirus infection in humans comparative analysis of twelve genomes of three novel group c and group d coronaviruses reveals unique group and subgroup features coronavirus diversity, phylogeny and interspecies jumping discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus orf -related genetic evidence for chinese horseshoe bats as the source of human severe acute respiratory syndrome coronavirus dambe : a comprehensive software package for data analysis in molecular biology and evolution novel sars-like betacoronaviruses in bats isolation and characterization of a novel bat coronavirus closely related to the direct progenitor of severe acute respiratory syndrome coronavirus intraspecies diversity of sars-like coronaviruses in rhinolophus sinicus and its implications for the origin of sars coronaviruses in humans bat severe acute respiratory syndrome-like coronavirus wiv encodes an extra accessory protein, orfx, involved in modulation of the host immune response epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china this work was funded by cas pioneer hundred talents program to jc, and wiv "one-three-five" strategic program (wiv- -tp ) to jc and zls. supplementary data to this article can be found online at https:// doi.org/ . /j.meegid. . . . key: cord- -jm lj t authors: uddin, md bashir; hasan, mahmudul; harun-al-rashid, ahmed; ahsan, md irtija; imran, md abdus shukur; ahmed, syed sayeem uddin title: ancestral origin, antigenic resemblance and epidemiological insights of novel coronavirus (sars-cov- ): global burden and bangladesh perspective date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: jm lj t sars-cov- , a new coronavirus strain responsible for covid- has emerged in wuhan city, china and still continuing its worldwide pandemic nature. considering the severity of the disease, a number of studies are underway, and full genomic sequences have already been released in the last few weeks to enable the understanding of the evolutionary origin and molecular characteristics of this virus. bioinformatics analysis, satellite derived imaging data and epidemiological attributes were employed to investigate origin, immunogenic resemblance and global threat of newly pandemic sars-cov- including bangladesh perspective. based on currently available genomic information, a phylogeny study was employed focusing four types of representative viral proteins (spike, membrane, envelope and nucleoprotein) of sars-cov- , hcov- e, hcov-oc , sars-cov, hcov-nl , hku , mers-cov, hku , hku and bufcov-hku . the findings clearly demonstrated that sars-cov- exhibited evolutionary convergent relation with previously reported sars-cov. it was also found that sars-cov- proteins were highly similar and identical to sars-cov proteins, though proteins from other coronaviruses showed lower level of similarity and identical patterns. the cross-checked conservancy analysis of sars-cov- antigenic epitopes showed significant conservancy with antigenic epitopes derived from sars-cov. the study also prioritized the temperature comparison through satellite imaging alongside compiling and analyzing the epidemiological outbreak information on the novel coronavirus based on several open datasets on covid- (sars-cov- ) and discussed possible threats to bangladesh. covid- has opened a new chapter of human civilization with a lots of tragedy stories. a new strain of coronavirus family, novel coronavirus or sars-cov- has emerged and infected thousands of humans. it is gaining importance due to daily increases in the deaths caused by this disease [ ] [ ] [ ] . the virus has already been reported from wuhan (china), thailand, japan, south korea, iran, and the us and is poised to occur in many more areas of the world community causing a pandemic scenario [ ] [ ] [ ] and globally increasing the potential for rapid horizontal spread geographically [ ] . determining the origin, evolution and antigenic resemblance of sars-cov- is urgently needed to study its molecular pathogenesis, perform surveillance, [ , ] were employed for the study. again, as some reports and analyses guessed bats as the probable original host of sars-cov- , we also considered two strains of bat-originated coronavirus (hku and hku ) in this study. from database and literature searches, only a single buffalo-originated coronavirus strain collected from bangladesh (bufcov-hku -m) [ ] was used for the comparative study with covid- strains isolated from wuhan, china [ ] . the global risk of the novel coronavirus (covid- [sars-cov- ]) has recently been addressed by many scientists [ ] [ ] [ ] [ ] . outside china,covid- transmission has been found in over countries and territories [ , ] . the us declared emergency funds because of coronavirus to the countries that are either affected or at high risk of spread, including bangladesh [ ] . as the outbreak of the novel coronavirus (covid- [sars-cov- ]) is expanding rapidly, analysis of epidemiological data of covid- is necessary to explore the measures of burden associated with the disease and to simultaneously gather information on determinants and interventions. therefore, we designed this study to compare the genetic materials of sars-cov- with different previously reported [ ] [ ] [ ] [ ] . accordingly, we also extracted population data of countries and provinces (china) from several websites [ ] [ ] [ ] . the retrieved protein sequences were subjected to multiple sequence alignment (msa) by clustalw [ ] and phylogenetic relationship (maximum parsimony, mp) studies by using mega x [ ] to understand the ancestral origin and antigenic resemblance of sars-cov- with other coronaviruses. in addition, pairwise sequence alignment of sars-cov- proteins with other viral strains was performed by the emboss needle online software, which uses the needleman-wunsch alignment algorithm to find the optimum alignment (including gaps) of two sequences along their entire length [ ] . moreover, sequence alignment was also visualized and analyzed by using jalview software (https://www.jalview.org/). targeting potential antigens from viral proteins is crucial for constructing peptide-based vaccine molecules that can interact with b lymphocytes [ ] . it was reported that peptide flexibility and j o u r n a l p r e -p r o o f journal pre-proof proper surface accessibility are prerequisites for being a potential b cell epitope. considering those parameters, the immunogenic peptide sequences from four types of viral proteins were determined by using the b cell epitope prediction tools of the immune epitope database (iedb) [ ], which employs the bepipred linear epitope prediction method [ ] . the vaxijen v . server (http://www.ddgpharmfac.net/vaxijen/) was used for screening out the most immunogenic peptides determined from iedb [ ] . however, epitope conservancy analysis is an important step to determine the degree of desired epitope distribution in its homologous protein set. in this study, the conservancy pattern of mostly immunogenic b cell peptide sequences of covid- was compared with other homologous sequences retrieved from the ncbi database by using blastp [ ] . moreover, the conservancy study of immunogenic peptides predicted from the sars-cov- proteins was also compared against other human coronavirus strains (hcov- e, hcov-oc , sars-cov, hcov-nl , hku and mers-cov). the epitope conservancy analysis tool (http://tools.iedb.org/conservancy/) of the iedb was used to continue the conservancy analysis [ ]. homology modeling of spike glycoprotein (p dtc ), membrane protein(p dtc ), envelope protein (p dtc ) and nucleoprotein (p dtc ) of sars-cov- was performed by using the i-tasser server [ ] . although d structures were generated by multiple threading alignments in the i-tasser server, refinement was conducted using modrefiner [ ] followed by the fg-md refinement server to improve the accuracy of the predicted d modeled structure [ ]. modrefiner allowed for significant improvements in the physical quality of the local structure based on hydrogen bonds, side-chain positioning and backbone topology of the native-state proteins. again, fg-md, a molecular dynamics-based algorithm for structure refinement, usually works at the atomic level. the refined protein structure was further validated by rampage [ ] and eraat analyses [ ] .structures were visualized and analyzed by pymol [ ] . j o u r n a l p r e -p r o o f we illustrated the number of cases and deaths of sars-cov- in a consecutive way through graphs to elucidate the pattern of occurrence of those outcomes. we covered country-wise cases and deaths, the onset of global and chinese cases by date, the global death toll per day, and province-wise cases and deaths in china. we calculated the crude mortality rate and case fatality according to the formulas suggested by the cdc [ ] as well as jacob and ganguli [ ] . here, we calculated the crude mortality rate for those countries and for chinese provinces, having death records per crore persons, for better interpretation. it is already known that the sars-cov- can multiply even at high temperatures, especially temperatures higher than ° c [ , ] ; however, sars-cov- is rapidly inactivated at °c [ ] . therefore, temperature plays a great role in its multiplication. for this purpose, recent environmental temperature data from the place of first occurrence as well as bangladesh were obtained from landsat- satellite data. this satellite provides high spatial resolution ( m) data at -day intervals. using the brightness temperature of band number (tir- ) and emissivity data temperature (in °c) of bands and (l data users handbook), a large area (a -km-wide swath) can be obtained for a time with minor deviation from in situ temperature data (maximum . degree celsius sd). therefore, cloudless or less cloudy images (less than %) were obtained from the usgs webpage (www.earthexplorer.usg.gov). a maximum of data points were available for one area in each month. however, neighboring path and row image borders shared some common areas, which provided more frequencies for those overlapped areas. level- tier- images, which are radiometrically and geometrically corrected, were used in this study. first, all images fulfilling the cloud-related conditions were downloaded. a total of images covering the land areas of wuhan, china, korea, italy and bangladesh were downloaded. then, dn of band data were converted to emissivity and simultaneously converted to brightness j o u r n a l p r e -p r o o f journal pre-proof temperature by using "equation " [ ] . then, the emissivity was converted to temperature by using "equation " [ ] . the estimated data were obtained by the landsat thermal infrared sensor (tirs) of band . this information was automatically obtained from metadata. the four phylogenetic trees constructed from four types of representative viral proteins (spike, besides, fewer level of similarity and identical patterns were found with other viral strains, including bufcov-hku of bangladesh origin ( table ) . were employed to determine the most antigenic sites by using the b cell epitope prediction tool of iedb and vaxijen scoring. the vaxijen server, which gave a result well above the threshold value ( . ), usually reveals the immunogenic potential to stimulate a protective response in host organisms [ ] . from the analysis, a total of epitopes from s proteins, epitope from m proteins, epitope from e proteins and epitopes from n proteins were found to be mostly immunogenic in sars-cov- , with almost % of peptides carrying more than the threshold value of the antigenic score of the vaxijen server ( and were subjected to conservancy analysis with the immunogenic epitopes from sars-cov- proteins. it was found that antigenic sites are almost conserved in all of the homologous protein sequences deposited in the ncbi database ( table ) . cross-checked conservancy analysis of covid- antigenic epitopes with sars-cov proteins showed that conservancy when crosschecked with other coronaviruses, including bufcov-hku of bangladesh origin, was not significant ( table ) . in china, of provinces experienced deaths from covid- , and the highest death toll occurred in hubei ( , ) province, followed by henan ( ) and heilongjiang ( ); in other provinces, the death toll was below ten up until march (supplementary file ) . upon analysis of mortality data over the time period from january to march , therefore, only the eastern part is shown here. similarly, during the study period in the midregion of korea, the temperature was very low, which was caused by the presence of heavy and widespread clouds in that region during satellite image acquisition. however, very few clouds covers were found for the landsat- image acquisition for february for the italy areas. in almost all areas temperature were lower than °c except a few places where the temperature did not exceeded °c. therefore, interpretations from the figures for these regions should be guarded in order to avoid errors. the novel coronavirus sars-cov- became a pandemic because of its global spread [ ] . as the genetic architecture of sars-cov- was highly divergent from that of bufcov-hku j o u r n a l p r e -p r o o f (figures and many of the scientists and pathologists revealed that high temperature and humidity able to j o u r n a l p r e -p r o o f restrict the spread of covid- and spread of disease will be suppressed as the weather warms [ , ] . this also supports our hypothesis. interestingly, coronaviruses that cause colds do tend to subside in warmer months. however, it is highly uncertain whether sars-cov- will behave the same way. current research by scientists is too early to predict how the virus will respond to changing weather [ ]. immunogenicity and epitope conservancy analyses of coronavirus proteins were performed to determine the potential b-cell epitopes that would interact efficiently with b lymphocytes to initiate the immune response against specific viral pathogens [ ] . the study identified a total of highly immunogenic b-cell epitopes from sars-cov- proteins ( epitopes table ). the antigenic sites of covid- were also crosschecked with other coronavirus-corresponding proteins ( table ) respectively. this calculation agrees with the report of wang et al. [ ] , who stated that the global case fatality was close to %. however, the global case fatalities of sars ( . %) and j o u r n a l p r e -p r o o f world health organization): coronavirus disease (covid- ) situation reports early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia international journal of infectious diseases the continuing -ncov epidemic threat of novel coronaviruses to global health -the latest novel coronavirus outbreak in development of genetic diagnostic methods for novel coronavirus (ncov- ) in japan transmission of -ncov infection from an asymptomatic contact in germany the global spread of -ncov : a molecular evolutionary analysis passengers' destinations from china: low risk of novel coronavirus ( -ncov) transmission into africa and south america a highly conserved wdypkcdra epitope in the rna directed rna polymerase of human coronaviruses can be used as epitope-based universal vaccine design occurrence of foot and mouth disease (fmd) during - in cattle of sirajganj district first genome sequences of buffalo coronavirus from water buffaloes in a novel coronavirus from patients with pneumonia in china nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study potential for global spread of a novel coronavirus from china preliminary assessment of the international spreading risk associated with the novel coronavirus ( -ncov ) outbreak in wuhan city global health policy:covid- coronavirus tracker outbreak of acute respiratory syndrome asssociated with a novel coronavirus . and rampage ramachandran plot analysis of sods in gossypium raimondii and g. arboreum protein-protein docking on molecular models of aspergillus niger rnase and human actin: novel target for anticancer therapeutics pymol: an open-source molecular graphics tool measures of risk, section : mortality frequency measures handbook of clinical neurology the effects of temperature and relative humidity on the viability of the sars coronavirus persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents estimation of sea surface temperature (sst) using split window methods for monitoring industrial activity in coastal area retrieval of sea surface temperature over poteran island water of indonesia with landsat tirs image: a preliminary algorithm immunogenicity prediction by vaxijen: a ten year overview real-time estimation of the risk of death from novel coronavirus (covid- ) infection: inference using exported cases recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission severe acute respiratory syndrome coronavirus (sars-cov- ) temperature, humidity, and latitude analysis to predict potential spread and seasonality for covid- transmissibility of covid- and its association with temperature and humidity in silico vaccine strain prediction for human influenza viruses immunoinformatics approaches for designing a novel multi epitope peptide vaccine against human norovirus (norwalk virus) exploring t & b-cell epitopes and designing multiepitope subunit vaccine targeting integration step of hiv- lifecycle using immunoinformatics approach. microbial pathogenesis significance of rna sensors in activating immune system in emerging viral diseases. dynamics of immune activation in viral diseases world health organization (who) recombinant modified vaccinia virus ankara expressing the spike glycoprotein of severe acute respiratory syndrome coronavirus induces protective neutralizing antibodies primarily targeting the receptor binding region middle east respiratory syndrome coronavirus (mers-cov): mers monthly summary a novel coronavirus outbreak of global health concern human immunopathogenesis of severe acute respiratory syndrome (sars) conceptualization, data curation, formal analysis, investigation, methodology, software, validation, manuscript writing-original draft, review and editing abdus shukur imran:data curation, formal analysis, investigation, methodology, software, validation, manuscript writingoriginal draft formal analysis, methodology, project administration, software, supervision, validation, visualization, manuscript writing-original draft all authors read and approved the final version of the manuscript. the descriptions are accurate and agreed by all authors table : template proteins considered for d homology structure predictions by using i-tasser. table hcov -oc the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.j o u r n a l p r e -p r o o f key: cord- -lxyo z u authors: di martino, barbara; di profio, federica; melegari, irene; sarchese, vittorio; cafiero, maria assunta; robetto, serena; aste, giovanni; lanave, gianvito; marsilio, fulvio; martella, vito title: a novel feline norovirus in diarrheic cats date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: lxyo z u by screening a collection of fecal samples from young cats housed in three different shelters in south italy, noroviruses (novs) were found in / ( . %) specimens of animals with enteritis signs while they were not detected in samples collected from healthy cats ( / ). upon sequence analysis of the short rna-dependent rna polymerase (rdrp) region, the three strains displayed the highest nucleotide (nt) and amino acid (aa) identities to the prototype giv. strain lion/pistoia/ / /ita ( . – . % nt and . – . % aa). the sequence of ~ . -kb portion at the ′ end of the genome of a nov strain, te/ - /ita, was determined. in the full-length orf , encoding the vp capsid protein, the virus was genetically closest to the canine gvi. nov strains c /viseu/ /prt and fd / /ita ( . – . % nt and . – . % aa identities), suggesting a recombination nature, with the cross-over site being mapped to the orf -orf junction. based on the full-length vp amino acid sequence, we classified the novel feline nov, together with the canine strains viseu and fd , as a genotype , within the genogroup gvi. these findings indicate that, as observed for giv nov, gvi strains may infect both the canine and feline host. unrestricted circulation of nov strains in small carnivores may provide the basis for quick genetic diversification of these viruses by recombination. interspecies circulation of novs in pets must also be considered when facing outbreaks of enteric diseases in these animals. noroviruses (novs), caliciviridae family, have been identified as the most common cause of viral gastroenteritis in humans. nov infections affect persons of all age groups and are predominantly transmitted through the fecal-oral route, either indirectly through contaminated food, water or surfaces or directly from person to person (patel et al., ) . drop virions are nonenveloped and approximately to nm in diameter. the rna genome is organized into three open reading frames (orfs) (green, ) . orf encodes a polyprotein that is cleaved by the virus-encoded protease to produce several nonstructural proteins, including the rna dependent rna polymerase (rdrp), orf encodes a major capsid protein (vp ) and orf encodes a small basic protein (vp ) that has been associated with the capsid stability (bertolotti-ciarlet et al., ) . based on the full-length vp amino acid sequence, novs have been divided into six genogroups (gi to gvi) and several genotypes (zheng et al., ; martella et al., ; green, ) . only gi, gii, and giv novs infect humans, with gii strains being the most prevalent worldwide (green, ) . novs genetically similar to human novs have been recently found in dogs and cats (martella et al., (martella et al., , summa et al., ; pinto et al., ; soma et al., ) , raising public health concerns of potential cross-species transmission due to the strict social interaction between humans and pets. feline novs were first detected in the stools of - -week-old kittens from a feline shelter with an outbreak of diarrhea in new york state (pinto et al., ) . in the vp encoding gene, the feline novs displayed the highest amino acid (aa) identity ( . %) to the prototype nov strain giv. /pistoia/ / /ita, detected in a captive lion cub with severe hemorrhagic enteritis (martella et al., ) and to the canine strain giv. /bari- / /it ( . % aa), detected in a young dog with diarrhea (martella et al., ) . using baculovirus-expressed vp of the lion nov strain giv. /pistoia/ / /ita, antibodies specific for giv novs have been identified in . % of cats in italy (di martino et al., ) , providing indirect evidence for the circulation of these novs in felines. in addition, the rna of giv. novs has been detected in . % of fecal samples of cats with enteritis in japan (soma et al., ) . upon genome sequencing, the feline nov strain cat/gvi. /jpn/ /m (takano et al., ) was found to be more similar ( . % aa identity) in the full-length orf to the canine nov gvi. /bari/ / /it (martella et al., ) . infection, genetics and evolution ( ) infection, genetics and evolution j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / m e e g i d altogether these findings indicate that diverse nov strains may infect cats, as observed in dogs, and that the feline and canine host may be infected by the same nov strains, thus constituting an enlarged host reservoir for these animal novs. in order to draw a more complete picture of novs molecular epidemiology in cats, in this study a collection of fecal specimens from diarrheic and healthy animals was screened using either broadly-reactive primers for caliciviruses and primers specific for novs. a total of stool samples from domestic cats aged - months were collected from april to july in three different shelters located in south italy. the fecal panel consisted of samples from cats with signs of mild to severe gastroenteritis and samples from asymptomatic animals. all the samples were stored at − °c until use. fecal specimens ( %) were re-suspended in phosphate-buffered saline ph . , and the debris were removed by centrifugation at × g for min. dna and rna extracts were prepared using the dnaeasy® and qiaamp® viral rna kit (qiagen gmbh, hilden, germany), according to the manufacturer's instructions and stored at − °c until use. to assess the presence of nov rna, the samples were screened using a broadly reactive primer pair, p -p , targeted to highly conserved motifs dyskwdst and ygdd of the rna-dependent rna polymerase (rdrp) region of the polymerase complex (jiang et al., ) . in the samples yielding amplicons of the expected sizes, the presence of nov was confirmed using norovirus-specific primer pair jv y-jv i (vennema et al., ) . all the fecal samples were also tested by pcr or rt-pcr for feline parvovirus (fpv) (buonavoglia et al., ) , feline enteric coronavirus (fecv) (gunn-moore et al., ) and feline kobuviruses (fekov) (di martino et al., ) . the amplicons were excised from the gel and purified using a qiaquick gel extraction kit (qiagen gmbh, hilden, germany). the fragment was then subjected to direct sequencing using bigdye terminator cycle chemistry and dna analyzer (applied biosystems, foster, ca). basic local alignment search tool (blast; http://www.ncbi.nlm. nih.gov) and fasta (http://www.ebi.ac.uk/fasta ) with default values were used to find homologous hits. the sequence of~ . -kb fragment at the ′ end of the genome of one such strain, te/ - /ita, including the partial rdrp gene and the complete orf and orf genes, was determined by ′ race protocol, as previously described (scotto-lavino et al., ) . cdna was synthesized by superscript iii first-strand cdna synthesis kit (invitrogen ltd., milan, italy) with primer qt. pcr was then performed with takara la taq polymerase (takara bio europe s.a.s. saint-germain-en-laye, france) with forward primer p and reverse primers qo and qi. finally, the amplicons were purified and cloned by using topo® xl cloning kit (invitrogen ltd., milan, italy). additional primers were designed to determine the complete . kb sequences by an overlapping strategy (table ) . sequence editing and multiple alignments were performed with the bioedit software package, version . (hall, ) . phylogenetic trees were generated using bayesian analysis with mrbayes (huelsenbeck and ronquist, ; ronquist and huelsenbeck, ) . the appropriate substitution model settings were derived using jmodeltest (posada, ). the sequence obtained was analyzed with simplot (lole et al., ) using a window size of and step size of , with gap strip off and hamming correction on. additionally, recombination analysis was carried out with different algorithms implemented in the recombination detection program v. . (rdp ) (martin et al., ) , with default settings. out of samples, ( . %) contained novs rna, either alone ( . %, / ) or in mixed infections with fekov or fecv ( . %, / ). sixteen samples ( . %) were found to contain fpv dna alone. all the nov positive samples were identified from diarrheic cats with a prevalence rate of . % ( / ), while they were not detected from asymptomatic animals ( / ). by sequence comparison in the short rdrp fragment, the viruses te/ - , me/ - and te/ - /ita shared . - . % nt and . - . % aa identities to each other and displayed the highest identity ( . - . % nt and . - . % aa) to the prototype giv. strain lion/pistoia/ / /ita. for the strain te/ - /ita the sequence of~ . -kb fragment at the ′ end of the genome, including the partial rdrp ( . kb) and the complete orf and orf (genbank accession number kt ), was sequenced and the genome organization was determined. phylogenetic analysis was based on the -nt sequence of the cooh terminus of the polymerase complex of the carnivore nov strains available in the databases. also, rdrp sequences of human giv. novs were included in the analysis and used to calculate a nt identity matrix. by visual inspection of the tree, the carnivore novs segregated in at least three different genetic clusters (fig. ) . the strain cat/te/ - /ita was grouped with the feline nov strains lion/giv. /pistoia/ / /ita, cat/giv. / cu e/usa/ and cat/gvi. /jpn/ /m (martella et al., ; pinto et al., ; takano et al., ) , with a nt identity of . - . %. this group shared identity of . - . % to the recombinant nov dog/gvi. / / /ita and to the giv. strains dog/ / / ita and dog/thessaloniki/ / /gr (martella et al., (martella et al., , ntafis et al., ) , which in turn segregated in a second cluster ( . - . % nt identity). a minor group, distantly related to the feline novs ( . - . % nt identity) and to the canine novs gvi. two additional clusters were resolved in the tree that included, respectively, the human giv. strains detected from stool and sewage samples in different geographic settings (fankhauser et al., ; la rosa et al., ; eden et al., ; ao et al., ; han et al., ) and giv. novs found in sewage samples in italy in - (la rosa et al., . the nt identity between this two groups was . - . %. the orf of strain te/ - /ita was nt in length and encoded a vp capsid protein with a predicted size of aa. orf was nt long and encoded a vp protein of aa. a -nucleotide (nt) overlap was present in the orf -orf junction region. in the complete vp , the strain te/ - /ita was most closely related ( . - . % nt and . - . % aa) to the strains dog/gvi. /c / viseu/ /prt and dog/gvi. /fd / /ita, while identity to the feline strain cat/nov/gvi. /jpn/ /m and to the canine strain gvi. / bari/ / /ita (martella et al., ) was . - . % nt and . - . % aa. strain te/ - /ita displayed b . % aa identity to animal and human giv novs. phylogenetic analysis was performed with a selection of complete capsid sequences representative of the norovirus genus. in the vp -based tree (fig. ) , strain te/ - /ita segregated with the canine novs gvi. /c /viseu/ /prt and gvi. /fd / /ita into genogroup gvi, genotype . a nucleotide identity plot of the genome of strain te/ - /ita was elaborated, in comparison with the canine strain dog/nov/gvi. /c /viseu/ /prt and the feline strain cat/nov/giv. /cu e/usa/ . by simplot (fig. ) and rdp (fig. s ) analyses, a putative recombination break-point event was mapped to orf -orf junction region at nt with a significant statistical support (p b . ). in this study direct evidence was collected for circulation of novs in cats. novs were detected in cats with enteric signs while they were not identified in samples collected from healthy animals used as control study group. experimental inoculation of specific pathogen free cats with the feline gvi. strain jpn/ /m can induce enteritis signs, diarrhea and vomiting (takano et al., ) . although the pathogenic role of nov in cats should be confirmed in larger epidemiological studies /ita. following strictly the outlines of zheng's classification (zheng et al., ) , we classified the novel feline nov, together with the canine strains viseu and fd , as a genotype (n % pairwise aa identity intergenotypes), within the genogroup gvi (n % pairwise aa identity intergenogroups). accordingly, cats and dogs may harbor novs of the same genotypes, giv. , gvi. and gvi. . circulation of novs genetically related in different host species has been already demonstrated. porcine novs cluster in gii (wang et al., ) , but within different genotypes (gii. , gii. , and gii. ) from those infecting humans. giii novs have been detected in large and small ruminants, with giii. and giii. (liu et al., ; oliver et al., ) found in cattle and giii. in sheep (wolf et al., ) . however, unlike small carnivores, circulation of nov strains belonging to the same genotypes in heterologous species has not been reported thus far. this intriguing finding poses several questions. binding of gvi. and gvi novs in dog tissues seems to be mediated by the presence of the h and a antigens of the histo-blood group antigen (hbga) family (caddy et al., ) and therefore to be genetically determined, as observed in humans (marionneau et al., ) . this may suggests that dogs and cats share a similar pattern of hbgas as attachment factor for nov infections. also, virus-like particles of seven different human nov genotypes (gi. , gi. , gi. , gii. , gii. , gii. , and gii. ) have been shown to be able to bind to canine gastrointestinal tissues (caddy et al., ) . it will be interesting to assess whether cats may also be infected by human novs, as observed in dogs (summa et al., ; caddy et al., ) , as this may have implications for the transmission of human novs. recombination among novs of domestic carnivores has been already described. the canine nov strain bari/ / /ita, resembles giv. novs in its polymerase gene while it is genetically unrelated in the vp gene to giv nov (martella et al., ) . the feline nov strain, jpn/ /m (takano et al., ) , possesses a giv. rdrp region, and a gvi. orf related to the canine virus bari/ / /ita ( . % aa identity). in all the cases, the site of recombination was mapped to the orf /orf junction region. this part of nov genome is highly conserved and has been individuated as a preferential recombination site (bull et al., ) . recombination has been shown to strongly influence the evolution and epidemiology of human novs (ambert-balay et al., ; reuter et al., ) and surely poses a challenge for the development of specific diagnostic tools for nov of carnivores. analysis of the rdrp fragment cannot be used to characterize unequivocally these animal novs and a definitive characterization should rely on the orf . the development of molecular assays for caliciviruses and for novs has allowed gathering epidemiological information about these viruses in several animal species, including domestic carnivores. it is now clear that cats and dogs may harbor novs of several genotypes and genogroups, although the clinical relevance of these viruses remains to be investigated. gathering information on the genetic diversity of animal novs will be useful to optimize/develop direct and indirect diagnostic tools, and to investigate more effectively the epidemiology of novs in carnivores. in addition, as novs of carnivores are suspected to have a zoonotic relevance (peasey et al., ; mesquita et al., ; di martino et al., ; caddy et al., ) , this will be useful to understand the extent of inter-species transmission from cat to dogs, and vice versa, from carnivores to humans. supplementary data to this article can be found online at http://dx. doi.org/ . /j.meegid. . . . all authors declare that there are no financial or other relationships that might lead to a conflict of interest. all authors have seen and approved the manuscript and have contributed significantly to the work. characterization of new recombinant noroviruses detection of human norovirus giv. in china: a case report the ′ end of norwalk virus mrna contains determinants that regulate the expression and stability of the viral capsid protein vp : a novel function for the vp protein norovirus recombination in orf /orf overlap evidence for evolution of canine parvovirus type in italy genogroup iv and vi canine noroviruses interact with histo-blood group antigens evidence for human norovirus infection of dogs in the united kingdom seroprevalence of norovirus genogroup iv antibodies among humans detection of feline kobuviruses in diarrhoeic cats detection of antibodies against norovirus genogroup giv in carnivores complete genome of the human norovirus giv. strain lake macquarie epidemiologic and molecular trends of "norwalk-like viruses" associated with outbreaks of gastroenteritis in the united states caliciviridae: the noroviruses caliciviridae: the noroviruses detection of feline coronaviruses by culture and reverse transcriptase-polymerase chain reaction of blood samples from healthy cats and cats with clinical feline infectious peritonitis bioedit: a user-friendly biological sequence alignment and analysis program for windows / /nt detection of norovirus genogroup iv, klassevirus, and pepper mild mottle virus in sewage samples in south korea mrbayes: bayesian inference of phylogeny design and evaluation of a primer pair that detects both norwalk-and sapporo-like caliciviruses by rt-pcr molecular detection and genetic diversity of norovirus genogroup iv: a yearlong monitoring of sewage throughout italy detection of genogroup iv noroviruses in environmental and clinical samples and partial sequencing through rapid amplification of cdna ends molecular characterization of a bovine enteric calicivirus: relationship to the norwalk-like viruses full-length human immunodeficiency virus type genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination norwalk virus binds to histo-blood group antigens present on gastroduodenal epithelial cells of secretor individuals norovirus in captive lion cub genetic heterogeneity and recombination in canine noroviruses detection and molecular characterization of a canine norovirus rdp : a flexible and fast computer program for analyzing recombination novel norovirus in dogs with diarrhea presence of antibodies against genogroup vi norovirus in humans outbreak of canine norovirus infection in young dogs complete genomic characterization and antigenic relatedness of genogroup iii, genotype bovine noroviruses systematic literature review of role of noroviruses in sporadic gastroenteritis seroepidemiology and risk factors for sporadic norovirus/mexico strain discovery and genomic characterization of noroviruses from a gastroenteritis outbreak in domestic cats in the us selection of models of dna evolution with jmodeltest epidemic spread of recombinant noroviruses with four capsid types in hungary mrbayes : bayesian phylogenetic inference under mixed models ′ end cdna amplification using classic race detection of norovirus and sapovirus from diarrheic dogs and cats in japan pet dogs-a transmission route for human noroviruses? molecular characterization and pathogenicity of a genogroup gvi feline norovirus rational optimization of generic primers used for norwalk-like virus detection by reverse transcriptase polymerase chain reaction porcine noroviruses related to human noroviruses molecular detection of norovirus in sheep and pigs in new zealand farms norovirus classification and proposed strain nomenclature this work was supported with funds from the grant "calicivirus nei carnivori e nell'uomo: caratterizzazione molecolare, epidemiologia, implicazioni zoonosiche -prin " ( f p x_ ). key: cord- -wytog cv authors: panda, somnath; banik, urmila; adhikary, arun k. title: bioinformatics analysis reveals four major hexon variants of human adenovirus type- (hadv- ) as the potential strains for development of vaccine and sirna-based therapeutics against hadv- respiratory infections date: - - journal: infect genet evol doi: . /j.meegid. . sha: doc_id: cord_uid: wytog cv human adenovirus type (hadv- ) encompasses – % of all adenoviral respiratory infections. the significant morbidity and mortality, especially among the neonates and immunosuppressed patients, demand the need for a vaccine or a targeted antiviral against this type. however, due to the existence of multiple hexon variants ( hv- to hv- ), the selection of vaccine strains of hadv- is challenging. this study was designed to evaluate hadv- hexon variants for the selection of potential vaccine candidates and the use of hexon gene as a target for designing sirna that can be used as a therapy. based on the data of worldwide distribution, duration of circulation, co-circulation and their percentage among all the variants, hv- to hv- were categorized as the major hexon variants. phylogenetic analysis and the percentage of homology in the hypervariable regions followed by multi-sequence alignment, zpicture analysis and restriction enzyme analysis were carried out. in the phylogram, the variants were arranged in different clusters. the hvr encoding regions of hexon of hv- to hv- showed point mutations resulting in amino acids substitutions. the homology in hvrs was . – %. therefore, the major hexon variants are substantially different from each other which justifies their inclusion as the potential vaccine candidates. interestingly, despite the significant differences in the dna sequence, there were many conserved areas in the hvrs, and we have designed functional sirnas form those locations. we have also designed immunogenic vaccine peptide epitopes from the hexon protein using bioinformatics prediction tool. we hope that our developed sirnas and immunogenic vaccine peptide epitopes could be used in the future development of sirna-based therapy and designing a vaccine against hadv- . in this study, we have analysed all the hexon variants of hadv- based on various criteria, followed by the molecular analysis of their hvrs to assess their appropriateness as potential vaccine candidates. then, we have identified the conserved locations in hvr encoding regions of the hexon gene and fr om those locations we have designed functional sirnas. next, we have also designed immunogenic vaccine peptide epitopes from the hexon protein that can be used to design a vaccine. we anticipate that our developed sirnas and peptide epitopes could be used in the development of sirna-based therapy and designing a future vaccine against hadv- . the amino acid (aa) sequences that include the seven hvrs of hexon variants and prototype strain (gb-genbank accession no. ab ) were collected from ncbi (http://www.ncbi.nlm.nih.gov/). hexon variants were selected based on their duration of circulation, co-circulation and worldwide distribution. the selection process is depicted in fig. . to explore the variations, the aa long sequence (extending from to ) that included the seven hvrs of the gb strain and hv- to hv- were aligned by genetyx software (www.genetyx.co.jp). after alignment the number of aa variations in all the hvrs was observed. then the differences in aa sequences in all the hvrs were tabulated and the percentage of homologies were calculated manually as compared to the gb strain. a phylogenetic tree was constructed with the help of the phylogeny.fr website (http://www.phylogeny.fr/documentation.cgi) [ ] using the "one click mode". it has been designed to provide a high-performance platform that transparently chains programs relevant to phylogenetic analysis in a comprehensive and flexible pipeline. by default, the pipeline is already set up to run and connect programs recognized for their accuracy and speed (muscle for multiple alignment and phyml for phylogeny) to reconstruct a robust phylogenetic tree. the aa long (extending from to ) sequence of hexon variants ( hv- to hv- ) and gb strain were uploaded to the website in fasta format to build the phylogram. the variations among the hvr encoding regions of the hexon gene of hv- to hv- were shown by multi-sequence alignment (msa), in silico re analysis numerous tools are available to design functional sirnas such as mysirna-designer [ ] , sidirect [ ] and simax-sirna designer (https://eurofinsgenomics.eu/en/dnarna-oligonucleotides/custom-dna-rna-oligos/simax-sirna/) etc. in this study, sidirect . (http://sidirect .rnai.jp) has been used to design functional sirnas from the conserved regions found in all hexon variants ( hv- to hv- ). sidirect . algorithm eliminates off-target effects by reflecting the recent finding that the capability of sirna to induce off-target effect is highly correlated to the thermodynamic stability, or the melting temperature (tm), of the seed-target duplex. hence, the selection of sirnas with lower seed-target duplex stabilities (benchmark tm < . °c) minimizes the offtarget effects. it generates and filters sirnas in three selection steps: step involves the selection of highly functional sirnas, step involves the reduction of seed-dependent off-target effects and step involves the elimination of near -perfect matched genes [ ] . we used netmhc . (http://www.cbs.dtu.dk/services/netmhc/) to predict mhc class i binding epitopes. for this, the aa long sequences (extending from to ) that included the seven hvrs of hv- to hv- were uploaded in fasta format. we chose mer peptides as most hla molecules have a strong preference for binding to mer peptides. the peptides were identified as a strong binder if the % rank is below the specified threshold for the strong binders, by default . %. on the other hand, the peptide was identified as a weak binder if the % rank is above the threshold of the strong binders but below the specified threshold for the weak binders, by default %. among the hexon variants, hv- to hv- comprised % of all the variants. we found that they are most prevalent in countries (japan, korea, taiwan, germany, china, usa and india) as depicted in fig. . hence, considering the percentage among the variants, global distribution and duration of circulation, we considered hv - to hv- as the major hexon variants. the alignment data showed remarkable aa sequence variations among the hvrs of hv- to hv- as compared to the gb strain. the highest number of variations was found in hvr followed by variations in hvr (fig. ) . table ) . the different hexon variants ( hv- to hv- ) were segregated into multiple clusters in the phylogenetic tree as shown in fig. . two of the four major hexon variants ( hv- and hv- ) were incorporated in the same cluster. however, hv- and hv- were included in a different cluster. multiple major clusters are formed due to their heterogeneity of the aa sequence in the hvrs. the nt alignment data of the gb strain and the four major hexon variants of hadv- ( hv- to hv- ) showed a total number of point mutation on the seven hvrs (fig. ). there are substantial numbers of conserved regions within the variation in the gene which are depicted in table . the rest of the variants ( hv- to hv- ) also showed similar conserved regions. in silico dna restriction patterns with restriction endonuclease bcci, bcodi, bsp i and bstni clearly differentiated gb from hv- to hv- (fig. a) . similarly, the zpicture analysis also clearly depicted the variations (fig. b) . we found conserved regions in the hvr encoding portion among all hadv - hexon variants as presented in table . for example, the functional sirnas designed from the first sequence is shown in fig. . the functional sirnas designed from the other conserved regions are also shown in the supplementary data . however, we could not obtain any functional sirnas from the conserved region numbers and when the seed-duplex stability (tm) was < . °c. we did not try to design functional sirnas by relaxing the parameters as tm < . °c is the minimum requirement for getting offtarget reduced functional sirnas. mhc class i binding epitopes of mer peptides were predicted from netmhc . . all hla-a alleles were selected for this. a total number of epitopes were predicted for hla-a . among them there were strong binders and weak binders based on their %rank as shown in table . the complete dataset of mhc class i binding epitopes prediction from hv- has been shown in the supplementary data . hadv- respiratory infections have become a global concern, especially among asian countries [ ] . the only adenovirus live vaccine was developed to prevent hadv - and in case of hadv- , the selection of vaccine strains is complicated due to the existence of a large number of hexon variants [ ] . in the present study, we have selected hv- to hv- , as they ) comprise % of all hexon variants, ) have been circulating over longer periods when the circulation of others was shorter and ) were co-circulating among different countries. therefore, we have designated them as the major hexon designing sirnas from the conserved regions to inhibit the viral replication has been used against several pathogenic viruses such as hiv, hcv, hbv, sars coronaviruses [ , , ] . after a successful clinical trial, the sirna-based drug (onpatrro) is now available for the treatment of systemic disease like amyloidosis [ ] . due to the presence of multiple hexon variants of hadv- , we have selected their conserved regions for designing sirnas, as they will work best against all the variants. sirna fro m the conserved region has been successfully used against influenza and hcv [ , , [ ] [ ] [ ] . hexon protein constitutes % of the viral capsomere [ ] . if the hexon gene can be knocked down by sirnas [ ] , the formation of complete viral particles will be prevented and without a complete capsid, hadv will be unable to infect new host cells journal pre-proof [ ] . we have also predicted mhc class i epitopes from the aa sequence of the hvrs ( - aa) of each major hexon variant which will save time and cost of future biological work of vaccine development. in this study, we have found that the major hexon variants ( hv- to hv- ) are the most appropriate vaccine candidates against hadv- and the several conserved regions located in the hvr encoding portion of the hexon gene are the suitable sites for designing sirnas against all the hexon variants. we have designed functional sirnas from those conserved regions and immunogenic vaccine peptide epitopes from the hexon protein. we expect that our findings could pave the way for the development of vaccine and sirna-based therapeutics against hadv- respiratory infections. the genbank accession nos. are mentioned in the parentheses of each selected hexon variant. j o u r n a l p r e -p r o o f molecular characterization of human adenovirus associated with acute respiratory infections in cameroon from to molecular epidemiology and clinical features of adenovirus infection in taiwanese children adenovirus: epidemiology, global spread of novel serotypes, and advances in treatment and prevention genomic diversity of human adenovirus type isolated in fukui, japan over a -year period adenovirus serotype and infection with acute respiratory failure in children in taiwan adenoviruses: update on structure and function analysis of adenovirus hexon proteins reveals the location and structure of seven hypervariable regions containing serotypespecific residues structure-based identification of a major neutralizing site in an adenovirus hexon worldwide increased prevalence of human adenovirus type (hadv- ) respiratory infections is well correlated with heterogeneous hypervariable regions (hvrs) of hexon adenovirus infections in immunocompetent and immunocompromised patients diagnosis and treatment of adenovirus infection in immunocompromised patients drug development against human adenoviruses and its advancement by syrian hamster models short interfering rna-directed inhibition of hepatitis b virus replication stable inhibition of hepatitis b virus proteins by small interfering rna expressed from viral vectors caspase small interfering rna prevents acute liver failure in mice fr: robust phylogenetic analysis for the non-specialist zpicture: dynamic alignment and visualization tool for analyzing conservation profiles mysirna-designer: a workflow for efficient sirna design sidirect . : updated software for designing functional sirna with reduced seed-dependent off-target effect novel computational approaches to developing potential stat silencing sirnas for immunomodulation of atherosclerosis human adenovirus infection in children with acute respiratory tract disease in guangzhou, china guidelines for the selection of highly effective sirna sequences for mammalian and chick rna interference rational sirna design for rna interference an algorithm for selection of functional sirna sequences therapeutic, for hereditary transthyretin amyloidosis role and application of rna interference in replication of influenza viruses silico design and experimental validation of sirnas targeting conserved regions of multiple hepatitis c virus genotypes protection against lethal influenza virus challenge by rna interference in vivo fields virology -nlm catalog -ncbi inhibition of adenovirus infections by sirna-mediated silencing of early and late adenoviral gene functions table : mhc class i binding epitopes as predicted from the aa sequence of the hvr of hv- for hla-a . the peptides with a % rank below . are strong binders and the peptides with a % rank between . - are weak binders. key: cord- - oi slk authors: naguib, mahmoud m.; höper, dirk; arafa, abdel-satar; setta, ahmed m.; abed, mohamed; monne, isabella; beer, martin; harder, timm c. title: full genome sequence analysis of a newly emerged qx-like infectious bronchitis virus from sudan reveals distinct spots of recombination date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: oi slk infectious bronchitis virus (ibv) infection continues to cause economically important diseases in poultry while different geno- and serotypes continue to circulate globally. two infectious bronchitis viruses (ibv) were isolated from chickens with respiratory disease in sudan. sequence analysis of the hypervariable regions of the s gene revealed a close relation to the qx-like genotype which has not been detected in sudan before. whole genome analysis of ibv/ck/sudan/ar – / isolate by next generation sequencing revealed a genome size of , nucleotides harbouring open reading frames: ′- a- b-s- a- b-e-m- b- c- a- b-n- b- ′. highest nucleotide sequence identity of % for the whole genome was found with the chinese ibv strain ck/ch/lhlj/ , the italian ibv isolate ita/ / and the / vaccine strain. phylogenetic analysis of the s gene revealed that the ibv/ck/sudan/ar – / isolate clustered together with viruses of the gi- lineage. recombination analysis gave evidence for distinct patterns of origin of rna in the sudanese isolate in multiple genes. several sites of recombination were scattered throughout the genome suggesting that the sudan-qx-like strain emerged as a unique recombinant from multiple recombination events of parental viruses from / , h and ita/ / genotypes. the sudanese qx-like isolate is plausibly genetically different from ibv strains previously reported in africa and elsewhere. gammacoronavirus in the coronaviridae family (king et al., ) . globally distributed ibv induces an acute, highly contagious infectious disease affecting chickens and results in huge annual losses in the poultry industry. ibv was first reported in north dakota, usa, by schalk and hawn (schalk & hawn, ) as a novel respiratory disease affecting chickens. only chickens and pheasants act as the natural hosts for ibv (ignjatovic & sapats, ) . ibv initially infects the respiratory tract; for some ibv strains further virus spread may involve kidneys and oviduct causing reduction of growth rate, decreased performance and reduction of egg quality and quantity (cavanagh, ) . also another shift of tissue tropism causing proventriculitis has been recorded (yu et al., a) . the infection spreads by aerosols, direct contact and indirectly through contaminated fomites (ignjatovic & sapats, ) . ibv harbors a monopartite rna genome of positive polarity which is approximately . kb in size and codes for four structural proteins: the spike (s) glycoprotein, the membrane (m) glycoprotein, the nucleocapsid (n) phosphoprotein, and the envelope (e) protein (spaan et al., ) . the n protein is a major structural protein, which is highly conserved among different ibv serotypes. the spike (s) glycoprotein, an integral membrane protein, is another major structural protein of the ibv; it is post translationally cleaved into the s (n terminal part) and s fragments. in their matured forms the s constitutes trimers of the globular head while the s forms the trimerized stalk domain of the peplomer spikes in the viral lipid envelop (cavanagh, ; belouzard et al., ) . the s protein carries the receptors binding site and thus plays an important role in tissue tropism and induction of protective immunity (belouzard et al., ; wickramasinghe et al., ) . along the s gene three hypervariable regions (hvrs) are distinguishable that are targets of neutralizing and serotype specific antibodies (moore et al., ; cavanagh et al., ) . variation in these epitopes has been infection, genetics and evolution ( ) - implicated in escape from vaccine-induced immunity (belouzard et al., ) . numerous distinct serotypes have been described which are based on the variability of the s protein sequence that differs by - %, and sometimes up to % between serotypes (adzhar et al., ) . consequently, cross protection between these different serotypes is limited (jackwood, ; wickramasinghe et al., ; cavanagh, ; kuo et al., ) . new s genotypes of ibv that are often also showing antigenic variation and, hence, define new serotypes, appear to emerge frequently in different parts of the world (jackwood, ) . frequent mutations have been made accountable for the emergence and evolution of multiple s variants (jackwood, ; cavanagh et al., ) including point mutations, insertions, deletions, and also recombination between different strains (adzhar et al., ; hewson et al., ) . recombination tends to be not a rare event during ibv replication and the emergence of chimeric viruses harbouring sequences from two or more viruses were reported previously (jia et al., ; abro et al., ) . the qx ibv genotype, recorded for the first time in china in , is associated with renal infections, proventriculitis, or impaired egg production (yu et al., b; liu & kong, ) . this genotype has spread from asia to europe (monne et al., ) , and recently was reported from the southern part of the african continent, namely zimbabwe and south africa (toffan et al., ; abolnik, ) . qx ibv has become the predominant field strain in many asian and european countries de sjaak et al., ) . in the middle east, qx-like strains have been detected in kurdistan-iraq in (amin et al., ) . qx ibv was shown to be antigenically different from both classical vaccine type and other variant strains. (ducatez et al., ). however, a previous study has reported that vaccination using the ma (mass type) and / ( b) can reduce the clinical impact of qx ib virus infections in spf layers and in commercial broiler chickens (terregino et al., ) . in sudan, ibv was recorded for the first time in (elamin et al., ) . serosurveillance studies reported widespread ibv infection in sudan associated with the / virus strain in - (ballal et al., a ballal et al., b) and again in . widespread occurrence in non-vaccinated poultry of different sectors is assumed (selma & ballal, ) . little is known to date on ibv from sudan; in particular, genetic and virological data are missing. in the current study, an ibv isolate from chickens in sudan is characterized as a qx-like ibv recombinant genotype. two ibv viruses (ck/sudan/ar - / and ck/sudan/ar - / ) were isolated from field samples obtained at the national laboratory for veterinary quality control on poultry production (nlqp, ministry of agriculture) from chickens which showed severe respiratory disease in a holding in sudan. viruses were submitted to the friedrich-loeffler-institut, germany, for further molecular and genetic characterization. virus passaging was performed in -day old spf chicken eggs (oie, ). viral rna was extracted using the qiaamp viral rna mini kit (qiagen, hilden, germany) according to the manufacturer's instructions. rna was eluted in μl nuclease-free water, and stored at − °c until use. presence of ibv rna was confirmed using rt-qpcr and conventional rt-pcrs specific for hvrs of the s gene of ibv and sanger sequencing of these regions was performed. primer sequences, amplification and sequencing conditions are available from the author upon request (callison et al., ) . spike gene-specific rt-pcr amplicons were size-separated by agarose gel electrophoresis, excised and purified from gels using the qiaquick gel extraction kit (qiagen). purified pcr products were used directly for cycle sequencing reactions (bigdye terminator v . cycle sequencing kit, applied biosystems, darmstadt, germany). the reaction products were purified using nucleoseq columns (macherey-nagel gmbh & co, düren, germany) and sequenced on an abi prism® genetic analyzer (life technologies, darmstadt, germany). thereafter, megablast (http://blast.ncbi.nlm.nih.gov/blast.cgi) analyses using the s hvrs sequences were carried out. the obtained hvrs sequences of the s gene were assembled and edited using the geneious software, version . . (kearse et al., (kearse et al., - . alignment and identity matrix analyses were performed using mafft (katoh & standley, ) and bioedit (hall, ) . sequences generated in this study were deposited in the genbank database, and assigned accession numbers are shown in table . sequences of other viruses required for further analyses were retrieved from public databases. for maximum likelihood analysis of phylogenetic relationships, a best fit model was chosen first on which further calculations and an ultrafast bootstrap equivalent analysis was based, using the models and algorithms implemented in the iq-tree software version . . (minh et al., ; nguyen et al., ) . trees were finally viewed and edited using figtree v . . software (http://tree.bio.ed.ac.uk/software/ figtree/). for full-genome sequencing of ck/sudan/ar - / , rna was extracted using trizol ls reagent (life technologies) and an rneasy minikit (qiagen) with on-column dnase digestion according to the manufacturer's instructions. rna conversion into double-stranded dna was done using a cdna synthesis system (roche, mannheim, germany) according to the genome sequencer rapid rna library preparation manual (roche, mannheim, germany). library preparation was done as previously described (juozapaitis et al., ) . sequencing was performed with an illumina miseq instrument using the miseq reagent kit version (illumina, san diego, ca, usa). the raw sequencing reads were assembled into a single contig representing the complete ibv genome using the genome sequencer software suite (v . , roche). the sequence was analyzed with blastn (blastn; http://blast.ncbi. nlm.nih.gov/blast.cgi) and orfs were detected and the genome annotation was carried out using geneious software, version . . . the s-orf encoding the spike protein was analyzed and compared using programs implemented in geneious to those of previously reported qx and qxlike strains from asia, europe and africa focusing in particular on the highly variable regions (hvrs). in addition, n-glycosylation sites in the spike protein were predicted by services available on http://www.cbs. dtu.dk/services/netnglyc. phylogenetic trees were also generated based on the obtained full genome sequence and the s gene sequence separately according to methods described in section . . programs embedded in the recombination detection program (rdp ) software suite (martin et al., ) were used to identify recombination events in the full-length ibv genome sequence of isolate ck/ sudan/ar - / through detection of breaking points using specific algorithms implemented in rdp : rdp, genecov, bootscan, maxchi, chimaera, siscan and seq with the highest acceptable p-value adjusted to . . for this purpose, an alignment was produced, as mentioned in the previous section, featuring other complete ibv genomes from north america (usa), asia (china, korea and taiwan), africa (nigeria) and europe (sweden, ukraine and italy) that are relevant as vaccine strains and/or representatives of major ibv lineages. the same alignment was also used to examine nucleotide and amino acid identity for each orf between the ck/sudan/ar - / and other ibv strains. two virus isolates generated from clinical samples collected from diseased chickens in a holding in sudan tested strongly positive in ibv specific rt-qpcrs targeting ′-utr. conventional rt-pcr specific for the hvrs of the s gene amplified fragments of approximately (hvr and ) and (hvr ) nucleotides, respectively. sanger sequencing of the amplicons confirmed the identity of these isolates as ibv (genbank accession number kx - ). nucleotide sequences of the s hvrs were used to conduct a blast search for further characterization. highest identities for the assembled sequences of hvr and − (amino acids to ) were found at the nucleotide level with the ibv isolates slo/ / (slovenia) and kr/ d / (korea) ( %). at the amino acid level, the highest similarities were detected with kr/ / (korea) and rf/ / (russia) ( %). for the hvr amplicon (amino acids to ), the highest identity (n %) was found with ita/ / (italy) and az- / (italy), both at the nucleotide and protein levels. these data revealed that the s gene of the two sudanese ibv isolates clusters with qx and qx-like ibv viruses. in order to confirm these findings, phylogenetic analysis was conducted using a maximum likelihood method (iqtree). the hvr - and hvr nucleotide sequences of the s gene of ibv ck/sudan/ ar - / and ck/sudan/ar - / clustered together with qx and qx-like viruses reported previously in asia, europe, middle east and west and south africa (fig. a, b) . since the hvr sequences of the two viruses were very similar to each other ( . % and % identity for hvr - and hvr , respectively), only one virus, ck/sudan/ar - / , was selected for further analysis by full genome sequencing. the complete genome of the ck/sudan/ ar - / strain as obtained by next generation sequencing was found to be , nucleotides (nt) in length, including both utr ′ and the poly (a) tail. the complete genome sequence of the ck/ sudan/ar - / isolate has been assigned into the genbank sequence database in the national center for biotechnology information (ncbi) accession number kx . the sequence obtained showed a classical ibv genome organization with open reading frames (orfs) in the order ′- a- b-s- a- b-e-m- b- c- a- b-n- b- ′ (table , fig. ). across the whole genome the highest identities were seen with the chinese ibv strain ck/ch/lhlj/ ( %; accession kp ), an italian ibv isolate (ita/ / ; %; accession fn ), and the / vaccine strain ( %; accession kf ). overall nucleotide identities of % and % across the whole genome were obtained compared to the most commonly used ibv vaccine strains, h and m , respectively. however, the s gene revealed identities of only . and . % to these latter strains. similarity searches using each orf of the ibv/ck/sudan/ar - / separately produced variable rankings with other ibv reference and vaccine strains: ibv/ck/sudan/ar - / revealed higher sequence identities of the b, s, a, b, e, m, orfs with qx and qx-like viruses. on the contrary, the a, c, a, b and n orfs showed higher sequence identities with the h , / and ibvukr - ( / like) strains (table ) . orf b revealed overall low identities with all genotypes as shown in table . orf b showed mixed identity values with either qx-like viruses or with / . these findings suggested a possible recombination between different ibv genotypes shaping the genome of the sudanese isolate. phylogenetic analysis based on the full genome sequence revealed a close relation with the qx-like viruses from italy, sweden and south africa (fig. s ) . by phylogenetic analysis based on the whole s -gene according to (valastro et al., ) it appeared that the ibv/ck/sudan/ar - / is closely related to ita/ / and the previously reported recombinant strains detected in south africa (ck/za/ / ) and sweden (ck/swe/ / ) which are clustered together within gi- lineage (fig. ) . the s-orf of the sudanese isolate has a length of nucleotides giving rise to amino acids. the precursor protein harbors an endoproteolytical cleavage site rrrr/s which divides the s-protein precursor into the s and s fragments of amino-terminal and amino acids at the carboxyl terminus, respectively. a total of n-linked glycosylation sites was predicted in the s protein of ck/ sudan/ar /qx strain (s = and s = ) similar to the chinese qx which are possessing seven additional glycosylation sites than the vaccine stain h in positions , , , , , and , and lost two sites at positions and . compared to the chinese qx strain, the sudan isolate ar gained an additional glycosylation site at position , but lost a glycosylation site at position . the employed recombination detection methods embedded in rdp revealed that ck/sudan/ar - / has undergone genetic recombination. three long recombined sequence stretches were identified with high reliability by at least five programs embedded in rdp : the first and second recombination regions were observed at positions - and - in the orf a and orf b genes, and the third recombination region was located in position - involving the or b and s gene. further recombination events in different positions were identified as shown in fig. albeit with lower reliability according to rdp . the results showed that the sudanese isolate was a recombinant virus which probably emerged from at least three different genotypes, including the / genotype as a major parent and the h vaccine strain as well as italy/ / -like viruses as minor parents (fig. ) . different ibv variants are in circulation around the world; some of them show a geographic restriction, others are globally distributed (de sjaak et al., ) . different genotypes of ibv are classified based on the genetic variation of the spike protein, in particular its s fragment (cavanagh, ; belouzard et al., ; valastro et al., ) . the hvr - and hvr regions of the s gene encode serotype specific determinants of ibv and harbor antigenic epitopes important for induction of protection (promkuntod et al., ) . little is known about the epidemiology of avian respiratory diseases in poultry in sudan although respiratory disease continues to threaten commercial poultry in the country. in this study, two ibvs were isolated from commercial broiler farms showing respiratory signs with an increased mortality. flocks were vaccinated with either mass type alone (h or ma ) or with ma and / . partial s gene analysis of the three hvrs has demonstrated that ibv strains ck/sudan/ar - / and ck/sudan/ar - / are related to the qx-like serotype. qx-like ibv had never been reported from sudan. qx variant ibv apparently emerged in china in and reported thereafter in europe as qx-like viruses (monne et al., ; wang et al., ) . recombination in coronaviruses is likely to be occurred by their unique mechanism of rna synthesis involving polymerase jumping fig. . examination of putative recombination events in the genome of the ibv isolate sudan ar - / (query sequence) (a). the analysis was conducted using the recombination detection program v . maximum likelihood trees of the selected recombinant regions were estimated using algorithms embedded in rdp (b, c, d). and discontinuous transcription (lai, ) . different recombination evidences were reported among different ibv field strains through recombination between two or more strains resulting in the emergence of new variants (abro et al., ; abolnik, ; ammayappan et al., ) . further, distinct ibv recombination has been experimentally illustrated in vitro, in ovo and in vivo (wang et al., ) . here, the full-length genome of the ck/sudan/ar - / strain determined by next generation sequencing (ngs) revealed thirteen open reading frames orfs ( ′utr- a- b-s- a- b-e-m- b- c- a- b-n- b- ′utr) showing different spots of variations and recombination with other genotype located in multiple genes orf a, orf band s. the start of the orf a gene revealed a high frequency of recombination events with / . other genes exhibited recombination with the h and italy/ / type ibvs, indicating that recombination might involve more than two strains. taken together, phylogenetic and recombination analyses performed on the complete genome of the sudan/ar - / virus showed that this strain is a mosaic of different parental lineages never described so far. in particular, the virus most likely resulted from a natural recombination event involving at least three distinct ibv variants namely the qx-like, / and h strains. in conclusion, it is noteworthy that there are no records of the presence of a qx-like variant in sudan prior to this study. whatever the way in which this virus has reached sudan, identification of this chimeric qx like virus highlights the need to improve monitoring programs for ibv in sudan and neighboring countries for a better understanding of its epidemiology. in addition, updating of vaccines may be required with further experimental studies to demonstrate the efficacy of the currently used vaccine. supplementary data to this article can be found online at http://dx. doi: . /j.meegid. . . . mahmoud m. naguib, timm harder, abdel-satar a. arafa, conceived the study. dirk höper conducted and interpreted the ngs sequencing. mahmoud m. naguib and timm harder produced, analyzed and interpreted genetic and phylogenetic data. ahmed setta provided and analyzed epidemiological data. timm harder and mahmoud m. naguib drafted the manuscript. all co-authors critically analyzed, revised and finally approved the manuscript. the authors declare no conflict of interest. genomic and single nucleotide polymorphism analysis of infectious bronchitis coronavirus characterization and analysis of the full-length genome of a strain of the european qx-like genotype of infectious bronchitis virus molecular analysis of the /b serotype of infectious bronchitis virus in great britain circulation of qx-like infectious bronchitis virus in the middle east. the veterinary record complete genomic sequence analysis of infectious bronchitis virus ark dpi strain and its evolution by recombination isolation and characterization of infectious bronchitis virus strain / from commercial layer chickens in the sudan serosurveillance study on avian infectious bronchitis virus in sudan mechanisms of coronavirus cell entry mediated by the viral spike protein development and evaluation of a real-time taqman rt-pcr assay for the detection of infectious bronchitis virus from infected chickens severe acute respiratory syndrome vaccine development: experiences of vaccination against avian infectious bronchitis coronavirus coronaviruses in poultry and other birds coronavirus avian infectious bronchitis virus infectious bronchitis virus: evidence for recombination within the massachusetts serotype amino acids within hypervariable region of avian coronavirus ibv (massachusetts serotype) spike glycoprotein are associated with neutralization epitopes characterization of a new genotype and serotype of infectious bronchitis virus in western africa isolation of infectious bronchitis virus from a disease outbreak in chickens in eastern sudan bioedit: a user-friendly biological sequence alignment editor and analysis program for windows / /nt evaluation of a novel strain of infectious bronchitis virus emerged as a result of spike gene recombination between two highly diverged parent strains. avian pathology: journal of the w avian infectious bronchitis virus review of infectious bronchitis virus around the world a novel variant of avian infectious bronchitis virus resulting from recombination among three different strains an infectious bat-derived chimeric influenza virus harbouring the entry machinery of an influenza a virus mafft multiple sequence alignment software version : improvements in performance and usability geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data evolution of infectious bronchitis virus in taiwan: characterisation of rna recombination in the nucleocapsid gene recombination in large rna viruses a new genotype of nephropathogenic infectious bronchitis virus circulating in vaccinated and non-vaccinated flocks in china. avian pathology: journal of the w rdp : detection and analysis of recombination patterns in virus genomes ultrafast approximation for phylogenetic bootstrap qx genotypes of infectious bronchitis virus circulating in europe identification of amino acids involved in a serotype and neutralization specific epitope within the s subunit of avian infectious bronchitis virus iq-tree: a fast and effective stochastic algorithm for estimating maximum likelihood phylogenies avian infectious bronchitis virus mapping of the receptor-binding domain and amino acids critical for attachment in the spike protein of avian coronavirus infectious bronchitis virus an apparently new respiratory disease of baby chicks seroprevalence of selected avian pathogens of backyard poultry in sinar sudan infectious bronchitis virus variants: a review of the history, current situation and control measures coronaviruses: structure and genome expression pathogenicity of a qx strain of infectious bronchitis virus in specific pathogen free and commercial broiler chickens, and evaluation of protection induced by a vaccination programme based on the ma and / serotypes qx-like infectious bronchitis virus in africa. the veterinary record s gene-based phylogeny of infectious bronchitis virus: an attempt to harmonize virus classification experimental confirmation of recombination upstream of the s hypervariable region of infectious bronchitis virus isolation and identification of glandular stomach type ibv (qx ibv) in chickens binding of avian coronavirus spike proteins to host factors reflects virus tropism and pathogenicity the avian coronavirus spike protein a reverse transcriptase-polymerase chain reaction survey of infectious bronchitis virus genotypes in western europe from characterization of three infectious bronchitis virus isolates from china associated with proventriculus in vaccinated chickens molecular epidemiology of infectious bronchitis virus isolates from china and southeast asia the authors thank diana wessler and aline maksimov, fli, germany, for excellent technical support. we are grateful to colleagues and coworkers at nlqp, cairo, egypt, contributing to sample preparation. m. naguib is recipient of a doctoral scholarship from the german academic exchange service (daad grant no. a/ / ). key: cord- - jsj sb authors: bodnar, livia; lorusso, eleonora; di martino, barbara; catella, cristiana; lanave, gianvito; elia, gabriella; bányai, krisztián; buonavoglia, canio; martella, vito title: identification of a novel canine norovirus date: - - journal: infect genet evol doi: . /j.meegid. . . sha: doc_id: cord_uid: jsj sb by screening a collection of fecal samples from young dogs from different european countries, noroviruses (novs) were found in / ( . %) animals with signs of enteritis whilst they were not detected in healthy dogs ( / ). an informative portion of the genome ( . kb at the ′ end) was generated for four nov strains. in the capsid protein vp region, strains . / /ita and fd / /ita were genetically related to the canine gvi. strain c /viseu/ /prt ( . – . % nt and . – . % aa). strain fd / /ita displayed the highest identity to the gvi. canine strain bari/ / /ita ( . % nt and . % aa). strain / /ita displayed only . – . % nt and . – . % aa identities to the gvi. canine strains fd / /ita and bari/ / /ita and the gvi feline strain m - / /jpn. identity to the other canine/feline novs strains in the vp was lower than . % nt and . % aa. based on the full-length vp amino acid sequence and the criteria proposed for distinction of nov genotypes, the canine nov / /ita could represent the prototype of a third gvi genotype, thus providing further evidence for the genetic heterogeneity of novs in carnivores. noroviruses (novs) are members of the caliciviridae family, and are recognized as one of the leading causes of acute gastroenteritis in humans (atmar and estes, ; glass et al., ) . epidemics associated with novs occur frequently in hospitals, cruise ships and childcare centers (glass et al., ; hall et al., ) and nov infection has become a major public health concern (siebenga et al., ; vega et al., ) . novs have also been identified in several mammalian species, including cows, pigs, lion, cats, dogs, bats, sea lions and harbor porpoises (degraaf et al., ; li et al., ; martella et al., martella et al., , pinto et al., ; scipioni et al., ; sugieda et al., ; wu et al., ) . novs are small, non-enveloped round viruses of approximately to nm in diameter with a positive-sense rna genome. nov genome is organized into three open reading frames (orfs) (green, ) . orf encodes a polyprotein that is cleaved by the virus-encoded protease to produce several nonstructural proteins, including the rna dependent rna polymerase (rdrp). orf encodes a major capsid protein (vp ) and orf encodes a small basic protein (vp ) that has been associated with capsid stability (bertolotti-ciarlet et al., ) . based on the full-length vp amino acid sequence, novs can be classified into at least seven genogroups (gi to gvii) and n genotypes (green, ; kroneman et al., ; vinjé, ; zheng et al., ) . only gi, gii, and giv novs infect humans, with gii strains being the most prevalent worldwide (green, ) . novs genetically similar to human novs have been recently found in carnivores (martella et al., (martella et al., , pinto et al., ; soma et al., ; summa et al., ) , raising public health concerns of potential cross-species transmission due to the strict social interactions between humans and pets. interestingly, significant sequence variation has been found across different canine/feline nov strains identified to date, allowing the classification of at least genotypes from three distinct genogroups, i.e. giv. , gvi. and gvi. and gvii ( table ). the first canine nov was reported in a month-old dog with a fourday history of gastroenteritis and in co-infection with canine parvovirus type- (cpv- ) in in italy (martella et al., ) . since then, canine novs have been reported in several countries in europe, the americas and asia (azevedo et al., ; mesquita et al., ; nascimento, a, b; ntafis et al., ; soma et al., ; tse et al., ) but the pathogenic role of novs in dogs has not been demonstrated firmly. the prevalence of canine nov in dogs with clinical signs of gastroenteritis has been estimated to vary from . % (martella et al., ) to % (mesquita et al., ) . a study in the us identified canine nov at a prevalence of % in canine diarrhea samples (azevedo et al., ) . canine nov has also been detected in healthy dogs (mesquita et al., ) , a pattern that is commonly observed in human nov infections (ozawa et al., ) . serological investigations have confirmed the presence of nov-specific antibodies in dogs, with various prevalence rates. in a small serological survey in italy, antibodies to giv. nov were identified in % of the dogs (di martino et al., ) . in a study in uk, antibodies specific for different nov genotypes were identified in . % of samples collected between and and . % of samples collected in - (caddy et al., ) . a european-wide serological investigation revealed antibodies to gvi. nov in % of the analysed dogs (mesquita et al., ) . in order to draw a more complete picture of nov molecular epidemiology in dogs, in this study a collection of fecal specimens from diarrheic and healthy animals obtained from different european countries was screened using both broadly-reactive primers for caliciviruses and primers specific for novs. the sequence of a large portion of the genome at the ′ end of two canine nov strains was determined. the sequence of an additional two canine nov strains identified in in italy was determined and analysed in detail. during the years of , , and , a total of dog fecal samples (n = from dogs with mild to severe gastroenteritis and n = from asymptomatic animals) were collected from domestic animals aged - months of four european countries (germany (n = ), france (n = ), spain (n = ) and italy (n = ). the samples were submitted to the laboratory of animal infectious diseases, department of veterinary medicine, university of bari, italy, for diagnostic examinations. all the samples were stored at − °c until use. in addition, two stool samples positive for nov and collected in in italy were included in the analysis. fecal specimens were diluted % w/v in phosphate-buffered saline, ph . , and the debris were removed by centrifugation at ×g for min. viral nucleic acid was extracted from μl of each clarified stool suspension by the qiaamp cador pathogen mini kit (qiagen s.p.a., milan, italy), following the manufacturer's protocol and the nucleic acid templates were stored at − °c until use. to assess the presence of caliciviruses (cvs), the samples were screened using a broadly reactive primer pair, p -p , amplifying a band of bp for nov and a band of bp for vesivirus and sapovirus. the primers are targeted to the highly conserved motifs dyskwdst and ygdd of the rna-dependent rna polymerase (rdrp) region of the polymerase complex (jiang et al., ) . in the samples yielding amplicons of the expected sizes, the presence of nov was confirmed using the nov-specific primer pair jv y-jv i (vennema et al., ) . all the fecal samples collected from dogs with enteritis signs were also tested for the presence of common canine viral pathogens such as cpv- and canine coronavirus (ccov) either by gel-based pcr or by quantitative pcr and rt-pcr (buonavoglia et al., ; decaro et al., ; pratelli et al., ; vennema et al., ) . the amplicons were excised from the gel and purified using the qiaquick gel extraction kit (qiagen gmbh, hilden, germany). the amplicons were subjected to direct sequencing using bigdye terminator cycle chemistry and dna analyzer (applied biosystems, foster, ca). basic local alignment search tool (blast; http://www.ncbi. nlm.nih.gov) and fasta (http://www.ebi.ac.uk/fasta ) with default values were used to find homologous hits. the sequence of~ . -kb fragment of the nov genome (the ′ end of orf , the full-length orf , orf , and the non-coding region through the poly-a tail of canine (zheng et al., ) . , ) . cdna was synthesized by super script iii first-strand cdna synthesis kit (invitrogen ltd., milan, italy) with primer qt. pcr was then performed with takara la taq polymerase (takara bio europe, saint-germain-en-laye, france) with forward primer jv y and reverse primer jv i. the cdna was purified and cloned by using topo® xl cloning kit (invitrogen ltd., milan, italy). additional primers were designed to determine the complete . -kb sequences by a primer walking strategy (table ) . sequence editing and multiple alignments were carried out with the geneious software package version . . (biomatters ltd., new zealand). genome sequences of novs strains were retrieved from genbank and aligned using clustal w (larkin et al., ) . phylogenetic trees were generated using bayesian analysis with mrbayes (huelsenbeck and ronquist, ; ronquist and huelsenbeck, ) . using the consensus primer pair p -p , cv rna was found in of samples ( . %). a total of samples generated amplicons of the expected size for nov ( bp) and were subsequently confirmed to contain nov rna using nov-specific primers, either alone ( . %, / ) or in mixed infections with cpv- ( . %, / ), ccov ( . %, / ), canine sapovirus ( . %, / ) and vesivirus ( . %, / ). seventeen samples ( . %, / ) were found to contain canine sapovirus alone ( . %, / ) or in mixed infection with cpv- ( . %, / ), ccov ( . %, / ), cpv- /ccov ( . %; / ) and nov ( . %, / ). all the nov and sapovirus positive samples were identified from diarrheic dogs with a prevalence rate of . % ( / ) for novs and . % ( / ) for sapovirus, whilst they were not detected from asymptomatic animals ( / ). eleven nov-positive samples were identified in the italian collection of samples ( / , . %), whilst two nov-positive samples were identified in the spanish collection ( / , . %). nov rna was not detected in the german ( / ) and french ( / ) samples. the sequence of~ . -kb fragment at the ′ end of the genome, including the partial rdrp and the complete orf and orf of the strains . / /ita, / /ita, fd / /ita and fd / /ita was determined and then submitted to genbank under accession no. jf , jf , ky and ky . the sequences were analysed with cognate sequences of animal and human novs reference strains available in the genbank database. as shown in displayed between the canine strains from this study and other animal and human giv novs was b . % aa and it was b . % aa to the chinese gvii canine nov strains. screening by rt-pcr of a collection of fecal samples obtained from young dogs (aged - months) with gastroenteritis identified canine novs in . % of the symptomatic dogs, either alone or in mixed infectious with cpv- , ccov, canine sapovirus and vesivirus, but not in fecal samples from clinically healthy dogs. in previous molecular investigations canine novs have been detected in diarrheic stool samples with a prevalence rate ranging from . % (martella et al., ) , to . % (azevedo et al., ) and to % (mesquita et al., ) . canine novs have also been detected in healthy dogs with a prevalence of . % (mesquita et al., ) . accordingly, the pathogenic role of these viruses in dog enteritis is not firmly established yet. the prevalence rate of . % observed in this study was lower than those reported in the us and portugal (azevedo et al., ; mesquita et al., ) , although different primer sets were used in the various studies. in our investigation, we preferred using broadly reactive primers designed to amplify all the cvs, p -p (jiang et al., ) , and subsequently a set of broadly reactive primers able to amplify the majority of nov strains, jv y-jv i (vennema et al., ) . this algorithm, while allowing us to identify genetically diverse nov strains, was likely less sensitive than other rt-pcr assays based on specific primers. the majority of the nov-positive samples were identified in the italian collection of samples ( / , . %), whilst nov-positive samples were identified in the spanish collection ( / , . %). mixed infections were identified but they were not unexpected as this eventuality is not infrequent. a limit of our study was surely the non-homogenous size of the sample collections, chiefly the small number of samples analysed for the non-italian sample collections. nevertheless, to our knowledge, this is the first report on the presence of nov in dogs in spain. a fragment of about . -kb at the ′ end of the genome of four samples was used for analysis and comparison with a large selection of nov strains belonging to all genogroups available on database. in the fulllength vp capsid gene (orf ), two strains (fd / /ita and . / /ita) were characterized as gvi. , whilst strain fd / /ita was characterized as gvi. . a fourth strain, / /ita, although segregating within genogroup gvi, was more diverse genetically in the vp and could not be classified into any established gvi genotypes. according to zheng's classification (zheng et al., ) , the canine nov strains . / /ita and fd / /ita were classified together with the canine strain c /viseu/ /prt and the feline strain te/ - /ita, as a genotype (n % pairwise aa identity inter genotypes), within the genogroup gvi (n % pairwise aa identity intergenogroups). the canine strain / /ita was also classified into genogroup gvi, but the aa identity to its closest relatives (the canine gvi. strains bari/ / /ita and fd / /ita) and the gvi feline strain m - / /jpn was lower ( . - . % aa) than the % pairwise aa identity cut-off, proposed to discriminate between different nov genotypes. thereby, strain / /ita was tentatively classified as a third genotype (gvi. ) within the genogroup vi. interestingly, in our analysis the canine gvi. strains fd / /ita, bari/ / / ita and / /ita appeared equally distant from the feline nov strain m - / /jpn ( . - . % aa) . this feline strain (genbank accession number lc ) is registered in ncbi sequence database as a gvi. strain, although it does not fit the criteria for being assigned to this genotype and it was tentatively proposed as a fourth gvi genotype, gvi. (tables and ; fig. ). novs genetically related have been detected in different host species. gii nov have been found in human and pigs, although porcine novs belong to different genotypes (gii. , gii. and gii. ) (wang et al., ) . giii nov infect ruminants, with genotypes giii. and giii. (liu et al., ; oliver et al., ) being identified in cattle and genotype giii. in sheep (wolf et al., ) . however, circulation of novs strains belonging to the same genotype in different host species seems rather uncommon. by converse, it seems clear that carnivores (canids and felids) may be infected by nov strains with the same capsid or polymerase (pol) types (table ) . binding of human novs to human tissues seems to be mediated by antigens of the histo-blood group antigen (hbga) family and therefore to be genetically determined (marionneau et al., ) . likewise, binding of giv. and gvi novs in dog tissues seems mediated by antigens of the hbga family (caddy et al., ) . interestingly, virus-like particles (vlps) of seven different human nov genotypes (gi. , gi. , gi. , gii. , gii. , gii. , and gii. ) were also found to bind to canine gastrointestinal tissues (caddy et al., ) and human nov strains of genotypes gii. and gii. were detected from infected dogs (summa et al., ) . whether dogs and cats share similar hbgas that can be used as attachment factors by nov could be hypothesized. alternatively, carnivore hbgas could be more "permissive", i.e. they could allow binding of novs of different genotypes. regardless, this intriguing phenomenon could be a key factor explaining the apparent genetic diversity of carnivore novs. based on sequence and phylogenetic analysis of the ′ partial sequence of orf spanning -nt at the cooh terminus of the rdrp, different pol genetic lineages could be distinguished among canine/feline giv and gvi novs that were indicated, for the limited purpose of this study, with the letters "a" to "d". table ). a unique lineage (pol a) was formed by feline novs of genogroup giv and gvi and by the lion nov strain giv. /pistoia/ /ita. interestingly, inconsistencies were observed between the pol-and capsid-based phylogenies, which are suggestive of potential recombination events (table ) . this phenomenon has been reported repeatedly when studying novs from carnivores (di martino et al., ; martella et al., ; takano et al., ) . the canine nov strain bari/ / /ita, resembles giv. novs in its pol gene (pol b) whilst it possesses a gvi. capsid (martella et al., ). the feline nov strain, m - / /jpn possesses a giv. pol (pol a), and a gvi capsid (takano et al., ) , whilst the feline strain te/ - / /ita possesses a giv. pol (pol a) and a gvi. capsid (di martino et al., ) . in all the cases, the recombination site was located within the orf /orf overlap, a region highly conserved across all nov sequences (bull et al., ) . rna recombination is a major force driving viral evolution (worobey and holmes, ; lai, ) . recombination in viruses may substantially affect phylogenetic groupings, confusing molecular epidemiologic studies, and also can have major implications in vaccine design (bull et al., ) . a recombinant nov can be defined as a strain that clusters with distinct groups of nov strains when different regions (normally the capsid and polymerase) of the genome are subjected to phylogenetic analysis (bull et al., ) . accordingly, analysis of the diagnostic regions located on the rdrp cannot be used to characterize unequivocally these animal novs and a definitive characterization should rely on the orf as proposed for humans novs strains (kroneman et al., ) . epidemiological studies are necessary to understand the relevance of newly discovered pathogens. these studies are important as they allow us to obtain information on the genetic diversity of viruses, a step that is propaedeutic to the development of accurate diagnostic tools and necessary for the understanding of viral evolution. analysis of novs identified in carnivores thus far is unveiling a marked genetic diversity and suggests that the evolution of canine and feline nov is tightly intermingled. this could have implications when enacting prophylaxis measures in animal populations (such as humane shelters, kennels or breeding facilities), as cats and dogs may be infected by a number of common pathogens (decaro et al., ; di martino et al., martella et al., ; pratelli et al., ) . finally, as some animal viruses may be transmitted to humans, generating new threats for human health, defining a baseline of the genetic diversity on animal viruses is important, to understand better and readily identify the origin of novel human viruses. the epidemiologic and clinical importance of norovirus infection detection of norovirus in dogs in arkansas the ′ end of norwalk virus mrna contains determinants that regulate the expression and stability of the viral capsid protein vp : a novel function for the vp protein norovirus recombination in orf /orf overlap evidence for evolution of canine parvovirustype in italy serological evidence for multiple strains of canine norovirus in the uk dog population genogroup iv and vi canine noroviruses interact with histo-blood group antigens evidence for human norovirus infection of dogs in the united kingdom new approaches for the molecular characterization of canine parvovirus type strains characterisation of canine parvovirus strains isolated from cats with feline panleukopenia norovirus infection in harbor porpoises characterization of a strain of feline calicivirus isolated from a dog faecal sample detection of antibodies against norovirus genogroup giv in carnivores a novel feline norovirus in diarrheic cats norovirus gastroenteritis caliciviridae: the noroviruses: specific virus families caliciviridae: the noroviruses acute gastroenteritis surveillance through the national outbreak reporting system, united states mrbayes: bayesian inference of phylogenetic trees design and evaluation of a primer pair that detects both norwalk-and sapporo-like caliciviruses by rt-pcr proposal for a unified norovirus nomenclature and genotyping rna recombination in animal and plant viruses clustal w and clustal x version . the fecal viral flora of california sea lions molecular characterization of a bovine enteric calicivirus: relationship to the norwalk-like viruses norwalk virus binds to histoblood group antigens present on gastroduodenal epithelial cells of secretor individuals analysis of the capsid protein gene of a feline-like calicivirus isolated from a dog norovirus in captive lioncub detection and molecular characterization of a canine norovirus genetic heterogeneity and recombination in canine noroviruses molecular epidemiology of canine norovirus in dogs from portugal gastroenteritis outbreak associated with faecal shedding of canine norovirus in a portuguese kennel following introduction of imported dogs from russia novel norovirus in dogs with diarrhea seroprevalence of canine norovirus in european countries an outbreak of canine norovirus infection in young dogs complete genomic characterization and antigenic relatedness of genogroup iii, genotype bovine noroviruses norovirus infections in symptomatic and asymptomatic food handlers in japan discovery and genomic characterization of noroviruses from a gastroenteritis outbreak in domestic cats in the us fatal coronavirus infection in puppies following canine parvovirus b infection genetic diversity of a canine coronavirus detected in pups with diarrhoea in italy mrbayes : bayesian phylogenetic inference under mixed models animal noroviruses ′ end cdna amplification using classic race norovirus illness is a global problem: emergence and spread of norovirus gii. variants detection of norovirus and sapovirus from diarrheic dogs and cats in japan detection of norwalk-like virus genes in the caecum contents of pigs pet dogs -a transmission route for human noroviruses? molecular characterization and pathogenicity of a genogroup gvi feline norovirus complete genome sequences of novel canine noroviruses in hong kong genotypic and epidemiologic trends of norovirus outbreaks in the united states rational optimization of generic primers used for norwalk-like virus detection by reverse transcriptase polymerase chain reaction advances in laboratory methods for detection and typing of norovirus porcine noroviruses related to human noroviruses molecular detection of norovirus in sheep and pigs in new zealand farms evolutionary aspects of recombination in rna viruses deciphering the bat virome catalog to better understand the ecological diversity of bat viruses and the bat origin of emerging infectious diseases norovirus classification and proposed strain nomenclature this study was supported by grants from the institution capes, brazil, from the momentum program, hungary, and from the action "future in research" of apulian region, italy.