key: cord-0815550-dtvlo0kz authors: Satyam, Rohit; Jha, Niraj Kumar; Kar, Rohan; Jha, Saurabh Kumar; Sharma, Ankur; Kumar, Dhruv; Nand, Parma; Ruokolainen, Janne; Kesari, Kavindra Kumar; Kamal, Mohammad Amjad title: Deciphering the SSR incidences across viral members of Coronaviridae family date: 2020-09-21 journal: Chem Biol Interact DOI: 10.1016/j.cbi.2020.109226 sha: ce23cb61267dddb318cc323901842403cb1a6c7a doc_id: 815550 cord_uid: dtvlo0kz Presence of Simple Sequence Repeats (SSRs), both in genic and intergenic regions, have been widely studied in eukaryotes, prokaryotes, and viruses. In the current study, we undertook a survey to analyze the frequency and distribution of microsatellites or SSRs in multiple genomes of Coronaviridae members. We successfully identified 919 SSRs with length≥12 bp across 55 reference genomes majority of which (838 S SRs) were found abundant in genic regions. The in-silico analysis further identified the preferential abundance of hexameric SSRs than any other size-based motif class. Our analysis shows that the genome size and GC content of the genome had a weak influence on SSR frequency and density. However, we find a positive correlation of SSRs GC content with genomic GC content. We also report relatively low abundances of all theoretically possible 501 repeat motif classes in all the genomes of Coronaviridae. The majority of SSRs were AT-rich. Overall, we see an underrepresentation of SSRs across the genomes of Coronaviridae. Besides, our integrative study highlights the presence of SSRs in ORF1ab (nsp3, nsp4, nsp5A_3CLpro and nsp5B_3CLpro, nsp6, nsp10, nsp12, nsp13, & nsp15 domains), S, ORF3a, ORF7a, N & 3′ UTR regions of SARS-CoV-2 and harbours multiple mutations (3′UTR and ORF1ab SSRs serving as major mutational hotspots). This indicates the genic SSRs are under selection pressure against mutations that might alter the reading frame and at the same time responsible for rapid protein evolution. Our preliminary results indicate the significance of the limited repertoire of SSRs in the genomes of Coronaviridae. Coronaviruses are known to cause mild to severe respiratory, gastrointestinal, and central nervous system infections both in humans and other vertebrates (Weiss & Navas-Martin, 2005) . The viruses weren't considered highly pathogenic until the 2003 outbreak of SARS (Severe Acute Respiratory Syndrome) (Xing et al., 2010) followed by the emergence of MERS (Middle East Respiratory Syndrome) in Middle Eastern countries (De Wit et al., 2016) and SARS-CoV-2 outbreak of 2019 (Guo et al., 2020) . The viruses are the members of Coronaviridae family (order Nidovirales) and have host preferences; for instance, Alpha & Beta-coronaviruses predominantly infect mammals whereas Gamma & Delta majorly infect birds (and sporadically mammals). Bats are believed to be the largest reservoirs of diverse coronaviruses than animal species; domestic and poultry animals being the intermediate hosts, that cause zoonotic transmission of virus finally to the humans (Vijaykrishna et al., 2007) . The ecological distribution, evolution, and spillover events of various coronaviruses have been extensively reviewed in some recent reports (Cui et al., 2019; Drexler et al., 2014; Shereen et al., 2020; Tang et al., 2020) . Simple sequence repeats (SSRs), refer to tandem repetitions of mono-, di-, tri-, tetra-, penta-and hexanucleotide sequence units of a genome and are widely reported to be the most variable type of short motifs within the viral genome. They are ubiquitously present in a variety of genomic regions including the 3′-UTRs, 5′-UTRs (Untranslated Regions), genic (coding regions), and intergenic regions (non-coding regions) thereby conferring to diverse roles across viral species (Zhao et al., 2012) . SSRs have been widely exploited as neutral markers in multitudes of studies such as ecology and evolutionary genetics, genome mapping, etc. irrespective of their hypermutablility (Tsykun et al., 2017; Vieira et al., 2016) . SSRs are characterized by their inherent ability to cause frameshift mutations in genomic regions encoding phenotypic changes and therefore, confer an adaptive advantage in the course ofviral mutations (Atia et al., 2016; Y. C. Li et al., 2002; Lin & Kussell, 2012) . Their highly polymorphic nature results in gain/loss of repeat motifs which makes them altogether important to study the genome evolution. Despite the deluge of viral genomes in the public databases, the SSR incidences/abundances and their relevance in viral genomes have been given a little attention including coronaviruses in particular. Elucidating the SSR landscape in viral members of Coronaviridae and their prospect relevance in evolution and pathogenesis, therefore, became crucial in the current scenario of the COVID-19 outbreak (Kiselev et al., 2020) . Thus, the aims of the current study were 1) to analyze various facets of the distribution and dynamics of SSRs in the genomes of Coronaviridae members, 2) to identify patterns of SSR incidences across genomes, if any i.e the underrepresentation/overrepresentation of specific repeat motif classes, 3) the preferential genomic localization of SSRs & 4) to investigate if SSRs serves as mutation hotspots in SARS-CoV-2, a novel SARS strain causing COVID-19 outbreak. The outcome of our study suggests that SSRs are generally underrepresented in Coronaviridae members and are characterized by low GC content. Additionally, the attributes of SSRs across genomes under study were quite similar in terms of length (preferentially found to be 12-13 nucleotides long with polyA repeats of varying lengths), GC composition, abundance (SSR frequency didn't exceed 2 irrespective of genome size) and localization. The trends highlighted in the current study are repercussions of the differences in the Coronaviridae genome organization and could serve as pitching points to understand the mutation rates in SSRs and how these mutations propagate among the coding and non-coding compartments. Besides, the study attempts to lay the groundwork for the much-needed scientific discussion on SSRs incidences in Coronaviridae genomes and endeavours to test their biological significance in pathogenesis, evolution, and immune evasion. The 55 complete genomes of Coronaviridae families were retrieved on 23 rd March 2020 (See Supplementary Material 1, Sheet 2 for more details, we used only RefSeq Nucleotides with complete annotations) and were scanned in search of SSRs using a Python package, PERF (Avvaru et al., 2018) . A minimum length of SSRs was chosen to be 12 nt (Mashhood Alam et al., 2015; Srivastava et al., 2019) which represents at least two complete repeating units of a 6mer motif (hexamer). We used all theoretically possible 501 unique classes of SSRs as described in a study (Srivastava et al., 2019; Subramanian et al., 2003) to identify their presence/absence in Coronaviridae genomes by using the following command: "PERF -isequence.fasta -a -o sequence_perf_default.tsv". The interactive .html pages were used to manually visualize and analyze SSRs prediction data and understand their attributes. The BED files (eg.sequence_perf_default.tsv) so produced by PERF comprise of SSRs genomic coordinates (Column 1-3) followed by repeat class, repeat Length, repeat Strand, motif Number & actual repeat (more details: https://github.com/RKMlab/perf) and were used for the downstream analysis. For each genome, we computed a few attributes to measure the prevalence of SSRs in the viral genomes of the Coronaviridae family. These included SSR frequency (or abundance), SSR J o u r n a l P r e -p r o o f density, and SSR GC%. The SSR frequency was defined as the total number of SSRs found in each genome. The SSR density was computed as per the formula Where SSR L is the length of SSR (in bp) and G L , the genomic length (in bp), and SSR d , the SSR density per Kb. This was attempted to normalize and take care of the biases that could crop up due to variable G L . We use SSR density as a measure of comparison throughout the study unless otherwise mentioned. The SSR GC% for a genome was defined as the GC % of concatenated strings of SSRs retrieved using the coordinates from the BED file. Briefly, we used samtools (H et al., 2009) , bedtools (Quinlan & Hall, 2010) , and seqkit (Shen et al., 2016) in combination to compute GC content. To identify the class-specific trends of SSRs we computed class-specific SSR frequency, SSR base coverage, and SSR density for each of the 501 repeat classes using in-house scripts. The list of 501 SSRs were obtained from Additional file of (Subramanian et al., 2003) . The class-specific SSR frequency was computed and collated in the form of a matrix where each row represented repeat class and column represented frequency of that class in each genome. The matrix was visualized and analyzed using Morpheus (https://software.broadinstitute.org/morpheus/) of Broad Institute. The repeat classes were subjected to Hierarchal Clustering using Euclidean distance. The heatmap of the repeat classes present in at least >=10 members was constructed. The color scale on the heatmap ranged from 0-3, 3 being the highest SSR frequency observed. We also checked for variation of repeat class abundance with respect to repeat class lengths using python script from Srivastava et al, 2019. The 501 repeat classes were divided into 5 GC cluster groups based on the GC content of the repeat motif. The 60 bp strings formed by repeating the base motif in tandem were constructed and GC content computed using 'seqkit fx2tab -g' command. The GC cluster group so formed were <=25%, 26-49%, 50-60%, 61-80% & 81-100%, which encompass 70, 120, 153, 112 & 46 repeat motifs respectively. Based on the length of repeat motifs, repeat classes were categorized as monomers, dimers, trimers, tetramers, pentamers & hexamers. The motifs with similar length were groped in similar size category. The accession list was used to query Batch Entrez Assembly Database to procure GFF files containing genomic feature annotation information of all viruses under the current study. The SSRs annotation was accomplished using an in-house developed shell script that computes 4 possible overlap scenarios of SSRs with genic regions. Briefly, the script parses both .tsv files obtained as PERF output and .gff files and performs a coordinate-based comparison (Avvaru et al., 2018) . The SSRs overlapping with two or more genes were counted as one while computing J o u r n a l P r e -p r o o f the SSR abundances in the genic region. Besides, for all overlapping SSRs with the genic region, the percentage overlap of SSR with the genic region is also reported. The step was critical to negate the skewness stemming in otherwise if the majority of SSRs were found populated within genic-intergenic boundaries. We verified that > 95% of exonic SSRs show a complete overlap with exons. We also carried out a variant analysis of SARS-CoV-2 SSRs to decipher if the SSRs serves as the mutational hotspots. 4935 variant sites made available public by NGDC (National Genomics Data Center) stemming from the analysis of 11641 high-quality human-derived SARS-CoV-2 genome sequences were downloaded last on 28-04-2017. The identified variant sites are graded into three levels (I to III) based on population frequency and mutation density distribution. The class I variants are one with the highest population frequency (> 0.05, more credible); class II variant are sites with moderate population frequency and class III being one with <0.05 population frequency, hence low reliability (detailed in table ' Variation Annotation', https://bigd.big.ac.cn/ncov/variation/annotation). A custom shell script was deployed to check if the variants were majorly localized in genic SSRs of SARS-CoV-2. The Severe Acute Respiratory Syndrome-related coronavirus, Middle East Respiratory Syndrome-related coronavirus, and Human coronavirus OC43 strains were used for primer designing. The SSRs with 70 bp flanking were retrieved using samtools and seqkit and were converted to query files using a customized in-house bash script. We used primer3_coreconda package to retrieve the primer sequence with the custom settings: PRIMER_TASK=generic, PRIMER_PICK_LEFT_PRIMER=1, PRIMER_PICK_RIGHT_PRIMER=1, PRIMER_OPT_SIZE=18, PRIMER_MIN_SIZE=15, PRIMER_MAX_SIZE=21, PRIMER_MAX_NS_ACCEPTED=1, PRIMER_PRODUCT_SIZE_RANGE=75-100, P3_FILE_FLAG=1,PRIMER_EXPLAIN_FLAG=1,PRIMER_MIN_GC=40,PRIMER_OPT_GC _PERCENT=50, PRIMER_MAX_GC=60. The sequence for which the primers couldn't be determined via automated scripts were identified separately by tweaking GC content and other settings (PRIMER_MIN_GC=30 and SEQUENCE_TARGET= 70,2). The primers so designed were checked for off-targets if any using BLASTn. To screen microsatellites in 55 Coronaviridae genomes, we used a PERF package, an exhaustive repeat finding algorithm, to search for all 501 theoretically possible SSR motifs (Subramanian et al., 2003) occurrences in the genomes (Avvaru et al., 2018) . A total of 919 SSRs with length >=12 bp were identified across 55 reference genomes belonging to two subfamilies: Coronavirinae and Orthocoronavirinae. The top 4 strains with the largest number of SSRs were human infecting coronaviruses viz. Human coronavirus HKU1 (29 SSRs, NC_006577), Human coronavirus OC43 (26 SSRs, NC_006213), Severe acute respiratory syndrome-related coronavirus (25 SSRs, NC_004718), Human coronavirus NL63 (25 SSRs, NC_005831). The genome size can influence SSRs incidences. Therefore, to account for the variation we calculated the SSR density which is the number of bases covered by SSRs per Kb and plotted the results. As evident from Fig. 1 , the SSR density is independent of the genome size. We also computed the correlation coefficient (r) between SSR density and genome length & SSR frequency and genome length. Unlike eukaryotic genomes (Srivastava et al., (Fig. 2) . A recent study carried out elsewhere highlights high sequence similarity of CDS of SARS-CoV-2 with 4 coronavirus strains; a Bat Relative (bat-SL-CoVZXC21, MG772934), Tor2 SARS-CoV (NC_004718), and HCoV-EMC MERS-CoV (NC_019843) (Grifoni et al., 2020) . We, therefore, decided to compare the SSR repertoire across the aforementioned homologues to SSRs found in SARS-CoV-2 (NC_045512). The Tor2 SARS-CoV SSR density (11.05845182 bp/kb) is higher than SARS-CoV-2 (9.296726081 bp/kb) followed by bat-SL-CoVZXC21 (8.293482259 bp/kb) & least in HCoV-EMC (4.515422159 bp/kb). The genomic composition of viruses can vary widely and dictates mutational bias toward AT or GC. We, therefore, evaluated genome-wide and SSR localized nucleotide composition across 55 genomes of Coronaviridae. Overall, the GC content of Coronaviridae genomes was found to range from ~32-48%. It has been highlighted in previous studies that Coronaviridae genomes have underrepresented CpG ratio which might confer the members of the family, an advantage of immune evasion in vertebrates where immune pathways target CpG rich regions (eg TLRs). Moreover, Coronaviruses exhibits atypical nucleotide composition with high levels of Ts and low levels of Cs, perhaps due to cytokine deamination (Di Giallonardo et al., 2017 ). The SSRs GC% shows an overall moderate uphill (positive) relationship with genomic GC% (Pearson, r= 0.510175). Interestingly, we found a significant correlation between genomic GC% and SSR's GC% when the organisms were grouped according to their genus (Fig. 3) .We Betacoronavirus, however, do not such correlation. The bat-SL-CoVZXC21 have similar genomic GC% to that of SARS-CoV-2 but dissimilar SSR GC%. We calculated relative abundances of all 501 theoretically possible repeat motifs across 55 genomes and plotted a heatmap based on the observed motif frequencies. Most of the SSRs were 12-13 nt long except a polyA monomeric repeats (polyA) of varying lengths. We observed a distinct pattern that Coronaviridae members have intrinsically low abundances of SSRs which do not generally exceed the frequency of 2. The heatmap was plotted for those classes of repeat motifs that were found across a minimum of 10 strains (Fig. 4) . We found 16 such classes which were high in A content than in G/C content. 249 Repeat Classes were altogether absent in all the 55 genomes. The polyA monomeric repeats were however found in most of the genomes of Orthocoronavirinae subfamily and, majorly in MERS, SARS, and Avian Coronaviruses. The repeats were however found to be localized at the end of the chromosome in the intergenic region (3'UTR). The 3′UTR region, for instance, is reported to be conserved in beta coronaviruses and harbours the cis-acting sequences that form potential molecular switch required for viral replication. The sequences fold into secondary or higher-order structures that confer to RNA stability and facilitate both intra-and inter-molecular interactions (Yang & Leibowitz, 2015) . Canonically, the SSR frequency is expected to decrease with growing repeat length, as longer repeats have a higher propensity of mutation (Srivastava et al., 2019) . Therefore, we looked at the length of each SSR across all 55 organisms. However, we didn't observe the variation of repeat class abundance with respect to repeat class lengths in the predicted SSRs indicating that SSRs in Coronaviridae family exhibit no length preferences. To identify if the genomes of Coronaviridae atypically favoured repeat classes of certain size categories, we divided the repeat classes into six size categories ranging from monomers to hexamers. We found that Coronaviridae genomes were majorly populated with hexamers followed by pentamers in the league. The monomeric repeats were the least abundant (Fig. 5A) . The repeat motifs were also clustered into 5 subgroups based on the GC content of the repeat motifs itself as explained in methodology. Maximum SSRs belongs to the repeat classes with <=25% GC (323 SSRs/919) across all genomes followed by subgroups with 26-49% GC content (314 SSRs/919) and 50-60% (213 SSRs/919). This is in alignment with the fact that Coronaviridaegenomes are intrinsically AT-rich which highly influences SSRs AT-or GCrichness in different genomic regions in addition to the nucleotide distribution across genomes (Di Giallonardo et al., 2017; Victoria et al., 2011) . The earlier study highlights the hexanucleotide SSRs abundances in exonic regions of eukaryotic genomes (B. while mono and dinucleotide repeat in viruses (Mashhood Alam et al., 2015) . However, unlike other viruses, the Coronaviridae genomes have a different genome architecture. We, therefore, suspected the non-random distribution of SSRs across the genomes as pointed out by several studies (Katti et al., 2001) . To test our hypothesis, we investigated if there was a significant bias of harbouring SSRs in genic regions. To compute the overlap of SSRs and genic regions we wrote a customized shell script undertaking four overlapping scenarios. Interestingly, we found that 99.4% (833/838 genic SSRs) of the SSRs exhibit a 100% overlap with the genic regions. This led us to infer that the genic regions of Coronaviridae are populated with majorly hexanucleotide SSR repeats followed by pentamers (Fig. 5A & B) . The comparative under-representation of Dimers, Trimers, and Monomers can be explained based on the destabilization and disruptive effect the repeats impart to the coding region. Moreover, a body of evidence suggests mutations in CDS (Coding Sequence) region can potentially disrupt protein function or could lead to protein truncation (Y. C. . Also, CDS are reported to selectively comprise tri-and hexanucleotide SSR motifs, which can lower the incidences of translational frameshift mutations (Fujimori et al., 2003; Metzgar et al., 2000; Subramanian et al., 2003) . Besides, SSRs in the CDS region are under strong evolutionary pressure and prefer not to expand to maintain protein stability encoded by the CDS (Qi et al., 2016) . The current outbreak of COVID-19 has affected 212 countries and several territories across the globe and various reports underline the ongoing evolution of SARS-CoV-2 (Cagliani et al., 2020; Phan, 2020; Tang et al., 2020) . In our analysis of SARS-CoV-2, 13 SSRs were found to occupy ORF1ab region, 3 SSRs in S gene, 2 in ORF3a, 1 in ORF7a and N gene. Overall, the variations account for 29.14% of the total bases covered by SSRs in SARS-CoV-2. This indicates the genic SSRs are under selection pressure against non-beneficial mutation. Indeed, it has been reported in earlier studies that tandem repeats are common in protein-coding regions thereby facilitating the rapid evolution of proteins (Huntley & Golding, 2000; Romero & Arnold, 2009 ). SSRs of SARS-CoV-2 comprised repeats from 17 repeat classes. Therefore, we plotted a classwise tree-map for the observed frequency of variants (Fig. 6) . The intergenic repeat class (polyA repeats) harboured the maximum number of variant loci (34) than the genic classes. The polyA produces a PolyU tail in negative-sense viral RNA. The polyuridine sequence is cleaved by EndoU endonuclease which would otherwise activate the host's immune cells. Mutations in the PolyA region of SRAS-CoV-2 can prevent formation secondary structure PolyU makes with other A/G rich domains in negative-sense RNA, which is otherwise recognized by pattern recognition receptors (PRRs) and might confer a selective advantage in immune evasion (Hackbart et al., 2020) . The number of SSRs vs Number of Variations recorded in each gene was plotted (Fig. 7A) . To see which type of mononucleotides variations were more prominent; we tried to chart the frequencies of variations in the form of a radar map. We observed A->G, A->T & A->C mutations were more frequent in SSR regions (Fig. 7B) . To be transparent, since all these variants belong to Evidence class III as per the limited whole genome set of 11641 high fidelity sequences made available from globally collected samples, and evidence class are subjected to change as more sequences are deposited and analyzed in NGDC.(current analysis identifies intergenic SSR's; polyA repeats (3'UTR region of the genome) to be mutational hotspot comparative to the genic SSRs, see Fig. 7A , followed by ORF1ab). The exoribonuclease (ExoN) coded by coronavirus genomes plays an essential role in high fidelity replication/ synthesis of RNA (Fung et al., 2020) . A study carried out on CoVs lacking showed the accumulation of A->G and U->C variations in CoVs viral genomes (Chen et al., 2009; Smith et al., 2013) . The mutation in the ExoN coding region (18040..19620 in SARS-CoV-2) might derive the proofreading mechanism haywire leading to progressive accumulation of mutations. This can be offered as a possible explanation that observed A->G mutation in genic SSRs might be a result of ExoN attenuation due to mutational burden. To check our hypothesis, we revisited the variants dataset to look for mutations in the nsp14 coding region. Surprisingly, we found one high fidelity and three moderate fidelity To facilitate further research into the SSR repertoire in Coronaviridae members, we identified primers for the SARS-related coronavirus, MERS-related virus, and Human coronavirus OC43 using Primer3 which are provided in Supplementary Material 2 and can be validated using SSR-PCR (Atia et al., 2016) . For a few sequences surrounded by AT-rich regions and mononucleotide (A) repeats (polyA tail), the primer designing couldn't be achieved and therefore must be orphaned. Genes harbouring more than one SSR lying juxtaposing to each other and hence having similar primers are demarcated as "Common Primers" (Atia et al., 2016) . The complete set of SSRs identified and the customized scripts for Batch primer identification are available upon request. The present study screened 55 genomes of Coronaviridae family for the incidences, abundances, and composition of microsatellites. The informatic analysis revealed that the SSRs incidences and density were independent of the genome sizes of Coronaviridae members. We observe an J o u r n a l P r e -p r o o f overall moderate positive correlation between genomic GC% and SSRs GC%. A strong positive correlation in GC percentages was observed when the genomes and SSRs were grouped at the genus level. Our preliminary findings suggest the dearth rather than the complete absence of SSRs in Coronaviridae genomes. The underrepresentation of SSRs in Coronaviridae genomes can come as an additional explanation for progressively slowing of the nonsynonymous mutation rates in the SARS 2003 outbreak and current SARS-CoV-2 outbreak besides other reasons such as the role of 3′ exonuclease (ExoN) in proofreading activity during replication (He et al., 2004) . The SSRs were found to populate preferentially the genic regions of the genomes analyzed and are predominantly hexameric repeat motifs. Our study highlights SSRs to be present in ORF1ab (nsp3, nsp4, nsp5A_3CLpro and nsp5B_3CLpro, nsp6, nsp10, nsp12, nsp13, & nsp15 domains), S, ORF3a, ORF7a, N &3' UTR regions of SARS-CoV-2 and harbors multiple mutations (3'UTR and ORF1ab SSRs harboring major number of variants). Though limited in SARS-CoV-2 and other Coronaviridae genomes, SSRs have the potential to become mutational hotspots (given to their well-known reputation as hypermutable regions) (Lin & Kussell, 2012) as the virus explores genotypic space and evolves to find beneficial mutations (Elena et al., 2006) . However, in-vitro and in-vivo studies are further required for detailed investigation of the role of SSRs in viral genomes of coronaviruses in terms of pathogenesis, evolution and immune evasion. Author contributions: RS and NKJ designed and wrote the manuscript. RK, SKJ, AS, DK, PN, JR, KKK and MAK analyzed, coordinated and drafted the manuscript. All authors read and approve the final manuscript. Funding: There are no relevant funding sources to report. The authors declare that they have no conflicts of interest. Fig1. Overview of SSRs density variation with respect to genome size. No correlation was observed between SSR density with genome size (Pearson, r=0.149987561). Genome-wide in silico analysis, characterization and identification of microsatellites in spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV) PERF: An exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences Computational inference of selection underlying the evolution of the novel coronavirus, SARS-CoV-2 Functional screen reveals SARS coronavirus nonstructural protein nsp14 as a novel cap N7 methyltransferase Origin and evolution of pathogenic coronaviruses SARS and MERS: J o u r n a l P r e -p r o o f Recent insights into emerging coronaviruses Dinucleotide Composition in Animal RNA Viruses Is Shaped More by Virus Family than by Host Species Ecology, evolution and classification of bat coronaviruses in the aftermath of SARS Mechanisms of genetic robustness in RNA viruses A novel feature of microsatellites in plants: A distribution gradient along the direction of transcription A tug-of-war between severe acute respiratory syndrome coronavirus 2 and host antiviral defence: lessons from other pathogenic viruses A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak-A n update on the status The Sequence Alignment/Map format and SAMtools Coronavirus endoribonuclease targets viral polyuridine sequences to evade activating host sensors Molecular Evolution of the SARS Coronavirus, during the Course of the SARS Epidemic in China Evolution of simple sequence in proteins Differential distribution of simple sequence J o u r n a l P r e -p r o o f repeats in eukaryotic genome sequences Current trends in diagnostics of viral infections of unknown etiology Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes Microsatellites: Genomic distribution, putative functions and mutational mechanisms: A review Microsatellites within genes: Structure, function, and evolution Evolutionary pressures on simple sequence repeats in prokaryotic coding regions Analysis of Simple and Imperfect Microsatellites in Ebolavirus Species and Other Genomes of Filoviridae Family Selection against frameshift mutations limits microsatellite expansion in coding DNA Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Genetic diversity and evolution of SARS-CoV-2. Infection Distinct patterns of simple sequence repeats and GC distribution in intragenic and intergenic regions of primate genomes BEDTools: A flexible suite of utilities for comparing genomic features Exploring protein fitness landscapes by directed evolution SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses Coronaviruses Lacking Exoribonuclease Activity Are Susceptible to Lethal Mutagenesis: Evidence for Proofreading and Potential Therapeutics Patterns of microsatellite distribution across eukaryotic genomes Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions On the origin and continuing evolution of SARS-CoV-2 Comparative assessment of SSR and SNP markers for inferring the population genetic structure of the common fungus Armillaria cepistipes In silico comparative analysis of SSR markers in plants Microsatellite markers: What they mean and why they are so useful Evolutionary Insights into the Ecology of Coronaviruses Coronavirus Pathogenesis and the Emerging Pathogen Severe Acute Respiratory Syndrome Coronavirus Anatomy of the epidemiological literature on the 2003 SARS outbreaks in Hong Kong and Toronto: A time-stratified review The structure and functions of coronavirus genomic 3' and 5' ends Coevolution between simple sequence repeats (SSRs) and virus genome size