key: cord-0744514-dcjdhyo6 authors: Chechetkin, Vladimir R.; Lobzin, Vasily V. title: Evolving ribonucleocapsid assembly/packaging signals in the genomes of the human and animal coronaviruses: targeting, transmission and evolution date: 2021-06-13 journal: J Biomol Struct Dyn DOI: 10.1080/07391102.2021.1958061 sha: 4c56df20b0bb794d27d07363fe6b1d312fb619b5 doc_id: 744514 cord_uid: dcjdhyo6 A world-wide COVID-19 pandemic intensified strongly the studies of molecular mechanisms related to the coronaviruses. The origin of coronaviruses and the risks of human-to-human, animal-to-human, and human-to-animal transmission of coronaviral infections can be understood only on a broader evolutionary level by detailed comparative studies. In this paper, we studied ribonucleocapsid assembly-packaging signals (RNAPS) in the genomes of all seven known pathogenic human coronaviruses, SARS-CoV, SARS-CoV-2, MERS-CoV, HCoV-OC43, HCoV-HKU1, HCoV-229E, and HCoV-NL63 and compared them with RNAPS in the genomes of the related animal coronaviruses including SARS-Bat-CoV, MERS-Camel-CoV, MHV, Bat-CoV MOP1, TGEV, and one of camel alphacoronaviruses. RNAPS in the genomes of coronaviruses were evolved due to weakly specific interactions between genomic RNA and N proteins in helical nucleocapsids. Combining transitional genome mapping and Jaccard correlation coefficients allows us to perform the analysis directly in terms of underlying motifs distributed over the genome. In all coronaviruses RNAPS were distributed quasi-periodically over the genome with the period about 54 nt biased to 57 nt and to 51 nt for the genomes longer and shorter than that of SARS-CoV, respectively. The comparison with the experimentally verified packaging signals for MERS-CoV, MHV, and TGEV proved that the distribution of particular motifs is strongly correlated with the packaging signals. We also found that many motifs were highly conserved in both characters and positioning on the genomes throughout the lineages that make them promising therapeutic targets. The mechanisms of encapsidation can affect the recombination and co-infection as well. The COVID-19 pandemic inspired worldwide crisis by the rapid spread of infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). There are strong indications that coronaviral infections will become a permanent significant factor affecting the life of human society. Currently, coronaviruses cause 30% of upper and lower respiratory tract infections in humans, SARS-CoV, SARS-CoV-2 and MERS-CoV being the cause of heavy diseases and deaths. Like the other diseases caused by coronaviruses, COVID-19 is thought to be of zoonotic origin (Contini et al., 2020; Decaro & Lorusso, 2020; Mahdy et al, 2020; Swelum et al., 2020; Zhou et al., 2020) . The assessment of risks of human-to-human, animal-to-human, and human-to-animal transmission of coronaviral infections and development of efficient medications and vaccines against the coronaviruses need the knowledge of main molecular mechanisms in the virus life cycle and virus-host interactions (Mishra & Tripathi, 2021; O'Leary et al., 2020; Rabi et al., 2020; Saxena, 2020) . The long (about 30,000 nt) non-segmented plus-sense single-stranded RNA genome of the coronaviruses is packaged within a filament-like helical nucleocapsid, while the whole ribonucleocapsid is packaged within a membrane envelope with spike glycoproteins (Chang et al., 2014; Chen et al., 2007; Gui et al., 2017; Masters, 2019; Neuman & Buchmeier, 2016) . The nucleocapsid (N) proteins of coronaviruses provide one of the promising therapeutic targets as they are critical for viral replication and assembly and can be attributed to the most conserved proteins (Bai et al., 2021; Chang et al., 2014; Dinesh et al., 2020; Dutta et al., 2020; Gao et al., 2021; Kwarteng et al., 2020; Lin et al., 2020; Matsuo, 2021; Peng et al., 2020; Tilocca et al., 2020; Yadav et al., 2020; Ye et al., 2020; Zinzula et al., 2021) . The cryogenic electron microscopy (cryo-EM) has revealed that the ribonucleocapsid of SARS-CoV is helical with an outer diameter of 16 nm and an inner diameter of 4 nm (Chang et al., 2014) . The turn of the nucleocapsid is composed of two octamers polymerized from dimeric N proteins (Chen et al., 2007) . The pitch for the SARS-CoV nucleocapsid is 14 nm. The packaging of the SARS-CoV ssRNA genome near internal surface of helical nucleocapsid with such parameters should correspond to 54-56 nt per helical turn (or 6.75-7 nt per N protein) (Chang et al., 2014; Chechetkin & Lobzin, 2020) . The parameters of the ribonucleocapsid for murine hepatitis virus (MHV) are close to those of SARS-CoV (Gui et al., 2017) in accordance with evolutionary conservation of the main structural characteristics of N proteins. Due to the transitional symmetry of a helix, the weakly specific cooperative interaction between ssRNA and nucleocapsid proteins should lead to the natural selection of specific quasi-periodic motifs in the related genomic sequences. Indeed, the quasi-periodic motifs with the period close to 54 nt were detected and appeared to be strongly pronounced in the genomes of SARS-CoV and SARS-CoV-2 (Chechetkin & Lobzin, 2020) . The ribonucleocapsid assembly/packaging signals (RNAPS) in RNA genomes together with variations in N protein conformations ensure the specificity of encapsidation. In this paper we studied RNAPS in the genomes of the known endemic and pandemic human coronaviruses and compared them with RNAPS for a batch of related genomes for animal coronaviruses. The analysis of RNAPS is performed directly in terms of underlying motifs distributed over the genome and in the context of the general genome organization. Such RNAPS may serve for therapeutic targeting or for the assessment of evolutionary divergence between the viruses. They may also shed additional light on the mechanisms of recombination and co-infection for human coronaviruses as well as on the basic problems of viral molecular evolution. In our study, we chose the scheme of analysis providing the output data directly in terms of related motifs distributed over the genome which are most important for the genetic and medical applications. The primary objects in our approach are correlational motifs. Generally, they cannot be reduced to the sparse or tandem repeats with gaps and alignments. As periodic features produce decaying or persistent oscillations in correlational motifs separated by distances multiple to the period, the periodic features can also be detected by the suggested method. The method can detect and reconstruct tandem repeats (both complete and incomplete) as well. The distribution of correlational motifs over the genome provides information about large-scale genome organization. As the correlational motifs are approximately robust with respect to point mutations and indels, they are especially suitable for the study of viral genomes with inherent high frequency of mutations. The characteristic correlational and quasi-repeating motifs can be used, e.g., for therapeutic targeting, for subtyping of viruses or for the assessment of evolutionary divergence between species. The algorithm for transitional automorphic mapping of the genome on itself (TAMGI) (Chechetkin & Lobzin, 2020 used throughout this paper is defined as follows. Let a nucleotide N m,α of the type α be positioned at a site m of the genomic sequence. Then, a pair of s-neighbors, N m-s and N m+s , is searched for around N m,α . The nucleotide N m,α will be retained if it has at least one s-neighbor N m-s,α or N m+s,α of the same type and be replaced by void otherwise (denoted traditionally by hyphen). All s-neighbors of the same type, N m-s,α or/and N m+s,α , should also be retained. The separation distance s is called the step of TAMGI. The resulting sequence after TAMGI with the step s is composed of the nucleotides of four types (A, C, G, T) and the hyphens "-" denoting voids. We will call such sequences TAMGI components and denote as s Γ , whereas the whole genomic sequence will be denoted as Γ. The TAMGI components can be studied separately or be united, Further analysis after TAMGI is reduced to the study of all complete words of length k (k-mers) composed only of nucleotides (voids within the complete words are prohibited) and surrounded by the voids "-" at 5'-and 3'-ends, -N k -. By definition, the complete words are non-overlapping. At the next stage, the mismatches with hyphens to the complete words can be studied. The correlation motifs (or TAMGI motifs) are defined as a set of k-mers generated by TAMGI. To avoid end effects and to ensure homogeneity of the mapping, the linear genomes will always where N m,α denotes the nucleotide of the type α∈(A, C, G, T) occupying the site m and M is the genome length. The circularized version of TAMGI can also be described as follows. (i) Circularize a linear genome and superimpose two identical circular genomes over each other. (ii) Rotate clockwise one of the genomes on a step s and count all coincidences between the two genomes. (iii) Rotate counterclockwise one of the genomes on a step s and count all coincidences between the two genomes. (iv) Unite all coincidences into one sequence and fill the voids by hyphens. The theory and simulations show that for even M the step M/2 should be considered apart from the other steps. Therefore, the range of steps can be chosen from 1 to where the brackets denote the integer part of the quotient. Any sequence can be expanded via the complete set of TAMGI components with the steps from s = 1 to [M/2], ( This means that s Γ can be considered as the generalized genome coordinates, whereas the highest s Γ can be associated with the principal components related to the genome organization. The frequencies of nucleotides after TAMGI with the step s, should be properly normalized to assess their statistical significance. The normalization ought to be performed against the counterpart characteristics in the random sequences of the same nucleotide composition. The partial fractions of nucleotides after TAMGI for the randomly reshuffled genomic sequences are given by where α ϕ is the frequency of nucleotides of the type α retained under reshuffling (the detailed theory for TAMGI, including the derivation of Eq. (5), can be found in the paper by Chechetkin & Lobzin (2021) ). The frequencies given by Eq. (5) are independent of steps s. For random sequences, the variances for the frequencies defined by Eq. (5) obey the binomial distribution, The total frequency defined by Eq. (4) can be presented in terms of a normalized deviation, In random sequences, the deviations defined by Eq. (7) The mutual selection of functional motifs in viral genomes, their merging and decaying during molecular evolution strongly affect the molecular mechanisms of the virus life cycle. To assess such effects, we will use the Jaccard correlation coefficient (Baharav et al., 2020; Chung et al., 2019; Jaccard, 1912; Vorontsov et al., 2013) . The number of nucleotides in a sequence obtained after TAMGI with the step s will be denoted as and are related by the equality The corresponding frequencies are calculated by the standard definitions, (14) The related frequencies for random sequences are given by (5)). This yields the following approximate expression for the Jaccard coefficient, (see for notations Eqs. (5) and (6)). Therefore, the Jaccard coefficient defined by Eq. (14) can be conveniently normalized as A set of JC is calculated at s 1 fixed and s 2 running from 1 to S 2 , the value for s 2 = s 1 being discarded. The sets of JC provide an additional criterion for quasi-periodicity of motifs. In particular, if s 1 = p and s 2 = 2p, the value of ι p^2p should be high; or more symmetrically, s 1 = 2p, s 2 = p and s 2 = 3p, while the values of ι 2p^p and ι 2p^3p should both be high. For persistent quasi-periodicity, there is a series of high values for ι p^kp , where p is the period and k = 2, 3, ... The multiplication and modification of repeated patterns is one of the molecular mechanisms responsible for the generation of RNAPS. The normalized deviations defined by Eqs. (7) and (18) unify the comparison of genomes with different lengths and nucleotide compositions. Primarily, we are interested in the most significant effects. For this reason, all data in Supplements are additionally presented in the ranked form. The related features for the values of top ranks are definitely non-random (the corresponding Gaussian probabilities are less than 10 -10 ). The robustness of ranking with respect to mutations was assessed on limited sets of isolates. The corresponding variations in the normalized deviations were not higher than 0.5. Therefore, the ranked values were additionally grouped by grades (grade = 1) for the semi-quantitative assessment of divergence between these values. The values belong to the same grade if the absolute difference between them does not exceed 1; a value with difference between 1 and 2 with respect to a reference should be attributed to a lower grade etc. As the main information in viral genomes is related to the coding for different proteins needed for virus proliferation, the signals corresponding to genome packaging are mainly evolved using the redundancy of the genetic code. The quasi-periodicity with period p = 3 appears to be the most pronounced in viral genomic sequences (for a review and further references see, e.g., Lobzin & Chechetkin, 2000; Marhon & Kremer, 2011) . The features related to packaging and other molecular mechanisms are displayed as peaks in the sets of normalized deviations for the steps multiple to three. The encapsidation of coronaviral genomes is assumed to be performed immediately after the replication stage (Chang et al., 2014) in order to prevent viral RNA genomes from the action of host enzymes and from the damages during packaging of the genome within the membrane envelope. Similarly to the viruses with icosahedral capsids (Stockley et al., 2016) , the encapsidation of viruses with helical capsids should also include more rapid assembly stage and subsequent slower packaging rearrangement of the RNA genome within the helical capsid. Taking into account the filament-like geometry of the helical capsid, the dynamics of the latter process should resemble the reptation of polymers (De Gennes, 1979) . The corresponding assembly and packaging signals may generally be different (Chechetkin & Lobzin, 2019) . Chang et al. (2009) found multiple (at least three) nucleic acid binding sites in N proteins that may be related to such a difference (see also Cubuk et al., 2021) . We restricted ourselves to the analysis of the genomes of alpha-and betacoronaviruses including all known pathogenic human coronaviruses. A batch of selected related genomes for animal coronaviruses was chosen for comparison. The grouping of the genomes is performed according to their evolutionary closeness. The first three groups belong to betacoronaviruses, whereas the fourth group comprises human, bat, pig, and camel alphacoronaviruses. For all human coronaviruses we used the standard reference genomic sequences from GenBank, whereas the accession numbers for the animal isolates will be presented separately in the text. Strictly, each set of TAMGI motifs for a particular step s should be considered as a correlational entity. The mutual overlapping of sets for different steps assessed by JC deviations provides information about their potentially coordinated action in different molecular mechanisms. As discussed above, the overlapping sets with steps s 1 and s 2 yield contribution into the sets with s 1 + s 2 and |s 1 -s 2 |. The latter combinations will be called channels and provide insight into a partial mechanism of the motif generation in the network of overlapping motifs. We begin the study of RNAPS with the genomes of SARS-CoV (GenBank accession: NC_004718.3), SARS-CoV-2 (NC_045512.2) and SARS-related bat coronaviruses (accession numbers: DQ071615.1, DQ412043.1, DQ412042.1, MT782115.1, FJ588686.1, GQ153548.1, GQ153547.1, and DQ022305.2). The isolate with the accession FJ588686.1 (henceforth, SARS-Bat-CoV) was chosen for the presentation in the main text and Supplements. SARS-related coronaviruses belong to the lineage B of betacoronaviruses (Luk et al., 2019) . Fig. 1 shows the normalized deviations defined by Eq. (7) for the steps s within the interval 1-550 (left column) and the JC normalized deviations defined by Eq. (18) for the step s 1 = 54 and steps s 2 within the interval 1-600 (right column). As we are interested in the most pronounced effects, the dynamical range for the TAMGI and JC deviations here and below was restricted to approximately four grades (see Section 2.3). The spectra for the TAMGI (on the left) and JC (on the right) normalized deviations defined by Eqs. (7) and (18) for the genomes of the SARS-related coronaviruses. The TAMGI deviations were calculated within the range s = 1-550, whereas the deviations for JC were calculated for s 1 = 54 and s 2 = 1-600. A, SARS-CoV; B, SARS-CoV-2; C, SARS-Bat-CoV. The characteristic steps s are explicitly marked by arrows. The deviation κ s with s = 54 appeared to be the highest in the complete TAMGI spectrum for the genome of SARS-CoV and is more than a grade higher than the nearest ranked deviations (κ 54 = 9.25, κ 87 = 7.89, and κ 108 = 7.53). The quasi-periodic character of the deviations κ 54×k (k = 1, 2, ...) is clearly seen in Fig. 1A in accordance with the previous results (Chechetkin & Lobzin, 2020) . The characteristic top ranked deviations are marked explicitly by arrows. The JC deviations shown on the right display the TAMGI motifs coordinated with those for s = 54. The motifs for the steps s 2 corresponding to the higher JC at s 1 = 54 can be called co-evolving to the motifs associated with s 1 = 54. As expected for quasiperiodic motifs with the period about p = 54, the highest Jaccard correlations are observed for s 2 = 108 (see Fig. 1A , right). The association with the octamer units corresponding to a half of period or s 2 = 27 is four grades lower, i.e. relatively weak. The characteristic values s 2 associated with the pronounced JC deviations are shown by arrows. As discussed in Section 2.2, the superimposed TAMGI motifs yield contribution into the generation of motifs corresponding to the sum and absolute difference of their steps. In particular, the motifs with s 2 = 45 and s 2 = 9 provide a contribution to the motifs with s 1 = 54 because 45+9=54. The same concerns the motifs with s 2 = 63 and s 2 = 9 because 63-9=54. The multiplication of the motifs with s 2 = 9 may also provide a contribution to the motifs with s 1 = 54, as 9×6=54. Other combinations such as 45+63, 21+87, 9+99 provide a contribution to the motifs with the doubled period, 2p = 108. Modification of motifs during quasi-random molecular evolution makes the periodic patterns for s = 54 fuzzy. Indeed, significant correlations between patterns for s 1 = 54 and s 2 = 51 or 57 as well as the correlations with s 2 = 105 are also seen in Fig. 1A , right. The comparison with the cryo-EM data (Chang et al., 2014) indicates that the motifs for s = 54 should be associated with the packaging signals, whereas the motifs for s = 51, 57, 105, 45, 9 (via 45+9 or 9×6), 87, 21, and 99 can be associated in part with the assembly signals. Such multi-channel participation of the assembly signals ensures the higher rate of encapsidation. In the genome of SARS-CoV-2 the deviations κ s with the steps s = 9 and 54 are the highest and belong to the same grade (κ 9 = 9.57 and κ 54 = 9.37). The next nearest deviations correspond to the steps s For a batch of SARS-related bat genomes, the deviations for the steps s = 54, 108, 87, 9, and 3 were persistently among top ranked and belonged commonly to the same grade (except DQ412042.1, for which the deviation for s = 9 was a grade higher in comparison with the others). The ranking for these steps may permute depending on a particular isolate. These features are also typical of the chosen isolate FJ588686.1 (κ s = 7. 87, 7.87, 7.77, 7.41, and 7.20 for s = 54, 108, 9, 87 , and 3, respectively; see also Fig. 1C, left) . Though the JC deviation for s 2 = 108 was not the highest, the JC deviations for s 2 = 63, 9, 45, 87, 108, 51, and 105 belonged to the same top ranked grade (Fig. 1C, right) , providing the combinations associated with the nucleocapsid assembly similarly to the genome of SARS-CoV. Figs. 1A-C reveal several reproducible features. The deviations for s = 54 and 108 were invariably among the highest, while the higher TAMGI and JC deviations were approximately clustered within the range s ≤ 200. The corresponding sets for the JC deviations were also calculated for the steps 21, 24, 27, 30, 42, 48, 51, 57, 60 , and 63 and the steps s 2 running from 1 to 600. These data are collected in Supplement S1 together with TAMGI deviations. The JC deviations at these steps s 1 were invariably among the top ranked for s 2 = 54, indicating the involvement of packaging periodicity p = 54 into various regulatory mechanisms and multi-channel generation of RNAPS. The distribution of motifs over the genome was calculated using windows of width w = 216 and sliding step of 108 (half-overlapping windows). For the human and bat SARS-CoV the TAMGI motifs corresponded to s = 9, 54, and 87, whereas for SARS-CoV-2 the step s = 87 was replaced by the more typical s = 84. The profiles for the distribution of motifs over the genome were calculated both for all kmers and separately for k-mers with k ≥ 5. The resulting profiles are shown in Fig. 2 . The Pearson correlations between counterpart profiles for all k-mers and for k-mers with k ≥ 5 were about 0.6-0.7. The corresponding peaks and troughs on the profiles may indicate the involvement of related motifs in the particular regulation mechanisms (cf. the comparison of such features with the experimentally verified examples below). Interestingly, the profile for k ≥ 5 and s = 9 revealed the highest peak in the region of the gene coding for N protein in the genome of SARS-CoV-2 (windows #267 and 268 in Fig. 2B ). The binding of N proteins with this region might autocontrol the expression of the gene coding for N protein via a feedback loop. The profiles for all three lineage B coronaviruses display pronounced peaks within region of the gene coding for S protein (windows #200-220 in Fig. 2 ). The detailed data related to the profiles are collected in Supplement S2. All k-mer motifs with k ≥ 6 corresponding to s = 9, 54, and 87 for the human and bat SARS-CoV and to s = 9, 54, and 84 for SARS-CoV-2 are explicitly reproduced in Supplement S3. We found that the motifs for s = 54, k = 9 contained transcription regulatory sequences ACGAAC typical of SARS-related coronaviruses (Woo et al., 2010) in the genome of SARS-CoV-2), and 25469 (close to the start at 25478 of the gene coding for E protein in the genome of SARS-Bat-CoV). Besides, the motif CTAAACAT with k = 8 and s = 9 positioned at 16049 on the genome of SARS-CoV-2 (upstream from the gene coding for nsp13) contained transcription regulatory sequence CTAAAC typical of alphacoronaviruses and of lineage A betacoronaviruses (Woo et al., 2010) . Figure 2 . The distributions of motifs over the genomes of SARS-CoV, SARS-CoV-2 and SARS-Bat-CoV calculated with windows of width w = 216 and sliding step of 108 (half-overlapping windows). The corresponding TAMGI steps are shown in the inserts. The upper profiles correspond to all motifs (k ≥ 1), whereas the lower profiles correspond to k ≥ 5. The relationship between particular RNAPS and transcription regulatory sequences corresponds to the experimentally established multifunctional role of N proteins which participate not only in the assembly/packaging of the ribonucleocapsid but also in the regulation of the replication-transcription processes (Grossoehme et al., 2009; Hurst et al., 2010 Hurst et al., , 2013 McBride et al., 2014; Verheije et al., 2010; Yang et al., 2021) . The complete RNAPS repeats with k ≥ 6 were rather rare in the genomes of particular SARS- MERS-CoV discovered in 2012 is responsible for acute respiratory syndrome in humans with overall mortality around 35.7% (Azhar et al., 2019) . Though bats and alpacas are considered to be the potential reservoirs for MERS-CoV, dromedary camels seem to be the only animal host responsible for the camelto-human virus transmission and spread of human infections (Azhar et al., 2014; Chan et al., 2015; Mohd et al., 2016; Omrani et al., 2015) . This endemic disease was found in dromedary camel populations of East Africa and the Middle East. Human and camel MERS-CoV belong to the lineage C of betacoronaviruses (Luk et al., 2019) . Despite the divergence of amino acid sequences, N protein for MERS-CoV retains the main structural similarity with N proteins for the other coronaviruses (Chang et al., 2014; Nguyen et al., 2019; Peng et al., 2020) . In this section we study RNAPS in the genomes of MERS-related coronaviruses. We used the reference sequence for human MERS-CoV ( 105 were higher than those between motifs for s 1 = 54 and for s 2 = 108, ι 54^105 = 12.10 (ranked position 4); and ι 54^108 = 11.50 (position 23). The JC value ι 54^171 = 12.12 corresponded to the ranked position 2, whereas the top ranked value, ι 54^96 = 13.59, was a grade higher. The channels 96-42, 99+9, 96+9, 93+15, 36+69 may be associated with the assembly signals. Nearly all these channels correspond to the doubled period. The motifs associated with s = 21 can be attributed to quasi-periodic ones. The correlations between motifs for s = 42 with the motifs for s = 21 and s = 63 are, respectively, one and two grades lower than that for s = 84 (Fig. 4C) , indicating approximately independent quasi-periodicity for s = 42. The correlations ι 21^171 = 13.19 belonged to the top ranked for s 1 = 21 (see Fig. 4B and Supplement S1). In this section we study RNAPS in the genomes of HCoV-OC43, HCoV-HKU1 and MHV. All these viruses belong to the lineage A of betacoronaviruses (Woo et al., 2010; Luk et al., 2019) . If the lengths of were the highest for s 1 = 54, whereas ι 54^105 = 14.76 corresponded to the ranked position 5. The correlations ι 57^102 = 15.51 and ι 57^108 = 15.08 were on the positions 2 and 3, respectively, and were more than a grade higher than the correlations with s 2 = 114 (ι 57^114 = 13.84). The combined analysis of TAMGI and JC spectra indicates that the fuzzy periodicity p = 54 is coordinated mainly with the channels 9×6, 33+21, 63-9 and the doubled periods 105, 9+96, 33+75. The motifs for s = 21 may also be attributed to quasi-periodic ones and yield contributions into the generation of motifs with s = 54 and 57 (Fig. 7A) . Figure 6 . The spectra for the TAMGI (on the left) and JC (on the right) normalized deviations defined by Eqs. (7) and (18) for the genomes of the lineage A betacoronaviruses. The TAMGI deviations were calculated within the range s = 1-550, whereas the deviations for JC were calculated for s 1 = 54 and s 2 = 1-600. A, HCoV-OC43; B, HCoV-HKU1; C, MHV. The characteristic steps s are explicitly marked by arrows. The genome of HCoV-HKU1 contains a fragment with the complete tandem repeats of 30 nt coding for amino acids NDDEDVVTGD (Woo et al., 2006) . Such a feature is very rare in viral genomes. In the reference genome, the complete tandem repeats occupy the sites from 3038 to 3460 and in the partly incomplete form persist up to 3517 within the gene coding for nsp3 (sites 2633-8719 within ORF1ab). Strongly modified by mutations, such repeats are scattered over the genome of HCoV-HKU1 and form larger correlational units as is seen from TAMGI deviations, κ 210 = 11.10 (ranked position 2); κ 240 = 11.05 (4); κ 120 = 11.00 (5); κ 60 = 10.56 (8) comparison with ι 54^24 . The channels related to s = 54 may be attributed to 9×6, 24+30, 9+45, whereas those for s = 57 may be attributed to 24+33, 81-24. In both cases the correlations 2 1^s s ι for s 1 = 54 and 57 were higher for s 2 = s 1 ×k with k ≥ 3 rather than for k = 2 (with an approximate exception s 2 = 102 for s 1 = 57) ( Fig. 7C and Supplement S1). The TAMGI spectrum for MHV strain ML-10 (AF208067.1) was nearly coincident with MHV-A59 for the top rank deviations (data not shown). HCoV-HKU1 (B) and MHV (C) calculated with windows of width w = 216 and sliding step of 108 (half-overlapping windows). The corresponding TAMGI steps are shown in the inserts. The upper profiles correspond to all motifs (k ≥ 1), whereas the lower profiles correspond to k ≥ 5. The two-sided arrows for HCoV-HKU1 (B) indicate the middle of a region with tandem repeats, whereas the arrows for MHV (C) indicate the middle of the experimentally verified packaging signal located at the sites 20273-20416. Profiles for the motif distributions over the genome were calculated for half-overlapping windows of width w = 216 and the TAMGI steps s = 21, 24, 54, 57, 60, 63, 66. These data are presented in Fig. 8 and Supplement S2. We begin with the profiles for MHV because in this case there are experimentally verified data on the packaging signal (Fosmire et al., 1992 The profiles for motifs in the genome of HCoV-HKU1 reveal high peaks for all steps s within the windows #29-32 corresponding to the tandem repeats (Fig. 8B) . Generally, tandem repeats always are simultaneously a source of correlational TAMGI motifs which may participate in various molecular mechanisms. In particular, this region may contain RNAPS as well. The local peaks in this region were also found for MHV, s = 24, 60, 63 (Fig. 8C ) and for HCoV-OC43, s = 21, 24, 54, 57 (Fig. 8A) . The local peak for the profile s = 24 in the case of HCoV-HKU1 was found within the same windows #188 and #189 as for the MHV packaging signal (Fig. 8B) . . The closely positioned triple coincidences were rather rare and comprised only motifs no longer than k = 7. The TAMGI motifs for HCoV-HKU1 also contained transcription regulatory sequences CTAAAC (Woo et al., 2010) , s = 57, CTAAAC (15155); s = 66, ACCTAAAC (26466); this feature was absent for the other two viruses. The human alphacoronaviruses HCoV-229E and HCoV-NL63 cause the common cold in healthy adults (Dijkman & van der Hoek, 2009 ). Yet close phylogenetically, they shared only 65% sequence identity. Being less virulent than SARS-and MERS-related viruses, these alphacoronaviruses were suggested to use as more safe models for development of drugs against SARS-and MERS-related viruses in laboratory conditions (Chakraborty & Diwan, 2020) . The structure of N protein for HCoV-NL63 retains the major structural features inherent to N proteins of the other coronaviruses (Chang et al., 2014; Peng et al., 2020; Szelazek et al., 2017) . The mode of oligomerization during formation of nucleocapsid for HCoV-229E is also close to that in the others (Lo et al., 2012) . In this section we study RNAPS in the genomes of HCoV-229E and HCoV-NL63 (accessions: NC_002645.1 and NC_005831.2, respectively) and compare them with RNAPS in the genomes of were distinctly higher than those for s = 57, while their mutual ranking depended on the particular alphacoronavirus, κ 51 = 7.16; 9.10; 9.44; 7.01 and κ 54 = 6.87; 9.31; 7.56; 7.65 for HCoV-229E, HCoV-NL63, Bat-CoV MOP1, and camel α-CoV, respectively. The related deviations for the doubled steps were κ 102 = 4.91; 8.04; 7.81; 5.24 and κ 108 = 6.98; 9.08; 7.74; 6.88, i.e. except Bat-CoV MOP1, were always in favor of quasi-periodicity p = 54. The high deviations for s = 60 appear to be the other common feature for the alphacoronaviruses in this group. The motifs for s = 21, 54, 60 can be considered as fuzzy quasiperiodic ones. Figure 9 . The spectra for the TAMGI (on the left) and JC (on the right) normalized deviations defined by Eqs. (7) and (18) for the genomes of the alphacoronaviruses. The TAMGI deviations were calculated within the range s = 1-550, whereas the deviations for JC were calculated for s 1 = 54 and s 2 = 1-600. A, HCoV-229E; B, HCoV-NL63; C, Bat-CoV MOP1; and D, TGEV. The characteristic steps s are explicitly marked by arrows. The JC spectra for s 1 = 51 and s 2 = 1-600 are shown separately in Fig. 10 . The combined TAMGI and JC spectra reveal a trend to the formation of the long 150-450 nt correlational patterns in the genomes of all studied alphacoronaviruses including TGEV. Brief comments on the particular alphacoronaviruses are presented below. for the other alphacoronaviruses), and 162 (=3×54) and can be attributed to the same grade. The deviation for s = 51 was a grade higher in comparison with that for s = 54 (6.82 and 5.72, respectively), while both were distinctly higher than the deviation for s = 57 (κ 57 = 5.04). The related deviations for the doubled steps were κ 102 = 6.21 and κ 108 = 7.20 (note also κ 105 = 7.25 and that s = 105 = 21×5 = 51+54). The rank position for the deviation κ 21 = 7.17 was 15 (on the lower boundary of the same grade as for the top rank The profiles for the characteristic TAMGI motifs in the genomes of HCoV-229E, HCoV-NL63, Bat-CoV MOP1, and TGEV are shown in Fig. 11 . We begin with the latter profiles because there is an experimentally verified packaging signal for TGEV (Escors et al., 2003; Morales et al., 2013) . As proved by Morales et al. (2013) , the first 598 nt from the 5'-end of the genome function as the packaging signal. The last 494 nt at 3'-end enhance the packaging efficiency yet are not crucial for the packaging. As is seen from Fig. 11D , the most pronounced peak in the packaging signal region was detected for the motifs s = 51 associated with the fuzzy helical periodicity in the genomic sequences. The abundance of these motifs Generally, the motifs in 5'-and 3'-end regions may be associated with other functional mechanisms (Yang coding for N protein in the genome of HCoV-NL63 (windows #245-246 and #248-249, respectively, in Fig. 11B ) and for s = 51 in the regions of the genes coding for HCNV63gp2 and HCNV63gp3 proteins (or S protein and protein 3, respectively; windows #212-213 and #231-232). A similar high peak in the region of the gene coding for S protein was detected for the profile Bat-CoV MOP1, s = 51 (windows #230-231 in Fig. 11C ). We studied also coincidences of 6-mer motifs for the counterpart steps in the different genomes (Supplement S3). We restricted ourselves to the complete coincidences and to the coincidences up to one mismatch. In the latter case the complete coincidences were discarded. The complete coincidences included also the self-coincidences (i.e. each motif was counted at least once). Each pair of coincident motifs with different positions on the genome was counted once. It was found within such a scheme that more than a half of motifs for HCoV-229E and camel α-CoV were completely coincident. The pair-wise complete coincidences for the other alphacoronaviruses were at the level 10-20%. The complete coincidences were commonly the lowest between motifs for TGEV and the other four viruses, with the minimum of coincidences for the steps s = 51 and 54 associated with the packaging periodicity. The complete coincidences for the motifs in the genome of HCoV-NL63 were the highest with Bat-CoV MOP1 and were at the level about 20%. TAMGI motifs are intrinsically related to the general organization of the genome, RNAPS being one of the elements of the genome organization. The choice of putative targets in viral genomes is strongly hampered by high frequency of point mutations and indels. As has been proved earlier, TAMGI motifs are robust with respect to point mutations and indels (Chechetkin & Lobzin, 2020 . the steps s < 500-600 used in this paper, the maximum number of indels should not exceed 25-30, whereas the real numbers of indels are about 1-5. Therefore, the longer motifs with k ≥ 5-6 can be used as putative therapeutic targets. It would be insightful to compare the impact of mutations related to the longer motifs with k ≥ 5-6 on virus viability as the impact from mutations within reconstructed motifs and/or from mutations leading to the elongation of motifs is expected to be stronger. The overlapping TAMGI motifs generate the complicated network of coordinated motifs. The RNAPS around the step s = 54 can be affected by the other overlapping motifs for the different steps and vice versa. The character of overlapping can be assessed by the Jaccard coefficients. Combining the normalized deviations for TAMGI and JC provides results directly in terms of underlying motifs within genomic sequences. The vocabulary of motifs and their positioning in the genome can be conveniently assessed against the annotated genomic sequences and specialized data bases (Ibrahim et al., 2018; Sharma et al., 2016) . This can be attributed to the molecular mechanisms of the friend-or-foe recognition which should be highly specific. A part of peaks and troughs in the motif profiles can be attributed to the mechanisms of initiation and elongation of encapsidation, replication and gene expression. The spectra for TAMGI and JC deviations and related profiles for motifs depend on the particular lineage and on the particular virus within lineage. However, some of the features appear to be highly reproducible for all coronaviruses. As has been discussed in Section 3.3, the region corresponding to the windows #29-32 (sites 3025-3564) is occupied by 30-nt tandem repeats in the genome of the coronavirus HCoV-HKU1. This region is within the gene coding for nsp3 for all coronaviruses. The long stretches with tandem repeats are simultaneously a source of a variety of correlation TAMGI motifs which can participate in the different molecular mechanisms (see Fig. 8B , Supplement S3 and general discussion by Chechetkin & Lobzin (2021) ). Though the region with such tandem repeats is inherent only to HCoV-HKU1, the peaks in the corresponding regions were also found for the profiles of motifs throughout all coronaviruses (see Figs. 2, 5, 8, and 11) . These peaks were pronounced in the motif profiles for s = 54 associated with the period of helical nucleocapsid for the most pathogenic coronaviruses SARS-CoV and SARS-CoV-2, whereas for MERS-CoV the pronounced peak in this region was for s = 105. Targeting of the quasi-repeating motifs in both genomic RNA or/and related proteins should be more efficient because of the larger statistical weight of such motifs that makes nsp3 a promising therapeutic target. The complete encapsidation of the SARS-CoV genome needs 4.4×10 3 N proteins if the nucleocapsid helix pitch is associated with 54 nt (Chechetkin & Lobzin, 2020) . This result agrees with cryo-EM data (Chang et al., 2014; Gui et al., 2017) . The fuzzy shifts of periodicity p = 54 to 57 for the longer genomes of the lineage A betacoronaviruses and to 51 for the shorter genomes of alphacoronaviruses retain the estimate above. This means that the expression of N gene should be approximately the same for all coronaviruses. Our bioinformatic analysis and cryo-EM data indicate that N proteins should be the most abundant. The abundant N proteins induce strong immune response and the antibodies against N proteins can be used for the diagnostics of coronaviral infection and for the development of vaccines (Chang et al., 2016; Dutta et al., 2020) . There are also indirect indications in favor of such conclusion based on the mechanism of expression of structural and accessory proteins in coronaviruses via nested subgenomic mRNAs (sgmRNAs) (Sawicki et al., 2007; Wu & Brian, 2010; Yang & Leibowitz, 2015; Zhang et al., 1994) . This implies some regulation feedback mechanisms for expression. The experimental data and TAMGI motif profiles indicate that N proteins can mediate the regulation. Together with frequent mutations, the recombination is one of the major source of virus variability and diversity. The recombination region for coronaviruses is mainly attributed to the receptor-binding domain of the spike (S) glycoprotein (Bobay et al., 2020; Tao et al., 2017; V'kovski et al., 2021; Zhu et al., 2020) . Taking into account the order of the structural genes in the genome, S-E-M-N, the fragment downstream from such recombinant region should code for all structural and accessory proteins. In this section we discuss compatibility of recombination and encapsidation. Consider a recombinant genome combined by ORF1ab coding for RNA replicase (fragment I) and sgRNA coding for all structural and accessory proteins (fragment II). The experimentally established packaging signals in the genomes of coronaviruses are located within the fragment I (Escors et al., 2003; Fosmire et al., 1992; Hsin et al., 2018; Kuo & Masters 2016; Morales et al., 2013) , whereas fragment II codes for M and N proteins though homologous yet different in some features from their counterparts in the primary genome without recombination. All experiments and the results of our bioinformatic analysis indicate that the specific recognition between the genomic RNA and M/N proteins is needed for the proper transport of RNA into the membrane envelope and encapsidation. In the case of recombinant genome such a recognition should be between RNA from the fragment I and M/N proteins translated from the fragment II. It should be expected that the efficient RNA transport, encapsidation and turnover of the whole virus life cycle need that homology between the fragments II of the virus genome and/or the similarity between structures of M/N proteins before and after recombination ought to exceed some threshold needed to make fuzzy the friend-or-foe recognition between the RNA and M/N proteins after recombination. In some cases the 3'-end may also affect the efficiency of packaging (Morales et al., 2013; Yang & Leibowitz, 2015) . This means that a recombinant virus should be suboptimal (if viable) from the packaging point of view in the overwhelming majority of cases. The recombinant virus can be adopted and selectively optimized during subsequent evolution or be extinct (Jensen & Lynch, 2020; Vignuzzi & López, 2019) . A part of the pronounced peaks for the motif profiles in the region of the gene coding for S protein may be related to recombination. As is seen from Figs. 2, 5, 8, and 11, this feature is typical of the coronavirus genomes. The location of the packaging signal within the recombinant fragment II coding for the structural and accessory proteins would make more compatible recombination and encapsidation. It is interesting to investigate whether the peaked region 28729-29052 within the gene coding for N protein in the genome of SARS-CoV-2 (see the profile for s = 9 in Fig. 2B ; windows #267-268) affects the packaging. The other peaked region 13933-14256 (windows #130-131 in Fig. 2B ) for s = 9 lies within the gene coding for nsp12. The similar peaked regions 23005-23328 (windows #214-215 in Fig. 2A ) and 6589-6912 (windows #62-63 in Fig. 2A ) for s = 9 in the genome of SARS-CoV were found within the genes coding for S protein and for nsp3, respectively. The neighboring peaks for s = 54 and 9 in the windows # 113 and 114 (sites 12097-12420 on the profiles shown in Figs. 2A and 2B) were reproducible for both SARS-CoV and SARS-CoV-2 and lie within the gene coding for nsp8. The synergistic action of these regions on packaging cannot be excluded as well. Note the depletion of the other motifs in all noted peaked regions for s = 9. The corresponding motifs for s = 9, k ≥ 6 can be found in Supplement S3. The recombination needs co-infection of one host cell by two or more coronaviruses. Another aspect of co-infection is related to competitive interactions between M/N proteins and genomic RNAs of two or more coronaviruses within one host cell (Sungsuwan et al., 2020) . The molecular mechanisms of coinfection for human coronaviruses are still poorly understood (Chaung et al., 2020; Lai et al., 2020; Ou et al., 2020) . The specific friend-or-foe recognition between RNA and M/N proteins should play important role during co-infection as well. A year after outbreak of COVID-19 pandemic, the efforts of scientists leaded to the development of several vaccines ready or nearly ready for practical medical applications (Artese et al., 2020; Dong et al., 2020; Karpiński et al., 2021; Logunov et al., 2021; Padron-Regalado, 2020; Singh & Gupta, 2020) . In this section we discuss several possibilities related to RNAPS. As has been established in Section 3.1, the transcription regulatory sequences with the motifs TACGAACTT coordinated with the helical capsid periodicity p = 54 were strictly conserved for SARS-CoV, SARS-related Bat-CoV and SARS-CoV-2 and were similarly located with respect to the starts of the genes coding for E proteins. Targeting of this motif by any agent (ligands, proteins, aptamers, modified N proteins or their fragments) should switch off the expression of E gene and interrupt the virus life cycles simultaneously for all three viruses. The same applies to the motif CCCTTTGTC (27596) nearby the start of the gene coding for E protein in the genome of MERS-CoV (Section 3.2). The replacements of the transcription regulatory sequences (rewiring regulatory network) proved to attenuate the virulence of SARS-CoV (Graham et al., 2018) . The total number of relatively long motifs with k ≥ 9 does not exceed 15-30 for the particular steps (Supplement S3). The targeting of such long motifs may appear to be also efficient for attenuation of viral activity. Note in particular, the long 17-mer motif TATTCAAACAATTGTTG (start at 3268) within the gene coding for nsp3 or 12-mer motif ACTGCCACTAAA (29060) within the gene coding for N protein (both for s = 54) in the genome of SARS-CoV-2. Hybridizing the RNA palindrome CAATTG within 17-mer motif with targeted DNA can be used for cutting this site by restrictase. Unlike progress in the development of vaccines, the development of antiviral drugs remains still much less successful (Artese et al., 2020; Rohilla, 2021; Twomey et al., 2020) . Search for agents efficiently targeting RNAPS or N proteins would be highly desirable. Other strategies can be based on preventing oligomerization of N proteins (He et al., 2004; Chang et al., 2013 Chang et al., , 2016 . N proteins strongly binding with RNA can affect various regulatory mechanisms with the RNA participation. In particular, it has been shown that SARS-CoV N protein can activate an AP-1 pathway, which regulates many cellular processes, including cell proliferation, differentiation, and apoptosis (He et al., 2003; Zhu et al., 2021) . It was also shown that binding of SARS-CoV-2 N proteins to the host 14-3-3 protein in the cytoplasm can regulate nucleocytoplasmic N shuttling (Tugaeva et al., 2021) . The multifunctional role of N proteins can be important not only for the virus life cycle but for the pathways and general disease progression as well. The risks of the animal-to-human and human-to-animal transmission of viruses can be assessed not only by the general homology between related genomic sequences for the human and animal coronaviruses but also by the correspondence between functional motifs. We found that some of the motifs were highly reproducible both in characters and positioning on the genome throughout the lineages (Supplement S3). Among studied genomes, the closest homology was between the camel and human MERS-CoV. They were nearly coincident and differed as one isolate differs from another. In this case it can be said about adaptation of the animal virus within the human organism. Two other pairs with the closest correspondence between motifs were human and bat SARS-CoV as well as HCoV-229E and camel α-CoV. The homology between the camel and human coronaviruses for MERS-CoV and HCoV-229E enhances the risk of co-infection by these viruses. Combining detection and reconstruction of correlational and quasi-periodic motifs by TAMGI provides a convenient tool for the study of viral genomic sequences. Supplementing TAMGI by the Jaccard coefficients elucidates the relationships within intricate network of mutually connected regulatory motifs. In the case of coronaviruses such unified technique displays the relationships between RNAPS and the other motifs within genome organization. The first, more general, level of characterization of viral genomes comprises the ranked TAMGI deviations and the study of mutual relationships between them by JC. These characteristics remained mainly reproducible within the same lineage of the coronaviruses but differed between the lineages. The second, more specific, level of characterization is related to the repertoires of longer TAMGI motifs (typically with k ≥ 5-6), their sequential ordering and distribution over the genome. In this form, the TAMGI-JC technique can be applied to the assessment of the evolutionary divergence between viruses and to the problem of subtyping. The application of the TAMGI motifs to the subtyping of viruses is close to the general discriminant genomic analysis with k-mers (Ounit et al., 2015; Tomović et al., 2006) . Motifs reconstructed by TAMGI can also be used for the assessment of the mutation impact and for therapeutic targeting. Further progress in the study of (multi)functional role of TAMGI motifs can be achieved by the additional experimental work. The vocabulary of functional motifs for the coronaviruses is of basic interest and can help in the practical medical applications. Current status of antivirals and druggable targets of SARS CoV-2 and other human pathogenic coronaviruses Evidence for camel-to-human transmission of MERS coronavirus The Middle East respiratory syndrome (MERS) Spectral Jaccard similarity: a new approach to estimating pairwise sequence alignments The SARS-CoV-2 nucleocapsid protein and its role in viral structure, biological functions, and a potential target for drug or vaccine mitigation Recombination events are concentrated in the spike protein region of Betacoronaviruses A proposed role for the SARS-CoV-2 nucleocapsid protein in the formation and regulation of biomolecular condensates NL63: A better surrogate virus for studying SARS-CoV-2. Integrative Molecular Medicine Middle East respiratory syndrome coronavirus: another zoonotic betacoronavirus causing SARS-like disease The SARS coronavirus nucleocapsid protein -Forms and functions Transient oligomerization of the SARS-CoV N protein -implication for virus ribonucleoprotein packaging Multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging Recent insights into the development of therapeutics against coronavirus diseases by targeting N protein Coinfection with COVID-19 and coronavirus HKU1-The critical need for repeat testing if clinically indicated Genome packaging within icosahedral capsids and large-scale segmentation in viral genomic sequences Ribonucleocapsid assembly/packaging signals in the genomes of the coronaviruses SARS-CoV and SARS-CoV-2: detection, comparison and implications for therapeutic targeting Combining detection and reconstruction of correlational and quasiperiodic motifs in viral genomic sequences with transitional genome mapping: Application to COVID-19 Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerization domain suggests a mechanism for helical packaging of viral RNA Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data The novel zoonotic COVID-19 pandemic: an expected global health concern The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA Novel human coronavirus (SARS-CoV-2): A lesson from animal coronaviruses Scaling Concepts in Polymer Physics Human coronaviruses 229E and NL63: close yet still so far Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein A systematic review of SARS-CoV-2 vaccine candidates The nucleocapsid protein of SARS-CoV-2: a target for vaccine development Transmissible gastroenteritis coronavirus packaging signal is located at the 5' end of the virus genome Identification and characterization of a coronavirus packaging signal Identification and functional analysis of the SARS-CoV-2 nucleocapsid protein Evaluation of a recombinationresistant coronavirus as a broadly applicable, rapidly implementable vaccine platform NTD) specifically binds the transcriptional regulatory sequence (TRS) and melts TRS-cTRS RNA duplexes Electron microscopy studies of the coronavirus ribonucleoprotein complex Analysis of multimerization of the SARS coronavirus nucleocapsid protein Activation of AP-1 signal transduction pathway by SARS coronavirus nucleocapsid protein Nucleocapsid protein-dependent assembly of the RNA packaging signal of Middle East respiratory syndrome coronavirus Characterization of a critical interaction between the coronavirus nucleocapsid protein and nonstructural protein 3 of the viral replicase-transcriptase complex An interaction between the nucleocapsid protein and a component of the replicase-transcriptase complex is crucial for the infectivity of coronavirus genomic RNA The distribution of the flora in the alpine zone Considering mutational meltdown as a potential SARS-CoV-2 treatment strategy A new era of virus bioinformatics Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid The 2020 race towards SARS-CoV-2 specific vaccines Functional analysis of the murine coronavirus genomic RNA packaging signal Targeting the SARS-CoV2 nucleocapsid protein for potential therapeutics using immuno-informatics and structure-based drug discovery techniques Co-infections among patients with COVID-19: The need for combination therapy with non-anti-SARS-CoV-2 agents SARS-CoV-2: vaccines in the pandemic era Structure-based stabilization of non-native protein-protein interactions of coronavirus nucleocapsid proteins in antiviral drug design Order and correlations in genomic DNA sequences. The spectral approach Safety and efficacy of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine: an interim analysis of a randomised controlled phase 3 trial in Russia The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein Molecular epidemiology, evolution and phylogeny of SARS coronavirus An overview of SARS-CoV-2 and animal infection Gene prediction based on DNA spectral analysis: a literature review Coronavirus genomic RNA packaging Viewing SARS-CoV-2 nucleocapsid protein in terms of molecular flexibility The coronavirus nucleocapsid is a multifunctional protein One year update on the COVID-19 pandemic: Where are we now? Middle East Respiratory Syndrome Coronavirus (MERS-CoV) origin and animal reservoir Transmissible gastroenteritis coronavirus genome packaging signal is located at the 5' end of the genome and promotes viral RNA incorporation into virions in a replication-independent process Cooperation of an RNA packaging signal and a viral envelope protein in coronavirus RNA packaging Supramolecular architecture of the coronavirus particle Structure and oligomerization state of the C-terminal region of the Middle East respiratory syndrome coronavirus nucleoprotein Unpacking Pandora from its box: deciphering the molecular basis of the SARS-CoV-2 coronavirus Middle East respiratory syndrome coronavirus (MERS-CoV): animal to human interaction. Pathogens and Global Health A severe case with co-infection of SARS-CoV-2 and common respiratory pathogens CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers Vaccines for SARS-CoV-2: lessons from other coronavirus strains Structures of the SARS-CoV-2 nucleocapsid and their perspectives for drug design SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs SARS-CoV-2 and coronavirus disease 2019: what we know so far Designing therapeutic strategies to combat severe acute respiratory syndrome coronavirus-2 disease: COVID-19 Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates A contemporary view of coronavirus transcription Medical Virology: From Pathogenesis to Disease Control Unraveling the web of viroinformatics: computational tools and databases in virus research SARS-CoV-2 therapeutics: how far do we stand from a remedy? Bacteriophage MS2 genomic RNA encodes an assembly instruction manual for its capsid Nucleocapsid proteins from other swine enteric coronaviruses differentially modulate PEDV replication COVID-19 in human, animal, and environment: a review Structural characterization of human coronavirus NL63 N protein Surveillance of bat coronaviruses in Kenya identifies relatives of human coronaviruses NL63 and 229E and their recombination history Comparative computational analysis of SARS-CoV-2 nucleocapsid protein epitopes in taxonomically related coronaviruses n-Gram-based classification and unsupervised hierarchical clustering of genome sequences The mechanism of SARS-CoV-2 nucleocapsid protein recognition by the human 14-3-3 proteins COVID-19 update: The race to therapeutic development The coronavirus nucleocapsid protein is dynamically associated with the replication-transcription complexes Defective viral genomes are key drivers of the virus-host interaction Coronavirus biology and replication: implications for SARS-CoV-2 Jaccard index based similarity measure to compare transcription factor binding site models Coronavirus genomics and bioinformatics analysis. Viruses Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1 Subgenomic messenger RNA amplification in coronaviruses Virtual screening and dynamics of potential inhibitors targeting RNA binding domain of nucleocapsid phosphoprotein from SARS-CoV-2 The structure and functions of coronavirus genomic 3' and 5' ends Structural insight into the SARS-CoV-2 nucleocapsid protein C-terminal domain reveals a novel recognition mechanism for viral transcriptional regulatory sequences Architecture and self-assembly of the SARS-CoV-2 nucleocapsid protein Coronavirus leader RNA regulates and initiates subgenomic mRNA transcription both in trans and in cis A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein Mining of high throughput screening database reveals AP-1 and autophagy pathways as potential targets for COVID-19 therapeutics Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2 High-resolution structure and biophysical characterization of the nucleocapsid phosphoprotein dimerization domain from the Covid-19 severe acute respiratory syndrome coronavirus 2