key: cord-0992743-srz64562 authors: El-Shehawi, Ahmed M.; Alotaibi, Saqer S.; Elseehy, Mona M. title: Genomic Study of COVID-19 Corona Virus Excludes Its Origin from Recombination or Characterized Biological Sources and Suggests a Role for HERVS in Its Wide Range Symptoms date: 2021-01-15 journal: Cytol Genet DOI: 10.3103/s0095452720060031 sha: d4eddd41dbfd06d3d3b3803b07377289c08ba4f9 doc_id: 992743 cord_uid: srz64562 The COVID-19 corona virus has become a world pandemic which started in December 2019 in Wuhan, China with no confirmed biological source. Various countries reported the genomic sequence of different isolates obtained from infected patients. This allowed us to obtain a number of 38 isolates of full genomic sequences. Alignment of nucleotide (nt) sequence was carried out using Clustal Omega multiple alignment service at the EBI website. Alignment of nt sequence and phylogenetic relationship revealed that the COVID-19 is a new viral strain and its biological source has not been yet detected. The expected orf pattern was different among isolates obtained from the same country or different countries as well as from SARS-CoV isolates or bats CoV suggesting different virus human interaction possibilities during infection and severity. All isolates had the main five orfs (1ab, S, M, N, E), whereas they differed in the expected accessory orfs. Being with the biological source of COVID-19 undetected, the role of human endogenous retrovirus (HERVs) in the regulation of the host cell gene expression or the encoding for products that could modulate COVID-19 infection and the spectrum of its symptoms is discussed. Coronavirus belong to coronaviridae family, genus betacoronavirus, and subgenus sarbecovirus. Coronaviridae includes numerous birds and mammalian coronaviruses [1, 2] . Human to human coronaviruses was detected after its outbreak in Southern China in 2003 [3] [4] [5] . It was associated with severe acute respiratory symptoms (SARS), therefore it was named SARS-Coronavirus (SARS-CoV) [1, 6] . Its worldwide spread in 2003 outbreak caused above 8000 infections and more than 774 confirmed dead [1] . It was detected in the civets at the Himalayan palm [7] . Genome comparison confirmed that the civet viral isolate had 29 missing nucleotide of the open reading frame 10 (orf10) in most of characterized human isolates in the 2003 outbreak [7] . This led to the suggestion that the missing nucleotides caused the transmission of the virus from civets to human [1] . Another version of the virus was isolated from horseshoe bats [8] with 29 nucleotide insertion in orf8 (Bat-SARS-CoV) compared to most characterized human isolates. This genomic relationship suggested a common ancestor for civets, bats, and human SARS-CoV genomes [8] . After SARS outbreak in 2003, bats were considered the reservoir for future human CoV pandemics [9] . In 2012, the Middle East Respiratory coronavirus (MERS-CoV) was detected in Saudi Arabia [10, 11] . It is believed that it was transmitted from dromedary camels to human [12] but its origin was linked also to bats [13] . It caused 2521 infections and the death of 919 (35%) [14] . In 2019, a novel coronavirus (COVID-19) appeared in China (Wuhan City, Hubei Province). It is believed that COVID-19 originated from fresh seafood [15, 16] . This version of coronavirus was able to transmit from human to human [17, 18] . It has been spread in 193countries with above 10 Million confirmed infection and more than 500000 confirmed deaths [19] . Analysis of COVID-19 full genome showed that it is similar to betacoronavirus, yet it is different from the previous SARS-CoV or MERS-CoV [15] . The COVID-19 diverged with the Bat_SARS-CoV in a separate group of sarbecovirus [15] . Genome study of COVID-19 and the Bat SARS-CoV (isolate BatCoV RaTG13) revealed that the genetic similarity between COVID-19 and RaTG13 indicated that COVID-19 is not the exact variant that led to the outbreak in China. However, the COVID-19 could have originated form the bats. Also, this study confirmed that COVID-19 did not result of recombination and not a mosaic [14] . Bioinformatics analysis using nucleotide sequence of COVID-19 genome isolated from patients revealed that the COVID-19 has 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Using amino acid sequence of the expected orfs of COVID-19 showed that it was diverged with bat, civet, and human SARS-CoV. Yet, unlike other coronaviruses, its orf3b produce a shorter protein and its orf8 encode a secreted protein making the source of the cOVID-19 version is undetectable [20] . Interaction between the COVID-19 spike protein (S) receptor and its host receptor angiotensin-converting enzyme 2 (ACE2) was investigated based on similar information obtained from SARS-CoV. The amino acid (aa) sequence of COVID-19 S protein including the receptor-binding domain (RBD) which interact with ACE2 is similar to that of SARS-CoV. This supports that the COVID-19 use ACE2 as its receptor and it has more affinity to human ACE2 and other animals, explaining its capability of human cell infection and human-human transmission [21] . The question now is where the COVID-19 came from and how similar are the isolates from different patients and different countries? Also, the wide spectrum of symptoms of the virus starting from no symptoms to death is a second key question. These are fundamental questions need to be answered for better understanding of the virus origin, transmission, and severity. In this study, we investigated the similarity of nucleotide sequence of 38 COVID-19 isolates from 6 countries to evaluate differences among them. Similarity among COVID-19 at the nt sequence or the predicted orfs were investigated. The role of human endogenous retroviruses (HERVs) in the COVID-19 wide range of symptoms is also discussed. All nucleotide sequences of COVID-19 or SARS-CoV complete genome nt sequence of isolates were obtained from NCBI nucleotide database (https://www. ncbi.nlm.nih.gov/nuccore). Isolates included 17 from China, 10 from USA, 5 from Japan, 2 from Hong Kong, 2 from Taiwan, 1 from South Korea, 1 from Australia (Table 1) . The sequence of the first reported COVID-19 isolate from China (HZ-1, MT039873.1) was used in a BLAST search to determine the identity of its sequence with other sequences reported from China or other countries in the nucleotide database. The nt sequence of isolates were aligned using Clustal Omega (ClustalO) multiple alignment service (https://www.ebi.ac.uk/Tools/msa/ clustalo/). Phylogenetic tree of isolate sequence was constructed using the same ClustalO. Nucleotide SNPs were detected manually in the aligned sequences. The expected orfs of each COVID-19 isolate were obtained from the NCBI graphics view of the nucleotide accession at the NCBI nucleotide database website (https://www.ncbi.nlm.nih.gov/nuccore). and Other Corona Viruses The first Chinese reported sequence (MT039873.1) of COVID-19 was used in a BLAST search. This search revealed high identity to the other 38 COVID-19 isolates (Table 1) . These included 16 other reported sequences from China, 11 form USA, 5 from Japan, 2 Hong Kong, 2 from Taiwan, and 1 from Australia. High identity of these isolates was observed to the Chinese isolate ranging from 100 to 99.91% (Table 1) with query coverage range from 99-100%. Interestingly, the Chinese first reported case showed 96.11% identity and 99% coverage with the Chinese BatCoV-RaTG13 (MN996532.1) isolate; closest identity in this study. More important, its identity to the closest isolate of SARS-CoV (AY395003.1) was 82.34% identity and 88% query coverage (Table 1) . Phylogenetic relationship among the 38 COVID-19 isolates reported from different countries showed random clustering without any noticeable phylogenetic relationship on various clades of the phylogenetic tree of isolates from China or any other country (Fig. 1) . Clade A has 1 Chinese isolate. Clade B has 2 Chinese isolates. Clade C has 14 isolates, 1 from Australia, 3 USA, 6 from China, 1 from Taiwan, 2 from Japan, 1 from Korea. Clade D has 3 isolates, 2 from China, 1 from USA. Clade E has 18 isolates, 7 from USA, 6 from China, 2 from Hong Kong, 3 from Japan (Fig. 1) . This random distribution of isolates from the same country, specifically Chinese isolates, indicated they belong to the same strain. of COVID-19 Isolates Using blast search, COVID-19 first reported Chinese isolate had 3.89% difference from the closest SARS-CoV and 17.66% difference from the closest bat coronavirus isolate (Table 1) , Similarly, alignment of COVID-19 and SARS-CoV isolates as one group resulted in tremendous differences in the nt sequence spread overall the genome, therefore we investigated the nucleotide SNPs among COVID-19 and SARS-CoV isolates. The 38 COVID-19 isolates and the 3 SARS-CoV isolates were compared as separate groups. Among the 38 COVID-19 isolates, 108 nucleotide changes (103 SNPs and 5 deletions) were detected ( Table 2 ). Seven Chinese isolates did not have any SNPs, whereas other isolates had different number of SNPs ranging from 1-9 ( Table 2 ). The Korean isolate SNU01 came on the top with 9 SNPs, followed by USA isolate USA-IL1, USA isolate USA-IL1, and the Chinese isolate IPBCAMS-WH-02 with 8, 7, 6 SNPs consecutively ( Table 2 ). All Japanese isolates had SNPs ranged from 3-5. Nucleotide SNPs were distributed among transition (66) and transversion (37) . The number of detected SNPS indicated that the base substitution rate (SNPs) rate for all studied COVID-19 isolate was 103/1135284 = 9.07 × 10 -5 . Similar alignment among three SARS-CoV isolates (DQ182595.1; MT019532.1-IPBCAMS-WH-04 > 0 China, AY323977.2, Italy; AY310120.1, Germany) revealed that the Chinese isolate (DQ182595.1) nucleotide sequence had 99.97 and 99.95% identity with the Italian (AY323977.2) and German (AY310120.1) isolates consecutively. Nucleotide sequence alignment resulted in 12 SNPs and 1 deletion among the three SARS-CoV isolates ( Table 2) indicating base substitution rate of 12/89197 = 12.22 × 10 -5 among SARS-CoV isolates. This seems to be higher that the SNPs rate in COVID-19 isolates because of low number of isolates used. Five main orfs are usually produced by all corona virus isolates including orflab polyprotein, orfS, orfN, orfM, and orfE. Another seven orfs have been reported by various isolates including orf1a polyprotein, orf3a, orf6, orf7a, orf7b, orf8, and orf10 (Table 3) . Usually, polyprotein 1ab and orf1a are processed into smaller accessory orfs ( Table 4 ). The accessory orfs are not produced in all corona virus isolates. We investigated the expected orfs of different isolates from the same country or from different countries to check if different corona virus isolate differ in their expected orf pattern, although they have similar genome size and high identity in their genome nucleotide sequence (Tables 1, 2) . Interestingly, orf pattern produced by isolates form the same country or from different countries differed greatly (Table 5 , Fig. 2 ). All COVID-19, SARS-CoV, and the BatCoV-RaTG13 isolates have the five main orfs (1ab, S, E, M, N). Also, all of these isolates have orf3a except the Chinese isolate WHU01 (MN988668.1). This isolate is expected to produce only the five main orfs being the minimum orfs detected in this study. Only two Chinese isolates (Wuhan-Hu-1 and Yunnan-01) of COVID-19 38 isolates had the orf1a which is expected in three SARS-CoV isolates and the BatCoV-RaTG13 isolate ( (Fig. 2) . On the other hand, another Chinese isolate (Yunnan-01, MT049951.1) is expected to produce orf1a and orf7b beside the 10 orfs expected in isolate HZ-1 (Fig. 2 ). In addition, the Chinese isolate WIV02 (MN996527.1) expected orfs is similar to expected orf pattern of isolate Yunnan-01 except the (Fig. 2 ). The high identity (99.91 to 100%) in nucleotide sequence among COVID-19 isolates from various countries or the same country (Table 1) and their random clustering on the phylogenetic tree ( Fig. 1) indicated that the reported COVID-19 isolates from different countries are highly similar and they belong to one COVID-19 strain. Also, the difference between COVID19 and SARS-CoV (11.66%) or COVID-19 and bat corona virus isolate BatCoV-RaTG13 (3.89%) strains distance COVID-19 as a novel viral strain that has not been identified before with different genome context. In addition, the low differences in nt sequence indicated by the nt SNPs among COVID-19 isolates and their distinction from SARS-CoV or bat corona virus support the same idea. Interestingly, collective base substitution rate for the studied isolates was 9.07 × 10 -5 . Base substitution rate of RNA viruses is the number of changed bases per cellular infection (generation). This is very difficult to determine because it is not known how many generations (infections) these isolates have gone before they had been sequenced, therefore this number is overestimation of SNPs rate in the studied strains because they should have gone through huge number of infections from being isolated from patients with symptoms. RNA viruses have mutation rate from 1 × 10 -6 to 1 × 10 -4 [22] [23] [24] . Our overestimated mutation rate of COVID-19 is still in the range of RNA viruses' mutation rate indicating that COVID-19 is a new viral strain. COVID-19 isolates showed differences in the expected orf pattern from their highly similar genome suggesting a high level of expected complexity of the COVID-19 genome and its host cells. This is in agreeing with other previous reports. Production of extra orfs beside the main orfs by different retroviruses has been reported previously. Human endogenous retrovirus K (HERV-K) produces two variant proteins (np9, rec) of its full sequence or the 292 bp deficient gene respectively [25] . Our results are in agree with results reported from other several studies which indicated that COVID-19 is a novel corona virus and did not originate from other previous existing strains [15] . Similarly, it was reported that COVID-19 is not a mosaic virus nor did it originated from recombination events [14] . In the same line, a third study revealed that COVID-19 had 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Its orf3b pro-duce a shorter protein and its orf8 encode for a secreted protein leaving the source of the COVID-19 undetectable [20] . Therefore, the most probable scenario is that this strain was transmitted from unknown organism and developed the ability to infect and transmit from human to human [16] . Based on this scenario, future studies are needed to screen wide range of animals that come in contact with human to search for the possible source of this viral strain; COVID-19. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public by a leakage from unknown biological facilities can not be rolled out at this time. This possibility is supported by the detection of unique isolate reported in 2004. The sequence of a new SARS-CoV strain was reported in 2004 and filled by Centre National de la Recherche Scientifique CNRS, Institut Pasteur, Universite Paris Diderot as patent to the European Patent Office (Patent no. EP1694829B1). This strain was isolated from a patient from Hanoi, Vietnam. The sequence of this strain was not deposited in the nucleotide database or anywhere else except in the patent itself. When we blasted the nt sequence of this strain against the nucleotide database it turned out the SARS-CoV Urbani isolate icSARS-MA (Acc no. MK062180.1) as the closest sequence with only 89.65% identity indicating its difference from reported SARS-CoV isolates at that time and consequently from any other reported corona virus or COVID-19 isolates. Interaction with Human Biology It is well known that COVID-19 has a wide range of symptoms in human ranging from no symptoms to death. The valid question here is that what makes people different in their response to COVID-19 infection? Based on the distinction of COVID-19 genome from SARS-CoV and Bat CoV, COVID-19 unique characteristics, similarity among COVID-19 isolates at the nt, some possible scenarios could be suggested for the discrepancies among humans in response to infection. In addition to age and health of the host person, some genomic scenarios are summarized in the following sections based on the current studies of human endogenous retroviruses (HERVs). HERVs are DNA sequences originated from recurrent integrations of the previous exogenous retrovirus [26, 27] . HERVs are one type of highly conserved transposable elements (TE). TE and HERVS make up 40 and 8% of our genome consecutively [28] . HERVs were first detected in the human genome in the 1970s [29] . HERVs are classified into three main groups; I (gamaretrovirus and epsilonretrovirus-like), II (betaretrovirus-like), III (spumaretrovirus-like) based on their phylogenetic relationship [30, 31] . Their integration allowed the vertical transmission of retroviral genomes CYTOLOGY AND GENETICS Vol. 54 No. 6 2020 EL-SHEHAWI et al. 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29833 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29903 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29881 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29825 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29855 1 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K 22K 24K 26K 29706 along with the human genome across generation [32] . HERVs are inserted in the genome through the reverse transcription of viral RNA producing a double stranded DNA (provirus) using the viral reverse transcriptase [33] and then the integration of the provirus in the host genome by the viral integrase and other host proteins [34] . Integrated copies can be activated and become active infection. After integration, the proviral DNA produce mRNA that encodes for various viral proteins or reverse transcribed by viral reverse transcriptase into proviral DNA that has the capability of new integration cycle. HERVs have similar structure to exogenous retroviruses that is comprised of two long terminal repeats (LTRs) with internal gag (matrix protein), pro-pol (protease, reverse transcriptase, and integrase), env (envelope) viral genes [32] . Beside these main retroviral proteins, some retroviruses produce extra proteins. Accordingly, the env gene of the HERV-K encodes two different protein variants (np9, rec) using its full sequence or the 292 bp deficient variant respectively [25] . HERVs have several different impacts on their host cells. Production of RNA and proteins from HERV sequence could have a role in the regulation of human genes and modulate immunity of the host [35, 36] . Although most of TEs have been silenced by accumulation of mutations or hypermethyaltion, some of them have been domesticated and still active in human biology [37] . For example, syncytins is a group of env proteins produced by different HERVs in mammals [38] . In human genome, two env genes HERV-W and HERV-FRD are involved in the production of env proteins syncytin-1 and -2, respectively [39] . They are involved in placental syncytiotrophoblast development, homeostasis [39, 40] , and maternal immune tolerance to the growing fetus [41] respectively. At DNA level, huge number of HERV are integrated in the human genome and function as binding sites for transcription factors, alternative promoter, or splicing signals for cellular genes [37, [42] [43] [44] [45] [46] which indicates their role in regulation of transcription and human genome development. This could lead to upregulation, downregulation, suppression, or tissuespecific splicing of cellular genes [42, 45, 47] . Also, they represent a plethora of cis-acting regulatory elements that function as binding sites for the host transacting elements. The interplay between both types of elements makes up the gene regulation network in a cell [48, 49] . In the same line, the solitary LTRs, reminiscent of complete HERVs, can also regulate the host gene expression. Recurrent insertions of HERVs cause insertional mutations in the target genes and allelic homologous recombination [32] . For example, recombination between homologous HERV-I on chromosome Y cause microdeletion in the azoosperma factor and consequently male infertility [50] . In addition, HERVs can produce non-coding RNAs (ncRNAs) including microRNA and long ncRNA which furnish recognition motifs for RNA binding proteins or modulate the function of transcription factors [32] . Accordingly, HERV ncRNAs that has sequence similarity to human miRNA work as RNA sponges to bind other miRNA which are involved in the post-transcriptional regulation of gene expression [51] . This was the case in the regulation of embryonic stem cells in which an interaction of ncRNA (HPAT5) produced by HERVH to the let-7 miRNAs sequence [52] . Furthermore, in case of a HERV produces a protein which could function as regulator of the host gene expression during the virus life cycle and provide cellular functions during the cycle [36] . Interesting example is the HERV Gag and Rec proteins which are involved in the stability and translation the host cell mRNA [36] . For example, HML2 Rec was able to bind to 1 600 nt mRNAs of host embryonic cells and regulate their translation by ribosome in an early development process [53] . In the same line, Arc Gaglike protein produced by the Ty3/gypsy retrotransposon was suggested to coordinate brain neural cell communication indicating its role in the nervous system development [54, 55] . Specifically, Arc has been proposed to form capsids to carry mRNA between neuron cells via extracellular vesicles to be translated in the target neuron cell [56] . A group of HERVs spread in the human genome can form a coordinated regulatory network to regulate the expression of many host genes involved in the same pathway simultaneously [35, 47, 57] . For example, more than 30% of the human genome binding sites for the protein p53 were distributed in the genome by the HERV sequences and become the target network of p53 protein [58] leading to human genome plasticity and cellular networking. An interesting example for this plasticity is the MHC (major histocompatibility complex) locus which has been shown to have heavy integration of HERVs leading to its tremendous plasticity and hyper genetic variability [59] . Accordingly, the HERVK (HERVKC4) was integrated in the 9th intron of human complement C4A gene leading to its hyper variation [60, 61] . One vital example is the role of HERVs in the interferon (IFN) antiviral pathway in the innate immunity in the induction of adaptive immune response [62] . HERV integrations were involved in the development of INF network of INF inducible transcription enhancers in various mammalian genomes [35] . It was shown that deletion of HERV sequence near IFN gene suppressed the linked pathway [35] . Also, sequences of the HERV LTRs function as promoter or enhancer sites in response to IFN based activation [63] . The HERVK LTRs that have two IFN-stimulated response elements (ISREs) were induced by the IFN cascade in response to inflammation [64] self molecules and can be tolerated by human immune system or induce human immunity giving rise to autoimmune diseases. The innate immune pathways induced by HERVs' products are the ones that function in the exogenous antiviral infection [65] . In humans, Toll Like Receptors (TLRs) and cytosolic pattern recognition receptors (cytPRRs) can recognize HERV products and lead to induction of immune response. This was reported in the case of autoimmune diseases and cancer [66, 67] . Recognition of viral molecules by innate immune receptors induces inflammatory molecules including IFN, cytokines, and chemokines invoking the antiviral response. This group of molecules activates the adaptive immune response through the activation of T and B cells. Both immune responses are required to fight exogenous viral infection and finally stop this activated response after infection. In case of HERV products, their continuous presence in the host cells provokes chronic stimulation of the host immune response resembling the chronic stimulation of immune response in autoimmune and inflammatory diseases caused by exogenous retroviral molecules [67] [68] [69] [70] . The induced antiviral response activated by HERV products cause vicious circle in which the produced inflammatory molecules and epigenetic dysregulation further upregulated HERV expression [65, 71, 72] . Also, peptides produced from HERVs were implicated in the suppression of immune response. This includes the env proteins that has immunosuppressive conserved domain (ISD) in retroviral env proteins. For example, ISD from HERVs function in the maternal immune tolerance during pregnancy [38, 41] . It is well documented that HERVs can contribute negatively or positively during exogenous viral infection [67] . Infection by some viruses including HIV, herpesviruses and influenza changed HERV expression [73] [74] [75] . In this regard exogenous infection could cooperatively upregulate the HERV expression and increase the immune response [67] . Also, HERV products could play a protective role against exogenous viral infection [36] . For example, production of HERV antisense RNA develops protection against exogenous infection by viruses with complementary RNA [65, 76] . Some studies reported that products of HERV function as pathogen-associated molecular patterns (PAMPs) which is able to induce receptors for host defense system [49, 65] . In addition, some of their products mimic antigens for stimulating specific B and T cells [77, 78] . This explains the role of HERVs in autoimmune and inflammatory diseases. On the other hand, they had a role in suppressing the immunity of host cells as they have been involved in maternal immune suppression and protection of excessive imune activation [79, 80]. HERVs could modulate the infection and symptoms in the case of exogenous COVID-19 infection in different possible ways. First, HERVs or their products could compromise the immune system and facilitate the infection and penetrance of the virus to human cells. Also, individuals with high levels of the ACE2 receptor could be an easy target for the virus, especially those with high blood pressure and various types of stress. Second, different isolates of the virus can use the host cell to produce different protein sets (orf pattern) that can use the host cells and compromise the host immune system with different efficiencies. This will result in spectrum of disease severity and possibly death. In this study, different isolates from the same country (China) or from different countries are expected to produce various orf patterns. Some of the produced orfs which is the enzyme responsible for methylation of the 2' carbon of the ribose sugar of viral RNA. This modification of viral RNA makes it undetectable by the host immune system and effectively infects human cells [81] . Third, HERVs could produce protein products that complement the viral set of orfs in its entry, infection, replication, packaging, and integration in the human genome. In addition, partial proviral genomes of previous integration can produce some enzymes required for the replication of viral isolates that do not have the infection ability. For example, one animal isolate which does not have the capability to infect human could transfer to human and find in this individual's genome some proviral genes that complement the animal strain to be infectious and able to cause the symptoms. Fourth, Corona virus genome can only produce its effective proteins for viral reproduction with -1ribosomal slippage at the translation start site. HERVs may produce proteins or miRNA that modulates the translation start for the ribosome changing the pattern of COVID-19 orfs in different human hosts. This leads to different course of symptoms and severity of the COVID-19 infection. Long term studies are urgent to be conducted on the COVID-19 and other retroviruses that attach human to validate all of these possibilities for future safety and better management of future pandemics like COVID-19. Also, intensive studies are needed to survey human populations (expecially elders and immune compromised) for their HERV loads and link this to their predisposition for other autoimmune diseases, cancer, and their risk for exogenous viral infection. Our results conclude that COVID-19 did not originate from a known biological source or other previously characterized strains. COVID-19 isolates used in this study showed high similarity at the nt sequence, yet they differed greatly in the expected orf pattern from their similar genomes. The most probable scenario is that this strain was transmitted from unknown organism and has/or has developed the ability to infect human cells as well as to transmit from human to human. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public from unknown biological facilities can not be rolled out at this time. This article does not contain any studies involving animals or human participants performed by any of the authors. Authors have contributed equally to this manuscript. History and recent advances in coronavirus discovery Coronaviruses: an overview of their replication and pathogenesis Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Coronavirus as a possible cause of severe acute respiratory syndrome Coronavirus infections more than just the common cold Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Origin and evolution of pathogenic coronaviruses Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Middle East respiratory syndrome coronavirus (MERS-CoV): a perpetual challenge Middle East respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia Close relative of human Middle East respiratory syndrome coronavirus in bat Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event A novel coronavirus from patients with pneumonia in China Another decade, another coronavirus The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-the latest 2019 novel coronavirus outbreak in Wuhan, China Novel Coronavirus (2019-nCoV) Situation Report-162 Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Receptor recognition y novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS Rates of evolutionary change in viruses: patterns and determinants Viral mutation rates Ecological and evolutionary processes shaping viral genetic diversity A novel gene from the human endogenous retrovirus K expressed in transformed cells The evolutionary dynamics of human endogenous retroviral families Classification and characterization of human endogenous retroviruses Initial sequencing and analysis of the human genome Induction of retrovirus particles in human testicular tumor (Tera-1) cell cultures: an electron microscopic study Use of endogenous retroviral sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy Classification and nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations Human endogenous retroviruses are ancient acquired elements still shaping innate immune responses HIV-1 reverse transcriptase still remains a new drug target: structure, function, classical inhibitors, and new inhibitors with innovative mechanisms of actions Past and future. Current drugs targeting HIV-1 integrase and reverse transcriptase-associated ribonuclease H activity: single and dual active site inhibitors Regulatory evolution of innate immunity through co-option of endogenous retroviruses Co-option of endogenous viral sequences for host cell function Endogenous viruses: insights into viral evolution and impact on host biology Paleovirology of "syncytins," retroviral env genes exapted for a role in placentation Syncytin is a captive retroviral envelope protein involved Expression of HERV-WEnv glycoprotein (syncytin) in the extravillous trophoblast of first trimester human placenta Placental syncytins: genetic disjunction between the fusogenic and immunosuppressive activity of retroviral envelope proteins Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions Evolution of the mammalian transcription factor binding repertoire via transposable elements Widespread contribution of transposable elements to the innovation of gene regulatory networks Transposable elements are the primary source of novelty in primate gene regulation Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses The contribution of transposable elements ot the evolution of regulatory networks CTRL+INSERT: retrotransposons and their contribution to regulation and innovation of the transcriptome The double-edged sword of (re)expression of genes by hypomethylating agents: from viral mimicry to exploitation as priming agents for targeted immune checkpoint modulation Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events Endogenous miRNA sponge lincRNA-RoR regulates Oct4, Nanog, and Sox2 in human embryonic stem cell self-renewal The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells Arc/Arg3.1 is a postsynaptic mediator of activity-dependent synapse elimination in the developing cerebellum Structural basis of arc binding to synaptic proteins: implications for cognitive disease The neuronal gene arc encodes a repurposed retrotransposon gag protein that mediates intercellular RNA transfer Regulatory activities of transposable elements: from conflicts to benefits Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53 Retroelements in the human MHC class II region Identification of a novel HERV-K(HML10): comprehensive characterization and comparative analysis in non-human primates provide insights about HML10 proviruses structure and diffusion Detection of retroviral antisense transcripts and promoter activity of the HERV-K(C4) insertion in the MHC III region Dynamic evolution of immune system regulators: the history of the interferon regulatory factor family Association of endogenous retroviruses and long terminal repeats with human disorders NF-kB and IRF1 induce endogenous retrovirus K expression via interferon-stimulated response elements in its 5' long terminal repeat Activation of the innate immune response by endogenous retroviruses HERV envelope proteins: physiological role and pathogenic potential in cancer and autoimmunity Type W human endogenous retrovirus (HERVW) integrations and their mobilization by L1 machinery: contribution to the human transcriptome and impact on the host physiopathology The role of molecular mimicry and other factors in the association of human endogenous retroviruses and autoimmunity Viruses as potential pathogenic agents in systemic lupus erythematosus Identification of a HERV-K env surface peptide highly recognized in Rheumatoid Arthritis (RA) patients: a crosssectional case-control study Endogenous retrovirus-K promoter: a landing strip for inflammatory transcription factors? Epigenetic control of human endogenous retrovirus expression: focus on regulation of long-terminal repeats (LTRs Transactivation of elements in the human endogenous retrovirus W family by viral infection Transcriptional derepression of the ERVWE1 locus following influenza A virus infection Microarray analysis reveals global modulation of endogenous retroelement transcription by microbes Innate immune detection of microbial nucleic acids DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts Potential molecular mimicry between the human endogenous retrovirus W family envelope proteins and myelin proteins in multiple sclerosis From ancestral infectious retroviruses to bona fide cellular genes: role of the captured syncytins in placentation Human endogenous retrovirus envelope proteins target dendritic cells to suppress T-cell activation Thiel V Ribose 2'-O-methylation provides a molecular signature for the distinction of self and non-self mRNA dependent on the RNA sensor Mda5