key: cord-0826559-bgh729s5 authors: Cárdenas‐Conejo, Yair; Liñan‐Rico, Andrómeda; García‐Rodríguez, Daniel Alejandro; Centeno‐Leija, Sara; Serrano‐Posada, Hugo title: An exclusive 42 amino acid signature in pp1ab protein provides insights into the evolutive history of the 2019 novel human‐pathogenic coronavirus (SARS‐CoV‐2) date: 2020-03-20 journal: J Med Virol DOI: 10.1002/jmv.25758 sha: dece103d46a48ee00e1dc57e4dd01de0742066b0 doc_id: 826559 cord_uid: bgh729s5 The city of Wuhan, Hubei province, China, was the origin of a severe pneumonia outbreak in December 2019, attributed to a novel coronavirus (severe acute respiratory syndrome coronavirus 2 [SARS‐CoV‐2]), causing a total of 2761 deaths and 81109 cases (25 February 2020). SARS‐CoV‐2 belongs to genus Betacoronavirus, subgenus Sarbecovirus. The polyprotein 1ab (pp1ab) remains unstudied thoroughly since it is similar to other sarbecoviruses. In this short communication, we performed phylogenetic‐structural sequence analysis of pp1ab protein of SARS‐CoV‐2. The analysis showed that the viral pp1ab has not changed in most isolates throughout the outbreak time, but interestingly a deletion of 8 aa in the virulence factor nonstructural protein 1 was found in a virus isolated from a Japanese patient that did not display critical symptoms. While comparing pp1ab protein with other betacoronaviruses, we found a 42 amino acid signature that is only present in SARS‐CoV‐2 (AS‐SCoV2). Members from clade 2 of sarbecoviruses have traces of this signature. The AS‐SCoV2 located in the acidic‐domain of papain‐like protein of SARS‐CoV‐2 and bat‐SL‐CoV‐RatG13 guided us to suggest that the novel 2019 coronavirus probably emerged by genetic drift from bat‐SL‐CoV‐RaTG13. The implication of this amino acid signature in papain‐like protein structure arrangement and function is something worth to be explored. (25 February 2020). SARS-CoV-2 belongs to genus Betacoronavirus, subgenus Sarbecovirus. The polyprotein 1ab (pp1ab) remains unstudied thoroughly since it is similar to other sarbecoviruses. In this short communication, we performed phylogeneticstructural sequence analysis of pp1ab protein of SARS-CoV-2. The analysis showed that the viral pp1ab has not changed in most isolates throughout the outbreak time, but interestingly a deletion of 8 aa in the virulence factor nonstructural protein 1 was found in a virus isolated from a Japanese patient that did not display critical symptoms. While comparing pp1ab protein with other betacoronaviruses, we found a 42 amino acid signature that is only present in SARS-CoV-2 (AS-SCoV2). Members from clade 2 of sarbecoviruses have traces of this signature. The AS-SCoV2 located in the acidic-domain of papain-like protein of SARS-CoV-2 and bat-SL-CoV-RatG13 guided us to suggest that the novel 2019 coronavirus probably emerged by genetic drift from bat-SL-CoV-RaTG13. The implication of this amino acid signature in papain-like protein structure arrangement and function is something worth to be explored. The polyprotein 1ab (pp1ab) is the largest protein of coronaviruses that through proteolytic cleavage is divided into 16 mature nonstructural proteins (nsps). The nsps are involved in replication and transcription of the viral genome and are responsible for the cleavage of the polyprotein, thus making them attractive antiviral drug targets. 7 Due to the lack of remarkable differences between pp1ab of SARS-CoV-2 with those from other sarbecoviruses, 3,5 pp1ab of SARS-CoV-2 has not been thoroughly analyzed. Despite the high similarity between pp1ab proteins, it could be possible to identify distinguishable regions representing molecular signatures for the specific detection of virus strains or to track its evolutive history. In this short communication, we expound a comparative sequence analysis of pp1ab protein of SARS-CoV-2. The analysis was performed using the phylogenetic-structural sequence analysis; sequence comparisons were made in a phylogenetic order. 8 Thus, pp1ab from SARS-CoV-2 isolates are compared first, then the polyprotein of SARS-CoV-2 is contrasted against those from clade 2 of sarbecoviruses. Finally, the protein is set against those from clade 1 and 3. Protein alignments were performed using the alignment tool According to the phylogenetic-structural sequence analysis, first, we compared pp1ab proteins with 144 isolates of SARS-CoV-2 from patients around the world (Table S1 ). The analysis displayed that most pp1ab proteins have not changed; only six amino acid changes were detected (Table S2) . We consider an amino acid change if two or more sequences have the same mutation. One of these mutations (L3606F), placed in the position 37 of nsp6 protein (L37F), is shared by ten sequences from viruses isolated in China, USA, France, Hong Kong, Italy, and Singapore (Table S2) . Coronavirus nsp6 is a transmembrane protein that is associated with nsp3 and nsp4 proteins to form the organelle-like replicative structures (double-membrane vesicles). 9 Prediction of transmembrane helices (TMHs) segments in nsp6 protein showed that L37F does not alter the secondary structure of the adjacent transmembrane domains ( Figure S1 ). In fact, the mutation L37F is predicted to be outside of the membrane as part of an unstructured coil segment (32SLFFFL/FYEN) that connects the first (12-31 residues) and the second (41-60 residues) TMHs ( Figure S1 ). Strikingly, the position 37 of nsp6 protein is a Val residue that is conserved in all analyzed sarbecoviruses (Data S1), except in SARS-CoV-2 (Leu). So the mutation of the aliphatic Leu residue for the aromatic Phe residue in this conserved position probably has functional implications; although Leu and Phe are both hydrophobic residues, the Phe residue could also perform cation-π interactions that could affect the protein-protein interactions in the L37F mutant. The structural impact of this mutation can not be determined since experimental data in the Protein Data Bank are not available for homology modeling of nsp6 using a single or multiple templates (eg, SWISS-MODEL server, Phyre2, etc.). Interestingly, a virus isolated from a Japanese male (GISAID: EPI_ISL_407084), with no critical pneumonia (patient status described in GISAID database), has eight deleted amino acids at position 32 to 39 aa of pp1ab (nsp1) ( Table S2 ). Since nsp1 is a virulence factor that inhibits host gene expression, 10 (Figure 1) . We found the same results when we compared the pp1ab of members of the genus Betacoronavirus (Table S3 ; Data S1). Figure 1 ). These findings suggest that the pp1ab of SARS-CoV-2 is more closely related to pp1ab of Bat-SL-CoV-RaTG13 than to pp1ab of coronaviruses isolated from Chinese pangolins. Since the region that encodes the pp1ab protein represents about 71% of SARS-CoV-2 genome, we suggest that it is less likely that the novel human coronavirus has been arising directly from the viruses isolated from pangolins. First reports focused on the genetic characterization of SARS-CoV-2 suggested that this virus has a recombinant origin. 2 Our results indicate that most probably, a recombination event did not happen in the first half of the viral genome (ORF1ab). Under this idea, an alternative explanation for SARS-CoV-2 origin is that bat-SL-CoV-RaTG13, collected 6 years ago, is the progenitor of SARS-CoV-2, which has evolved since the collection date by genetic drift before infecting humans. Three observations support the hypothesis: 1. The high pairwise identity of pp1ab from SARS-CoV-2 and bat-SL-CoV-RaTG13 is preserved throughout its length ( Figure S2 ). 2. The exclusive AS-SCoV2 of the novel F I G U R E 1 Alignment of pp1ab proteins from sarbecoviruses. Pp1ab proteins of sarbecoviruses (green: clade 1; blue clade 2: blue; clade 3: red) were aligned using the multiple sequence alignment program MAFFT v7 and manually edited for maximizing coincidences. The figure shows the AC domain of pp1ab. Conserved residues are yellow highlighted. AS-SCoV2 conserved residues are blue highlighted. N-terminal region of the papain-like protein is represented above the alignment. AS-SCoV2, SARS-CoV-2; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2 coronavirus is conserved in bat-SL-CoV-RaTG13 (Figure 1 ). 3. The high pairwise identity (96.3%) shared by SARS-CoV-2 and bat-SL-CoV-RaTG13 is preserved in its whole genome, only a slight dissimilarity region is displayed in the ORF of spike protein. 6 Although the origin of SARS-CoV-2 does not appear to be caused by recent genetic recombination, at least acquisition of genetic material must have given place to AS-SCoV2. Since members of subgenus Star indicates the probable acquisition of ancestral AS-SCoV2. The evolutionary distances were computed using the number of differences method. The scale bar SARS-CoV-2, severe acute respiratory syndrome coronavirus 2 A new coronavirus associated with human respiratory disease in China Homologous recombination within the spike glycoprotein of the newly identified coronavirus may boost cross-species transmission from snake to human Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan A novel coronavirus from patients with pneumonia in China Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A pneumonia outbreak associated with a new coronavirus of probable bat origin The SARS-coronavirus papain-like protease: structure, function, and inhibition by designed antiviral compounds Evolution of light-regulated plant promoters Biogenesis and dynamics of the coronavirus replicative structures Coronavirus nonstructural protein 1 Is a major pathogenicity factor: implications for the rational design of coronavirus vaccines Nuclear magnetic resonance structure of the N-terminal domain of nonstructural protein 3 from the severe acute respiratory syndrome coronavirus The authors declare that there are no conflict of interests. Yair Cárdenas-Conejo http://orcid.org/0000-0002-0190-244X