key: cord-0328634-i0l962ck authors: Pierros, Vasileios; Kontopodis, Evangelos; Stravopodis, Dimitrios J.; Tsangaris, George Th. title: Unique Peptide Signatures Of SARS-CoV-2 Against Human Proteome Reveal Variants’ Immune Escape And Infectiveness date: 2021-10-04 journal: bioRxiv DOI: 10.1101/2021.10.03.462911 sha: 0aaf9ad1093d843a6aca564c3ba4deb99c84ee8b doc_id: 328634 cord_uid: i0l962ck SARS-CoV-2 pandemic has emerged the necessity of the identification of sequences sites in viral proteome appropriate as antigenic sites and treatment targets. In the present study, we apply a novel approach for deciphering the virus-host organism interaction, by analyzing the Unique Peptides of the virus with a minimum amino acid sequence length defined as Core Unique Peptides (CrUPs) not of the virus per se, but against the entire proteome of the host organism. The result of this approach is the identification of the CrUPs of the virus itself, which do not appear in the host organism proteome. Thereby, we analyzed the SARS-CoV-2 proteome for identification of CrUPs against the Human Proteome and they are defined as C/H-CrUPs. We found that SARS-CoV-2 include 7.503 C/H-CrUPs, with the SPIKE_SARS2 being the protein with the highest density of C/H-CrUPs. Extensive analysis indicated that the P681R mutation produces new C/H-CrUPs around the R685 cleavage site, while the L452R mutation induces the loss of antigenicity of the NF9 peptide and the strong(er) binding of the virus to ACE2 receptor protein. The simultaneous existence of these mutations in variants like Delta results in the immune escape of the virus, its massive entrance into the host cell, a notable increase in virus formation, and its massive release and thus elevated infectivity. Covid-19 pandemic has emerged the urgent necessity of the identification of sequence sites of the SARS-CoV-2 viral proteome that serve as appropriate treatment targets and antigenic sites suitable for production of therapeutic vaccines. In our previous studies, we have defined as Unique Peptides (UPs) the peptides that their amino acid sequence appears only in one protein across a given proteome. We have also introduced the term of Core Unique Peptides (CrUPs), which are the peptides with a minimum amino acid sequence length that appear only in one protein across a given proteome, being thus a unique signature for the particular protein identification (Alexandridou et al., 2009; Kontopodis et al., 2019) . Thereby, to map the UP landscape of a proteome under examination, we have herein developed a novel and advanced bioinformatics tool including big data analysis (Supp. Materials and Methods). Its employment to analysis of the 20.430 reviewed Homo sapiens proteins resulted in the identification of 7.263.888 CrUPs, which are parts of the Human Uniquome that is defined as the total set of unique peptides belonging to the Human proteome (Kontopodis et al., 2019) . Recently, in order to elucidate the virus-host organism interaction, we designed an advanced bioinformatics approach to analyze the CrUPs of the virus against the host organism proteome. These peptides are different from the virus CrUPs per se, which are defined as the minimum amino acid sequence length peptides appeared only in one protein across the virus proteome. The virus CrUPs against the host organism proteome have two distinct properties: (a) they are unique in the virus proteome and (b) they do not exist in the host organism proteome. Based on these properties, the virus CrUPs against the host organism proteome illuminate our knowledge about the virus-host interaction, the infectiveness and the pathogenicity of the virus, and, most importantly, they can be used as antigenic and diagnostic peptides, and possible treatment targets. Furthermore, these unique peptides constitute a completely new entity of peptides able to advance our knowledge about the construction of viral and Human Uniquomes (Kontopodis et al., 2019) . Since human cells can host the SARS-CoV-2 virus, we have herein engaged our novel bioinformatics platform not only for the profiling of CrUPs in SARS-CoV-2 proteome per se, but, most importantly, for their identification against the human proteome (C/H-CrUPs). Remarkably, C/H-CrUPs can likely serve as targets for the immune response upon infection, and antigenic sites with major pharmaceutical and diagnostic potential for the successful clinical management of the Covid-19 pandemic. The SARS-CoV-2 proteome is structurally quite simple. In the UNIPROT database (version 7/2021), 16 reviewed and 75.714 unreviewed proteins have been included (Jungreis et al., 2021) . For the present study, only the 16 reviewed proteins are examined, since the unreviewed proteome components contain (among others) duplicate registrations, unverified sequences and protein fragments, which could lead to unreliable data regarding the uniqueness of a protein sequence. To recognize all the CrUPs being embraced in SARS-CoV-2 proteome against the human proteome, we in silico constructed a new, artificial, "hybrid-proteome" that contained all the reviewed human proteins (20.430 proteins) plus the one protein derived from SARS-CoV-2 viral proteome (20.431 proteins). Thus, 16 "hybrid proteomes" including the 16 SARS-CoV-2 proteins were constructed. Hence, these "hybrid proteomes" were bioinformatically searched one by one for the identification of SARS-CoV-2-specific CrUPs in human protein sequence environments (C/H-CrUPs). Strikingly, 7.503 C/H-CrUPs were detected, with 4.213 of them being presented one time in the SARS-CoV-2 proteome, 3.289 being observed two times in the viral proteome and only one peptide ("VNNATN") with a 6 amino acid length being recognized three times (Table 1 and Data S1). Data processing and analysis unveiled that C/H-CrUPs retain a length range from 4 to 9 amino acids, while longer peptides could not be identified in the SARS-CoV-2 virus proteome. Length distribution showed that the majority of C/H-CrUPs have a 6 amino acid length, whereas only one with 4 amino acids and only two with 9 amino acids C/H-CrUPs were observed (Fig 1) . The proteome of the virus constituted the β coronavirus group SARS-CoV-2, SARS-CoV and MERS-CoV were analyzed for core unique peptides against the human proteome. The CrUPs of each virus against the Human proteome were presented. The identified CrUPs fwere urther analyzed for the times by which they appeared in the virus proteome. The CrUPs density, is defined as the percentage of the total Amino Acids contained in CrUPs of each virus to the total number of the virus Amino Acids. The analysis of the SARS-CoV-2, SARS-CoV and MERS-CoV virus is presented. All the proteins viruses were analyzed and each protein is shown by its EntryID, Entry Name and Protein Name according to the UNIPTOT data base. The AA length of each protein and the CrUPs of each protein against the human proteome are shown. Density is defined as the percentage of the total Amino Acids contained in CrUPs of each protein to the total number of the protein's Amino Acids. Among all SARS-CoV-2 proteins, the SPIKE_SARS2 (P0DTC2) one (Spike) has received the greatest attention as a key element for virus attachment to the host cell, and as such it has become a principal target for therapeutic vaccine development (Papa et al., 2021; Xia 2021 Most importantly, SARS-CoV-2 Spike protein has presented a significant mutational diversity (Sanches et al., 2021; Tzou et al., 2020) . Hitherto, 9 main variants with adaptive mutations and high spread to human populations, named from Alpha to Lambda, respectively, have been thoroughly mapped and characterized. These 8 variants are divided in 39 sub-variants, while other 32 sporadic variants have also been described (Tzou et al., 2020) . To investigate the association of mutational profiling with C/H-CrUP landscaping of SARS-CoV-2 Spike protein, the 39 sub-variants together with the wild-type Spike protein (SPIKE_SARS2, P0DTC2) were suitably aligned ( Fig S3A) . This multiple alignment illustrates all the herein identified Universal Peptides (Table S2 ) and all the mutations previously announced per isolated variant ( Fig S3B) . Notably, it seems that almost all the hitherto characterized mutations are identified in regions being located outside the Universal Peptides group. Their majority are clustered in the S1 domain of Spike protein, with two critical mutations being detected in the S1-S2 bridge region, at the amino acid residue 681 that resides in proximity to the first cleavage position by Furin protease, in between the 685 th and 686 th amino acid residue ( Fig S3C) (Davidson et al., 2020; Coutard et al., 2020) . Remarkably, all the examined mutations herein prove to create new CrUPs against the human proteome compared to the wild-type Spike protein, thus indicating that the mutant virus strains need novel clinical treatments. This is an important finding, since these new C/H-CrUPs do not exist in the human proteome, but are observed exclusively in the mutant virus proteomes, thereby justifying the great attention Alpha, Delta, Kappa, Lambda and Mu variants have recently received at the worldwide level (Tzou et al., 2020) . Hraber etal., 2020). The molecular mechanism of Spike protein's proteolytic activation has been shown to play a crucial role in the selection of host species, virus binding to the ACE2 receptor, virus-cell fusion and viral infection of human lung cells (Peacock et al., 2021; Whittaker 2021; Shang et al., 2020a) . SPIKE_SARS2 (P0DTC2) contains three cleavage sites; the R 685 ↓S and R 815 ↓S positions that serve as direct targets of the Furin protease, and the T 696 ↓M position that can be recognized by the TMPRSS2 protease (Hoffmann et al., 2020a; Hoffmann et al., 2020b; Takeda, 2021) . Analysis of the wild-type C/H-CrUPs and the newly formed, mutation-induced, C/H-CrUPs in Spike protein unveiled that the mutation-driven, novel, peptides are created exclusively, around the R 685 ↓S cleavage site, by the two pathogenic mutations P681H and P681R (Table S2 ). The new CrUPs of SARS-CoV-2 spike protein (SPIKE_SARS2, P0DTC2) across the variants Alpha, Delta, Kappa and Lambda were presented. In the first column the position of the mutation in the spike protein sequence is shown. In the second column the mutation is recorded. In the third column is recorder the SARS-CoV-2 main variant in which the mutation appeared. In the fourth column the position of the first amino acid of the new C/H-CrUP created by the mutation is shown. Notably, among these four new peptides, the only one new peptide that embraces Furin's cleavage site is the "SRRRAR↓S" C/H-CrUP, which is solely generated by the P681R mutation carried by the Delta and Kappa coronavirus variants, while at the same time the peptide "PRRARSV" conserve its uniqueness even after the replacement of Proline with Arginine and its transformation to "RRRARSV" (Fig. 2) . The Furin cleavage site R 685 ↓S has been characterized as a 20 amino acid motif that corresponds to the amino acid sequence A672-S691 of the SPIKE_SARS2 (P0DTC2) protein (Fig 2) (Wu and Zhao, 2020) . The 8 amino acid sequence peptide "SPRRAR↓SV" (S680-V687) serves as the core region of the motif, while two flanking solvent-accessible regions of 8 amino acids (A672-N679) and 4 amino acids (A688-S691) long, respectively, are recognized (Takeda, 2021; Wu and Zhao, 2020) . Pro-protein Convertase (PC) Furin and/or Furin-like PCs act as sequence-specific proteases, and can cleave the Spike protein in a position recognizing the unique, and positively charged by the Arginine, motif "R-x-x-R↓S" ( Wu and Zhao, 2020) . Since Furin and/or Furin-like PCs are secreted from host cells and bacteria in the airway epithelium, while other PCs, such as the PC5/6A and PACE4, exhibit widespread tissue distribution, it is likely that their activities may be critically implicated in the SARS-CoV-2-induced damage and pathology of multiple infected organs (Örd et al., 2020) . It seems that Furin's cleavage site essentially contributes to the infection process and disease progression, and offers a powerful target for immunogenetic, antigenic and therapeutic interventions, as corroborated by the recently developed new antibody against Furin's cleavage site (Braun et al., 2019; Zahradník et al., 2021; . Most importantly, the SARS-CoV-2 Delta variant that carries the critical mutation P681R seems to be more infectious and pathogenic than the wild-type virus form, while the importance of that mutation has very recently begun to be recognized . Replacement of Proline with Arginine at position 681 causes the loss of amino acid sequence uniqueness that characterizes the wild-type "PRRARSV" C/H-CrUP and likely increases the possibility of Furin's cleavage site (core region) to be significantly stabilizing its conformation, thus facilitating a more efficient Spike protein cleavage process by the Furin protease (Whittaker, 2021; Callaway, 2021) . To the same direction, novel SLiMs, such as "SRRR", "RRR", "RRRAR" and "RRRARS", can be produced by the mutant C/H-CrUPs, which may act as specific targets of other than Furin PCs, thereby enabling the stronger (and quicker) binding of the mutant virus to its host ACE2 receptor that likely leads to a comparatively more generalized infection and massive mutant virus production (Table S3 ) (Shorthouse et al.,2021; Davey et al., 2015) . That fact seems to be evident by the dramatic increase of the total number of motifs created by the P681R mutation identified within the Human proteome (Table S3 ). Of note, the mutant C/H-CrUP-derived new SLiMs, in the SARS-CoV-2 Delta variant, could render Spike protein antigenically weak or defective, fostering it to lose its capacity to serve as antibody target promotes the virus immune escape (Davey et al., 2015; Almehdi et al., 2021 ). of the virus to the host cell surface. SARS-CoV-2 belongs to the β coronavirus group and, like SARS-CoV, uses the same cellular receptor, the Angiotensin-Converting Enzyme 2 (ACE2) (Walls et al., 2020; Wang et al., 2020) . The SARS-CoV-2 Spike protein attaches to ACE2 receptor by a Receptor-Binding Domain (RBD) defined in the Spike protein from positions F318 up to F541 (Shang et al., 2020b) . Nowadays, this region has received great attention, as it seems to be the target of antibodies against the virus and other therapeutic interventions (Chen et al., 2021; Zahradník et al., 2021; Hastie et al., 2021) . Additional studies have shown that from the amino acid residue W436 up to the Q506 one the RBD contains the Receptor-Binding Motif (RBM), which carries 12 contact positions with ACE2 (Hatmal et al., 2020) . Mutation analysis revealed that in 10 positions of the RBD region 13 mutations were described ( Fig. S3 and Table S4 ). In RBM, 10 mutations in 6 sequence positions were described in different virus variants (Table S4) , while from the 10 contact positions only the P501Y in Alpha, Beta, Gamma and Mu variants was found to be mutated (Table S5) . The most important region in RBM is the peptide NYNYLYRLF (from 448 to 456 position). This tyrosine-enriched peptide contains two contact site (Y449 and Y453) and is known as the NF9 peptide (Motozono et al., 2021) . It seems to affect antigen recognition, by being an immunodominant HLA*24:02-restricted epitope identified by CD8 + T cells. Furthermore, NF9 stimulation also increases cytokine production produced from CD8 + T cells, such as IFN-γ, TNF-α and IL-2 (Kared et al., 2021) . Analysis of C/H-CrUPs of the NF9 peptide showed that it contains 3 unique peptides ( Fig. 2D and E, and Table S6 ). Mutation analysis indicated that in the NF9 peptide the mutation L452R occurs in the variants Alpha, Delta, Iota and Kappa, while the mutation L452Q appears in the variant Lambda. Further analysis unveiled that these mutations are observed in the amino acid that resides at position 5, exactly in the middle of the peptide, creating 3 and 4 new C/H CrUPs, respectively (Table S6 ). These mutations have a dramatic effect in the uniqueness of the NF9 peptide(s). Namely, the 6 amino acid length C/H-CrUPs "NYNYLY" losses its uniqueness against the human proteome, while only by the mutation L452Q a new core unique peptide with 5 amino acid length is surprisingly created ( Fig. 2D and E, and Table S6 ). The loss of uniqueness of this peptide, which notably is located at the beginning of NF9 peptide, seems to be crucial, as it leads to the loss of the antigenic capacity of the NF9 peptide, thus evading the HLA-A24restricted immunity and inducing the immune escape of the virus. Interestingly, related studies have shown that the L452R mutation (and subsequently the newly created C/H-CrUPs herein characterized) increases the infectiveness of SARS-CoV-2, by strengthening the electrostatic interactions of this region on Spike protein with the ACE2 virus receptor (Motozono et al., 2021) . Hitherto, epidemiological data indicated that the dominant variant of SARS-CoV-2 is the Delta variant (Mlcochova et al., 2021) . Under the light of the aforementioned findings, variant's enhanced pathogenicity seems to be the outcome of the simultaneous existence (accumulation) of two critical mutations; the L452R and P681R ones, in Delta variant. The mutation L452R, through the loss of NF9 peptide uniqueness, causes virus immune escape and stronger binding of the virus to its cognate receptor, while at the same time mutation P681R facilitates the Spike protein cleavage process by different proteases, inducing a generalized infection and a massive virus release. Therefore, the Delta variant gains a significant advantage of escape from the immune system per se, as well as from the vaccination-induced immunity, together with an increased infectiveness, as a result of virus entrance into the host cell, and an increase of virus formation and its massive release. Interestingly, although mutations outside the Spike protein locus in SARS-CoV-2 coronavirus genome have not been yet completely mapped, in a systematic manner, our study also reveals novel and useful information of all the remaining (Spike protein-independent) C/H-CrUPs that seem to hold strong promise and open a new therapeutic window for the Covid-19 pandemic. Finally, the approach of virus-host unique peptide signature identification could prove a useful tool for the elucidation of virus infectiveness, prevention of virus immune escape, domination of pathogenic variants, and identification of new antigenic and pharmacological targets. A new bioinformatic tool was developed which can extract from a given proteome the Core Unique Peptides (thus creating it's Uniquome). The user can specify the min and max peptide lengths that the tool will analyze. The tool will split each protein to all possible peptides of length min to length max thus generating a very large set of peptides (for a protein of length L with a window of size W a set of C = L -W + 1 will be generated). In the next step all these peptides starting from smallest to largest will be searched against the rest of the proteome to decide whether the peptide exists on another protein or not. Since the search is for the smallest possible peptide (Core Unique Peptide) the tool will first make sure that the peptide under examination does not already contain a smaller Core Unique Peptide. This is ensured by examining if any of the already identified Core Unique Peptides of the protein is contained within the peptide under examination. All peptides that conform to these 2 rules are Core Unique Peptides. The following diagram describes the algorithm we use to identify these Core Unique Peptides. In the following figure a sliding window of 9 aminoacids is applied on O00400 ACATN_HUMAN protein generating candidate peptides VYVKNFGRR and YVKNFGRRK. Those peptides will be searched against the rest of the proteome to determine their uniqueness once we have determined that they don't already contain a smaller Core Unique Peptide. The latter is determined by examining whether an already defined Core Unique Peptide is contained within the peptide. To address the hypothesis of the current study, the aforementioned tool was expanded by developing a new feature where the user can give a reference and a target proteome. This new feature allows the tool to search all the peptides of the target proteome against the reference proteome thus creating a set of Core unique peptides of Target vs Reference proteomes. To accomplish that the tool will (like on the initial implementation) split all proteins in the target proteome to all possible peptides of length min to length max. Now instead of searching for the uniqueness of each peptide within the same proteome, it performs that search against the reference proteome. Like before the peptide under examination must not contain any smaller peptides already identified as Core Unique Peptides. The following diagram describes the algorithm we use to identify these Core Unique Peptides. For Motifs and SLiMs identification and search, the tool offers the user the ability to perform a motif search to identify possible SLiMs. User gives an N length peptide as well as the number of aminoacids that can vary in the given peptide. The tool then creates all possible combinations of peptides that can be produced by considering in each combination exactly N aminoacid(s) as unknown. Once those combinations are produced an exhaustive search using regular expressions is performed against the reference proteome to locate all possible proteins containing such peptides. To better highlight the process, if the user provides the peptide TQYILG and N=2 the following combinations will be produced: Exploration of antigenic determinants in spike glycoprotein of SARS-CoV2 and identification of five salient potential epitopes UniMaP: finding unique mass and peptide signatures in the human proteome SARS-CoV-2 spike protein: pathogenesis, vaccines, and potential therapies Furin-mediated protein processing in infectious diseases and cancer The mutation that helps Delta spread like wildfire ACE2-targeting monoclonal antibody as potent and broad-spectrum coronavirus blocker A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade Short linear motifs -ex nihilo evolution of protein regulation Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein Defining variant-resistant epitopes targeted by SARS-CoV-2 antibodies: A global consortium study Comprehensive Structural and Molecular Comparison of Spike Proteins of SARS-CoV-2, SARS-CoV and MERS-CoV, and Their Interactions with ACE2 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Resources to Discover and Use Short Linear Motifs in Viral Proteins SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes SARS-CoV-2-specific CD8+ T cell responses in convalescent COVID-19 individuals Data processing approach for the construction and evaluation of an organism's UNIQUOME with comparative analysis for the Human, Rat and Mouse Uniquomes. Paper presented at XIII. Annual Congress of the European Proteomics Association: From Genes via Proteins and their Interactions to Functions SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion SARS-CoV-2 spike L452R variant evades cellular immunity and increases infectivity The sequence at Spike S1/S2 site enables cleavage by furin and phospho-regulation in SARS-CoV2 but not in SARS-CoV1 or Furin cleavage of SARS-CoV Spike promotes but is not essential for infection and cell-cell fusion The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets Recent advances in SARS-CoV-2 Spike protein and RBD mutations comparison between new variants Alpha (B.1.1.7, United Kingdom), Beta (B.1.351, South Africa), Gamma (P.1, Brazil) and Delta (B.1.617.2, India) Flexible, Functional, and Familiar: Characteristics of SARS-CoV-2 Spike Protein Evolution Cell entry mechanisms of SARS-CoV-2 Structural basis of receptor recognition by SARS-CoV-2 SARS-CoV-2 variants are selecting for spike protein mutations that increase protein stability Proteolytic activation of SARS-CoV-2 spike protein CoV-RDB): An Online Database Designed to Facilitate Comparisons between Candidate Anti-Coronavirus Compounds Classification of intrinsically disordered regions and proteins Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 SARS-CoV-2 spike and its adaptable furin cleavage site Furin cleavage sites naturally occur in coronaviruses Furin: A Potential Therapeutic Target for COVID-19 Domains and Functions of Spike Protein in Sars-Cov-2 in the Context of Vaccine Design SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution All proteomes and proteins were obtained from: Uniprot SARS-Cov_2 wild type, variants sequences and mutations were obtained from Stanford COVID Database Motifs were taken from the Eukaryotic Linear Motif resource for Functional Sites in Proteins SLiMs containing proteins were taken from Davey lab SLiMs servers (The Institute of Cancer Research Competing interests: Authors declare that they have no competing interests Data and materials availability: All data are available in the main text or the supplementary materials