key: cord-0902618-in48pd8t authors: Antonio, E. C.; Meireles, M. R.; Bragatte, M. A. S.; Vieira, G. F. title: Can the protection be among us? Previous viral contacts and prevalent HLA alleles could be avoiding an even more disseminated COVID-19 pandemic. date: 2020-06-17 journal: nan DOI: 10.1101/2020.06.15.20131987 sha: 60090b31d8890af2882d798ad5076010da720b8a doc_id: 902618 cord_uid: in48pd8t Background: COVID-19 is bringing scenes of sci-fi movies into real life, and it seems to be far from over. Infected individuals exhibit variable severity, with no relation between the number of cases and mortality, suggesting the involvement of the populational genetic constitution and previous cross-reactive immune contacts in the individuals' disease outcome. Methods: A clustering approach was conducted to investigate the involvement of human MHC alleles with individuals' outcomes. HLA frequencies from affected countries were used to fuel the Hierarchical Clusterization Analysis. The formed groups were compared regarding their death rates. To prospect the T cell targets in SARS-CoV-2, and by consequence, the epitopes that are conferring cross-protection in the current pandemic, we modeled 3D structures of HLA-A*02:01 presenting immunogenic epitopes from SAR-CoV-1, recovered from Immune Epitope Database. These pMHC structures were also compared with models containing the corresponding SARS-CoV-2 epitope, with alphacoronavirus sequences, and with a panel of immunogenic pMHC structures contained in CrossTope. Findings: The combined use of HLA-B*07, HLA-B*44, HLA-DRB1*03, and HLADRB1*04 allowed the clustering of affected countries presenting similar death rates, based only on their allele frequencies. SARS-CoV HLA-A*02:01 epitopes were structurally investigated. It reveals molecular conservation between SARS-CoV-1 and SARS-CoV-2 peptides, enabling the use of formerly SARS-CoV-1 experimental epitopes to inspect actual targets that are conferring cross-protection. Alpha-CoVs and, impressively, viruses involved in human infections share fingerprints of immunogenicity with SARS-CoV peptides. Interpretation: Wide-scale HLA genotyping in COVID-19 patients shall improve prognosis prediction. Structural identification of previous triggers paves the way for herd immunity examination and wide spectrum vaccine development. Funding: This work was supported by the National Council for Scientific and Technological Development (CNPq) and National Council for the Improvement of Higher Education (CAPES) for their support COVID-19 is bringing scenes of sci-fi movies into real life. Considering more than seven million of already diagnosed cases and the growing number reported in the past few months, it seems to be far from over. The coronaviridae family also includes other respiratory syndrome causative agents in humans, the SARS-CoV, and MERS-CoV viruses 1 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University 3 -recovered at June 04, 2020. Nevertheless, we should consider that there is not a regular correlation between the number of diagnosed individuals and mortality, among the affected countries, as shown in Table 1 . The United States, for example, shows the highest number of cases, which may be due to a better diagnosis rate, which influences the death rate calculation. In this context, Russia is the third country with a high number of cases, although its low mortality rate (one to 107 cases). Other countries with the most infected populations also present a low proportion of morbidity in diagnosed individuals as Germany (1:21). These values are quite different from those found in Italy (1:7) and France (1: 6.6), for example. It becomes even more alarming when we realize that each number is a human life.calculated data recovered at May 26, 2020. What elements could be dictating these differences? Social and cultural differences? The diverse starting point of government measures? Or genetic background presented by the different populations and SARS-CoV-2 strains around the world, causing distinct immune response profiles in critical patients? The present work will discuss this last question. In the current pandemics, the first clues over the importance of cellular response are This great variability enables that animal species could be able to fight, especially in a populational level, with the comparable mutational potential of pathogens, as viruses. Briefly, it is involved with the processing and presentation of small peptides in cell surfaces, derived from intracellular proteins, allowing the immune system to discriminate self from non-self, thus eliminating pathogens and tumors. At this point, a critical question is raised. We mentioned the MHC locus intentionally, not only MHC genes, considering that other proteins, fundamental to produce true epitopes, belong to this genomic region. Many studies, aiming to prospect tumoral or vaccine targets, focuses its prediction only on the pathogen or cancer sample MHC ligandome 5 . A potential to bind to different alleles do not confer to peptides their full potential to be a T cell epitope. A full T cell synapsis triggering demands additional requirements such as epitope immunodominance and pMHC:TCR physicochemical complementarity. Thus, in silico analysis considering additional steps on the antigen processing pathway and comparisons among putative targets and immunogenic epitopes, could present a better performance to prospect actual T cell epitopes, as in the current situation where no previous information is available. Presenting a central role in this process, the HLAs (peptides presenting molecules in humans) constitute a link between the immune system surveillance and intracellular space status. It is known that different HLA alleles can bind and present a specific viral protein region (peptides) with altered efficiencies, which could provoke both susceptibility or resistance to a disease caused by a specific pathogen, depending on the type of MHC that the individual possesses. 6, 7 Thus, some HLA allotypes are unable to present some immunodominant epitopes (those responsible to initiate T-cell responses) from a specific virus, precluding the detection and fighting by the immune system. The opposite also occurs, the existence of HLAs with improved potential to present optimal viral targets, allowing the infection control 8 where we are dealing with a vast number of alleles (variables), as indicated above, contributing to minor effects, which could be important at a global level. The virus's ability to infect and spread in its pandemic should be related to a shared common global characteristic, as mentioned before. So, it is clear that the infection propagates disconnected from populations' habits and origins, but the lethality varies between them. Important information arises from attending for the most frequent/rare MHC alleles in the most affected regions. The frequencies of these different HLA alleles vary widely among the countries with the highest number of cases 12 Additional negative symbols were artificially attributed to DRB1*03 values intending to detach its alleles frequencies in the HCA. This action was a result of a previous visual inspection correlating its subtle prominence in countries presenting a worse prognosis, classifying the allele as a potential determinant factor. affects countries population. Italy and France are the two more closely related countries, forming a ternary group with Spain. They have, respectively, a mortality index for closed cases of 1 to 5.24, and 1 to 3.41, while in Spain it is from 1:7.87. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. . Taking into account the active cases the death per case rate decreases, being of 1:7.06, 1:6.60, and 1:10.04, in Italy, France, and Spain, respectively (all values were calculated considering numbers recovered by May, 23) 3 . This group comprises three relevant nations considered as former significant pandemic epicenters, presenting high death proportions regarding the number of reported cases. Fortunately, the death and cases rates in these countries have been falling, probably due to protective measures improving their prognosis. A related cluster includes the United States, the United Kingdom, and Brazil. Currently, these countries represent a prominent alarming status, possessing the actual high mortality indexes with an elevated contagious level. Its Table 1) . Trying to establish an initial validation of the noticed alleles effects, we include Chile, Portugal, and Peru (low-rate death indexes) and Romania, in four independent analysis (Table S1 and Figure S1 ). Portugal ( Figure An alternative hypothesis is that the T cell response governing SARS-CoV-2 recovering and clearance is through prevalent alleles in populations, as the HLA-A*02 supertype, which could be a good sign. Initially, it seems to be a contradictory reasoning, but in . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. . light of the mortality rates in other SARS causative viruses (around 10% or above) 16 In our analysis, most of the recovered HLA-A*02:01 epitopes described in past epidemics seem to present conserved sequences compared to their equivalent in SARS-CoV-2 ( Table 2) , as described in other works 20, 21 . Nevertheless, in case of discordant peptide sequences, the simple sequences comparison of SARS-CoV and SARS-CoV-2 may provide us little information about the impact of these alterations on the immunogenic potential on current putative viral targets. The analysis of structural and physicochemical features in the peptide-MHC (pMHC) surfaces that contact the T cell receptors, considers the combined molecular elements involved in T cell activation. Such structural investigation has already demonstrated its potential, explaining differential immunogenicity among epitopes from diverse viral strains or tumoral origins is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . immunodominant epitope M1 58-66 GILGFVFTL, from the Influenza virus. Both epitopes share 2/9 amino acids, reinforcing the importance of structural investigation to prospect cross-reactive targets. In our analysis, to make it evident that the comparisons were not result from structural biases, we provide a small sample of unrelated models from our CrossTope database (www.crosstope.com), containing TCR interacting surfaces from HLA-A*02:01 structures presenting immunogenic epitopes ( Figure S2 ). These first comparisons uncovered interesting scenarios. Firstly, even those peptides with sequence alterations in SARS-CoV-2, but without molecular modifications in the TCR interacting surfaces, remain good candidates to immunization strategy. The examples where SARS-CoV-2 peptides shown subtle alterations compared to their corresponding SARS-CoV targets could indicate a change in immunogenicity or even a complete loss of it. However, in this situation it resembles highly immunogenic epitopes from other viral organisms. This evidence emphasizes the need to investigate this new face of the immunogenic prism. The observed molecular similarity between pMHCs complexes containing peptides from SARS-CoV-2 and Influenza viruses brings us toward another attractive hypothesis that refers to a universal previous cytotoxic response present in populations from all over the world, triggered by previous infections. The first suspects to investigate were past contact with HCoVs from the alphacoronavirus genus members (229E, OC43, and NL63). Epidemiological studies reported that around 15-30% of the common cold are caused by this group of pathogens 24 . Even considering that they are viruses with a zoonotic origin, we would expect many spillover events throughout the history of the human species, maintaining regular contact with our species 25 . A codon usage analysis, involving BCoV and HCoV-OC43, suggests that an ancestor coronavirus could be present even 200 kyr ago, in early humans 26 . Therefore, we would expect that this group of pathogens has also contributed to shaping our current immune system repertoire. Guided by this supposition, we compared the immunogenic SARS-CoV epitopes with 229E, OC43, and NL63 corresponding protein sequences, looking for shared elements involved in immunogenicity triggering. Such analysis presented a clear example where sequence comparison might be hiding shared patterns not detectable by single amino acid identity alignment. In Table 2 , the identities ranged around 50%, a value usually not detected by regular sequence methods prospection. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . Nevertheless, when we inspect pMHC structural models harboring peptides from SARS-CoV-1, SARS-CoV2, and alphacoronavirus in HLA-A*02:01 alleles, intriguing fingerprints arose. A similar electrostatic distribution and topography, on the TCR interacting surfaces from the pMHCs, can be observed among SARS-1, SARS-2, and coronaviruses members (229E and OC43) ( Figure 3A) . It is important to reinforce that both 229E and OC43 peptides were predicted as strong binders to HLA-A*02:01 (data not shown), strengthening their potential as actual triggers for SARS-CoV crossreactivity. The peptides from the beta-CoVs are quite similar (GLMWLSYFL and GLMWLSYFV, for SARS-1 and 2, respectively). However, the peptides from 229E (LVMWVMYFA) and OC43 (IIMWIVYFV) are not so conspicuously identical, evidencing the importance of the structural investigation. Furthermore, other peptides derived from alpha-CoV viruses presented a lower similarity in physicochemical features to immunogenic SARS-CoV-1 epitopes (data not shown). They are also potential targets to investigate. A work recently deposited in bioRxiv showed that 34% SARS-CoV-2 of seronegative healthy individuals presented S-reactive CD4 T cells. These cells react almost exclusively with the C-term epitopes region, characterized by higher homology with spike protein of human endemic common cold coronaviruses. Nevertheless, none of the putative cross-reactive epitopes are pointed-out, nor the structural basis hypothesized, but it reinforces our propositions 27 . Evidences of many CD4+/CD8+ cross responses against many SARS-CoV proteins in unexposed individuals were extensively described in Griffoni et al. 20 , without the specific identification of sequence targets. Taking into consideration the previous identification of a similar target from Influenza virus (M1 58-66 GILGFVFTL) with a SARS-CoV epitope, the next step was to scrutinize CrossTope T cell epitope databases ((http://crosstope.com/ 28 ) searching for structural fingerprints common to SARS-CoV-1 epitopes and unrelated viruses. As demonstrated before, given that SARS-CoV-1 and SARS-CoV-2 peptides are structurally related, the comparison with experimentally described epitopes from SARS-CoV-1 seems to be appropriate. The results from these comparisons were extraordinary. When we look for pHLA-A*02:01 structures, not only the previous example of M1 58-66 IAV epitope matched with SARS epitopes, but 13 out 20 CD8+ coronavirus epitopes recovered from Immune Epitope Database, has counterparts in targets from common circulating viruses, with respect to depicted molecular features. Three examples of SARS-CoV peptides presenting strikingly structural identity with viral epitopes are presented in Figure 3B . The matched targets are derived from viruses belonging to three different families (herpesviridae, poxviridae, and flaviviridae). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . These would probably not be investigated targets in an approach using regular methods, given that no apparent identity is evidenced by any of these structurally related sequences.The remaining comparisons can be viewed in Figure S3 . Importantly, when we consider these images correspondences, the comparisons with pMHCs from unrelated viruses were more conspicuous than those from alpha CoVs. It seems a paradox, given the natural expectation (considering its phylogenetics proximity) of a more intimate relation between alpha and beta CoVs. In COVID-19, the aetiological agent, SARS-CoV-2, is well established. Nevertheless, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . for beta CoVs with the potential to spill out of their natural hosts to humans. Peptides sequences from these putative HCoV pathogens could be structurally compared searching for cross-reactive T cell targets to be used in a virtual future occurrence of a new coronavirus spillover phenomenon. Such preventive strategy could abbreviate steps to develop immunotherapeutic methods, avoiding the emergence of new pandemics. The second line of the investigation resulted in an even more attractive hypothesis. Previous infections with alpha-CoVs and unrelated common viruses can be generating memory T cells against SARS-CoV-2, in a significant portion of the population. This pool of cells in different individuals is providing a universal immunogenic shield against SARS-CoV-2 and, probably, against other potential emergent viruses. This mechanism seems to be constructed by regular cross-reactive contacts. Moreover, the defense appears to be associated with prevalent alleles, which probably present peptides harboring fingerprints of immunogenicity shared by epitopes that regularly infect humans. We thank to National Council for Scientific and Technological Development (CNPq) and National Council for the Improvement of Higher Education (CAPES) for their support. The authors declare no competing interests. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . To start the investigation, we picked the ten most affected countries in the number of COVID-19 cases, according to the dataset 3 (recovered at May 26, 2020), plus China, which was the first pandemic epicenter. The alleles having a frequency equal to or higher than 10% (in at least one of the selected countries) were examined. Some HLAs were rejected due to its lack of information in some countries or considering its similar frequencies across nations, which could interfere in the future employment of the clusterization method. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . We prospected in the IEDB database for experimental coronavirus epitopes, finding 20 immunogenic epitopes restricted to HLA-A*0201 allele. Those epitopes were derived of the N (nucleocapsid) protein, Surface (spike) protein, and Membrane glycoprotein. To We selected viruses from the Alphacoronavirus family (HCoV-229E, HCoV-NL63, and HCoV-OC43), and checked if these strains possess epitopes that could generate similar surfaces as those observed in the SARS viruses. To this, we observed the protein sequences of different Alphacoronaviruses and searched for some amino acid identity with immunogenic targets described for SARS-CoV-1, aiming to verify if they present structural conservation throughout different coronaviruses. All selected epitopes were modeled in HLA-A*02:01 receptor by Docktope Tool. The 3D models were used as input in the Pymol software to calculate their electrostatic surfaces to verify how much the change in the amino acid impacted the overall charge disposition in those models. We compared the pMHC complexes of both viruses with several other complexes deposited in the Crosstope database, which harbors several previously described epitope sequences of immunogenic targets from the IEDB. All images of SARS-CoV and CrossTope HLA-A*02:01 epitopes were used to perform the comparisons. We utilized ImageJ to extract the Red-Blue-Green (RGB) values corresponding to the electrostatic surface regions that contact T cell receptor. These regions were inferred by contact calculation of ternary crystals comprising p:HLA-A*02:01:TCR. The regions present colors that can be more positive, neutral or negative . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . in regards to the electric charges and this color information was converted in numeric values of mean, mode and standard deviation to feed the hierarchical clusterization program pvclust, ran by the R Studio to perform this pairwise comparison and group the most similar surfaces together. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . human populations. pMHCs models were compared in terms of topography and electrostatic distribution. In A, two beta (SARS-CoV) and two alpha (229E and OC43) coronaviruses show similar electrostatic distribution and topography despite its sequence divergences. In B, a panel of three SARS-CoV peptides is compared to pMHCs containing viral epitopes from members of other families. An unexpected shared structural similarity arises, disregarding their lack of sequence identity and phylogenetic relationship. Electrostatic calculations are represented as negative (red) and positive (blue) charges. Sequence identity is depicted as red letters in the peptide sequences. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. . . It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. I E D B -e x p e r i m e n t a l p o s i t i v e e p i t o p e s f r o m c o r o n a v i r u s e s A P e p t i d e s i n o t h e r c o r o n a v i r u s e s B E p i t o p e I D D e s c r i p t i o n A n t i g e n N a m e O r g a n i s m N a m e E x p e r i m e n t a l A s s a y s ( P o s . It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 17, 2020. A new coronavirus associated with human respiratory disease in China A pneumonia outbreak associated with a new coronavirus of probable bat origin An interactive web-based dashboard to track COVID-19 in real time HLA-DRB1 the notorious gene in the mosaic of autoimmunity Machine Learning for Cancer Immunotherapies Based on Epitope Recognition by T Cell Receptors Major histocompatibility complex genomics and human disease HLA and Associated Important Diseases The Compatibility Gene: How Our Bodies Fight Disease, Attract Others, and Define Our Selves by Daniel M. Davis The proximal origin of SARS-CoV-2 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Epidemiological and genetic correlates of severe acute respiratory syndrome coronavirus infection in the hospital with the highest nosocomial infection rate in Taiwan in 2003 Update: Gold-Standard Data Classification, Open Access Genotype Data and New Query Tools How the SARS coronavirus causes disease: host or organism? Association of Human-Leukocyte-Antigen Class I (B*0703) and Class II (DRB1*0301) Genotypes with Susceptibility and Resistance to the Development of Severe Acute Respiratory Syndrome HLA-B, HLA-C and KIR improve the predictive value of IFNL3 for Hepatitis C spontaneous clearance Understanding the T cell immune response in SARS coronavirus infection HLA-A*0201 T-cell epitopes in severe acute respiratory syndrome (SARS) coronavirus nucleocapsid and spike proteins Virus-specific memory CD8 T cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection T-Cell Epitopes in Severe Acute Respiratory Syndrome (SARS) Coronavirus Spike Protein Elicit a Specific T-Cell Immune Response in Patients Who Recover from SARS A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2