key: cord-0714761-kyx422j1 authors: Nerli, Santrupti; Sgourakis, Nikolaos G. title: Structure-based modeling of SARS-CoV-2 peptide/HLA-A02 antigens date: 2020-03-28 journal: bioRxiv DOI: 10.1101/2020.03.23.004176 sha: 046766c0dda14273031e63b9899791cb44fba31c doc_id: 714761 cord_uid: kyx422j1 As a first step toward the development of diagnostic and therapeutic tools to fight the Coronavirus disease (COVID-19), it is important to characterize CD8+ T cell epitopes in the SARS-CoV-2 peptidome that can trigger adaptive immune responses. Here, we use RosettaMHC, a comparative modeling approach which leverages existing high-resolution X-ray structures from peptide/MHC complexes available in the Protein Data Bank, to derive physically realistic 3D models for high-affinity SARS-CoV-2 epitopes. We outline an application of our method to model 439 9mer and 279 10mer predicted epitopes displayed by the common allele HLA-A*02:01, and we make our models publicly available through an online database (https://rosettamhc.chemistry.ucsc.edu). As more detailed studies on antigen-specific T cell recognition become available, RosettaMHC models of antigens from different strains and HLA alleles can be used as a basis to understand the link between peptide/HLA complex structure and surface chemistry with immunogenicity, in the context of SARS-CoV-2 infection. groove (termed A-F pockets) define a repertoire of 10 4 -10 6 peptide antigens that can be recognized 48 by each HLA allotype (9, 10) . Several machine-learning methods have been developed to predict 49 the likelihood that a target peptide will bind to a given allele (reviewed in (11)). Generally these 50 methods make use of available data sets in the Immune Epitope Database (12) to train artificial 51 neural networks that predict peptide processing, binding and display, and their performance varies 52 depending on peptide length and HLA allele representation in the database. Structure-based 53 approaches have also been proposed to model the bound peptide conformation de novo (reviewed 54 in (13)). These approaches utilize various algorithms to optimize the backbone and side chain 55 degrees of freedom of the peptide/MHC structure according to an all-atom scoring function, 56 derived from physical principles (14- 16) , that can be further enhanced using modified scoring 57 terms (17) or mean field theory (18). While these methods do not rely on large training data sets, 58 their performance is affected by bottlenecks in sampling of different backbone conformations, and 59 any possible structural adaptations of the HLA peptide-binding groove. 60 Predicting the bound peptide conformation whose N-and C-termini are anchored within a fixed-61 length groove is a tractable modeling problem that can be addressed using standard comparative 62 modeling approaches (19) . In previous work focusing on the HLA-B*15:01 and HLA-A*01:01 63 alleles in the context of neuroblastoma neoantigens, we have found that a combined backbone and 64 side chain optimization approach can yield accurate pMHC-I models for a pool of target peptides, 65 provided that a reliable template of the same allele and peptide length can be identified in the 66 database (20) . In this approach (RosettaMHC), a local optimization of the backbone degrees of 67 freedom is sufficient to capture minor (within 0.5 Å heavy atom RMSD) changes of the target 68 peptide backbone relative to the conformation of the peptide in the template, used as a starting suggesting that a similar principle can be applied to produce models of candidate epitopes directly 72 from the proteome of a pathogen of interest. Here, we apply RosettaMHC to all HLA-A*02:01 73 epitopes predicted directly from the ~30 kbp SARS-CoV-2 genome, and make our models publicly 74 available through an online database. The computed binding energies of our models can be used 75 as an additional validation layer to select high-affinity epitopes from large peptide sets. As detailed 76 epitope mapping data from high-throughput tetramer staining (23-25) and T cell functional 77 screens (26) become available, the models presented here can provide a toehold for understanding 78 links between pMHC-I antigen structure and immunogenicity, with actionable value for the 79 development of peptide vaccines to combat the disease. Identification of SARS-CoV-2 peptide epitopes 82 The SARS-CoV-2 protein sequences (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2) were 83 obtained from NCBI and used to generate all possible peptides of lengths 9 and 10 (9,621 9mer 84 and 9,611 10mer peptides). We used NetMHCpan-4.0 (27) to derive binding scores to HLA-85 A*02:01, and retained only peptides classified as strong or weak binders (selected using the default 86 percentile rank cut-off values). The binding classification was performed using eluted ligand 87 likelihood predictions. While in this study we use NetMHCpan-4.0 predictions as inputs to select 88 candidate epitopes for structure modeling, our workflow is fully compatible with any alternative 89 epitope prediction method. made available through an online database (see data availability). The website that hosts our 120 database was constructed using the Django web framework. Template identification for structure modeling using RosettaMHC 126 Our full workflow for template identification and structure modeling is outlined in Figure 1a Table 2 ). Inspection of Rosetta binding energies derived from models in this set 201 shows a similar distribution to the epitopes classified by NetMHCpan-4.0 as strong binders, with 202 the energies of 19/28 peptides falling well within the distribution of the refined PDB templates 203 (red dots in Figure 3e ). Based on these observations, we further classified all epitopes in the original set provided by However, due to substantial sequence variability in surface-exposed residues at the P2-P8 Table 1 . Subangstrom 306 accuracy in pHLA-I modeling by Rosetta FlexPepDock refinement protocol GradDock: rapid simulation and tailored 309 ranking functions for peptide-MHC Class I docking MFPred: Rapid and accurate 311 prediction of protein-peptide recognition multispecificity using self-consistent mean field theory High-Resolution Comparative Modeling with RosettaCM Frontiers | A Recurrent Mutation in Anaplastic Lymphoma Kinase with Distinct Neoepitope 317 Distinguishing functional polymorphism from random variation in the sequences 320 of >10,000 HLA-A, -B and -C alleles The Protein Data Bank Large-scale detection of 326 antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes Empty 330 peptide-receptive MHC class I molecules for efficient detection of antigen-specific T cells High Throughput pMHC-I Tetramer Library Production Using Chaperone Quantitating T Cell Cross-Reactivity for Unrelated 339 NetMHCpan-341 4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand Peptide Binding Affinity Data Amino acid substitution matrices from protein blocks PyRosetta: a script-based interface for 346 implementing molecular modeling algorithms using Rosetta Fast, scalable generation of 349 high-quality protein multiple sequence alignments using Clustal Omega Alternate states of proteins revealed by detailed energy landscape mapping A new coronavirus associated with human respiratory disease in 356 China The Human Genome Browser at UCSC The length distribution of class I restricted T cell 361 epitopes is determined by both peptide supply and MHC allele specific binding preference The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune 372 Responses to SARS-CoV-2 ViPR: an open bioinformatics database and analysis resource for virology 376 research Antigen Receptor Recognition of Antigen-Presenting Molecules How TCRs bind MHCs, peptides, 381 and coreceptors ATLAS: A database linking binding affinities with structures for wild-type and mutant 384 TCR-pMHC complexes Structure Based Prediction of Neoantigen Immunogenicity Predicting Humoral Alloimmunity from Differences in Donor and 390 Recipient HLA Surface Electrostatic Potential Electrostatics of 392 nanosystems: application to microtubules and the ribosome Improvements to the APBS biomolecular solvation software suite. Protein 398 Sci The PyMOL Molecular Graphics System