key: cord-0724651-0y7xbgun authors: Wierbowski, Shayne D.; Liang, Siqi; Chen, You; Andre, Nicole M.; Lipkin, Steven M.; Whittaker, Gary R.; Yu, Haiyuan title: A 3D Structural Interactome to Explore the Impact of Evolutionary Divergence, Population Variation, and Small-molecule Drugs on SARS-CoV-2-Human Protein-Protein Interactions date: 2020-10-13 journal: bioRxiv DOI: 10.1101/2020.10.13.308676 sha: 816d2ed394f7deee65e4d0a4d432b9a82bf9c937 doc_id: 724651 cord_uid: 0y7xbgun The recent COVID-19 pandemic has sparked a global public health crisis. Vital to the development of informed treatments for this disease is a comprehensive understanding of the molecular interactions involved in disease pathology. One lens through which we can better understand this pathology is through the network of protein-protein interactions between its viral agent, SARS-CoV-2, and its human host. For instance, increased infectivity of SARS-CoV-2 compared to SARS-CoV can be explained by rapid evolution along the interface between the Spike protein and its human receptor (ACE2) leading to increased binding affinity. Sequence divergences that modulate other protein-protein interactions may further explain differences in transmission and virulence in this novel coronavirus. To facilitate these comparisons, we combined homology-based structural modeling with the ECLAIR pipeline for interface prediction at residue resolution, and molecular docking with PyRosetta. This enabled us to compile a novel 3D structural interactome meta-analysis for the published interactome network between SARS-CoV-2 and human. This resource includes docked structures for all interactions with protein structures, enrichment analysis of variation along interfaces, predicted ΔΔG between SARS-CoV and SARS-CoV-2 variants for each interaction, predicted impact of natural human population variation on binding affinity, and a further prioritized set of drug repurposing candidates predicted to overlap with protein interfaces†. All predictions are available online† for easy access and are continually updated when new interactions are published. † Some sections of this pre-print have been redacted to comply with current bioRxiv policy restricting the dissemination of purely in silico results predicting potential therapies for SARS-CoV-2 that have not undergone thorough peer-review. The results section titled “Prioritization of Candidate Inhibitors of SARS-CoV-2-Human Interactions Through Binding Site Comparison,” Figure 4, Supplemental Table 9, and all links to our web resource have been removed. Blank headers left in place to preserve structure and item numbering. Our full manuscript will be published in an appropriate journal following peer-review. The ongoing global COVID-19 pandemic caused by the infection of SARS-CoV-2 has to date infected impact of mutations on interaction binding affinity and performed a comparison of protein-protein and 96 protein-drug binding sites. We compile all results from our structural interactome into a user-friendly 97 web server allowing for quick exploration of individual interactions or bulk download and analysis of the 98 whole dataset. Further, we explore the utility of our interactome modeling approach in identifying key 99 interactions undergoing evolution along viral protein interfaces, highlighting population variants on 100 human interfaces that could modulate the strength of viral-host interactions to confer protection from or 101 susceptibility to COVID-19, and prioritizing drug candidates predicted to bind competitively at viral-102 human interaction interfaces. Enrichment of divergence between SARS-CoV and SARS-CoV-2 at spike-ACE2 binding interface To highlight the utility of computational and structural approaches to model the SARS-CoV-2-human 106 interactome, we first examined the interaction between the SARS-CoV-2 spike protein (S) and human 107 angiotensin-converting enzyme 2 (ACE2) (Fig 1.a) . This interaction is key for viral entry into human 108 cells 3 and is the only viral-human interaction with solved crystal structures available in both SARS-CoV 47 109 and SARS-CoV-2 [48] [49] [50] . Comparison between SARS-CoV and SARS-CoV-2 revealed that sequence 110 divergence of the S protein was highly enriched at the S-ACE2 interaction interface (Fig 1.a; Log2OddsRatio=2.82, p=1.97e-5), indicating functional evolution around this interaction. To explore the 112 functional impact of these mutations on this interaction, we leveraged the Rosetta energy function 51 to 113 estimate the change in binding affinity (ΔΔG) between the SARS-CoV and SARS-CoV-2 versions of the 114 S-ACE2 interaction (Fig 1.b and 1.c) . The predicted negative ΔΔG value of -14.66 Rosetta Energy Units 115 (REU) indicates an increased binding affinity using the SARS-CoV-2 S protein driven by better optimized 116 solvation and hydrogen bonding potential fulfillment along the ACE2 interface. Our result is consistent 117 with the hypothesis that increased stability of the S-ACE2 interaction is one of the key reasons for 118 elevated transmission of SARS-CoV-2 52 . Moreover, recent experimental energy kinetics assays have 119 shown that SARS-CoV-2 S protein binds ACE2 with 10-20-fold higher affinity than that of SARS-CoV S 120 protein 53 supporting the conclusions from our computational modeling. among individuals 6, 7, 54 . Several hypotheses for genetic predisposition models have been proposed 123 including that expression quantitative trait loci (eQTLs) may up-or down-regulate host response genes 124 and that functional coding variants may alter viral-human interactions 55, 56 . For instance, a recent RNA-In order to add a structural component to our interactome map, and thereby enable modeling of 149 the binding affinity for these interactions, we additionally performed docking in PyRosetta using our 150 ECLAIR interface likelihood predictions to refine the search space (Supplemental After constructing the 3D interactome between SARS-CoV-2 and human, we first looked for evidence 162 of interface-specific variation by mapping both gnomAD 58 reported human population variants 163 (Supplemental Table 3 ) and sequence divergences between SARS-CoV and SARS-CoV-2 164 (Supplemental Table 4 ) onto the predicted interfaces. In general, conserved residues have been shown 165 to cluster at protein-protein interfaces 63 , and a recent analysis of SARS-CoV-2 structure and evolution 166 likewise concluded that highly conserved surface residues were likely to drive protein-protein 167 interactions 64 . Consistent with these prior findings at an interactome-wide level, we observed significant 168 depletion for both viral and human variation along the predicted interfaces comparable to that observed 169 on solved human-human interfaces (Fig 2.a) . Nonetheless, considering each interaction individually, our analysis uncovered a 13 interaction 171 interfaces enriched for human population variants (Fig 2.b) , and 7 enriched for recent viral sequence 172 divergences (Fig 2.c) . A breakdown of variant enrichment on each interface is provide in Supplemental 173 Table 5 . The individual viral interfaces showing an unexpected degree of variation may-like the 174 previously discussed S-ACE2 interface-be indicative of recent functional evolution around the viral-human interaction. Considering the slower rate of evolution in humans, enrichment of population variants along the human interfaces is unlikely to be a selective response to the virus. Rather, these 177 interfaces with high population variation along the interfaces may represent edges in the interactome whose strength may fluctuate among individuals or between populations. Alternatively, enrichment and depletion of variation along the human-viral interfaces could help distinguish viral proteins that bind 180 along existing-and therefore conserved-human-human interfaces from those that bind using novel 181 interfaces-that would be less likely to be under selective pressure. To further explore the functional impact of naturally occurring variants on the human interactors 183 of SARS-CoV-2, we considered variants with phenotypic associations as reported in HGMD 65 , ClinVar 66 184 or the NHGRI-EBI GWAS Catalog 67 . Interactors of SARS-CoV-2 were significantly more likely than the 185 rest of the human proteome to harbor phenotypic variants in each of these databases (Fig 2.d) . Notably, 186 among the individual disease categories enriched in this gene set, several were consistent with reported 187 comorbidities including heart disease, respiratory tract disease, and metabolic disease 68, 69 (Fig 2. e; 188 Supplemental Table 6 ). Disruption of native protein-protein interactions is one mechanism of disease 189 pathology, and disease mutations are known to be enriched along protein interfaces 70, 71 . Human 190 population variants on the predicted human-viral interface were more likely to be annotated as 191 deleterious by SIFT 72 and PolyPhen 73 but showed identical allele frequency distributions compared to 192 those off the interfaces (Supplemental Figure 4) . However, mapping annotated disease mutations onto 193 the protein interfaces only revealed significant enrichment along known human-human interfaces; no 194 such enrichment was found on human-viral interfaces (Fig 2.f) . This is likely because unlike with human-195 human interactions, mutations disrupting human-viral interactions would not disrupt natural cell function, 196 and therefore would be unlikely to be pathogenic. Our finding that disease mutations and viral proteins 197 affect human proteins at distinct sites is consistent with a two-hit hypothesis of comorbidities whereby 198 proteins whose function is already affected by genetic background may be further compromised by viral 199 infection. We next sought to explore the impact of sequence divergence in SARS-CoV-2 relative to SARS-CoV on 202 viral-human interactions. Mutations between the two viruses were identified by pairwise alignment and the 203 impacts of these mutations on the binding energy (ΔΔG) for 250 interactions amenable to docking were 204 predicted using a PyRosetta pipeline 46, 59, 60 . Although the binding energy for most interactions was 205 unchanged-either because no mutations occurred near the interface or because the mutations that did 206 had marginal effect-we observed an increased likelihood of the divergence from SARS-CoV to SARS- CoV-2 resulting in decreased binding energy (i.e. more stable interaction) (Fig 3. a; Supplemental Table 208 7). The significant outliers in these ΔΔG predictions can help pinpoint key differences between the viral-209 human interactomes of SARS-CoV and SARS-CoV-2. We further note a wide range of affinity impacts 210 among various human interactors of a single viral protein (Fig 3.d) and hypothesize that these differences 211 may help identify the most important interactions. To further explore the significance of these changes in interaction affinity, we considered those 213 interactions with the largest decrease in binding energy; corresponding to the largest predicted increase in 214 affinity. Specifically, we highlight the interaction between coronavirus orf9c and human mitochondrial NADH scanning mutagenesis along all docked interfaces in PyRosetta. We identified as binding energy hotspot 254 mutations all mutations with a predicted ΔΔG at least one standard deviation away from the mean for identical amino acid substitutions across the rest of the interface. In total, out of 2,241 population variants 256 on eligible interfaces, 161 (7.2%) were identified as hotspots predicted to disrupt interaction stability, and 257 116 (5.2%) were identified as hotspots predicted to contribute to interaction stability (Fig 3.b) . Most of the 258 hotspot mutations were predicted to be driven by solvation or repulsive forces, with disruptive hotspots 259 primarily being driven by repulsive forces and stabilizing hotspots primarily being driven by solvation forces 260 (Fig 3.c) . Results summarizing the predicted impact of all 2,241 population variants on the docked 261 interfaces are provided in Supplemental Table 8 . The current version of the SARS-CoV-2 human structural interactome web server describes 332 291 viral-human interactions reported by Gordon et al. 20 . We will continue support for the web server with 292 periodic updates as additional interactome screens between SARS-CoV-2 and human are published. As we update, a navigation option to select between the current or previous stable releases of the web 294 server will be provided. Overall, we present a comprehensive resource to explore the SARS-CoV-2-human protein-protein 297 interactome map in a structural context. Analysis through this framework allows us to consider the recent 298 evolution of SARS-CoV-2 in the context of its interactome map and to prioritize for further functional 299 characterization key interactions. Likewise, our consideration of underlying variation in the human 300 proteins that interact with SARS-CoV-2 may be valuable in explaining differences in response to 301 infection. We particularly note that our observation that perturbation from underlying disease mutations 302 and viral protein binding occur at distinct sites on the protein is of clinical interest. Further investigation 303 into the combined role of these two sources of perturbation to better understand the mechanisms linked 304 to comorbidities is warranted. However, our work is not without limitation. Firstly, we note that although structural coverage 306 from our homology modelling of SARS-CoV-2 proteins was robust (Supplemental Figure 1) , the same done to orient the most likely interface residues on each structure towards each other, protein-protein 309 docking using incomplete protein models introduces some bias and low coverage may exclude some 310 true interface residues. For this reason, the initial ECLAIR interface annotations-which are less subject 311 to structural coverage limitations-may provide orthogonal value. We additionally note that direct Perhaps most importantly, we emphasize the importance of further experimental 323 characterization to confirm the predictions made here. Nonetheless we believe our 3D Structural SARS- CoV-2-human Interactome web server will prove to be a key resource in informing hypothesis driven 325 exploration of the mechanisms of SARS-CoV-2 pathology and host response. The scope, and potential 326 impacts of our webserver will continue to grow as we incorporate the results of ongoing and future 327 interactome screens between SARS-CoV-2 and human. Additionally, we note that our 3D structural 328 interactome framework can be rapidly deployed to analyze future viruses. Homology-based modeling of all 29 SARS-CoV-2 proteins was performed in Modeller 90 using a multiple 332 template modeling procedure. In brief, a list of candidate template structures for each protein to be 333 modelled was obtained by running BLAST 91 against a reference containing all sequences in the Protein Data Bank (PDB) 92 . Templates were filtered to only retain those with at least 30% identify to the protein 335 to be modelled, and the remaining templates were ranked using a weighted combination of percent 336 identity and coverage as described previously 93 . To compile the final set of overlapping templates for 337 modeling, first the top ranked template was selected as a seed. Overlapping templates were iteratively 338 added to the set so long as 1) the new template increased the overall coverage by at least 10%, and 2) 339 the new template retained a total percent identity no more than 25% worse than the initial seed template. Pairwise alignments between the protein to be modelled and the template set were generated using a regions with large gaps (at least 5 gaps in the alignment in a 10 residue window). Finally, alignment was To accommodate predictions between SARS-CoV-2 and human, slight alterations were made. Using the predicted interface probabilities reported by ECLAIR, we set up the initial docking 379 conformation to explore a restricted search space for each docking simulation. In cases where multiple 380 structures were available for the human protein, all structures were weighted based on the ECLAIR 381 scores for the covered residues in each structure to maximize both coverage age inclusion of likely 382 interface residues. For each protein in the interaction, we performed a linear regression classification in scikit-learn 102 to optimally separate the likely interface residues from likely non-interface residues. The plane defined by this linear regression served as a reference to orient the structures along the y-axis the x-z plane and separated a distance of 5 Å along the y-axis. For each docking attempt, a series of 387 random perturbations from these initial conformations were made to search the nearby space. First, the 388 human protein was rotated up to 360° along the y-axis to allow full exploration of different rotations of 389 the two interfaces relative to each other. Second, to apply some flexibility to the plane predicting the 390 interface vs. non-interface sides of each protein, up to 30° of rotation along the x-and z-axis were 391 allowed for both the viral and human proteins. Finally, a random translation up to 5 Å in magnitude was 392 applied to the human protein along the x-z plane so that the docking could explore contact points other 393 than the center of masses along these axes. After initializing these guided starting conformations, docking was simulated in PyRosetta 46 395 using a modified version of the protein-protein docking methodology provided by Gray 2006 103 . The 396 initial demo (https://graylab.jhu.edu/pyrosetta/downloads/scripts/demo/D100 Docking.py) takes two 397 chains from a co-crystal structure, applies a random perturbation, and re-docks them. Because 398 randomized initial orientation was already handled as described previously, these steps were removed 399 from our docking runs. In brief, the protein models were converted to centroid representation, slid into 400 contact using the "interchain_cen" scoring function, and converted back to full atom representation, 401 before having their side-chains optimized using the predefined "docking" and "docking_min" scoring Table 2 . To annotate interface residues from atomic resolution docked models, we used a previously described 407 and established definition for interface residues 45 . In brief, the solvent accessible surface area (SASA) 408 for both bound and unbound docked structures was calculated using NACCESS. 100 We define as accessibility) and 2) in contact with the interacting chain (defined as any residue whose absolute 411 accessibility decreased by ≥ 1.0 Å 2 ). The full list of SARS-CoV-2 mutations is reported in Supplemental Table 4 . Human population variants in all 332 human proteins shown to interact with SARS-CoV-2 430 proteins were obtained from gnomAD 58 reported as the most general term with no more significant ancestor term (Supplemental Table 6 , sheet 465 1). Raw enrichment values for all terms are also predicted (Supplemental Table 6 , sheet 2). For curation of disease and trait associations from NHGRI-EBI GWAS Catalog (http://www.ebi.ac.uk/gwas/) 67 The scoring function used for these calculations is as described previously 59 using the following weights; Protein-ligand Docking Using Smina The previous viral-human interactome screen by Gordon et al. 20 Supplemental Table 9 . List of all predicted drug-target binding sites . Comparison of the percentage of human genes that interact with (green) or do not interact with (orange) Coronaviruses: an overview of their replication and pathogenesis A pneumonia outbreak associated with a new coronavirus of probable bat origin Including Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). Mandell, Douglas, and Bennett's Principles and Practice of Infectious Diseases A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein Extrapulmonary manifestations of COVID-19 Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Severe obesity, increasing age and male sex are independently associated with worse in-hospital outcomes, and higher in-hospital mortality African-American COVID-19 Mortality: A Sentinel Event Characteristics Associated with Hospitalization Among Patients with COVID-19 Greater risk of severe COVID-19 in Black, Asian and Minority Ethnic populations is not explained by cardiometabolic, socioeconomic or behavioural factors, or by 25(OH)-vitamin D status: study of 1326 cases from the UK Biobank Disparities in Incidence of COVID-19 Among Underrepresented Racial/Ethnic Groups in Counties Identified as Hotspots During Racial demographics and COVID-19 confirmed cases and deaths: a correlational analysis of 2886 US counties The SARS-coronavirus-host interactome: identification of cyclophilins as target for pan-coronavirus inhibitors Global landscape of HIV-human protein complexes Protein Interaction Mapping Identifies RBBP6 as a Negative Regulator of Ebola Virus Replication Comparative Flavivirus-Host Protein Interaction Mapping Reveals Mechanisms of Dengue and Zika Virus Pathogenesis A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Structure of the human receptor tyrosine kinase met in complex with the Listeria invasion protein InlB. Cell SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Chemokine receptor CCR5 antagonist maraviroc: medicinal chemistry and clinical applications Inhibiting HIV-1 integrase by shifting its oligomerization equilibrium Small molecule inhibitors of the LEDGF site of human immunodeficiency virus integrase identified by fragment screening and structure based design Virus-Receptor Interactions: The Key to Cellular Invasion Structural Insights into the Interaction of Coronavirus Papain-Like Proteases and Interferon-Stimulated Gene Product 15 from Different Species Mechanism of inhibition of retromer transport by the bacterial effector RidL Solution structure of the complex between poxvirus-encoded CC chemokine inhibitor vCCI and human MIP-1beta Structural properties of the promiscuous VP16 activation domain Crystal structure of a gamma-herpesvirus cyclin-cdk complex Metabolic Syndrome and Viral Pathogenesis: Lessons from Influenza and Coronaviruses A unifying view of 21st century systems biology The molecular sociology of the cell Network medicine: a network-based approach to human disease Small molecules, big targets: drug discovery faces the protein-protein interaction challenge Small-molecule inhibitors of protein-protein interactions: progressing toward the reality AlphaSpace: Fragment-Centric Topographical Mapping To Target Protein-Protein Interaction Interfaces The Development and Current Use of BCL-2 Inhibitors for the Treatment of Chronic Lymphocytic Leukemia Identification of protein-protein interaction inhibitors targeting vaccinia virus processivity factor for development of antiviral agents Inhibition of human papillomavirus DNA replication by small molecule antagonists of the E1-E2 protein interaction Optimization and determination of the absolute configuration of a series of potent inhibitors of human papillomavirus type-11 E1-E2 protein-protein interaction: a combined medicinal chemistry, NMR and computational chemistry approach Protein-Protein Interactions in Virus-Host Systems. Front Microbiol Interactome INSIDER: a structural interactome browser for genomic studies PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design Structural basis of receptor recognition by SARS-CoV-2 Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Who is most likely to be infected with SARS-CoV-2? The Lancet Infectious Diseases Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations Genetic predisposition models to COVID-19 infection The mutational constraint spectrum quantified from variation in 141,456 humans A simple physical model for binding energy hot spots in protein-protein complexes Spatial chemical conservation of hot spot interactions in protein-protein complexes Tests of Concrete Strength across the Thickness of Industrial Floor Using the Ultrasonic Method with Exponential Spot Heads The sequence of human ACE2 is suboptimal for binding the S spike protein of SARS coronavirus 2. bioRxiv Conserved residue clusters at protein-protein interfaces and their use in binding site identification SARS-CoV2 (COVID-19) Structural/Evolution Dynamicome: Insights into functional evolution and human genomics. bioRxiv Human Gene Mutation Database (HGMD): 2003 update ClinVar: improving access to variant interpretations and supporting evidence The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 Characteristics Associated with Hospitalization Among Patients with COVID-19 Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis Widespread macromolecular interaction perturbations in human genetic disorders Three-dimensional reconstruction of protein networks provides insight into human genetic disease SIFT web server: predicting effects of amino acid substitutions on proteins A method and server for predicting damaging missense mutations Analysis of intraviral protein-protein interactions of the SARS coronavirus ORFeome Human mitochondrial complex I assembly is mediated by NDUFAF1 CIA30 complex I assembly factor: a candidate for human complex I deficiency? Hum Genet Human genome-wide RNAi screen reveals a role for nuclear pore proteins in poxvirus morphogenesis Mitochondrial reactive oxygen species control T cell activation by regulating IL-2 and IL-4 expression: mechanism of ciprofloxacin-mediated immunosuppression The Landscape of Human Cancer Proteins Targeted by SARS-CoV-2 Genome-wide siRNA screen identifies the retromer as a cellular entry factor for human papillomavirus The master regulator of the cellular stress response (HSF1) is critical for orthopoxvirus infection A genome-wide small interfering RNA (siRNA) screen reveals nuclear factor-kappaB (NF-kappaB)-independent regulators of NOD2-induced interleukin-8 (IL-8) secretion Architecture of the human interactome defines protein communities and disease networks TMED2 Potentiates Cellular IFN Responses to DNA Viruses by Reinforcing MITA Dimerization and Facilitating Its Trafficking Role of the early secretory pathway in SARS-CoV-2 infection Tom70 mediates activation of interferon regulatory factor 3 on mitochondria A whole-genome association study of major determinants for host control of HIV-1. Science Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations Comparative protein structure modeling using Modeller Basic local alignment search tool The Protein Data Bank Interactome3D: adding structural details to protein networks Protein Identification and Analysis Tools on the ExPASy Server The Proteomics Protocols Handbook Divergence measures based on the Shannon entropy Predicting functionally important residues from sequence conservation Direct coupling analysis for protein contact prediction Evolutionarily conserved pathways of energetic connectivity in protein families ModBase, a database of annotated comparative protein structure models and associated resources The interpretation of protein structures: estimation of static accessibility Accelerating protein docking in ZDOCK using an advanced 3D convolution library Scikit-learn: Machine Learning in Python High-resolution protein-protein docking UniProt: a worldwide hub of protein knowledge Analysis of multimerization of the SARS coronavirus nucleocapsid protein A new coronavirus associated with human respiratory disease in China Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan A general method applicable to the search for similarities in the amino acid sequence of two proteins Amino acid substitution matrices from protein blocks The Ensembl Variant Effect Predictor Explaining odds ratios. J Can Acad Child Adolesc Psychiatry LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants Open Babel: An open chemical toolbox Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise SARS-CoV-2 that contain disease annotations in HGDM (Log2OR=0.57, p=1.70e-4) SARS-CoV-2 proteins were significantly more likely to harbor disease mutations than non-interactors Error bars indicate ± SE. e, A sample of individual disease terms enriched in human genes targeted by 603 Comparison of the enrichment of HGDM, ClinVar, and GWAS annotated mutations on human-vial 605 interfaces or human-human interfaces for the same gene set. Although disease mutations were enriched 606 on human-human interfaces (HGMD, Log2OR=0 24), no 607 enrichment was observed on human-viral interfaces (HGMD, Log2OR=0.21, p=0.13; ClinVar The GWAS category was removed from this analysis because most lead GWAS 609 SNPs occurred in non-coding regions. Error bars indicate ± SE Predicted changes in binding affinity from sequence divergences between SARS-CoV and SARS-613 An overall representation of 614 these ΔΔG predictions is reported (mean=-1.40 REU, std=6.16 REU) with interactions sorted from those 615 with the largest decrease in binding energy (most stabilized relative to SARS-CoV) to those with the 616 largest increase in binding energy (most destabilized relative to SARS-CoV) < z-score ≤ -1, n=85), or strongly stabilizing (z-score ≤ -2, n=31) score < 1, n=1,964) showed minimal impact of binding affinity. c, Breakdown of the contribution of each 623 term in the PyRosetta energy function used for in-silico scanning mutagenesis for all population variants A breakdown of which term contributed most heavily 625 to the classification of all 277 interface hotspot population variants is shown on the right. d, Individual 626 SARS-CoV-2-human interactions involving the same viral protein can have distinct interfaces with 627 distinct predicted changes in binding affinity between SARS-CoV and SARS-CoV-2 versions of the 628 protein. An example involving orf9b is highlighted where some interactions (e.g. TOMM70 and PTBP2) 629 are predicted to be more stabilized in SARS-CoV-2 whereas others (e.g. BAG5, SLC9A3R1, and 630 MARK2) are predicted to me unaffected. e, Docked structure for the interaction between SARS-CoV-2 631 orf9c and human NDUFAF1 SARS-CoV-2 (bottom) orf9c. Interface residues are colored by their predicted energy contribution 633 from blue (stabilizing) to white (no impact) to red (destabilizing) -2 are labeled in red, while other residues with a major contribution to the binding 635 affinity are labeled in green (NDUFAF1) or blue (orf9c). The overall predicted change in binding energy 636 (ΔΔG=-21.7 REU) suggests the interaction is more stable (lower energy) in the SARS-CoV-2 version of 637 the interaction Supplemental Figure 4. Summary of human population variant frequency and deleteriousness Summary of allele frequency for human population variants either on or off the predicted human-700 viral interface presented as either a raw distribution or a cumulative density respectively. Variants in 701 either category had roughly identical allele frequency distributions. c, d, Summary of the SIFT 702 deleteriousness score for human population variants either on or off the predicted human-viral interface 703 presented as either a raw distribution or a cumulative density respectively Population variants on the 706 interface were significantly more likely to be classified deleterious. f, g, Summary of the PolyPhen 707 deleteriousness score for human population variants either on or off the predicted human-viral interface 708 presented as either a raw distribution or a cumulative density respectively. Plots are colored based on 709 the split between PolyPhen benign, possibly damaging, and probably damaging categories. e, Pie chart 710 breakdown of these categories. Pie char outlines distinguish interface (green) from non-interface 711 (orange) For each SARS-CoV-2-human interaction with 3D structure available for both proteins, 50 independent 680 guiding docking trials were used to select a final docked configuration. The structure for the viral protein 681 is colored from white to blue with darker blue corresponding to higher ECLAIR prediction. The structure 682 for the human protein is colored similarly using a green to white gradient. Initial semi-random docked 683 configurations were generated using five steps. First a plane separating ECLAIR predicted likely 684 interface from likely non-interface residues was drawn to divide each protein. Second, the two protein 685 chains were separated 5 Å apart on the y-axis using the previously defined plane to orient the likely 686 interface sides of each protein towards each other. Third, the human protein was randomly rotated up 687 to 360° along the y-axis to sample different orientations of the two interfaces relative to each other.