key: cord-0810760-x7hx530s authors: Raybould, Matthew I. J.; Kovaltsuk, Aleksandr; Marks, Claire; Deane, Charlotte M. title: CoV-AbDab: the Coronavirus Antibody Database date: 2020-05-15 journal: bioRxiv DOI: 10.1101/2020.05.15.077313 sha: 9cfe8161a5aedb0b44552212a3d72681e6683d4b doc_id: 810760 cord_uid: x7hx530s The emergence of a novel strain of betacoronavirus, SARS-CoV-2, has led to a pandemic that has been associated with hundreds of thousands of deaths. Research is ongoing around the world to create vaccines and therapies to minimise rates of disease spread and mortality. Crucial to these efforts are molecular characterisations of neutralising antibodies to SARS-CoV-2. Such antibodies would be valuable for measuring vaccine efficacy, diagnosing exposure, and developing effective biotherapeutics. Here, we describe our new database, CoV-AbDab, which already contains data on over 380 published/patented antibodies and nanobodies known to bind to at least one betacoronavirus. This database is the first consolidation of antibodies known to bind SARS-CoV-2 and other betacoronaviruses such as SARS-CoV-1 and MERS-CoV. We supply relevant metadata such as evidence of cross-neutralisation, antibody/nanobody origin, full variable domain sequence (where available) and germline assignments, epitope region, links to relevant PDB entries, homology models, and source literature. Our preliminary analysis exemplifies a spectrum of potential applications for the database, including identifying characteristic germline usage biases in receptor-binding domain antibodies and contextualising the diagnostic value of the SARS-CoV binding CDRH3s through comparison to over 500 million antibody sequences from SARS-CoV serologically naive individuals. Community submissions are invited to ensure CoV-AbDab is efficiently updated with the growing body of data analysing SARS-CoV-2. CoV-AbDab is freely available and downloadable on our website at http://opig.stats.ox.ac.uk/webapps/coronavirus. To respond effectively to the recent Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) pandemic, it is essential to understand the molecular basis for a successful immune response to coronavirus infection (1) . In particular, characterising the B-cell response is important as the identification of potent neutralising antibodies could pave the way for effective treatments, aid in prior exposure diagnosis, or assist in predicting vaccine efficacy (2) (3) (4) (5) . Molecular characterisations of binding/neutralising antibodies to SARS-CoV-2 antigens are only just beginning to emerge. However, the SARS-CoV-2 and SARS-CoV-1 (the virus responsible for the 2003 epidemic) spike protein receptor binding domains (RBDs) target the same human receptor and share high sequence and structural homology (2) . As a result, collating data on SARS-CoV-1 binders may lead to the identification of potent cross-neutralising antibod-ies, as suggested in some early SARS-CoV-2 studies (6, 7). Solved crystal and cryo-EM structures indicate a relatively discrete set of neutralising RBD epitopes (possibly resulting from substantial glycan coverage (8)), with paratopes tending to span both the heavy and light chain complementaritydetermining regions (6, (9) (10) (11) (12) . Other SARS-CoV-2 surface proteins also display homology to more distantly related betacoronaviruses such as the Middle East Respiratory Syndrome coronavirus (MERS-CoV). Therefore, knowledge of antibodies that bind to MERS-CoV antigens could be relevant in treating SARS-CoV-2 infection, and indeed the anti-MERS-CoV combination therapy REGN3048/REGN3051 is already being trialled on SARS-CoV-2 patients in the USA (13) . Given this, a central database facilitating molecularlevel comparisons between published and patented anticoronavirus antibodies would be a valuable tool in the fight against COVID19. This resource would also act as a central hub to consolidate knowledge and coordinate efforts to identify novel antibodies that neutralise SARS-CoV-2. As the number of known binders builds up over time, researchers could harness this repository for many purposes, including deriving crucial sequence/structural patterns that distinguish neutralising from non-neutralising SARS-CoV-2 binders (1), or deducing independent neutralising epitopes exploitable by combination therapies. We have built CoV-AbDab, a new database that aims to document molecular information and metadata on all published or patented anti-coronavirus antibodies. Academic papers and patents containing coronavirus-binding antibodies were primarily sourced by querying PubMed, BioRxiv, MedRxiv, GenBank, and Google Patents with relevant search terms. Several review articles were helpful in ensuring maximal coverage, in particular those by Coughlin and Prabhakar (14) , Du et al. (18) . If the variable domain sequence was available, ANARCI (19) was used to number sequences in the IMGT (20) numbering scheme, and to assign V and J gene origins. In some cases we could source germline assignments and/or CDR3 sequences from the source literature for antibodies where the full Fv sequence was not supplied. Our Structural Antibody Database (21) , which tracks all antibody structures submitted to the Protein Data Bank (22) (PDB), was mined to identify relevant solved structures. Our antibody/nanobody homology modelling tool, ABody-Builder (23) , was used to generate full Fv region structural models where no solved structures were available. CoV-AbDab is an effort to document all coronavirus binding/neutralising antibodies and nanobodies reported in academic publications and commercial patents. Where possible, the following information is documented for each entry: 1. The published name of the antibody/nanobody 2. Antigens that the antibody/nanobody has been proven to bind and/or neutralise. 3. The protein domain targeted by the antibody/nanobody (e.g. spike protein receptor binding domain) 4. The developmental origin of the antibody/nanobody (e.g. engineered/naturally raised, species information, etc.) 5. Sequence information including: (a) the entire variable domain sequence for the antibody/nanobody, highlighting the CDR3 regions, and (b) V and J gene germline assignments. 6. Links to any available structures involving the antibody/nanobody 7. (If Fv sequence available) A homology model of the antibody/nanobody 8. References to the primary literature on the antibody/nanobody 9. Timestamps to show when the antibody/nanobody was added and last updated 10. Any steps we are taking to follow up on the entry (e.g. to source its sequence and/or add further metadata) As of 14 th May 2020, CoV-AbDab contains 385 entries across 46 publications (6, 7, 9, 11, 12, 24-64) and 19 patents. Of these, 156 entries are associated with MERS-CoV, 149 are associated with SARS-CoV-1, and 105 are associated with SARS-CoV-2 (each entry may be tested against multiple coronaviruses). It lists 263 unique full variable domain antibody/nanobody sequences and 56 links to relevant PDB structures, which include coronavirus spike proteins bound to their native receptors (35, (65) (66) (67) (68) (69) (70) (71) (72) . We are continuing to contact authors to confirm whether missing sequences can be recovered and added to existing entries. If sequences have been lost or cannot be released, they have been removed from the database and confirmed as such in a separate list on the CoV-AbDab homepage. The following analysis was carried out on the CoV-AbDab database as of 10 th May 2020. For clarity, we use the term "SARS-CoV-1" to refer specifically to the virus that caused the 2003 epidemic, and "SARS-CoV" to refer to binders to SARS coronaviruses in a general sense. Developmental Origins and Targets. We first analysed the developmental origins of antibody/nanobody binders to SARS CoV-1/2 ( Figure 1a ) and MERS-CoV (Supplementary Figure 1a ). The vast majority of the SARS-CoV antibody binders have human genetic origin (88.5% with se-b S1, RBD S1, RBD/non-RBD S1, non-RBD quence information aligned to human germlines), and derive from a mixture of isolated B-cells from infected or convalescent patients, transgenic mice, or recombinant human immune or non-immune phage display libraries. We soon expect the proportion from infected human B cells to increase, as papers characterising and panning the adaptive immune responses of SARS-CoV-2 patients continue to emerge (24) (25) (26) . A relatively small portion of antibodies were detected by challenging mice with SARS antigens, and a few of these were subsequently humanised. All but one SARS-CoV binding nanobody was obtained using phage display. MERS-CoV antibodies followed a similar distribution of origins, but nanobodies were sourced from the B-cells of infected/convalescent camels or immunised llamas (Supplementary Figure 1b) . We also evaluated the distribution of protein targets (and epitope regions, for spike protein binders) for all anti-SARS-CoV1/2 ( Figure 1b) At the time of writing, sequence information has been released for three antibodies (CR3022, S309, and S315) and one nanobody (VHH-72) that have been proven to neutralise SARS-CoV-2. These all target the RBD, and can crossneutralise SARS-CoV-1. In constructing our database, we evaluated/collected the gene transcript origins of as many of the anti-SARS-CoV and anti-MERS-CoV antibodies as possible. Here, we analyse IGHV gene usage, as this transcript encodes two of the three heavy chain complementarity determining regions (CDRH1 and CDRH2). Analysis of the CDRH3 region, which lies at the junction of IGHV, IGHD, and IGHJ genes, is performed in the next section. Figure 2a shows the distribution of IGHV genes in SARS-CoV binding antibodies against all targets (left-hand-side), and after filtering only for antibodies known to bind the RBD (right-hand-side). In both cases, over half of the antibod- To see whether RBD binding CDRH3s displayed any sequence biases, we used WebLogo plots (73) to visualise residue/position distributions (Figure 3 ). The MERS-CoV RBD binders displayed slightly higher homology at central loop positions, but neither showed a strong signal that implicates a particular interaction type. The SARS-CoV RBD binders have a slight tendency to exploit a poly-tyrosine tail towards the end of the CDRH3, hinting at a role for the IGHJ6 germline that bears this motif. IGHJ6 was independently implicated in a clone convergent in four of six SARS-CoV-2 patients in the study by Nielsen et al. (26) . Finally, we evaluated the closest sequence identity match between all SARS-CoV binding CDRH3s and the over 500 million CDRH3s in our Observed Antibody Space (OAS) database (78) . The OAS database is a regularly updated project to catalogue all publicly available immune repertoire sequencing experiments (currently over 60 studies), providing cleaned amino acid sequence datasets binned by individual and other useful metadata. We assume that the vast majority of this sampled population is serologically naive to SARS-CoV-1 and SARS-CoV-2, given both the high infection rate and that there is currently no evidence to suggest that exposure to common cold coronaviruses yields SARS-CoV cross-reactive antibodies (79) . It follows that the presence of CDRH3s shown to bind SARS-CoV but that have high sequence identity matches to OAS may be less useful for diagnosing SARS-CoV-2 exposure. to be proximal to sequences isolated in the recent Stanford SARS-CoV-2 patient serum investigation (26) . Exact clonal matches (V gene + high CDRH3 identity) were considerably rarer, implying full clonotyping may need to be performed on SARS-CoV-2 repertoires in order to identify genuine responding antibodies. Conversely, some CDRH3s from SARS-CoV-2 neutralising antibodies found in SARS-CoV-1 (mAb S309 (6)) and SARS-CoV-2 (mAb 32D4 (25)) responding repertoires have considerably lower than average closest sequence identity matches to OAS (70% and 67% respectively). We have attempted to identify all existing published information on SARS-CoV and MERS-CoV binding antibodies, however encourage users to inform us of any historical investigations we may have missed. We are also reaching out to authors of new studies characterising coronavirus binding antibodies to send us their data in Excel or CSV format. Data and queries may be sent to us by email (opig@stats.ox.ac.uk). Minimum requirements for addition to our database are the full antibody/nanobody variable domain sequence, binding or neutralising data for at least one specified coronavirus protein, and a link to a relevant preprint, publication, or patent. Through these submissions and our own efforts to track the scientific literature, we hope to provide a central community resource for coronavirus antibody sequence and structural information. Currently, the database can be queried by a search term (e.g. SARS-CoV-2) and ordered by any metadata field for maximum interpretability. Users can download the entire database as a CSV file and bulk download all ANARCI numberings, IMGT-numbered PDB files, and IMGT-numbered homology models. CoV-AbDab is free to access and download without registration and is hosted at http://opig.stats.ox.ac.uk/webapps/coronavirus. CoV-AbDab uses the following patents as a primary source of antibody/nanobody sequences: CN1664100, CN1903878, CN100374464, CN104447986, CN106380517, EP2112164, KR101828794, KR101969696, KR20190122283, KR20200020411, US7396914, WO2005/012360, WO2005/054469, WO2005/060520, WO2006/095180, WO2008/035894, WO2015/179535, WO2016/138160, and WO2019039891. The potential danger of suboptimal antibody responses in COVID-19 The trinity of COVID-19: immunity, inflammation and intervention The landscape of lung bronchoalveolar immune cells in COVID-19 revealed by single-cell RNA sequencing. medRxiv Immune Cell Profiling of COVID-19 Patients in the Recovery Stage by Single-Cell Sequencing. medRxiv Ningshao Xia, and Zheng Zhang. Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019 A non-competing pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2. medRxiv Godelieve de Bree, Rogier Sanders, and Marit van Gils. Potent neutralizing antibodies from COVID-19 patients define multiple targets of vulnerability. bioRxiv Rapid in silico design of antibodies targeting SARS-CoV-2 using machine learning and supercomputing. bioRxiv Unexpected receptor functional mimicry elucidates activation of coronavirus fusion Coronavirus treatment: Vaccines/drugs in the pipeline for COVID-19 Neutralizing human monoclonal antibodies to severe acute respiratory syndrome coronavirus: target, mechanism of action, and therapeutic potential MERS-CoV spike protein: a key target for antivirals Advances in MERS-CoV Vaccines and Therapeutics Based on the Receptor-Binding Domain Perspectives on monoclonal antibody therapy as potential therapeutic intervention for Coronavirus disease-19 (COVID-19). Asian Pac Neutralizing Antibodies against SARS-CoV-2 and Other Human Coronaviruses ANARCI: antigen receptor numbering and receptor classification IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains SAbDab: the structural antibody database The Protein Data Bank ABody-Builder: Automated antibody structure prediction with data-driven accuracy estimation Potent human neutralizing antibodies elicited by SARS-CoV-2 infection. bioRxiv Human monoclonal antibodies block the binding of SARS-CoV-2 spike protein to angiotensin converting enzyme 2 receptor B cell clonal expansion and convergent antibody responses to SARS-CoV-2. Research Square (Nature Preprint) Potent antibody binding to an unexpected highly conserved cryptic epitope of the SARS-CoV-2 Spike A human monoclonal antibody blocking SARS-CoV-2 infection. bioRxiv Synthetic nanobodies targeting the SARS-CoV-2 receptor-binding domain. bioRxiv Loes van Schie, VIB-CMB COVID-19 Response Team Human Monoclonal Antibody Combination against SARS Coronavirus: Synergy and Coverage of Escape Mutants Structural Insights into Immune Recognition of the Severe Acute Respiratory Syndrome Coronavirus S Protein Receptor Binding Domain Structural Basis for Potent Cross-Neutralizing Human Monoclonal Antibody Protection against Lethal Human and Zoonotic Severe Acute Respiratory Syndrome Coronavirus Challenge Broadening of Neutralization Activity to Directly Block a Dominant Antibody-Driven SARS-Coronavirus Evolution Pathway Structural Basis of Neutralization by a Human Anti-severe Acute Respiratory Syndrome Spike Protein Antibody, 80R Structure of Severe Acute Respiratory Syndrome Coronavirus Receptor-binding Domain Complexed with Neutralizing Antibody Therapy with a Severe Acute Respiratory Syndrome-Associated Coronavirus-Neutralizing Human Monoclonal Antibody Reduces Disease Severity and Viral Burden in Golden Syrian Hamsters Molecular and Biological Characterization of Human Monoclonal Antibodies Binding to the Spike and Nucleocapsid Proteins of Severe Acute Respiratory Syndrome Coronavirus Molecular characterization of a panel of murine monoclonal antibodies specific for the SARS-coronavirus Generation and characterization of human monoclonal neutralizing antibodies with distinct binding and sequence features against SARS coronavirus using XenoMouse an efficient method to make human monoclonal antibodies from memory b cells: potent neutralization of sars coronavirus Structural Basis for Potent Neutralization of Betacoronaviruses by Single-domain Camelid Antibodies Structural definition of a neutralization epitope on the N-terminal domain of MERS-CoV spike glycoprotein Structural Definition of a Neutralization-sensitive Epitope on the MERS-CoV S1-NTD Towards a solution to MERS: protective human monoclonal antibodies targeting different domains and functions of the MERS-coronavirus spike glycoprotein Structural definition of a unique neutralization epitope on the receptor-binding domain of MERS-CoV spike glycoprotein Importance of neutralizing monoclonal antibodies targeting multiple antigenic sites on the Middle East respiratory syndrome coronavirus spike glycoprotein to avoid neutralization escape Ultrapotent human neutralizing antibody repertoires against Middle East respiratory syndrome coronavirus from a recovered patient A novel nanobody targeting Middle East respiratory syndrome coronavirus (MERS-CoV) receptor-binding domain has potent cross-neutralizing activity and protective efficacy against MERS-CoV Chimeric camel/human heavy-chain antibodies protect against MERS-CoV infection Human neutralizing monoclonal antibody inhibition of Middle East Respiratory Syndrome coronavirus replication in the common marmoset Immunogenicity and structures of a rationally designed prefusion MERS-CoV spike antigen Single-dose treatment with a humanized neutralizing antibody affords full protection of a human transgenic mouse model from lethal Middle East respiratory syndrome (MERS)-coronavirus infection Junctional and allele-specific residues are critical for MERS-CoV neutralization by an exceptionally potent germline-like antibody Prophylactic and postexposure efficacy of a potent human monoclonal antibody against MERS coronavirus A humanized neutralizing antibody against MERS-CoV targeting the receptor-binding domain of the spike protein Evaluation of candidate vaccine approaches for MERS-CoV Pre-and postexposure efficacy of fully human antibodies against Spike protein in a novel humanized mouse model of MERS-CoV infection A human SARS-CoV neutralizing antibody against epitope on S2 protein Exceptionally potent neutralization of Middle East respiratory syndrome coronavirus by human monoclonal antibodies Potent neutralization of MERS-CoV by human neutralizing monoclonal antibodies to the viral spike glycoprotein Identification of human neutralizing antibodies against MERS-CoV and their role in virus adaptive evolution A conformation-dependent neutralizing monoclonal antibody specifically targeting receptor-binding domain in Middle East respiratory syndrome coronavirus spike protein Structural bases of coronavirus attachment to host aminopeptidase N and its inhibition by neutralizing antibodies Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Structure of SARS Coronavirus Spike Receptor-Binding Domain Complexed with Receptor Structure of MERS-CoV spike receptor-binding domain complexed with human receptor DPP4 Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26 WebLogo: A Sequence Logo Generator Rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the SARS-CoV-2 spike protein. bioRxiv Characterization of neutralizing antibodies from a SARS-CoV-2 infected individual. bioRxiv Rapid isolation of potent SARS-CoV-2 neutralizing antibodies and protection in a small animal model. bioRxiv IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization shifts and varies by ethnicity Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires A serological assay to detect SARS-CoV-2 seroconversion in humans This work was supported by an Engineering and Physical Sciences Research Council (EPSRC) and Medical Research Council (MRC) grant [EP/L016044/1] awarded to MIJR, and Biotechnology and Biological Sciences Research Council (BBSRC) grant [BB/M011224/1] award to AK, and funding from GlaxoSmithKline plc, UCB Pharma Ltd., AstraZeneca plc, and F. Hoffmann-La Roche. We would like to thank the many authors who have responded in a timely and helpful manner to our requests for data.