key: cord-0781134-7vj8heqx
authors: Starr, Tyler N.; Greaney, Allison J.; Hilton, Sarah K.; Ellis, Daniel; Crawford, Katharine H.D.; Dingens, Adam S.; Navarro, Mary Jane; Bowen, John E.; Tortorici, M. Alejandra; Walls, Alexandra C.; King, Neil P.; Veesler, David; Bloom, Jesse D.
title: Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding
date: 2020-08-11
journal: Cell
DOI: 10.1016/j.cell.2020.08.012
sha: 4ecd94de09179989c3190071fa6f192fe789794f
doc_id: 781134
cord_uid: 7vj8heqx

Summary The receptor binding domain (RBD) of the SARS-CoV-2 spike glycoprotein mediates viral attachment to ACE2 receptor, and is a major determinant of host range and a dominant target of neutralizing antibodies. Here we experimentally measure how all amino-acid mutations to the RBD affect expression of folded protein and its affinity for ACE2. Most mutations are deleterious for RBD expression and ACE2 binding, and we identify constrained regions on the RBD’s surface that may be desirable targets for vaccines and antibody-based therapeutics. But a substantial number of mutations are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses. However, we find no evidence that these ACE2-affinity enhancing mutations have been selected in current SARS-CoV-2 pandemic isolates. We present an interactive visualization and open analysis pipeline to facilitate use of our dataset for vaccine design and functional annotation of mutations observed during viral surveillance.

potential for viral escape from antibody neutralization?

The SARS-related (sarbecovirus) subgenus of betacoronaviruses comprises a diverse lineage of viruses that circulate in bat reservoirs and spill over into other mammalian species ( Figure 1A ) (Bolles et al., 2011; Cui et al., 2019) . Sarbecoviruses initiate infection by binding to receptors on host cells via the viral spike protein. The entry receptor for both SARS-CoV-2 and the original SARS-CoV (which we refer to here as SARS-CoV-1) is the human cell-surface protein angiotensin converting enzyme 2 (ACE2). The receptor binding domain (RBD) of spike from both these viruses binds ACE2 with high affinity (Hoffmann et al., 2020; Letko et al., 2020; Li et al., 2003; Walls et al., 2020; Wrapp et al., 2020a) . Because of its role in viral entry, the RBD is a major determinant of cross-species transmission and evolution (Becker et al., 2008; Frieman et al., 2012; Letko et al., 2020; Li, 2008; Li et al., 2005b; Qu et al., 2005; Ren et al., 2008; Sheahan et al., 2008a Sheahan et al., , 2008b Wu et al., 2012) . In addition, the RBD is the target of the most potent anti-SARS-CoV-2 neutralizing antibodies identified to date (Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Rogers et al., 2020; Seydoux et al., 2020; Shi et al., 2020; Wu et al., 2020b; Zost et al., 2020) , and several promising vaccine candidates use RBD as the sole antigen (Chen et al., , 2020b Mulligan et al., 2020; Quinlan et al., 2020; Ravichandran et al., 2020; Yang et al., 2020; Zang et al., 2020) .

2 Despite its important function, the RBD is highly variable among sarbecoviruses , reflecting the complex selective pressures shaping its evolution (Demogines et al., 2012; Frank et al., 2020; MacLean et al., 2020) . Furthermore, RBD mutations have already appeared among SARS-CoV-2 pandemic isolates, including some near the ACE2-binding interface-but their impacts on receptor recognition and other biochemical phenotypes remain largely uncharacterized. Therefore, comprehensive knowledge of how mutations impact the SARS-CoV-2 RBD would aid efforts to understand viral evolution and guide the design of vaccines and other countermeasures.

To address this need, we used a quantitative deep mutational scanning approach (Adams et al., 2016; Fowler and Fields, 2014; Weile and Roth, 2018) to experimentally measure how all possible SARS-CoV-2 RBD amino-acid mutations affect ACE2-binding affinity and protein expression (a correlate of protein folding stability). The resulting sequence-phenotype maps illuminate the forces that shape RBD evolution, quantify constraint on antibody epitopes, and suggest that purifying selection is the main force acting on RBD mutations observed in human SARS-CoV-2 isolates to date. To facilitate use of our measurements in immunogen design and viral surveillance, we provide interactive visualizations, an open analysis pipeline, and complete raw and processed data.

To enable rapid functional characterization of thousands of RBD variants, we developed a yeast surface-display platform for measuring expression of folded RBD protein and its binding to ACE2 (Adams et al., 2016; Boder and Wittrup, 1997) . This platform enables RBD expression on the cell surface of yeast ( Figure 1B) , where it can be assayed for ligand-binding affinity or protein expression levels, a close correlate of protein folding efficiency and stability (Kowalski et al., 1998a (Kowalski et al., , 1998b Shusta et al., 1999) . Because yeast have protein-folding quality control and glycosylation machinery similar to mammalian cells, they add N-linked glycans at the same RBD sites as human cells (Chen et al., 2014) , although these glycans are more mannose-rich than mammalian-derived glycans (Hamilton et al., 2003) . Yeast-expressed RBD from SARS-CoV-1 has similar antigenic and structural properties to RBD expressed in mammalian cells (Chen et al., 2014 (Chen et al., , 2017 ) and binds to ACE2 as expected (Chen et al., 2014) .

To validate the yeast-display platform, we selected RBDs from the Wuhan-Hu-1 SARS-CoV-2 isolate and six related sarbecoviruses ( Figure 1A ). These other sarbecoviruses include the closest known relatives of SARS-CoV-2 from bats and pangolins (RaTG13 and GD-Pangolin), SARS-CoV-1 (Urbani strain) and a close bat relative (LYRa11), and two more distantly related bat sarbecoviruses (BM48-31 and HKU3-1). Based on prior work, all these RBDs are expected to bind human ACE2 except those from BM48-31 and HKU3-1 Letko et al., 2020; Shang et al., 2020) . We cloned the RBDs into a vector for yeast-display, induced RBD expression, and incubated with varying concentrations of fluorescently labeled human ACE2 ( Figure 1B ). We then used flow cytometry to measure ACE2 binding across 11 ACE2 concentrations, enabling the calculation of a dissociation constant for the binding of each RBD to ACE2 ( Figure 1C ). Because we used ACE2 in its native dimeric form (Yan et al., 2020) , we refer to the measured constants as apparent dissociation constants (K D,app ) which are affected by binding avidity. We report log binding constants ∆log 10 (K D,app ) relative to the wildtype SARS-CoV-2 RBD, polarized such that a positive value reflects stronger binding ( Figure 1D ).

All RBDs exhibited ACE2 binding affinities consistent with prior knowledge. We measure K D,app = 3.9×10 -11 M for the SARS-CoV-2 RBD ( Figure 1C ), which is tighter than affinities reported for monomeric ACE2 Walls et al., 2020; Wrapp et al., 2020a) due to avidity effects caused by our use of native dimeric ACE2. Consistent with previous studies Walls et al., 2020; Wrapp et al., 2020a) , the SARS-CoV-1 RBD binds ACE2 with lower affinity than SARS-3 CoV-2 ( Figures 1C,D) . The SARS-CoV-1-related bat strain LYRa11 binds with even lower affinity, while the more distant bat RBDs (HKU3-1 and BM48-31) have no detectable binding. These measurements are consistent with the ability of these RBDs to enable viral particles to enter cells expressing human ACE2 (Letko et al., 2020) (Figure 1D ). Within the newly described SARS-CoV-2 clade, GD-Pangolin binds ACE2 with slightly higher affinity than SARS-CoV-2, while the bat isolate RaTG13 binds with two orders of magnitude lower affinity, consistent with prior reports Wrobel et al., 2020) . These results validate our yeast surface-display platform for RBD affinity measurements, and map variation in ACE2 affinity within the SARS-CoV-2 clade and the broader sarbecovirus subgenus.

We next integrated the yeast-display platform with deep mutational scanning to determine how all amino-acid mutations to the SARS-CoV-2 RBD impact expression and binding affinity for ACE2. We constructed two independent mutant libraries of the RBD using a PCR-based mutagenesis method that introduces all 19 mutant amino acids at each position (Bloom, 2014) . To facilitate sequencing and obtain linkage among amino-acid mutations within a single variant, we appended 16-nucleotide barcodes downstream of the coding sequence (Hiatt et al., 2010) , bottlenecked each library to ~100,000 barcoded variants, and linked each RBD variant to its barcode via long-read PacBio SMRT sequencing (Matreyek et al., 2018) ( Figure S1A ). By examining the concordance of RBD variant sequences for barcodes sampled by multiple PacBio reads, we validated that this process correctly determined the sequence of >99.8% of the variants ( Figure S1B ). RBD variants contained an average of 2.7 amino-acid mutations, with the number of mutations per variant roughly following a Poisson distribution ( Figure S1C ). Our libraries covered 3,804 of the 3,819 possible RBD amino-acid mutations, of which 95.7% were present as the sole amino-acid mutation in at least one barcoded variant (Figures S1D, E) . To provide internal standards for our measurements, we spiked the mutant libraries with a barcoded panel of 11 unmutated sarbecovirus RBD homologs (strains in color in Figure 1A ), including those tested in Figure 1C .

To determine how mutations affect RBD expression and ACE2 binding, we combined fluorescence-activated cell sorting (FACS) with deep sequencing of variant barcodes (Adams et al., 2016; Peterman and Levine, 2016) . To measure expression, we fluorescently labeled RBD protein on the yeast surface via a C-terminal epitope tag and used FACS to collect ~15 million cells from each library, partitioned into four bins from low to high expression (Figures 2A, S2A) . We sequenced the barcodes from each bin and reconstructed each variant's mean fluorescence intensity (MFI) from its distribution of reads across bins ( Figure S2C ). We represent expression as ∆log(MFI) relative to the unmutated SARS-CoV-2 RBD, such that a positive ∆log(MFI) indicates increased expression. To measure ACE2-binding affinity, we incubated yeast libraries that had been pre-sorted for RBD expression with 16 concentrations of fluorescently labeled ACE2 (10 -6 to 10 -13 M, plus 0M ACE2), and used FACS to collect >5 million RBD+ yeast cells at each concentration, partitioned into 4 bins from low to high ACE2 binding ( Figures 2B, S2B) . We again sequenced the barcodes from each bin, reconstructed the mean ACE2 binding of each variant at each concentration ( Figure S2C ), and used the resulting titration curves to infer dissociation constants K D,app ( Figure S2D ), which we represent as ∆log 10 (K D,app ) relative to the unmutated SARS-CoV-2 RBD, with positive values indicating stronger binding.

These high-throughput measurements of expression and ACE2 binding were consistent with expectations about the effects of mutations. RBD variants containing premature stop codons universally failed to express folded full-length protein ( Figure 2C ). Unmutated variants and those with synonymous mutations had a tight distribution of neutral expression and binding measurements ( Figure 2C ,D). Variants containing amino-acid mutations had a wide range of expression and binding phenotypes, with 4 variants containing just one mutation tending to have more mild functional defects than those with multiple mutations ( Figure 2C ,D). These trends are consistent with the fact that most mutations are deleterious to protein folding or function (Soskine and Tawfik, 2010 )-however, some mutated variants exhibit expression or binding that is comparable or even higher than the parental SARS-CoV-2 RBD. The panel of RBD homologs from other sarbecovirus strains all expressed well but exhibited a wide range of ACE2 binding affinities ( Figure 2C ,D, Table S1 ), as expected since only some are derived from viruses that can enter cells using human ACE2 (Letko et al., 2020) .

These measurements show that the RBD possesses considerable mutational tolerance ( Figure  2C , D). For instance, 46% of single amino-acid mutations to SARS-CoV-2 RBD maintain an affinity to ACE2 at least as high as that of SARS-CoV-1, suggesting that there is a substantial mutational space consistent with sufficient affinity to maintain human infectivity. Many single amino-acid mutants also maintain expression comparable to unmutated SARS-CoV-2, indicating that a large mutational space is compatible with properly folded RBD protein.

We next aggregated the measurements on all variants to quantify the effects of individual amino-acid mutations. Because many variants contain multiple mutations, we used global epistasis models to determine the effects of individual mutations from all singly and multiply mutated variants (Otwinowski et al., 2018) (Figure S2E -K). The resulting single-mutant ∆log(MFI) and ∆log 10 (K D,app ) measurements correlated well between the independent library duplicates (R 2 = 0.93 and 0.95, respectively; Figures 2E,F). Throughout the rest of this paper, we report single mutant effects as the average of the duplicate measurements. Overall, we obtained expression measurements for 99.5% and binding measurements for 99.6% of all 3,819 single amino-acid mutations.

The complete measurements of how amino-acid mutations affect expression and ACE2 binding represent rich sequence-to-phenotype maps for the RBD. We visualize the data in several ways. Figure  3 provides heatmaps that show how each mutation affects expression or ACE2 binding, with sites annotated by whether they contact ACE2, their relative solvent accessibility, and their amino-acid identities in SARS-CoV-2 and SARS-CoV-1. Interactive versions of these heatmaps are in Data S1 and at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS, and enable zooming, subsetting by functional annotations, and mouse-selection based readouts of numerical measurements. As an alternative representation, Figure S3 provides logo plots that enable side-by-side comparison of how mutations affect expression and ACE2 binding. Finally, interactive structure-based visualizations using dms-view (Hilton et al., 2020) are at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures/, and project the effects of mutations onto a crystal structure of the ACE2-bound RBD (Lan et al., 2020) and a cryo-EM structure of the full spike ectodomain . The underlying raw data in are in Table  S2 .

The sequence-phenotype maps reveal tremendous heterogeneity in mutational constraint across the RBD. Many sites are highly tolerant of mutations with respect to one or both of expression and ACE2 binding, while other sites are constrained to the wildtype amino acid. A substantial number of sites (e.g., 382 to 395) are tolerant of mutations with respect to ACE2 binding, but are constrained with respect to expression-consistent with folding and stability being global constraints common to many sites (Fane et al., 1991; Poteete et al., 1997) . There are also a handful of sites where ACE2 binding imposes strong constraint but expression does not (e.g. 489, 502, and 505) . Moreover, at some sites there are mutations that clearly enhance expression or ACE2-binding affinity (blue colors in Figure 3 ).

J o u r n a l P r e -p r o o f 5 We performed a series of experiments to confirm the dynamic range of our assays and their relevance for RBD expressed in mammalian cells or full spike trimer on pseudotyped lentiviral particles (Figures 4, S4) .

To validate the dynamic range of our deep mutational scanning, we re-cloned and tested RBD mutants in isogenic yeast-display assays. These experiments recapitulated the deep mutational scanning ( Figures 4A-C) , including confirmation that some mutations enhance expression (V367F and G502D) or ACE2 affinity (N501F, N501T, and Q498Y) in the context of yeast-expressed RBD.

We next compared our deep mutational scanning to measurements on mammalian-expressed RBDs. We purified mammalian-expressed RBDs from six sarbecoviruses (SARS-CoV-2, SARS-CoV-1, WIV1, RaTG13, ZXC21, and ZC45), and measured their 1:1 binding affinities for monomeric human ACE2 using biolayer interferometry, which agreed with the measurements from our deep mutational scan ( Figures 4D, S4A -F). Moreover, we observed that using a natively dimeric ACE2 enables detection of binding by the RaTG13 RBD, which can support ACE2-mediated cell entry even though the 1:1 affinity is too weak to detect ( Figure S4D ).

We also validated that mutations enhancing yeast surface expression improve soluble yield and stability of mammalian-expressed RBD protein. We tested five expression-enhancing mutations, and found that each greatly increased soluble RBD yield (2.3-to 4.8-fold increase, Figures 4E,F, S4G ). Four of the mutations also increased RBD stability (Figures 4G, S4H) , including one (V367F) that increased the melting temperature by 3.9°C. All five mutations also maintained ACE2-binding and antigenicity ( Figure S4I ), suggesting they could be useful for enhancing production of RBD-based vaccine immunogens.

Finally, we validated the deep mutational scanning measurements in the context of spikepseudotyped lentiviral particles ( Figures 4H, S4J ) (Crawford et al., 2020) . The trends observed for entry by the spike-pseudotyped lentiviral particles generally confirmed the deep mutational scanning: three of four mutations that were detrimental for RBD expression or ACE2 binding reduced pseudovirus entry, while a mutation that had little phenotypic effect in the deep mutational scan did not affect viral entry. We also tested two ACE2 affinity-enhancing mutations and found that both increased pseudovirus entry. Note that this result with single-cycle pseudovirus does not necessarily imply that these mutations would increase growth of authentic SARS-CoV-2, since multi-cycle viral replication often involves tuning of receptor affinity to simultaneously optimize viral attachment and release (Callaway et al., 2018; Hensley et al., 2009; Lang et al., 2020) . Taken together, these experiments help validate the accuracy and relevance of the deep mutational scanning.

Interpreting mutation effects in the context of the RBD structure To relate our sequence-phenotype maps to the RBD structure, we mapped the effects of mutations onto the ACE2-bound SARS-CoV-2 RBD crystal structure (Lan et al., 2020) , coloring each residue's C ɑ by the mean effect of mutations at that site on expression ( Figure 5A ) or binding ( Figure 5B ). Interactive structure-based visualizations of specific residue sets discussed in the following sections can be found at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures/.

The two subdomains of the RBD differ in mutational constraint on expression and binding. The core-RBD subdomain consists of a central beta sheet flanked by alpha-helices, and presents a stably folded scaffold for the receptor binding motif (RBM, residues 437-508; (Li et al., 2005a) ) which encodes ACE2-binding and receptor specificity (Letko et al., 2020) . The RBM consists of a concave surface anchored by a β-hairpin and a disulfide bond stabilizing one of the lateral loops, which cradles the ACE2 ɑ1 helix and a β-hairpin centered on K353 ACE2 . Consistent with the modularity of core-RBDencoded stability and RBM-encoded binding, constraint on expression primarily focuses on buried residues within the core-RBD ( Figure 5A ), while constraint on binding focuses on the RBM-proximal 6 core-RBD in addition to the RBM itself ( Figure 5B ), particularly on RBM residues that contact K31 ACE2 and K353 ACE2 , which are "hotspots" of binding for SARS-CoV-1 and SARS-CoV-2 (Li, 2008; Wu et al., 2012) .

Several ACE2-contact residues exhibit binding-stability tradeoffs, as has been seen in the active sites and binding interfaces of other proteins (Julian et al., 2017; Tokuriki et al., 2008; Wang et al., 2002) . For example, several mutations to G502 enhance RBD expression ( Figure 3A ) but abolish binding ( Figure 3B ) due to steric clashes with ACE2 ( Figure S5A ). Similarly, mutations to polar amino acids enhance expression at interface residues Y449, L455, F486, and Y505 ( Figure 3A) , consistent with the destabilizing effect of surface-exposed hydrophobic patches (Schwehm et al., 1998 )-but these hydrophobic residues form ACE2 packing contacts and are required for binding (Figures 3B, S5B) .

However, our data also indicate that global RBD stability contributes to ACE2-binding affinity. In general, mutation effects on RBD binding and expression are correlated ( Figures 5C, S5C) , with residues that deviate from this trend clustering at the ACE2 interface ( Figure 5C , cyan points). This correlation between expression and binding is consistent with studies on antibodies, where mutations that improve stability and rigidity accompany increases in binding affinity (Davenport et al., 2016; Ovchinnikov et al., 2018; Schmidt et al., 2013) . Because ACE2 binding is influenced by both global RBD stability and interface-specific constraints, a site's tolerance to mutation is better explained by its extent of burial in the ACE2-bound RBD structure than its burial in the free RBD structure alone ( Figure  S5D ). The contribution of RBD stability to ACE2 binding may be influenced by other factors in the full spike trimer, though our measurements on pseudotyped lentiviral particles ( Figure 4H ) indicate that a destabilizing RBD mutation (C432D) reduces ACE2-mediated cellular entry in the context of spike trimer.

Our data also reveal the importance of other sequence features. For example, the four disulfide bonds in the RBD have varying tolerance to mutation ( Figures 5A,B , S5E), with the RBM C480:C488 disulfide completely constrained for ACE2 binding. The two RBD N-linked glycans contribute to RBD stability, as mutations that ablate the NxS/T glycosylation motif decrease RBD expression ( Figure S5F ). The SARS-CoV-1 RBD contains a third glycan, but its introduction at the homologous N370 in SARS-CoV-2 is mildly deleterious for expression ( Figure S5F ). However, there are other surface positions where introduction of NxS/T glycosylation motifs is tolerated or even beneficial for RBD expression ( Figure S5G -I); adding glycans at some of these sites could be useful in resurfacing RBDs as antibody probes (Wu et al., 2010; Zhou et al., 2020c) or epitope-focused immunogens (Duan et al., 2018; Eggink et al., 2014; Jardine et al., 2016; Kulp et al., 2017; Weidenbacher and Kim, 2019) .

An initially surprising feature of SARS-CoV-2 was that its RBD tightly binds ACE2 despite differing in sequence from SARS-CoV-1 at many residues that had been defined as important for ACE2 binding (Andersen et al., 2020; Wan et al., 2020) . Our map of mutational effects explains this observation by revealing remarkable degeneracy at ACE2 contact positions, with many interface mutations being tolerated or even enhancing affinity ( Figure 5D ). Mutations that enhance affinity are notable at RBD sites Q493, Q498 and N501. Although these residues are involved in a dense network of polar contacts with ACE2 (Figure 5E ), our measurements show there is substantial plasticity in this network, as mutations that reduce the polar character of these residues can enhance affinity.

Within the SARS-CoV-2 clade of sarbecoviruses, our maps of mutational effects on binding explain variation in ACE2 affinity among different viruses. For example, GD-Pangolin has higher affinity for ACE2 than SARS-CoV-2 ( Figures 1C, 2D) , and this can be explained by the affinity-enhancing Q498H mutation present in this virus's RBD sequence ( Figure 5F ). In contrast, RaTG13 has 7 substantially lower affinity for ACE2 than SARS-CoV-2 ( Figures 1C, 2D) , consistent with the presence of affinity-decreasing mutations including Y449F and N501D ( Figure 5F ). The fact that differences in binding affinity of GD-Pangolin and RaTG13 are well explained by summing the effects of individual mutations relative to SARS-CoV-2 suggests that our deep mutational scanning is useful for sequencebased predictions of the ACE2-binding potential of future viruses isolated from the SARS-CoV-2 clade.

In contrast, the ACE2 binding interface of SARS-CoV-1 has many more mutations relative to SARS-CoV-2, and this increased divergence causes shifts in the actual effects of mutations on ACE2 binding. In particular, our deep mutational scanning shows that most SARS-CoV-1 amino-acid states are individually deleterious in SARS-CoV-2, despite being compatible with high-affinity binding by SARS-CoV-1 ( Figure 5F ). This shift in the effects of mutations between more distantly related RBDs is consistent with studies of protein evolution demonstrating that epistastic entrenchment causes aminoacid preferences to change as proteins diverge (Hilton and Bloom, 2018; Lee et al., 2018; Pollock et al., 2012; Povolotskaya and Kondrashov, 2010; Shah et al., 2015; Starr and Thornton, 2016; Starr et al., 2018) . Therefore, our current SARS-CoV-2 deep mutational scanning data are likely to be most useful for predicting the effects of mutations to RBDs closely related to that of SARS-CoV-2.

The RBD is the dominant target of neutralizing antibodies to SARS-CoV-2 (Brouwer et al., 2020; Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Premkumar et al., 2020; Rogers et al., 2020; Suthar et al., 2020; Yuan et al., 2020a; Zhang et al., 2020; Zost et al., 2020) . It is unclear to what extent the RBD will evolve to escape such antibodies in a manner reminiscent of some other viruses (Smith et al., 2004; Trkola et al., 2005) , although in vitro studies suggest that SARS-CoV-2 and SARS-CoV-1 RBDs are capable of fixing mutations that escape neutralizing antibodies (Baum et al., 2020; Rockx et al., 2010) . To better define the RBD's evolutionary capacity for antibody escape, we examined mutational constraint in the epitopes of antibodies that bind the SARS-CoV-1 or SARS-CoV-2 RBD (Figures 6A, S6A,B) (Hwang et al., 2006; Pak et al., 2009; Pinto et al., 2020; Prabakaran et al., 2006; Walls et al., 2019; Wrapp et al., 2020b; Wu et al., 2020b; Yuan et al., 2020b) .

Many antibodies have epitopes that overlap the RBD ACE2 contact interface and are therefore strongly constrained by mutation effects on binding. For instance, antibodies B38 and 80R engage the two constrained patches that comprise the ACE2-binding interface, while S230, F26G19, and m396 engage either one of these ACE2-binding patches. However, none of the currently characterized antibodies have epitopes as constrained as the ACE2-contact surface itself ( Figure 6B ), suggesting further epitope focusing could be achieved. The importance of such focusing is demonstrated by a recent study that identified RBD mutations enabling escape from RBM-directed neutralizing antibodies (Baum et al., 2020) -our data indicate that the escape occurs at sites that have high mutational tolerance ( Figure S6C ,D).

Epitopes of core-RBD-directed antibodies tend to be mutationally constrained with respect to expression rather than binding ( Figures 6A,B) . These core-RBD epitopes are conserved across the sarbecovirus alignment ( Figure S6E ), explaining the possible cross-reactivity of these antibodies between SARS-CoV-1 and SARS-CoV-2 (Huo et al., 2020; Pinto et al., 2020; Wrapp et al., 2020b) . Although residues in these epitopes are constrained for stability even in our measurements on the isolated RBD, some of them likely exhibit additional constraint due to quaternary contacts in the full spike trimer Wrapp et al., 2020a; Yuan et al., 2020b) . We identified an additional core-RBD patch centered on residue E465 that is also mutationally constrained ( Figure 6C ) and evolutionarily conserved ( Figure S6E ) but is not targeted by any currently known antibody and might represent a promising target. 8 Taken together, our results identify multiple mutationally constrained patches on the RBD surface that can be targeted by antibodies. These findings provide a framework that could inform the formulation of antibody cocktails aiming to limit the emergence of viral escape mutants (Baum et al., 2020; Pinto et al., 2020; Wu et al., 2020b; Zost et al., 2020) , particularly if deep mutational scanning approaches like our own are extended to define antibody epitopes in functional as well as structural terms (Dingens et al., 2019) .

Using sequence-phenotype maps to interpret genetic variation in SARS-CoV-2 An important question is whether any mutations that have appeared in circulating SARS-CoV-2 isolates have functional consequences. Despite intense interest in this question, experimental work to characterize the effects of SARS-CoV-2 mutations has lagged far behind their identification in viral sequences. Our comprehensive maps of the phenotypic effects of mutations provide a direct way to interpret the impact of current and future genetic variation in the SARS-CoV-2 RBD.

To assess the phenotypic impacts of mutations that have appeared in the SARS-CoV-2 RBD to date, we downloaded all 31,570 spike sequences available from GISAID (Elbe and Buckland-Merrett, 2017) on May 27, 2020 , and identified RBD amino-acid mutations present in high-quality clinical isolates. All observed RBD mutations are at low frequency, with 56 of the 98 observed mutations present only in a single sequence. The observed mutations are significantly less deleterious for ACE2 binding and RBD expression than random single-nucleotide-accessible mutations ( Figures 7A, S7A ,B, P-value < 10 -6 , permutation tests), consistent with the action of purifying selection. Purifying selection against deleterious mutations is especially apparent for mutations that are observed multiple times in circulating variants, with a substantial number of singletons being mildly or moderately deleterious whereas mutations observed multiple times are largely neutral. This general pattern of increased purifying selection on more common mutations is consistent with theoretical expectation and empirical patterns observed for other viruses (Pybus et al., 2007; Xue and Bloom, 2020) .

Our discovery of affinity-enhancing mutations to the SARS-CoV-2 RBD raises the question of whether positive selection favors such mutations, since the relationship between receptor affinity and fitness can be complex for viruses that are well-adapted to their hosts (Callaway et al., 2018; Hensley et al., 2009; Lang et al., 2020) . Affinity-enhancing mutations are accessible via single-nucleotide mutation from SARS-CoV-2 ( Figure S7C ), but none are observed among circulating viral sequences ( Figure 7A ), and observed mutations do not enhance ACE2 affinity more than randomly drawn samples of single nucleotide mutations ( Figure S7D ). Taken together, we see no clear evidence of selection for stronger ACE2 binding, consistent with SARS-CoV-2 already possessing adequate ACE2 affinity at the beginning of the pandemic.

Last, we validated our deep mutational scanning for mutations that are especially prevalent among naturally occurring sequences in GISAID. The deep mutational scanning suggests small phenotypic effects for the most prevalent mutations, with the exception of V367F, which substantially enhances expression ( Figure 7B ). We re-cloned and tested most of these prevalent mutations for expression and ACE2 binding in isogenic yeast display assays. Consistent with the deep mutational scanning, the only large phenotypic effect was increased expression of V367F ( Figure 7C ,D), which we also validated enhances thermal stability of mammalian-expressed RBD ( Figures 4G, S4H ). The relevance of V367F's stability-enhancing effect for viral fitness is unclear, though this mutation has independently arisen multiple times (van Dorp et al., 2020) . We also validated that N439K, the most prevalent RBD mutation which may have a very slight affinity-enhancing effect ( Figures 7B,C) , has no measurable impact on entry of spike-pseudotyped lentiviral particles ( Figure 4H ). Taken together, our results suggest that there is little phenotypic diversity in ACE2 binding among circulating variants at this J o u r n a l P r e -p r o o f 9 early stage of the pandemic-although it will be interesting to use our maps to continually assess the phenotypic effects of future mutations as the virus evolves.

Vast numbers of viral genomes have been sequenced in almost real-time during the SARS-CoV-2 pandemic. These genomic sequences have been useful for understanding viral emergence and spread (Andersen et al., 2020; Bedford et al., 2020; Fauver et al., 2020) , but the lack of corresponding highthroughput functional characterization means that speculation has outpaced experimental data when it comes to understanding the phenotypic consequences of mutations. Here, we take a step toward providing phenotypic maps commensurate with the scale of genomic data by experimentally characterizing how all amino-acid mutations to the RBD affect the expression of folded protein and its affinity for ACE2, two key factors for viral fitness. These maps show that RBD mutations that have appeared in SARS-CoV-2 to date are nearly neutral with respect to these two biochemical phenotypes, with the exception of one mutation (V367F) that increases RBD stability. Notably, there has been no selection to date for any of the evolutionarily accessible mutations that enhance ACE2 binding affinity. The genetic diversity of SARS-CoV-2 is likely to increase as it continues to circulate in the human population, and so our phenotypic maps should become increasingly valuable for viral surveillance as mutations accumulate over time.

It is important to remember that our maps define biochemical phenotypes of the RBD, not how these phenotypes relate to viral fitness. There are many complexities in the relationship between biochemical phenotypes of yeast-displayed RBD and viral fitness. First, there are subtle differences in glycan structures between yeast versus human cells (Hamilton et al., 2003) , though the overall role of glycans in RBD stability is preserved in yeast systems (Chen et al., 2014) . Second, the RBD is just one domain of the viral spike, which engages in complex dynamic movements to mediate viral entry (Huo et al., 2020; Walls et al., 2019 Walls et al., , 2020 Wrapp et al., 2020b) . Finally, spike-mediated entry is just one component of fitness, which involves a myriad of incompletely understood factors that determine how well a virus spreads from one human to another (Kutter et al., 2018) . To some degree, these caveats are universal of experimental studies, as even sophisticated animal models are imperfect proxies for true fitness (Louz et al., 2013 )-but they are especially true for basic biochemical phenotypes like the ones we measure. However, on a hopeful note, our measurements correlate well with cellular entry by spike-pseudotyped viral particles expressing sarbecovirus RBD homologs (Figures 1D) and single mutants of the SARS-CoV-2 RBD ( Figure 4H ). Fitness ultimately arises from the concerted action of biochemical phenotypes, which are in turn determined by genotype (Dean and Thornton, 2007; Harms and Thornton, 2013; Russell et al., 2014) . By making the first link from mutations to biochemical phenotypes, we have taken a step towards enabling better interpretation of viral genetic variation.

One important area where our maps do have clear relevance is assessing the potential for SARS-CoV-2 to undergo antigenic drift by fixing mutations at sites targeted by antibodies, as occurs for some other viruses such as influenza (Smith et al., 2004) . The RBD is the dominant target of neutralizing antibodies (Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Rogers et al., 2020; Seydoux et al., 2020; Shi et al., 2020; Wu et al., 2020b; Zost et al., 2020) , and so any antigenic drift will be constrained by its mutational tolerance. Our results show that many mutations to the RBD are welltolerated with respect to both protein folding and ACE2 binding. However, the ACE2 binding interface is more constrained than most of the RBD's surface, which could limit viral escape from antibodies that target this interface (Rockx et al., 2010) . In this respect, our maps enable several important observations. First, no characterized antibodies have epitopes as constrained as the actual RBD surface that contacts ACE2, suggesting that there is room for epitope focusing to minimize viral escape. Second, there are a number of RBD mutations that enhance ACE2 affinity, which implies evolutionary potential for compensation of deleterious mutations in the ACE2 interface in a manner reminiscent of multi-step escape pathways that have been described for other viruses (Bloom et al., 2010; Friedrich et al., 2004; Gong et al., 2013; Lynch et al., 2015; Wu et al., 2017) . It should be possible to shed further experimental light on the potential for antigenic drift by extending our deep mutational scanning methodology to directly map immune-escape mutations as has been done for other viruses (Dingens et al., 2019; Lee et al., 2019; Wu et al., 2020a) .

RBD-based antigens represent a promising vaccine approach (Chen et al., , 2020b Mulligan et al., 2020; Quinlan et al., 2020; Ravichandran et al., 2020; Zang et al., 2020) . Our sequencephenotype maps can directly inform efforts to engineer such vaccines in several ways. First, we identify many mutations that enhance RBD expression and thermal stability, a desirable property in vaccine immunogens. Second, our maps show which mutations can be introduced into the RBD without disrupting key biochemical phenotypes, thereby opening the door to resurfacing immunogens to focus antibodies on specific epitopes (Duan et al., 2018; Eggink et al., 2014; Jardine et al., 2016; Kulp et al., 2017; Weidenbacher and Kim, 2019; Wu et al., 2010) . Finally, our maps show which surfaces of the RBD are under strong constraint and might thereby be targeted by structure-guided vaccines to stimulate immunity with breadth across the sarbecovirus clade: in addition to the ACE2 interface itself, these surfaces include several core-RBD patches targeted by currently described antibodies and a previously undescribed core-RBD patch surrounding residue E465.

Finally, our work should be useful for understanding the evolution of sarbecoviruses more broadly, including the potential for more spillovers into the human population. There is a dizzying diversity of RBD genotypes and phenotypes among sarbecoviruses within bat reservoirs (Boni et al., 2020; Demogines et al., 2012; Frank et al., 2020; Hu et al., 2017; Latinne et al., 2020; Letko et al., 2020; MacLean et al., 2020) . A prerequisite for these viruses to jump to humans is the ability to efficiently bind human receptors (Becker et al., 2008; Letko et al., 2020; Menachery et al., 2015 Menachery et al., , 2016 . Our maps are immediately useful in assessing the effects on ACE2-binding of mutations to viruses within the SARS-CoV-2 clade, and extensions to account for epistasis and genetic background could further inform understanding of the evolutionary trajectories that enable sarbecoviruses to efficiently infect human cells.

RBDs included in the present study are in bold colored text. Node labels indicate bootstrap support. (B) RBD yeast surface-display enables fluorescent detection of RBD expression and ACE2 binding. (C) Yeast displaying the indicated RBD were incubated with varying concentrations of human ACE2, and binding was measured via flow cytometry. Binding constants are reported as K D,app from the illustrated titration curve fits. (D) Comparison of yeast display binding with previous measurements of the capacity of viral particles to enter ACE2-expressing cells. Relative binding is ∆log10(K D,app ) measured in the current study; relative cellular entry is infection of ACE2-expressing cells by VSV pseudotyped with spike containing the indicated RBD, reported by Letko et al. (Letko et al., 2020) in arbitrary luciferase units relative to SARS-CoV-1 RBD; n.d. indicates not determined. Inset, relative quantitation of protein yield from SEC. Open bar reflects the relative quantity of the earlier eluting peak, which corresponds to oxidized dimer ( Figure S4G ). (G) Thermal stability of RBD variants. See Figure S4H for raw melting curves. (H) Effects of mutations on transduction of ACE2-expressing cells by lentiviral particles pseudotyped with SARS-CoV-2 spike. Mutants are colored by their effects on ACE2 binding as measured in the deep mutational scan ( Figure 3B ). Titers that fell below the limit of detection (dashed horizontal line) are plotted on the x-axis. Measurements were made in biological triplicate, and reflect the integrated effects of mutations on pseudovirus production and cellular entry; transduction efficiency normalized by pseudovirus production is presented in Figure S4J and gives highly similar results. See also Figure S4 . Mutational constraint mapped to the SARS-CoV-2 RBD structure. A sphere at each site C ɑ is colored according to the mean effect of mutations with respect to expression (A) or binding (B), with red indicating more constraint. RBD structural features and the ACE2 K31 and K353 interaction hotspot residues are labeled. Yellow sticks indicate disulfide bridges. Interactive structure-based visualizations of these data are at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures/ (C) Relationship between mutational constraint on binding and expression. The structural view (cyan spheres) shows sites that are under strong constraint for ACE2 binding but are tolerant of mutations for expression. (D) Heatmap as in Figure 3B , subsetted on sites that directly contact ACE2 in the SARS-CoV-2 or SARS-CoV-1 RBD structures, plus interface site 494 which is a key site of adaptation in SARS-CoV-1. (E) RBD sites 493, 498, and 501, which have many affinity-enhancing mutations, participate in polar contact networks involving the ACE2 interaction hotspot residues K31 and K353. (F) Variation at ACE2 contact sites in sarbecovirus RBDs. Circles show the effects of individual mutations that differentiate a virus ACE2 interface from SARS-CoV-2, while x shows the mean effect of all mutations at that site. The sum of individual mutation effects at interface residues is shown, compared to the actual RBD binding relative to unmutated SARS-CoV-2. See also Figure S5 . The distribution of mutation effects is shown for all amino-acid mutations accessible via single-nucleotide mutation from the SARS-CoV-2 Wuhan-Hu-1 gene sequence, compared to the distributions for subsets of mutations that are observed in sequenced SARS-CoV-2 isolates deposited in GISAID at increasing observation count thresholds. n, number of mutations in each subset. (B) Summary of most frequent mutations among GISAID sequences, reporting our deep mutational scanning measured effect on binding and expression, the number of GISAID sequences containing the mutation, and the number of geographic regions from which a mutation has been reported. (C, D) Validation of the mutational effects on binding (C) and expression (D) for 4 of the 5 most frequent circulating RBD variants. S477N rose to high frequency after we began our validation experiments, and so was not included. Error bars in (D) are standard error from 11 samples. See also Figure S7 . Figure 2A and 2B, respectively. For (A), the P4 "RBD+" gate was used to enrich the library for expressing variants, which were grown up and re-induced for binding experiments as in (B). (C) Empirical estimates of variance in FACS-seq measurements. Barcodes encoding wildtype SARS-CoV-2 RBD were grouped by total cell count across sort bins, and the variance in estimates of expression mean fluorescence (left) or binding mean bin (right, corresponding to a single point in the subsequent titration curve fit) were determined. Black dashed lines indicate the median cell count for which each phenotype was measured among library genotypes. (D) Example variant-specific titration curves inferred from the deep mutational scanning experiment. Randomly sampled titration curves are illustrated across the range of fit K D,app binding constants, with variant genotype listed above each panel. Because curves that were fit with K D,app between 10 -4 to 10 -6 were virtually indistinguishable non-responsive curves, we truncated all K D,app measurements in this range to a censored >10 Analysis of binding to dimeric human ACE2, incorporating avidity effects, was also analyzed for the RBDs that did not bind monomeric ACE2 (D-F, right). (G) Reducing (top) and non-reducing (bottom) SDS-PAGE gels of expression-enhancing mutant RBDs illustrate that the early SEC peak ( Figure 4F ) is an oxidized dimer species.

(H) Raw thermal melting traces for determination of non-equilibrium thermal stability, summarized in Figure 4G . Mutations to polar residues at positions Y449, L455, F486, and Y505 would enhance expression but reduce binding, consistent with specific geometric constraints imposed by the close packing of these residues at the ACE2 surface. (C) Relationship between barcode expression and titration response plateau parameters. The correlation between mutation effects on binding and expression in Figure 5C could emerge from trivial correlation between phenotypes (e.g. yeast with higher RBD surface expression can bind more ACE2). However, our multiple-concentration titration approach should in principle remove this trivial correlation (Adams et al., 2016) , because each binding phenotype is determined from a self-referenced titration curve, for which the free plateau response parameter can vary to account for different levels of saturated binding due to RBD expression (see Figure S2D ). Consistent with this premise, the response parameter from the titration fit with K D,app < 10 -7 (as loweraffinity titration curves do not adequately sample the titration plateau) for each library variant correlates with its expression phenotype. (D) Relationship between mutational constraint on binding and residue solvent accessibility (RSA). Black dots indicate RSA in the full ACE2-bound RBD structure, and when sites have changes in RSA in the unbound structure, then their RSA in that structure is also shown in orange. (E) Mutation effects on binding (left) and expression (right) at disulfide cysteine residues. Details as in Figure 3 . RBD sites are grouped by disulfide pair and labeled according to location in the core-RBD or RBM sub-domains. (F) Mutation effects on expression at N-linked glycosylation sites (NLGS). RBD sites are grouped by NLGS motif (NxS/T, where x is any amino acid except proline). Boxed amino acids indicate those that encode a NLGS motif. NLGS motifs are labeled according to whether they are present in both the SARS-CoV-2 and SARS-CoV-1 RBD (N331 and N343 glycans), or in SARS-CoV-1 only (N370 glycan). Introduction of the N370 glycan in SARS-CoV-2 is mildly deleterious for stability. (G) Effects of putative N-linked glycosylation site (NLGS) knock-in mutations. Heatmap details as in Figure 3 . There are 10 surface-exposed asparagines for which RBD expression is unaffected or enhanced (top) when an NLGS motif is introduced via mutations to S or T at the i+2 site; for eight of these putative NLGS knockins (blue labels), the putative glycan is also tolerated for ACE2 binding (bottom), but for two (red labels), introduction of the NLGS motif is not tolerated for ACE2 binding. (H) Mapping of these ten asparagines to the RBD structure illustrates that these two binding-constrained asparagines (red) cluster to the ACE2 interface. Representations as described in Figure 6A . (C,D) Mutational constraint and observed antibody escape mutations. Baum et al. (Baum et al., 2020) selected SARS-CoV-2 escape mutations from RBD-directed antibodies. We compare the average mutational tolerance of the sites at which these escape mutations accrue (C), and the effects of the specific escape mutations themselves (D) to all RBM and ACE2-contact sites/mutations. The antibody escape involved mutations that were better tolerated than typical mutations in the RBM or ACE2-binding interface. (E) Evolutionary diversity in antibody epitopes and our newly described E465-centered surface patch among the sarbecoviruses in Figure 1A . Diversity is summarized as the effective number of amino acids (N eff ), which scales from 1 for a site that is invariant, to 20 for a site in which all amino acids are at equal frequency. For each threshold of GISAID observation counts, 1 million random sub-samples of singlenucleotide-accessible amino acid changes were generated at the same sample size as the true mutation set (n=98, 42, and 13 for the ≥1, ≥2, and ≥6 thresholds). A P-value was determined as the fraction of sub-samples with median mutational effect on binding or expression equal to or greater than that of the actual GISAID mutation set (dashed vertical line). The observation that the set of mutations observed in GISAID have a more favorable median mutational effect on binding and expression than randomly sampled mutations indicates the action of purifying selection for ACE2 binding and RBD stability. (C) Heatmaps depicting effects of mutations on ACE2 binding, indicating only those mutations that are accessible via single-nucleotide mutation from the SARS-CoV-2 Wuhan-Hu-1 isolate gene sequence. Amino-acid mutations that require more than one nucleotide change are in gray. (D) Permutation tests for positive selection for enhanced ACE2 affinity. Random sub-samples were generated as in (B), and the maximum affinity-enhancing effect of mutations in each sub-sample was compared to that in the actual GISAID mutation set. A P-value was determined as the fraction of sub-samples with a maximum effect on binding equal to or greater than in the actual GISAID mutation set (vertical dashed line). We do not see evidence for selection for enhanced ACE2 binding, as randomly sampled mutations generally contain mutations with stronger affinity-enhancing effects than observed in the GISAID mutation set.

Further information and requests for reagents and resources should be directed to and will be fulfilled by the Lead Contact, Jesse Bloom (jbloom@fredhutch.org).

SARS-CoV-2 mutant libraries generated in this study will be made available on request by the Lead Contact with a completed Materials Transfer Agreement.

We provide all data and code in the following ways:

• Raw data tables of our replicate functional scores at the level of single mutations (Table S2 , and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/single_mut_effects.csv) • Raw data tables of our replicate functional scores among sarbecovirus homologs (Table S1 and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/homolog_effects.csv) • Interactive heatmaps for lookup of individual mutational effects and related information (Data S1 and GitHub:

https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/) • Illumina sequencing counts for each barcode among FACS bins (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/counts/variant_counts.csv) • The complete variant:barcode lookup table (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/variants/codon_variant_table.csv) • The complete computational workflow to generate and analyze these data, including reproducible code within a programmatically constructed computational environment (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS) • A Markdown summary of the organization of analysis steps, with links to key data files and Markdown summaries of each step in the analysis pipeline (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/summary.md), with specific Markdown summaries linked in the relevant Methods sections below • All raw sequencing data are uploaded to the NCBI Short Read Archive (BioProject PRJNA639956).

Saccharomyces cerevisiae strain AWY101 (Wentz and Shusta, 2007) was cultured at 30°C (except where indicated) in baffled flasks while shaking at 275rpm. Selective media contained 6.7 g/L Yeast Nitrogen Base, 5.0 g/L Casamino acids, 1.065 g/L MES, and 2% w/v carbon source (dextrose for routine maintenance, galactose supplemented with 0.1% dextrose for RBD induction). HEK-293T cells (ATCC CRL-3216) were cultured in D10 growth media (DMEM with 10% heat-inactivated FBS, 2 mM l-glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin) at 37°C in a humidified 5% CO2 incubator. Expi293F (Thermo Fisher Cat No. A14527) suspension cells were grown at at 37°C in a humidified 8% CO2 incubator rotating at 130 rpm. Cell lines were not authenticated.

The Spike receptor binding domain (RBD) from SARS-CoV-2 (isolate Wuhan-Hu-1, Genbank accession number MN908947, residues N331-T531) and additional sarbecovirus homologs (RaTG13, Genbank MN996532; GD-Pangolin consensus from ; SARS-CoV-1 Urbani, Genbank AY278741; WIV1, Genbank KF367457 (identical RBD sequence to WIV16); LYRa11, Genbank KF569996; Rp3, Genbank DQ071615; HKU3-1, Genbank DQ022305; Rf1, Genbank DQ412042; ZXC21, Genbank MG772934; ZC45, Genbank MG772933; and BM48-31, Genbank NC014470) were ordered as yeast codonoptimized gBlocks (IDT) and cloned into the pETcon yeast surface-display expression vector. The destination vector was modified downstream from the yeast surface-display fusion construct to include a barcode landing pad for subsequent library generation, along with Illumina sequencing priming handles for downstream barcode sequencing and NotI digestion sites for downstream PacBio sequencing preparation. This plasmid sequence is provided on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/plasmid_maps/2649_pETcon-SARS-CoV-2-RBD-201aa.gb.

J o u r n a l P r e -p r o o f RBD variant plasmids were transformed into the AWY101 Saccharomyces cerevisiae strain (Wentz and Shusta, 2007) , selecting for the plasmid Trp auxotrophic marker on SD-CAA selective plates (6.7g/L Yeast Nitrogen Base, 5.0g/L Casamino acids, 1.065g/L MES acid, and 2% w/v dextrose). Single colonies were inoculated into 1.5mL liquid SD-CAA media, and grown overnight at 30°C. Then 1 OD unit of yeast were back-diluted into 1.5mL SG-CAA+0.1%D induction media (2% w/v galactose supplemented with 0.1% dextrose), and incubated for 16-18 hours at room temperature.

Induced cells were spun down at 250,000 cells per sample and washed in PBS-BSA (0.2 mg/mL). Samples were resuspended in primary labeling solutions across a range of concentrations of biotinylated human ACE2 ectodomain (ACROBiosystems AC2-H82E6), which contains its natural dimerization domain. Primary labeling reactions were conducted in sufficient reaction volumes for each concentration to avoid ligand depletion effects of greater than 10%. For instance, the lowest sample concentration of 10 -13 M was scaled to 25mL, at which volume 2.9% of total ligand molecules are estimated to be titrated in RBD:ACE2 complexes given the wildtype KD,app and an estimated 50,000 surface RBDs per cell (Boder and Wittrup, 1997) . Following overnight equilibration of ACE2 binding at room temperature, cells were washed in ice-cold PBS-BSA, and resuspended in PBS-BSA containing 1:200 diluted FITC-conjugated anti c-Myc antibody (Immunology Consultants Lab, CMYC-45F) to label for RBD surface expression via a C-terminal c-Myc epitope tag, and 1:200 diluted PE-conjugated streptavidin (Thermo Fisher S866) to detect bound biotinylated ACE2 ligand. Following 1 hour of secondary labeling at 4°C, cells were washed twice in ice-cold PBS-BSA, and resuspended in PBS. RBD surface expression and ACE2-binding levels were determined via flow cytometry using a BD LSRFortessa X-50. For flow cytometry, 10,000 cells were analyzed at each ACE2 concentration across a titration series. Cells were gated to select for singleton events, FITC labeling was used to subset RBD+ cells, and PE labeling was measured within this FITC+ population. To mimic the subsequent library sorting experiments in which we are blinded to exact PE fluorescence within a given PE fluorescence bin (since we only sequence barcodes within a bin), we analyzed isogenic titration data by drawing equivalent bins of PE fluorescence that capture 95% of unbound unmutated SARS-CoV-2 cells (bin1), 95% of saturated SARS-CoV-2 cells (bin4), and a bin2/bin3 boundary evenly spaced on the log-scale between the boundaries of the bin1 and bin4 partitions (see Figure 2B ). For each ACE2 concentration, we determine the mean bin of PE fluorescence as a simple weighted mean value across integer-weighted bins:

where ni,[ACE2] is the number of cells that fall into bin i at a given ACE2 concentration, and i is the simple integer value of a bin from 1 to 4.

We determined the binding constant KD,app describing the affinity of each RBD variant for human ACE2 ligand along with free parameters a (titration response range) and b (titration curve baseline) via nonlinear least squares regression using a standard non-cooperative Hill equation relating the mean bin response variable to the ACE2 labeling concentration: = * 2 /( 2 + , ) +

We report apparent KD values (KD,app) that do not take into account the stoichiometry of the multivalent yeast-displayed RBD interaction with dimeric ACE2. Following this "apparent" nomenclature, we report ACE2 concentrations as molarity of the monomeric subunit. Computational notebooks detailing the fits of all isogenic RBD titrations is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/isogenic_titrations/homolog_validations.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/isogenic_titrations/point-mut-validations.md).

Mutagenesis of the SARS-CoV-2 RBD was performed in two independent replicates via the method described in (Bloom, 2014) with the modification that primers lengths were adjusted to ensure equal melting temperatures as described in (Dingens et al., 2017) and we used NNS rather than NNN primers. Our general library generation and sequencing workflow is outlined in Figure S1A . Briefly, we designed mutagenic primers containing degenerate NNS codons that tile across the SARS-CoV-2 RBD, which were ordered as oPools from Integrated DNA Technologies. The script used to design the mutagenic primers and the resulting primers are available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers/mutational_lib. We conducted three rounds of mutagenesis, each consisting of 7 mutagenic PCR cycles and 20 joining PCR cycles. The final joined products were amplified for 10 cycles with primers that append a unique identifier N16 barcode sequence to the 3' end of each mutagenized insert, downstream from the RBD stop codon and mRNA 3' UTR. Barcodes were also PCR appended to the un-mutagenized RBD homologs via the same primer addition PCR. Primers used in library assembly are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers).

Mutagenized SARS-CoV-2 libraries and pooled wildtype homolog RBDs were cloned into EcoRI-HF/SacI-HF digested pETcon 2649 vector (sequence linked above) using NEBuilder HiFi DNA Assembly (NEB E2621). Assembled products were Ampure purified and electroporated into electrocompetent NEB10-beta cells. Electroporated cells were plated on 15cm LB+ampicillin plates at an estimated bottleneck of 100,000 (SARS-CoV-2 mutant libraries) or 1,000 (pooled RBD homologs) colony forming units to limit library size. After approximately 18 hours of outgrowth, colonies were scraped into liquid LB+ampicillin, and grown for 2.5 hours in liquid culture prior to plasmid purification.

Plasmid pools were transformed into the AWY101 strain of Saccharomyces cerevisiae via the protocol of Gietz and Schiestl (Gietz and Schiestl, 2007) . SARS-CoV-2 mutant libraries were transformed at 50ug scale and the pooled RBD homolog controls were transformed at 10ug scale. Colony forming unit counts from plated serial dilutions indicate transformation yield of >1 million cfus. Transformed yeast grew for 14 hours post-transformation in 100mL selective SD-CAA media, and were subsequently back-diluted into 100mL fresh SD-CAA at 1 OD600 for an additional 9 hours passage, to enable further resolution of multiple vector transformants (Scanlon et al., 2009) . Transformed yeast libraries were flash frozen in 1e8 cfu aliquots and stored -80°C.

PacBio sequencing was used to acquire long sequence reads spanning the N16 barcode and the RBD gene sequence. PacBio sequencing inserts were prepared from bacterially-purified plasmid pools via NotI-HF restriction digest followed by gel purification and SMRTbell ligation. The use of restriction digest rather than PCR eliminates the possibility of PCR strand exchange scrambling barcodes. Each SARS-CoV-2 RBD mutant library was spiked to 1% frequency with the internal standard pool of RBD homologs. Each replicate library was sequenced in two SMRT Cells on a PacBio Sequel using 20-hour movie collection times. PacBio circular consensus sequences (CCSs) were generated from the raw subreads using the ccs program (https://github.com/PacificBiosciences/ccs, version 4.2.0), setting the parameters to require 99.9% accuracy and a minimum of 3 passes. The resulting CCSs are available on the NCBI Sequence Read Archive at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA639956.

We then processed the CCSs to identify the RBD sequence (SARS-CoV-2 or one of the 11 homologs), call any mutations in the RBD sequence, and determine the associated 16-nucleotide barcode. To do this, we used alignparse (Crawford and Bloom, 2019) , version 0.1.3, which in turn makes use of minimap2 (Li, 2018) , version 2.17. We only retained CCSs that matched the parental RBD sequence with no more than 45 nucleotide mutations (corresponding to up to 15 codon mutations), had a barcode of the expected 16 nucleotide length, and had no more than one mismatch in the flanking regions expected in the sequenced amplicon. A computational notebook providing full details is available on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/process_ccs.md.

We next used these processed CCSs to generate a codon-variant lookup table that links each barcode to its associated codon mutations in the RBD sequence. To do this, we first filtered only for CCSs where the PacBio ccs-reported accuracy was at least 99.99% in both the RBD gene sequence and the barcode (the vast majority of CCSs passed this filter). We then determined the empirical accuracy of the CCSs by determining the concordance between the RBD gene sequence called by CCSs with the same barcode using the method implemented at https://jbloomlab.github.io/alignparse/alignparse.consensus.html#alignparse.consensus.empirical_accuracy. For both libraries, the empirical accuracy of the entire region of the CCS covering the RBD sequence was 99.8% if we ignored those with indels ( Figure S1B ). Most barcodes were covered by multiple CCSs (Figure S1B ), and in that case we built a consensus of these CCSs after discarding any barcodes for which the CCSs differed often or at many sites using the method implemented at https://jbloomlab.github.io/alignparse/alignparse.consensus.html#alignparse.consensus.simple_mutconsensus. Finally, we discarded any variants with indels in the RBD. Therefore, more than 99.8% of the final barcode-linked variants should have the correctly determined RBD sequence, since 99.8% is the accuracy for those covered by just one CCS and most variants were called by the consensus of multiple CCSs. For further analysis of the barcoded variants, we then created a codon variant table using dms_variants (https://jbloomlab.github.io/dms_variants/, version 0.6.0). The final barcode-variant lookup table (which associates each barcode with its RBD sequence) is at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/raw/master/results/variants/codon_variant_table.csv. Some summary statistics about the final composition of the libraries are in Figure S1 , and the complete code used to generate the barcode-variant lookup table and many additional plots characterizing the composition of the libraries are on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/build_variants.md.

Yeast libraries were thawed and grown overnight at 30°C in 180mL SD-CAA media at an initial OD600 of 0.1. We spiked our SARS-CoV-2 mutant libraries with the barcoded RBD homolog pool at a total fraction of 0.6% yeast density, such that each RBD homolog barcode should be present at a frequency on the same order of magnitude as the typical SARS-CoV-2 variant barcode. To induce RBD surface expression, yeast were back-diluted to 50mL (expression experiments) or 200mL (binding J o u r n a l P r e -p r o o f 19 experiments) SG-CAA+0.1%D induction media at 0.67 OD600 and incubated at room temperature for 16-18 hours with mild agitation.

For library expression experiments, 45 OD units yeast were washed twice with PBS-BSA and labeled in 3mL 1:100 diluted anti-Myc-FITC antibody for 1hr at 4°C with gentle mixing. Labeled cells were washed twice in PBS-BSA and resuspended in 5mL PBS for FACS. For library binding experiments, 8 OD units yeast per titration concentration (10 -13 M to 10 -6 M ACE2 at half-log intervals, plus a 0M ACE2 sample) were washed twice with PBS-BSA, and incubated with ACE2 ligand overnight at room temperature with gentle agitation. Labeling volumes were scaled at low ACE2 concentration to limit ligand depletion effects, as with isogenic titrations described above. Following equilibration of ACE2 labeling, cells were kept chilled while washing once with PBS-BSA, labeling for one hour in 1mL PBS-BSA with 1:100 diluted Myc-FITC and 1:200 Streptavidin-PE, washed two more times with PBS-BSA, and resuspended to 1mL in PBS.

Yeast libraries were sorted into bins of FITC or PE fluorescence using a BD FACS Aria II. Cells were sorted into 5mL FACS tubes containing 1mL of 2xYPAD supplemented with 1% BSA. Tubes were pre-wet with collection media prior to sample collection, to reduce sticking and improve post-sort yield. For expression sorts, cells were gated for singleton events ( Figure S2A ), followed by partitioning into four bins of FITC fluorescence (Figures 2A): bin 1 captures 99% of unstained cells, and bins 2-4 split the remaining library fraction into tertiles. We sorted >50 million cells from each library into these bins. From these same inductions, we also sorted 15 million RBD+ cells from each library (P4 population, Figure S2A ), to enrich RBD-expressing cells within our libraries for our titration sorting experiments.

For ACE2-binding titrations, we gated cells for singleton events and RBD+ expression ( Figure S2B ). For each ACE2 concentration sample, we sorted cells into four bins of PE fluorescence as described above: bin1 captures 95% of unmutated SARS-CoV-2 cells incubated with 0M ACE2, bin4 captures 95% of unmutated cells at saturating ACE2 ligand, and the bin2/bin3 boundary evenly splits the log-MFI scale between the bin1 and bin4 boundaries ( Figure 2B ). We sorted each ACE2 concentration sample into these four bins for approximately 15 minutes, capturing 5-6 million cells per ACE2 concentration.

Following each sort, cells from each collection tube were spun for 5 min at 3,000 g in a tabletop centrifuge, yielding a visible pellet for any sample with at least ~500,000 collected cells. Collection supernatant was removed, and cells were resuspended in SD-CAA media supplemented with 1:100 penicillin-streptomycin. Cells were resuspended to an estimated 2e6 cells/mL in 15mL culture tubes or baffled flasks for expresion post-sort samples, 5e5 cells/mL in baffled flasks for RBD+ sort samples, and 1mL (<1e6 cells) or 1.5mL (>1e6 cells) in 96-deep-well plates for titration samples. For expression FACS experiments, total cell recovery from all samples was measured via serial dilution and plating on YPD and SD-CAA plates for each sample, which showed average cellular recovery of 85% (range 79-94%), with 62% (range 52-77%) of cells retaining plasmid, with exception of the FITC-negative bin 1 populations, which showed 20% plasmid retention. These per-sample cell recovery counts were used to calibrate downstream sequencing numbers for the actual number of cells that grew out from each sort bin. For titration sorts, we did not titer all 64 post-sort samples, but instead spot checked 6 samples to ensure normal levels of cell recovery, which showed an average 66% cell recovery and 46% plasmid retention. As we did not titer all samples, we use the FACS log cell count as the estimate of number of cells collected in each bin, which makes the assumption that there are no systematic differences in post-sort cell yield across bins, which is more appropriate for these titration sorts where the ACE2 binding gates are nested within an overall RBD+ selection gate that selects for even plasmid retention ( Figure S2B ).

Post-sort samples were grown overnight in liquid media at 30°C. Plasmids were purified from post-sort yeast samples of <4e7 cfu using Zymo Yeast Miniprep kits (single column or 96-well plate formats) according to kit instructions, but with the addition of >2 hours Zymolyase treatment and a -80°C freeze/thaw cycle prior to cell lysis.

Post-sort plasmid samples were PCR amplified from 10uL plasmid template input using primers flanking the N16 barcode that append remaining Illumina sequencing handles that are not already plasmid encoded, and unique NextFlex sample indices (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers). PCRs were conducted with KOD polymerase for 20 cycles, except for titration sort samples of less than 10,000 cells, where 28 cycles were necessary to obtain sufficient PCR product due to low sample input:

1. 95°C, 2min 2. 95°C, 20s 3. 58°C, 10s 4. 70°C, 10s 5. Return to 2, 19x (27x for low-input samples) J o u r n a l P r e -p r o o f 20 PCR products were Ampure purified, quantified via PicoGreen, and pooled to mirror desired sample frequencies given cell counts in each FACS sample. Pooled samples were gel purified, Ampure purified, and submitted for 2 lanes of 50bp single end Illumina HiSeq sequencing per library. Demultiplexed reads were aligned to library barcodes determined from PacBio sequencing, yielding a count of the number of times each library barcode was sequenced within each FACS partition. Read counts for each FACS sample were downweighted by the ratio of total reads from a bin compared to the number of cells that were actually sorted into that bin. For one bin in which the number of HiSeq reads was less than the number of cells sorted into a bin, we re-amplified PCR product from a newly purified plasmid aliquot, and obtained reads via a single lane of MiSeq 50bp single end sequencing. Computational notebooks providing additional details on our Illumina sequencing processing and statistics are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/count_variants.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/analyze_counts.md).

For each library variant, we estimated mean expression based on its distribution of cell counts across FITC sort bins and the known censored fluorescence boundaries of each sort bin using a maximum likelihood approach (Peterman and Levine, 2016) , enacted in the fitdistrplus R package (Delignette-Muller and Dutang, 2015) , assuming the uncensored log-transformed fluorescence values for a genotype follow a normal distribution. Expression measurements were retained for barcodes for which at least 20 cells were sampled across the four sort bins, resulting in measured expression phenotypes for 92.9 and 90.5% of variants in libraries 1 and 2, respectively.

Expression measurements were represented as the difference in log-mean fluorescence intensity (MFI) relative to wildtype (∆logMFI = logMFIvariant -logMFIwildtype), such that a positive value indicates higher RBD expression. A very small fraction of wildtype and synonymous barcodes were ascribed non-fluorescing phenotypes, likely reflecting expressionabolishing mutations that occurred outside of the PacBio sequencing window. These variants were selected out prior to titration measurements by our RBD+ pre-sort, but remain in the expression measurements. To avoid artificially depressing the wildtype SARS-CoV-2 expression measurement and therefore miscalibrating this ∆log(MFI) scale, potentially annotating slightly deleterious mutational effects as beneficial, we computed the mean wildtype expression excluding these outliers (logMFI < 10.2 or 10.1 in lib1 and lib2, respectively). We note that we are unable to do the same for any library mutants for which we observe non-fluorescence, because we are unable to a priori determine whether a lack of expression is due to the library mutation versus external, unobserved factors. This uncertainty makes our calling of expression-enhancing mutations conservative, as mutational effects, if biased by these outliers, will tend to be pulled slightly down in their measurement. The global epistasis approach we explain below can mitigate the influence of these outlier observations on our final estimates of mutational effects. A computational notebook presenting our calculation of expression phenotypes and results is included on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/compute_expression_meanF.md).

For each library barcode at each ACE2 sample concentration, we determined its simple mean bin of ACE2-binding via the equation used above in isogenic titrations. We fit titration curves as above to determine barcode-specific KD,app from the series of FACS-seq derived mean bin measurements across ACE2 concentration ( Figure S2D ). Because a barcode's mean bin might be measured with varying certainty across different bins, we used weighted least squares nonlinear regression, weighing each mean bin estimate by an empirical variability estimate based on the per-sample cell count, derived from estimates of variability in repeated wildtype/synonymous barcode measurements grouped by sampling depth ( Figure S2C , right panels). To avoid fits of errant titration curves, we constrained the baseline parameter b to be fit between 1 and 1.5, and the response parameter a to be fit between 1.5 and 3. Through initial curve fit constraints and subsequent QC filtering, our fit KD,app binding constants were constrained to be within the concentration range of our titration (10 -13 -10 -6 M), and therefore many barcodes are censored at the upper limit with true KD,app ≥ 10 -6 M. We filtered out titration curves fit for variants with an average cell count <5 across sample concentrations, or with cell count <2 in 7 or more of the 16 ACE2 concentration samples. Finally, we filtered out the 5% of curves with the highest normalized mean square residual, where residuals are normalized from 0 to 1 by the fit response parameter a, such that titration curves that plateau at lower levels of saturated binding don't have systematically smaller mean square residuals. This process yielded KD,app estimates for 75.2 and 75.4% of variants in libraries 1 and 2, respectively. Binding measurements were represented as the difference in log10(KD,app) relative to wildtype (∆log10(KD,app) = log10(KD,app)wildtype -log10(KD,app)variant), polarized such that a positive value indicates higher variant ACE2 affinity. A computational notebook presenting our calculation of expression phenotypes and results is included on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/compute_binding_Kd.md).

J o u r n a l P r e -p r o o f 21 Barcodes in our libraries contain a Poisson-distributed number of mutations ( Figure S1C ). Though most mutations are sampled in at least one barcode as a unique single mutant ( Figure S1E ), most library genotypes contain multiple amino-acid mutations, and some amino-acid mutations are only sampled on many of these multiple-mutant backgrounds. Therefore, we used global epistasis models (Otwinowski et al., 2018) to decompose single mutation effects from the set of all single-and multi-mutant backgrounds ( Figure S2E-K) . Briefly, we fit regression models that represent the phenotype of each library variant as a sum of latent-scale effects of all component amino-acid mutations, which are transformed by a flexible nonlinear curve to the observed experimental scale; the shape of the nonlinear curve and the single-mutant effect terms are fit simultaneously to all of the data. For variance estimates on each library variant, we used the standard error of the estimate on KD,app to estimate a variance for our per-variant binding measurements; for expression, we calculated empirical estimates of variance as a function of cell count, based on binning replicate wildtype/synonymous mutant barcodes present in the library across bins of sampling depth ( Figure S2C , left panels). Our analysis, implemented in the dms_variants package (see https://jbloomlab.github.io/dms_variants/dms_variants.globalepistasis.html), is as described by Otwinowski Figure S2I ), and this correlation was not further improved by the global epistasis decomposition ( Figure S2J ); therefore, we retained all directly measured single-mutant effects, and only used global epistasis decomposition to interpolate the 14% of single mutants in each library that were not directly measured on any single-mutant backgrounds (which together comprise the measurements correlated in Figure 2F ). It is important to note that the shape of global epistasis nonlinearity that was fit to the data disallows mutations from increasing affinity relative to wildtype ( Figures S2H,K) -this prevents us from ascribing affinity-enhancing effects to any of the mutations that we did not directly measure as single mutants (only 5.7% of mutants were not sampled as single mutants in either library), which we accept as an appropriately conservative approach.

In the case of our expression measurements, directly sampled single mutants correlated moderately well between replicates (R 2 =88, Figure S2F ), but this correlation was improved between the global epistasis estimates derived from each library (R 2 = 0.93, Figure 2E ). This may be in part because the expression phenotype is a more widely distributed phenotype with smaller relative shifts in the mean caused by mutation, and because of the errant outliers that we could not account for as discussed above with regards to wildtype barcodes, such that measurements of mutational effects are improved when integrating across many different backgrounds instead of taking a single observed barcode at face value. Therefore, for expression phenotypes, we used the global epistasis estimates for all mutations. We filtered out four coefficients from library 1 and three from library 2 that had nonsensically high model estimates, likely to do partial collinearities among some lowcoverage mutations. Our final binding and expression single-mutant phenotypes were determined from the average effect across the two independent library replicates. A computational notebook detailing the full derivation of our final single mutant phenotypic scores for binding and expression is on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/single_mut_effects.md#assessing-global-epistasis-models-for-binding-data).

The interactive heatmap of mutational effects shown at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/ (Data S1) was made using the altair Python package (VanderPlas et al., 2018) . For the logo plot representation of the data in Figure S3 , the experimental measurements of ∆log(MFI) and ∆log10(KD,app) were converted to letter heights as follows. For binding, we first computed a Boltzmann-like weighting factor for each amino acid a at site r as wr,a= exp( xr,a) where xr,a is the experimental measurement for the effect of the mutation of site r to amino acid a, in other words the ∆log(MFI) or ∆log10(KD,app) value. The parameter is a temperature-like scaling factor which was set to 1.4 for the binding values, and chosen for the expression values so that the range of exponents for expression is the same as for binding. The letter heights were then computed by re-scaling the weighting factors at each site to sum to one, so that the letter height is pr,a = wr,a / ∑a' wr,a' . The logo plots themselves were rendered using Logomaker (Tareen and Kinney, 2020). The code that creates these logo plots is on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/logoplots_of_muteffects.md.

The interactive structure-based visualizations at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures were built using dms-view (Hilton et al., 2020) . In these visualizations, the logo plot letter heights were computed as for Figure S3 (see paragraph immediately above). Number of effective amino acids was calculated as the exponentiated preferences. Mean, minimum, and maximum mutational effects per site were calculated from the set of ∆log(MFI) or ∆log10(KD,app) measurements of all missense mutations at a site.

Structural analyses of the ACE2-bound SARS-CoV-2 and SARS-CoV-1 RBDs used the crystal structures from PDB 6M0J (Lan et al., 2020) and 2AJF (Li et al., 2005a) , respectively. ACE2 contacts were annotated as residues with any non-hydrogen atom within 4 Angstrom from any ACE2 residue. Solvent accessible surface area was calculated from the 6M0J structure using dssp (W Kabsch, 1983) , with and without the ACE2 ligand present. Relative solvent accessibilities were determined by normalizing to the maximum theoretical solvent accessibility of a residue (Tien et al., 2013) . Structural images were rendered in PyMol. Full analyses of our mutational measurements in context of structural and evolutionary features are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/structure_function.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/sarbecovirus_diversity.md).

Antibody epitopes were mapped from crystal structures 6W41 (Yuan et al., 2020b) , 6WAQ (Wrapp et al., 2020b) , 2DD8 (Prabakaran et al., 2006) , 3BGF (Pak et al., 2009) , 2GHW (Hwang et al., 2006) , 7BZ5 , and cryo-EM structures 6NB6 and 6NB7 (Walls et al., 2019) , and 6WPS (Pinto et al., 2020) . RBD residues were annotated as being in an antibody epitope if any non-hydrogen atom was within 4 Angstroms of an antibody residue, with the exception of the backbone-only models of 6NB6 and 6NB7, where epitopes were defined as RBD residues with Cɑ within 8 Angstroms of any antibody residue. Our full analysis of mutational constraint in antibody epitopes is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/antibody_epitopes.md).

All 31,570 spike sequences on GISAID as of 27 May 2020 were downloaded and aligned via mafft (Katoh and Standley, 2013) . Sequences from non-human origins and sequences containing any gap characters were removed. All amino-acid mutations among GISAID sequences were enumerated. Some low-coverage spike sequences contain undetermined 'X' characters. We excluded any mutation from our curated set of GISAID mutations if it was solely observed on sequence backgrounds containing at least one undetermined X character in the RBD sequence; however, sequences with X characters were allowed to contribute to observations of mutation count for mutations that were observed on at least one other highcoverage RBD sequence. To characterize patterns of selection on amino-acid mutations observed among GISAID sequences, we conducted permutation tests as described in the Figure S7 legend. Our full analysis of mutational effects of circulating variants is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/circulating_variants.md). We acknowledge all GISAID contributors for their sharing of sequencing data (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/alignments/Spike_GISAID/gisaid_hcov-19_acknowledgement_table.xls).

We used the curated RBD sequence set from Letko et al. (Letko et al., 2020) , adding newly described RBD sequences from sarbecovirus strains RaTG13 (Zhou et al., 2020b) , RmYN02 (Zhou et al., 2020a) , GD-Pangolin and GX-Pangolin , and the additional non-Asian bat sarbecovirus isolate BtKY72 (Tong et al., 2009) . RBD nucleotide sequences were aligned via mafft with a gap opening penalty of 4.5, and the maximum likelihood phylogeny was inferred in RAxML (Stamatakis, 2014) under the GTR model with 4 gamma-distributed discrete categories of among-site rate variation.

We selected seven single mutations from our deep mutational scanning measurements for validation of phenotypic effects in a spike-pseudotyped lentivirus assay (Crawford et al., 2020) . Mutations were selected that exhibited deleterious effects on RBD expression (C432D) or ACE2 binding (L455Y, N501D and G502), no strong phenotypic effect on either binding or expression (N439K), and affinity-enhancing effects (Q498Y and N501F). These point mutations were introduced via site-directed mutagenesis (New England Biolabs E0554S) into the HDM vector containing codon-optimized SARS-CoV-2 Spike from Wuhan-Hu-1, with an upstream Kozak sequence. The full sequence of this plasmid is available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/plasmid_maps/2736_HDM_IDTSpike_EcoKozak.gb.

Pseudotyped lentiviral particles were generated as previously described (Crawford et al., 2020) . Viruses were rescued in triplicate (i.e. independent transfections), which should average out variation in transfection efficiency such that viral entry phenotypes are reflective of both pseudovirus production and entry efficiency. Briefly, 2.5e5 293T cells per well were seeded in 12-well plates in 1 mL D10 growth media (DMEM with 10% heat-inactivated FBS, 2 mM l-glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin). 24h later, cells were transfected using BioT transfection reagent (Bioland Scientific, Paramount, CA, USA) with 0.5 µg of the ZsGreen lentiviral backbone pHAGE2-CMV-ZsGreen-W (BEI Resources NR-52520), 0.11 µg each of the lentiviral helper plasmids HDM-Hgpm2 (BEI Resources NR-52517), pRC-CMV-Rev1b (BEI Resources NR-52519), and HDM-tat1b (BEI Resources NR-52518), and 0.17 µg wildtype or mutant SARS-CoV-2 Spike plasmids. Media was changed to fresh D10 at 24 h post-transfection. At 60 hours post transfection, the viral supernatant was collected, filtered through a 0.45 µm SFCA low protein-binding filter, and stored at -80°C. To quantify efficiency of pseudovirus production, we J o u r n a l P r e -p r o o f 23 quantified p24 levels (in pg/mL) in viral transfection supernatants via ELISA, in technical duplicate, per kit instructions (Advanced Bioscience Laboratories Cat. # 5421).

The resulting viruses were titered as previously described (Crawford et al., 2020) . 293T cells stably expressing ACE2 (BEI NR-52511) were seeded at 1e4 cells per well in poly-L-lysine coated 96-well plates (Greiner 655930). 24 h later, 3 wells were counted and averaged to determine the number of cells per well at time of infection. Media was removed from the 293T-ACE2 cells and replaced with fresh D10 containing 50 µL of pseudovirus supernatant in a final volume of 150 µL. Polybrene (TR-1003-G, Sigma Aldrich, St. Louis, MO, USA) was added to a final concentration of 5 µg/mL. 60 h post-infection, cells were analyzed by flow cytometry. Titers were calculated using the Poisson formula. If P is the percentage of cells that are ZsGreen positive, as determined by drawing a ZsGreen+ gate from uninfected controls, then the titer per ml is: -ln(1 − P/100) × (number of cells/well)/(volume of virus per well in mL). Titers are only accurate when the percentage of ZsGreen+ cells is relatively low, i.e., ~1-10%. Titers are reported relative to the mean of the wildtype, which had similar titers as Crawford et al. of ~10 4 infectious particles per mL (Crawford et al., 2020) (Figure 4H) , and normalized by p24 levels in transfection supernatants ( Figure S4J ). The dashed horizontal line in Figure 4H showing the limit of detection was calculated as the minimum titer that would be determined in the case of a single positive event.

Receptor binding domains of SARS-CoV-2 (328-531) , WIV1 (316-518), RaTG13 (359-562), ZC45 (324-508), and ZXC21 (323-507) were synthesized by GenScript into vector pcDNA3.1-with a preceding mu-phosphatase signal peptide and a C-terminal octahistidine tag. SARS-CoV-1 (306-575) was subcloned from a GenArt synthesized SARS-CoV-1 Spike ectodomain . Human ACE2-Fc was synthesized and cloned by GenScript with a BM40 signal peptide and C-terminal human Fc tag. The ACE2 construct begins with 19STIEE and ends with DPLVPR615.

The RBD constructs were transfected into 150mL suspension Expi293F (Thermo Fisher Cat No. A14527) cells at 37°C in a humidified 8% CO2 incubator rotating at 130rpm and harvested 3 days later. Clarified supernatants were purified in batch over Talon resin (Takara) prior to buffer exchanging into 20mM Tris pH 8, 150mM NaCl and flash freezing.

Human ACE2-Fc was produced in Expi293F cells grown in suspension using Expi293F expression medium (Life Technologies) at 33°C, 70% humidity, 8% CO2 rotating at 150rpm. The cultures were transfected using PEI-MAX (Polyscience) with cells grown to a density of 3.0 million cells per mL and cultivated for 3 days. Supernatants were clarified by centrifugation (5 minutes at 4000 rcf), addition of PDADMAC solution to a final concentration of 0.0375% (Sigma Aldrich, #409014), and a second spin (5 minutes at 4000 rcf). Clarified cell supernatants were purified using a MabSelect PrismA 2.6x5cm column (GE Healthcare) on an AKTA Avant150 FPLC (GE Healthcare). Bound protein was washed with five column volumes of 20mM NaPO4 150mM NaCl pH 7.2, then five column volumes of 20mM NaPO4 1M NaCl pH 7.4, and eluted with three column volumes of 100mM glycine at pH 3.0. The eluate was neutralized with 2M Trizma base to 50mM final concentration. SDS-PAGE was run to assess purity. The Fc tag was removed by thrombin cleavage in a reaction mixture containing 3mg of recombinant ACE2-Fc and 10µg of thrombin in 20mM Tris-HCl pH 8.0, 150mM NaCl and 2.5mM CaCl2. The reaction mixture was incubated at 25°C overnight and re-loaded in a Protein A column to remove uncleaved protein and the Fc tag. The cleaved protein was further purified by gel filtration using a Superdex 200 column 10/300 GL (GE Life Sciences) equilibrated in a buffer containing 20mM Tris pH 8.0 and 100mM NaCl.

Binding measurements were performed on an Octet Red instrument (Forte Bio) at 30°C with shaking at 1,000 RPM. For monomeric ACE2 affinity measurements, Ni-NTA biosensors were hydrated in water for 10min and placed into 10X Kinetics Buffer (ForteBio). 10-20 µg/mL of RBD was loaded for 300s prior to baseline stabilization in 10X Kinetics Buffer. The sensors were immersed in a 1:3 serial dilution of ACE2 ranging from 1,000 to 4.11nM in 10X Kinetics Buffer. For measurements of RaTG13, ZC45, and ZXC21 binding to dimeric ACE2-Fc, ARG2 biosensors were hydrated in water then activated for 300s with an NHS-EDC solution (ForteBio) prior to amine coupling. 5-10 µg/mL of RBD in 10mM pH6 sodium acetate was loaded onto ARG2 tips (ForteBio) for 600s and then quenched into 1M ethanolamine for 300s. A baseline in 10X Kinetics Buffer was collected for 120s prior to immersing the sensors in a 1:3 serial dilution of dimeric ACE2-his (SinoBiological # 10108-H08H, residues 1-740) ranging from 1,000 to 4.11nM in 10X Kinetics Buffer. Curve fitting was performed using a 1:1 binding model and the ForteBio data analysis software when applicable. Mean kon and koff values were determined with a global fit applied to all data.

Codon-optimized RBDs of SARS-CoV-2 with its unmutated sequence or with single mutations (I358F, Y365F, Y365W, V367F or F392W) were synthesized by IDT as gBlocks with an N-terminal EGT secretion signal (MGILPSPGMPALLSLVSLLSVLLMGCVA) and C-terminal Avi-and octa-histidine tags (GLNDIFEAQKIEWHEHHHHHHHH) and cloned into the CMV/R (VRC 8400) mammalian expression vector. Plasmids were transfected into 200mL suspension Expi293F cells at 37°C in a humidified 8% CO2 incubator rotating at 130 rpm and harvested 3 days later. Clarified supernatants were purified in batch over Talon resin (Takara) . After elution at 125mL in 20mM Tris (pH 8.0), 300mM NaCl, and 300mM imidazole, concentrated solutions of L-arginine (pH 8.0), CHAPS and glycerol were added to eluate to final concentrations of 100mM, 0.75%, and 5%, respectively, to prevent adhesion to concentrator membranes. To quantify yield, each sample was concentrated to a final volume of 1500uL, and 1000uL was applied to a Superdex 75 Increase 10/300 GL column (GE) pre-equilibrated with 50mM Tris (pH 8.0), 185mM NaCl, 100mM L-arginine, 0.75% CHAPS and 5% glycerol. Peak integration was quantified using UNICORN software (GE), and relative quantity from the SEC trace was corrected for unique extinction coefficients and molecular weights of each RBD mutant. Purified peaks from monomeric species were dialyzed three times into 25mM Tris (pH 8.0), 150mM NaCl and 5% glycerol at 4°C.

BLI binding assays were performed on an Octet Red instrument at 25°C with shaking at 1,000 RPM in the presence of 25mM Tris pH 8.0, 150mM NaCl and 5% glycerol. Anti-hIgG Capture (AHC) tips were loaded with human ACE2-Fc or CR3022 at 0.02mg/mL for 300s prior to a baseline for 60s, association with monomeric RBDs at 500nM for 600s, and dissociation for 300s.

Non-equilibrium measurements of melting temperatures were determined from thermal denaturation melt curves using an UNcle (UNchained Labs) based on the barycentric mean of intrinsic tryptophan fluorescence, with data collected from 20-95°C using a thermal ramp of 1°C per minute in a background of 25mM Tris pH 8.0, 150mM NaCl and 5% glycerol. Melting temperatures were defined as the maximum point of the first derivative of the melting curve, with first derivatives calculated using GraphPad Prism software after smoothing with four neighboring points using 2nd order polynomial settings.

Quantitative analyses were performed using custom code, available on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS).

For quantitative analysis of deep mutational scanning expression phenotypes (see Method Details section, "Calculating variant phenotypes for expression"), we determined per-variant expression via maximum likelihood inference using the fitdistrplus R package (Delignette-Muller and Dutang, 2015) .

For quantitative analysis of deep mutational scanning binding phenotypes (see Method Details section, "Calculating variant phenotypes for ACE2-binding affinity"), we determined per-variant titration curves via weighted least squares nonlinear regression in R.

To quantitatively decompose single-mutant effects on expression and binding (see Figure S2 legend and Method Details section, "Decomposing single-mutant effects from multiple-mutant genotypes"), we fit global epistasis regression models (Otwinowski et al. 2018 ) using the dms_variants Python package (https://jbloomlab.github.io/dms_variants/dms_variants.globalepistasis.html).

For quantification of binding via Biolayer Inferometry (see Figures 4D, S4A -F,I), global curve fitting to determine kon and koff was performed using a 1:1 binding model in the ForteBio data analysis software.

To quantify thermal stability from melting curves (see Figures 4G, S4H) , the GraphPad Prism software was used to identify the maximum point of the first derivative of the melting curve.

For the statistical analysis of mutations observed among circulating SARS-CoV-2 isolates described in Figure S7 , we used permutation tests to assess significant trends in effects of observed mutations compared to the distribution of randomly sampled mutation subsets.

• Table S1 

-2.16 n.d. C *measurements from Letko et al. 2020 Bat SARSrelated CoV (non-Asian) 10 -7 10 -9 10 -11 10 -13 

[ACE2](M) 0 10 -6.5 0 10 -13 10 -12.5 10 -12 10 -11.5 10 -11 10 -10.5 10 -10 10 -9.5 10 -9 10 -8.5 10 -8 10 -7.5 10 -7 10 -6.5 10 -6 bin1 bin2 bin3 bin4 331  335  340  345  350  355  360  365  370  375  380  385  390  395  400  405  410  415  420  425 335  340  345  350  355  360  365  370  375  380  385  390  395  400  405  410  415  420  425  430   432 435  440  445  450  455  460  465  470  475  480  485  490  495  500  505  510  515  520  525  530 RBD site 

Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves

The proximal origin of SARS-CoV-2

Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies

Synthetic recombinant bat SARS-like coronavirus is infectious in cultured cells and in mice

Cryptic transmission of SARS-CoV-2 in Washington State. medRxiv. Posted online 16

An experimentally determined evolutionary model dramatically improves phylogenetic fit

Permissive secondary mutations enable the evolution of influenza oseltamivir resistance

Yeast surface display for screening combinatorial polypeptide libraries

SARS-CoV and emergent coronaviruses: viral determinants of interspecies transmission

Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic

Potent neutralizing antibodies from COVID-19 patients define multiple targets of vulnerability

Complex and Dynamic Interactions between Parvovirus Capsids, Transferrin Receptors, and Antibodies Control Cell Infection and Host Range

Potent neutralizing antibodies against SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients' B cells

Yeast-expressed recombinant protein of the receptor-binding domain in SARS-CoV spike protein with deglycosylated forms as a SARS vaccine candidate

Optimization of the Production Process and Characterization of the Yeast-Expressed SARS-CoV Recombinant Receptor-Binding Domain (RBD219-N1), a SARS Vaccine Candidate

Yeast-Expressed SARS-CoV Recombinant Receptor-Binding Domain (RBD219-N1) Formulated with Alum Induces Protective Immunity and Reduces Immune Enhancement

The SARS-CoV-2 Vaccine Pipeline: an Overview

alignparse: A Python package for parsing complex features from high-throughput long-read sequencing

Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays

Origin and evolution of pathogenic coronaviruses

Somatic Hypermutation-Induced Changes in the Structure and Dynamics of HIV-1 Broadly Neutralizing Antibodies

Mechanistic approaches to the study of evolution: the functional synthesis

fitdistrplus: An R Package for Fitting Distributions

Evidence for ACE2-utilizing coronaviruses (CoVs) related to severe acute respiratory syndrome CoV in bats

Comprehensive Mapping of HIV-1 Escape from a Broadly Neutralizing Antibody

An Antigenic Atlas of HIV-1 Escape from Broadly Neutralizing Antibodies Distinguishes Functional and Structural Epitopes

Emergence of genomic diversity and recurrent mutations in SARS-CoV-2

Glycan Masking Focuses Immune Responses to the HIV-1 CD4-Binding Site and Enhances Elicitation of VRC01-Class Precursor Antibodies

Guiding the immune response against influenza virus hemagglutinin toward the conserved stalk domain by hyperglycosylation of the globular head domain

Data, disease and diplomacy: GISAID's innovative contribution to global health

Identification of global suppressors for temperature-sensitive folding mutations of the P22 tailspike protein

Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States

Deep mutational scanning: a new style of protein science

Exceptional diversity and selection pressure on SARS-CoV and SARS-CoV-2 host receptor in bats compared to other mammals

Extraepitopic compensatory substitutions partially restore fitness to simian immunodeficiency virus variants that escape from an immunodominant cytotoxic-T-lymphocyte response

Molecular determinants of severe acute respiratory syndrome coronavirus pathogenesis and virulence in young and aged mouse models of human disease

High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method

Stability-mediated epistasis constrains the evolution of an influenza protein

Production of complex human glycoproteins in yeast

Evolutionary biochemistry: revealing the historical and physical causes of protein properties

Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift

Parallel, tag-directed assembly of locally derived short sequence reads

Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence

dms-view: Interactive visualization tool for deep mutational scanning data

SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor

Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus

Neutralization of SARS-CoV-2 by Destruction of the Prefusion Spike

Structural basis of neutralization by a human anti-severe acute respiratory syndrome spike protein antibody, 80R

HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen

Human neutralizing antibodies elicited by SARS-CoV-2 infection

Protein folding stability can determine the efficiency of escape from endoplasmic reticulum quality control

Secretion efficiency in Saccharomyces cerevisiae of bovine pancreatic trypsin inhibitor mutants lacking disulfide bonds is correlated with thermodynamic stability

Structure-based design of native-like HIV-1 envelope trimers to silence non-neutralizing epitopes and eliminate CD4 binding

Transmission routes of respiratory viruses among humans

Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins

Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor

Coronavirus hemagglutinin-esterase and spike proteins co-evolve for functional balance and optimal virion avidity

Origin and cross-species transmission of bat coronaviruses in China

Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants

Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin

Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses

Structural Analysis of Major Species Barriers between Humans and Palm Civets for Severe Acute Respiratory Syndrome Coronavirus Infections

Minimap2: pairwise alignment for nucleotide sequences

Structure of SARS coronavirus spike receptor-binding domain complexed with receptor

Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus

Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2

Animal models in virus research: their utility and limitations

HIV-1 fitness cost associated with escape from the VRC01 class of CD4 binding site neutralizing antibodies

Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans. bioRxiv. Posted online

Multiplex assessment of protein variant abundance by massively parallel sequencing

A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence

SARS-like WIV1-CoV poised for human emergence

Phase 1/2 Study to Describe the Safety and Immunogenicity of a COVID-19 RNA Vaccine Candidate (BNT162b1) in Adults 18 to 55 Years of Age: Interim Report. medRxiv. Posted online

Inferring the shape of global epistasis

Role of framework mutations and antibody flexibility in the evolution of broadly neutralizing antibodies

Structural insights into immune recognition of the severe acute respiratory syndrome coronavirus S protein receptor binding domain

Sort-seq under the hood: implications of design choices on large-scale characterization of sequencefunction relations

Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody

Amino acid coevolution induces an evolutionary Stokes shift

Alteration of T4 lysozyme structure by second-site reversion of deleterious mutations

Sequence space and the ongoing expansion of the protein universe

Structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody

The receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients

Phylogenetic evidence for deleterious mutation load in RNA viruses and its contribution to viral evolution

Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy

The SARS-CoV-2 receptor-binding domain elicits a potent neutralizing response without antibody-dependent enhancement. bioRxiv

Antibody signature induced by SARS-CoV-2 spike protein immunogens in rabbits

Difference in Receptor Usage between Severe Acute Respiratory Syndrome (SARS) Coronavirus and SARS-Like Coronavirus of Bat Origin

Escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus

Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model

Improving pandemic influenza risk assessment

Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries

Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody

Stability effects of increasing the hydrophobicity of solvent-exposed side chains in staphylococcal nuclease

Analysis of a SARS-CoV-2-Infected Individual Reveals Development of Potent Neutralizing Antibodies with Limited Somatic Mutation

Contingency and entrenchment in protein evolution under purifying selection

Structural basis of receptor recognition by SARS-CoV-2

Mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium

Pathways of cross-species transmission of synthetically reconstructed zoonotic severe acute respiratory syndrome coronavirus

A human neutralizing antibody targets the receptor binding site of SARS-CoV-2

Yeast polypeptide fusion surface display levels predict thermal stability and soluble secretion efficiency

Mapping the antigenic and genetic evolution of influenza virus

Mutational effects and the evolution of new protein functions

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Epistasis in protein evolution

Pervasive contingency and entrenchment in a billion years of Hsp90 evolution

Rapid Generation of Neutralizing Antibody Responses in COVID-19 Patients

Maximum allowed solvent accessibilites of residues in proteins

How protein stability and new functions trade off

Detection of novel SARS-like and other coronaviruses in bats from Kenya

Delay of HIV-1 rebound after cessation of antiretroviral therapy through passive transfer of human neutralizing antibodies

Altair: Interactive Statistical Visualizations for Python

Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion

Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein

Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus

Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs

Protect, modify, deprotect (PMD): A strategy for creating vaccines to elicit antibodies targeting a specific epitope

Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas

A novel high-throughput screen reveals yeast genes that increase secretion of heterologous proteins

Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation

Structural Basis for Potent Neutralization of Betacoronaviruses by Single-Domain Camelid Antibodies

SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects

Mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus

Diversity of Functionally Permissive Sequences in the Receptor-Binding Site of Influenza Hemagglutinin

Different genetic barriers for resistance to HA stem antibodies in influenza H3 and H1 viruses

Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1

A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2

Linking influenza virus evolution within and between human hosts

Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2

A vaccine targeting the RBD of the S protein of SARS-CoV-2 induces protective immunity

Structural basis of a shared antibody response to SARS-CoV-2

A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV

Immunization with the receptorbinding domain of SARS-CoV-2 elicits antibodies cross-neutralizing SARS-CoV-2 and SARS-CoV without antibody-dependent enhancement

Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients

A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Structure-Based Design with Tag-Based Purification and In-Process Biotinylation Enable Streamlined Development of SARS-CoV-2 Spike Molecular Probes

Potently neutralizing and protective human antibodies against SARS-CoV-2

Measured effects on folding and ACE2 binding of all mutations to the SARS-CoV-2 RBD • Provide open data and interactive visualization for vaccine design and surveillance • Identify constrained surfaces as ideal targets for vaccines and antibody therapeutics • Mutations that enhance ACE2 affinity exist but are not selected in pandemic isolates In brief

systematically change every amino acid in the receptor binding domain (RBD) of the SARS-CoV-2 Spike protein and determine the effects of the substitutions on Spike expression, folding, and ACE2 binding. The work identifies structurally constrained regions of the Spike RBD that would be ideal targets for COVID-19 countermeasures and demonstrates that mutations in the virus which enhance ACE2 affinity can be engineered but have not

We thank Keara Malone for experimental assistance, Katherine Xue for helpful suggestions, and Frederick Matsen for intellectual support and hospitality. We thank the Flow Cytometry and Genomics core facilities at the Fred Hutchinson Cancer Research Center for experimental support, and Mike Murphy, Deleah Pettie, and the Mammalian Production Core at the Institute for Protein Design for assistance with protein purification. This work was supported by the NIAID / NIH (R01AI141707 and R01AI12893 to J.D.B., HHSN272201700059C to D.V.,