key: cord-0296267-5lld270m authors: Hurdiss, Daniel L.; Drulyte, Ieva; Lang, Yifei; Shamorkina, Tatiana M.; Pronker, Matti F.; van Kuppeveld, Frank J.M.; Snijder, Joost; de Groot, Raoul J. title: Cryo-EM structure of coronavirus-HKU1 haemagglutinin esterase reveals architectural changes arising from prolonged circulation in humans date: 2020-03-26 journal: bioRxiv DOI: 10.1101/2020.03.25.998963 sha: 66f675bdd0d9c4a0a782abd0fbdd9116461ee053 doc_id: 296267 cord_uid: 5lld270m The human betacoronaviruses HKU1 and OC43 (subgenus Embecovirus) arose from separate zoonotic introductions, OC43 relatively recently and HKU1 apparently much longer ago. Embecovirus particles are studded with two types of surface projections called S (for spike) and HE (for haemagglutinin-esterase), with S mediating receptor-binding and membrane fusion and HE acting as a receptor-destroying enzyme. Together, they promote dynamic virion attachment to glycan-based receptors with 9-O-acetylated sialic acid as main constituent. We recently showed that adaptation of HKU1 and OC43 to replication in the human respiratory tract involved loss-of-function mutations in the lectin domain of HE. Here we present the cryo-EM structure of the ∼80 kDa, heavily glycosylated HKU1 HE at a global resolution of 3.4 Å. Comparison to existing HE structures reveals a drastically truncated lectin domain, incompatible with sialic acid binding, but with the structure and function of the HE esterase domain left intact. Our cryo-EM structure, in combination with mass spectrometry analysis, also describes the extent of glycosylation on the now redundant lectin domain, which forms a putative glycan shield. The findings further our insight into the evolution and host adaptation of human embecoviruses and also demonstrate the utility of cryo-EM for studying small, heavily glycosylated proteins which are intractable to X-ray crystallography. densely clustered sialoglycans on mucins [15] [16] [17] . However, OC43 and HKU1, subject to Figure 1 : Structure determination of HKU1 HE by single-particle analysis. A) Representative motion-corrected electron micrograph of HKU1 HE embedded in vitreous ice. B) Representative reference-free 2D class averages. C) Orthogonal views of the HKU1 HE EM density (coloured by subunit). D) Cartoon representation of the atomic model of the dimeric HKU1 HE complex. E) EM density (blue mesh) zoned 2 Å around an α-helix comprising residues 112-132. F) EM density (blue mesh) zoned 2 Å around a β-sheet comprising residues 181-188, 219-224 and 243-248. HKU1 HE contains eight predicted N-linked glycosylation sites which are strictly conserved 128 between all HKU1 field strains studied so far. To better characterise the occupancy and 129 composition of each of these N-linked glycosylation sites, we performed in-depth 130 glycoproteomics profiling of the same recombinant HEK293T cell derived material used for semi-quantitative analysis (Table S2 ). As expected from HEK293T cell derived materials 26 , 137 glycosylation was predominantly of complex type and very heterogeneous, ranging from 8-59 138 unique glycoforms identified for each site ( Figure 3A ). For sites N145, N168 and N193, 139 situated on the LD loops ( Figure 3B ), we also detected substantial signals for the unmodified 140 asparagines, without glycosylation. Based on the combined signal intensities of all glycoforms, 141 we found that the occupancy of those sites is approximately 81% for N145, <2% for N168 and 142 44% for N193. The low occupancy of N168 agrees with the lack of density observed in the 143 cryo-EM map ( Figure 3C ). Furthermore, the high B-factors of the LD loops suggests flexibility 144 in this region ( Figure S3E ), which explains the limited density for N110, despite having 100% 145 occupancy. With the exception of N168, the first core N-acetyl glucosamine (GlcNAc) was 146 modelled for each of the LD loop glycans ( Figure 3C ). The remaining four sites are fully 147 occupied based on our MS data, in accordance with strong densities observed in the cryo-EM 148 map. Indeed, we were able to model the entire Man3,GlcNAc2 core for N314 ( Figure 3C ). Apart 149 from differences in glycan occupancy, we also observed marked differences in glycan 150 composition. Whereas the overall pattern is dominated by complex glycosylation, sites N83 151 and N328 show predominant hybrid and high-mannose glycosylation, respectively. Sites N110 152 and N145, which contain mostly complex glycans, are also heavily (core) fucosylated and 153 contain higher numbers of sialic acids. Whereas glycosylation varies substantially from site-154 to-site, and is very heterogeneous, we did identify a set of glycan compositions that are highly 155 abundant and shared at the majority of sites, as listed in Supplementary Table S3. Bottom three panels show semiquantitative analyses from extracted peak areas of sitespecific N-glycosylation by glycan type (non-glycosylated, high-mannose, hybrid, or complex), fucosylation, and sialylation. A full overview is presented in Supplementary Table S2 . B) Surface representation of the dimeric HKU1 HE atomic model, with modelled N-glycans shown as spheres and coloured according to the predominant glycan type shown in panel (A). C) EM density (blue mesh) zoned 2 Å around each of the modelled N-glycans and analagous region for N168. The occupancy and glycan length distribution from glycoproteomics analysis for each site is shown below. and N193, two of which localise to the remnants of the elongated β5-β6 loop and β7-β10 loop. In an attempt to understand the evolutionary benefit of LD loop deletions and increased N-169 linked glycosylation, we first looked at the sequence conservation of BCoV HE. The BCoV LD exhibits modest sequence variation which localises primarily to the prominent β5-β6 loop, cryo-EM map, the glycans on the LD are not visible beyond the first core GlcNAc. However, 174 mass spectrometry analysis confirms that these primarily contain complex glycans, comprising 175 between 9 to 16 saccharide units. To understand where these lesser-ordered regions are 176 situated, a difference map of the N-linked glycans was generated and a gaussian filter was 177 applied in order to highlight low resolution features. Interestingly, density for the disordered 178 portion of these glycans forms a crown of glycan density, which encircles the LD and covers 179 much of its surface ( Figure 5C ). Of note, density belonging to the N110 glycan of HKU1 180 overlaps with the former sialic acid binding site. Side-by-side comparison shows that the LD 181 of HKU1 is ~8 Å shorter than BCoV, with none of the CBS loop remnants protruding above 182 the glycan crown ( Figure 5D -F). While our mass spectrometry data reveals that the N168 site 183 is only 1.6 % occupied, we do observe difference density which extends tangentially from this 184 site. This putative N168 density extends over the entrance to the esterase domain active site With respect to the glycans as a shield against humoral immunity, the HKU1 HE glycan crown 235 not only covers much of the LD, but also appears to extend outwards over the esterase active 236 site entrance. The entrance to the HE esterase domain active site is subject to sequence humoral immunity. In the context of a crowded viral envelope, with densely packed S and HE 239 proteins, the membrane distal glycan crown of HKU1 HE might help shield the regions below 240 without any potential loss of function resulting from direct glycosylation of the ED active site 241 entrance. Interestingly, OC43 has independently acquired two N-linked glycans in equivalent positions 243 to HKU1. Thus, our findings for HKU1 offer a glimpse into a possible future for OC43 HE. During circulation of OC43 in the last 70-120 years, the HE LD underwent structural changes 245 which were deleterious to 9-O-Ac-Sia binding, while remaining largely intact. Over a longer 246 timespan OC43 HE may well follow the fate of its HKU1 homologue, however, through similar 247 deletions of the surface-exposed loops in the LD. Conversely, the evolution of OC43 allows 248 informed speculation on the early steps in HKU1's adaptation to humans. From a general evolutionary perspective, our data offers insights into how viruses deal with a Further sub-classification attempts did not lead to improvements in map quality or resolution. Per particle defocus estimation improved the resolution to 3.73 Å. Relion's Bayesian polishing 309 procedure was then performed on these particles, with all movie frames included, which performed using Relion. An overview of the data processing pipeline is shown in Figure S2 . MS2 spectra were acquired at a resolution of 30 000 with an AGC target of 5 * 10^5, maximum Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome A pneumonia outbreak associated with a new coronavirus of probable bat origin Genetic Recombination, and Pathogenesis of Betacoronavirus Adaptation to Humans Involved Progressive Loss of Hemagglutinin-Esterase Lectin Activity Human Coronavirus OC43 Associated with Fatal Encephalitis Clinical features and molecular epidemiology of coronavirus-HKU1-associated community-acquired pneumonia Discovery of a novel coronavirus, China Rattus coronavirus HKU24, from Norway rats supports the murine origin of Betacoronavirus 1 and has implications for the ancestor of Betacoronavirus lineage A Origin and evolution of pathogenic coronaviruses Complete genomic sequence of human coronavirus OC43: molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission event Human coronaviruses OC43 and HKU1 bind to 9-O-acetylated sialic acids via a conserved receptor-binding site in spike protein domain A Structure of coronavirus hemagglutinin-esterase offers insight into corona and influenza virus evolution The murine coronavirus hemagglutinin-esterase receptorbinding site: a major shift in ligand specificity through modest changes in architecture Coronavirus receptor switch explained from the stereochemistry of protein-carbohydrate interactions and a single mutation Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia The resolution revolution Structural basis for human coronavirus attachment to sialic acid receptors Pre-fusion structure of a human coronavirus spike protein Cryo-EM structure of haemoglobin at 3.2 Å determined with the Volta phase plate High-resolution structure determination of sub-100 kDa complexes using conventional cryo-EM Single particle cryo-EM reconstruction of 52 kDa streptavidin at 3 Angstrom resolution Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1 An Atlas of Human Glycosylation Pathways Enables Display of the 46 Privateer: software for the conformational validation of carbohydrate structures Carbohydrate anomalies in the PDB The EMBL-EBI search and sequence analysis tools APIs in 2019 UCSF ChimeraX: Meeting modern challenges in visualization and analysis Unambiguous phosphosite localization using electrontransfer/higher-energy collision dissociation (EThcD) Byonic: advanced peptide and protein identification software Skyline: an open source document editor for creating and analyzing targeted proteomics experiments