key: cord-0263030-1ogqrn2q authors: Huang, Qiu Yu; Song, Kangkang; Xu, Chen; Bolon, Daniel N.A.; Wang, Jennifer P.; Finberg, Robert W.; Schiffer, Celia A.; Somasundaran, Mohan title: Quantitative Structural Analysis of Influenza Virus by Cryo-electron Tomography and Convolutional Neural Networks date: 2021-12-09 journal: bioRxiv DOI: 10.1101/2021.12.09.472010 sha: 6914d6a01251b713135be39793d678897350160f doc_id: 263030 cord_uid: 1ogqrn2q Influenza viruses pose severe public health threats; they cause millions of infections and tens of thousands of deaths annually in the US. Influenza viruses are extensively pleomorphic, in both shape and size as well as organization of viral structural proteins. Analysis of influenza morphology and ultrastructure can help elucidate viral structure-function relationships as well as aid in therapeutics and vaccine development. While cryo-electron tomography (cryoET) can depict the 3D organization of pleomorphic influenza, the low signal-to-noise ratio inherent to cryoET and extensive viral heterogeneity have precluded detailed characterization of influenza viruses. In this report, we developed a cryoET processing pipeline leveraging convolutional neural networks (CNNs) to characterize the morphological architecture of the A/Puerto Rico/8/34 (H1N1) influenza strain. Our pipeline improved the throughput of cryoET analysis and accurately identified viral components within tomograms. Using this approach, we successfully characterized influenza viral morphology, glycoprotein density, and conduct subtomogram averaging of HA glycoproteins. Application of this processing pipeline can aid in the structural characterization of not only influenza viruses, but other pleomorphic viruses and infected cells. Graphical abstract acid on host cells (Gamblin and Skehel, 2010) . HA mediates cellular entry by binding 67 sialic acids on cell surface proteins and lipids to promote membrane fusion, while NA 68 cleaves sialic acid from the cell surface to release newly budded viruses and spread to 69 new cells (Gamblin and Skehel, 2010) . HA is a homotrimer; each subunit contains a 70 globular head domain or receptor-binding site, and a stem that contains the fusion peptide 71 required for membrane fusion during cellular entry (Corti and Lanzavecchia, 2013) . NA is 72 a homotetramer that binds the host membrane via a thin stalk; each subunit contains an 73 enzymatic active site that cleaves sialic acid for viral release (Gamblin and Skehel, 2010 (Fig 2, Fig S1) . After individual networks were 145 trained, each tomogram was segmented. The resulting segmentations were merged to 146 create a multi-layer mask representing each structural component of influenza particles 147 (Fig 3) . To differentiate between virions where an M1 layer is present beneath the lipid 148 bilayer (Fig 3a) and virions without the M1 protein assembly (Fig 3b) , two networks were 149 trained to recognize a thicker density layer and a thinner one, respectively. Despite the coordinates of glycoproteins identified by the CNN were extracted and modelled as point 169 clouds (Fig 2) . Each point cloud represents the 3D morphology of a single virion; thus, 170 morphological analyses of influenza viruses were conducted based on the surface model 171 each point cloud generated (Fig 2) . This method provides an alternative approach from 172 manually measuring viral axes in tomograms. Moreover, modelling each virion using 173 glycoprotein coordinates examines 3D reconstructions instead of tomographic slices. 174 This approach reduces potential biases concerning measuring virus size from 175 tomographic slices due to the differential orientation of ice-embedded virions. 176 Additionally, the morphological profile of PR8 virions was characterized (Fig 4) . In 193 terms of size, influenza viruses were extensively pleomorphic. However, the vast majority 194 of virions identified were spherical or oval in shape; the median axial ratio of this sample 195 was 1.12 (Fig 4a) . The long axes of PR8 virions ranged from 63 nm to 359 nm, with a 196 mean of approximately 130 nm (Fig 4a) . The short axis length was more narrowly 197 distributed; they ranged between 56 and 211 nm with a mean of 105 nm (Fig 4a) . provides further confidence to this CNN-based pipeline. It was found that only twelve out 202 of 311 particles have long axes more than twice the length of their short axis. The most 203 prominent observation of a filamentous virion was of a particle that had a long axis which 204 was 5.9 times longer than the short axis (Table S1) . 205 206 Additionally, the internal structural components of influenza particles were 207 quantified (Fig 4b) . Approximately a quarter (n=84) of virions did not have an intact M1 208 protein layer beneath the lipid bilayer (Fig 4b) . virions (n=18) lacked vRNP complexes (Fig 4b) . The role of M1 in binding vRNP 218 complexes for viral budding is further illustrated through our results, as 16/18 virions that 219 lacked vRNPs also lacked M1, which underscores the importance of M1 binding on the 220 incorporation of vRNPs into nascent virions. Furthermore, morphology quantification 221 suggested that influenza particle size is modulated by M1. Size distribution of virions with 222 the M1 protein is narrower in comparison to virions lacking M1 (Fig 4e) . While the 25 th 223 percentile and median long axis lengths for virions with and without M1 protein are 224 comparable (Table S1), the distribution of viral axes length without the M1 protein is right-225 skewed. The 75 th percentile of M1-lacking viral long axis is 160 nm, whereas it is only 138 226 nm for viruses with M1 present. This pattern was more prominent for the short axis. PR8 227 short axis length shows an overall increase when the particle lacks a M1 protein layer; 228 the median short axis of M1-absent particles is 10 nm greater than M1-containing virions 229 (Fig 4d) . Moreover, there were no virions with a short axis above 160 nm when the M1 230 protein was present, whereas the short axis of viruses lacking M1 extended up to 210 nm. 231 These data further support that M1 helps mediate viral shape and morphology and 232 underscores its role in regulating not only filament formation, but also the shape of non-233 filamentous influenza viruses. 234 Histogram of axial ratios of IAV particles (n=311 , 2012) , the median inter-glycoprotein distance in this sample was measured as 9.6 256 ± 3 nm (Fig 5) . The 25 th percentile inter-glycoprotein distance was 8.6 nm and the 75 th 257 percentile distance was 11.1 nm, suggesting that glycoprotein organization is tightly 258 regulated on influenza virion surfaces. The appearance of outliers could indicate the 259 presence of empty patches on influenza surfaces bereft of glycoproteins or partial virions 260 captured within tomograms (Fig S2) . 261 The subtomogram average of a glycoprotein array confirmed the inter-glycoprotein 263 spacing calculations. Subtomograms containing an array of glycoproteins were extracted 264 for alignment and averaging (Fig 6) . A cylindrical mask was applied on the extracted 265 subtomograms post-alignment, which resulted in a clear array of three by two 266 glycoproteins perpendicular to membrane density (Fig 6a) . The lack of density between 267 the glycoprotein stem and the membrane is most likely due to the flexible linker region. 268 The median inter-glycoprotein spacing for the glycoprotein array was 10.5 nm, and the 269 interquartile range was between 10 nm to 10.7 nm (Fig 6b) . As glycoprotein coordinates 270 were extracted from hundreds of virions, these data demonstrate that glycoprotein 271 spacing is tightly regulated and well preserved on influenza virion surfaces. correlates with the number of glycoproteins at that value. Influenza glycoprotein spacing does not vary based on virion morphology 290 291 To investigate whether influenza particle morphology affects glycoprotein density, 292 single virion glycoprotein spacing were calculated for spherical (1.0 < axial ratio < 1.2), 293 oval (1.2 < axial ratio < 1.4), and elongated (axial ratio > 1.4) virions (Fig 7) . Glycoprotein 294 coordinates were extracted from individual virions. Between these populations, changes 295 within glycoprotein spacing were small. The median inter-glycoprotein spacing, and 296 interquartile range were comparable between all three morphologies ( Table S2) While 297 one-sample t tests between the populations reported a significant difference between the 298 means for all groups, the differences in mean glycoprotein spacing vary by less than 0.5 299 nm on average and are likely too small to result in changes in viral infectivity and 300 interaction with host. This finding further underscores the importance of regulating 301 glycoproteins spacing on influenza viral surfaces. 302 glycoprotein spacing on spherical particles between a viral glycoprotein with its three 308 closest neighbours. c. Histogram of inter-glycoprotein spacing on oval particles between 309 a viral glycoprotein with its three closest neighbours. d. Histogram of inter-glycoprotein 310 spacing on elongated particles between a viral glycoprotein with its three closest 311 neighbours. In situ influenza HA reconstruction 314 315 From the low-resolution glycoprotein array, in situ subtomogram averaging was 316 conducted for focused refinement on one glycoprotein. As the extracted glycoprotein 317 subtomograms were derived from a mixture of both HA particles and NA particles, 318 reference-free subtomogram averaging was performed on glycoprotein particles without 319 imposed symmetry for further classification analyses. Since initial alignments were 320 performed with a large glycoprotein array, particle re-extraction was performed after five c. d. rounds of particle alignment to refine glycoprotein locations. This re-extraction 322 eliminated the bottom 10% of particles according to the alignment score. After five 323 further rounds of reference-free subtomogram averaging, three-fold symmetry emerged 324 in the density map revealing a prefusion HA trimer (Fig 8, Fig S3) . The resolved HA 325 trimer was 13 nm in length and 8 nm in width. At an FSC=0.143, the resolution of the 326 unmasked map was 13 Å, and map-to-model resolution was 17 Å (Fig S4) . The map 327 shows clear separation between globular head domains and stem domains of HA, as 328 well as between the head domains of each HA monomer, and there was a central cavity 329 between the three HA head domains (Fig 8A) . Most likely due to linker flexibility, the 330 membrane anchor portion of HA was not resolved in the map. Next, we compared the 331 new in situ map with a previously determined crystal structure of H1 HA (Gamblin et al., 332 2004) (Fig 8B) . Overall, the map agreed well with the structure, with a correlation score 333 of 0.79. The separation between the head and stem domains of the HA crystal structure 334 is accounted for by the central cavity between the globular head domains in the map. Structural characterization of extensively pleomorphic influenza viruses has long been 348 an arduous task. In this report, we developed a cryoET analysis pipeline incorporating 349 Our processing pipeline uses individual CNNs to annotate each of these components and 357 dissects entire virions into distinct layers for further processing. As the CNN segmentation 358 implemented in EMAN2 allows for combining several CNNs, this approach allows for 359 flexible characterization of influenza viruses. Each particle is recognized as the composite 360 of individual components instead of a virion of fixed size or shape. Therefore, influenza 361 particles are readily identified despite morphological heterogeneity. Moreover, each viral 362 structure is annotated independently, which allows for structural characterization of 363 influenza virions without further subtomogram extraction and averaging (Fig 3) . Size and 364 morphological analysis can be fine-tuned using these annotations, including extraction of 365 glycoprotein coordinates to calculate a 3D influenza virion model (Fig 2) . from the coordinates identified using CNNs (Fig 6) . Our results corroborated well with 390 each other and further demonstrates the ability of CNNs in extracting detailed structural 391 data from pleomorphic viruses. 392 As the automated nature of particle picking enable enough glycoproteins for 394 reference-free subtomogram averaging, we performed focused refinement on the 395 glycoprotein array. To our surprise, a prefusion HA trimer emerged after reference-free 396 subtomogram averaging. Using this approach, we were able to obtain a reconstruction of 397 an in situ structure of influenza A HA. were removed based on the standard deviation of a point's distance from its closest 500 neighbours from the average distances across the point clouds. Single virion point clouds 501 were generated using the K-Means clustering algorithm implemented in the Scikit-Learn 502 library (Pedregosa et al., 2011) . Point clouds were exported as STL surfaces and each 503 axis of the surface was automatically calculated in UCSF Chimera (Pettersen et al., 2004) . 504 505 Subtomogram averaging parameters 506 All subtomogram averaging steps were carried out in EMAN2 (Chen et al., 2019a) . To 507 build an initial alignment model, 1000 extracted glycoprotein arrays were aligned and 508 averaged; after five rounds of alignment, clear density of six glycoproteins perpendicular 509 to a membrane emerged. The initial model was low pass filtered to 50 Å, and the full 510 glycoprotein particle set containing 85,021 particles was aligned to this model with a 511 cylindrical mask. The glycoprotein array subtomogram average after five further rounds 512 of refinement was used for further refinement of glycoproteins. Based on the 513 transformation matrix, glycoproteins were re-extracted. This step also discarded particles 514 with the lowest 10% alignment scores as well as overlapping particles. At this stage, 515 76,519 particles were split into two sets based on the tomogram it was extracted, and 516 independent refinement was carried out for both sets to ensure no overlapping particles 517 were used for even/odd refinements. To reduce mask-related artefacts, no masks were 518 applied at this stage. The subtomogram averages were carried out with the top 80% of 519 particles according to alignment score. After five further rounds of 3D refinement, the 520 Fourier Shell correlation was measured between unmasked maps from the two 521 independent refinements. Rigid body fitting of a H1 trimer (PDB: 1RUZ) was performed 522 in UCSF ChimeraX (Pettersen et al., 2021) . Post-rigid body fitting, real space refinement 523 was conducted using the Phenix software package; the refined structure was used to 524 10.1038/s41592-019-0591-8. 565 Convolutional neural networks for automated annotation of cellular cryo-electron 567 tomograms Characterization of influenza virus 569 PR8 strain cultured in embryonated eggs by cryo-electron tomography Broadly neutralizing antiviral antibodies Filamentous 574 influenza viruses Influenza A Virus Hemagglutinin-576 Preserving Virus Motility The M1 matrix protein controls the filamentous 579 phenotype of influenza A virus Interim Estimates of 2016-581 17 Seasonal Influenza Vaccine Effectiveness -United States Structural 584 changes in Influenza virus at low pH characterized by cryo-electron tomography The structure and receptor binding properties of the 1918 588 influenza hemagglutinin Influenza hemagglutinin and neuraminidase membrane 590 glycoproteins Functional balance 592 between neuraminidase and haemagglutinin in influenza viruses Kinetic analysis of the influenza 596 A virus HA/NA balance reveals contribution of NA to virus-receptor binding and NA-dependent 597 rolling on receptor-containing surfaces Influenza virus pleiomorphy characterized by cryoelectron tomography A mutant influenza virus that uses an N1 neuraminidase 603 as the receptor-binding protein Influenza-virus 605 membrane fusion by cooperative fold-back of stochastically induced hemagglutinin 606 intermediates Morphology of influenza B/Lee/40 determined by cryo-609 electron microscopy Neuraminidase inhibition contributes to 612 influenza A virus neutralization by anti-hemagglutinin stem antibodies Influenza Computer visualization of three-618 dimensional image data using IMOD Broadly neutralizing antibodies against influenza viruses Automated electron microscope tomography using robust prediction 622 of specimen movements Balanced hemagglutinin and neuraminidase activities are critical for 625 efficient replication of influenza A virus Scikit-learn: Machine Learning in Python UCSF Chimera--a visualization system for exploratory research and analysis UCSF ChimeraX: Structure visualization for researchers, educators, and 635 developers The native structure of the 638 assembled matrix protein 1 of influenza A virus Mutations in Influenza A Virus Neuraminidase 642 and Hemagglutinin Confer Resistance against a Broadly Neutralizing Hemagglutinin Stem Fecal microbiota transplantation in relapsing Clostridium 645 difficile infection Influenza virus assembly and budding Spherical influenza viruses have a fitness 649 advantage in embryonated eggs, while filament-producing strains are selected in vivo Microscopy Structures of Chimeric Hemagglutinin Displayed on a Universal Influenza Vaccine 654 Candidate Influenza A virus surface proteins are organized to help 656 penetrate host mucus Cryotomography of budding influenza A virus reveals filaments with diverse morphologies that 659 mostly do not bear a genome at their distal end Interdependence of 662 hemagglutinin glycosylation and neuraminidase as regulators of influenza virus growth: a study 663 by reverse genetics Distribution of surface 665 glycoproteins on influenza A virus determined by electron cryotomography 668 Functional balance of the hemagglutinin and neuraminidase activities accompanies the 669 emergence of the 2009 H1N1 influenza pandemic Open3D: A Modern Library for 3D Data Processing Representative output from (a) 678 glycoproteins (b) vRNP complexes (c) M1 + lipid bilayer (d) lipid bilayer alone are 679 shown. The images represent the 2D tomographic slice, manual annotation of feature of 680 interest, and automated annotation from CNNs. Samples without manual annotation 681 represent negative samples Influenza virions with empty membrane patches or cut off 685 by grid edge. (a) and (b) show PR8 particles with membrane segments bereft of 686 glycoproteins; sections are shown with red arrows. (c) is an example of a virion that was 687 cut off. All scale bars are 50 nm as of the date of publication. Github repository is linked in the key resources table. 441• Any additional information required to reanalyze the data reported in this paper is 442 available from the lead contact upon request. gold (Sigma-Aldrich) were applied to freshly washed and glow-discharged R2/2 holey 452 carbon copper grids (Quantifoil). The grids were manually back blotted for 2 to 3s with 453 filter paper, and rapidly frozen in liquid ethane using an EMS-2 rapid immersion freezer. 454The vitrified grids were stored in LN2. The specimen were imaged on a Thermo Fisher 455 Scientific Titan Krios electron microscope operating at 300 kV with a K3 camera (Gatan, 456 Pleasanton, USA) and a Gatan energy filter at a slit width of 20 eV. The tilt series were 457 recorded using a dose-symmetric tilt scheme (Hagen et al, Implementation of a cryo-458 electron tomography tilt-scheme optimized for high resolution subtomogram averaging. 459 2017). The tilt series start at 0° in 2° or 3° increment at a magnification of 42,000 460 (corresponding to a calibrated 2.0873 Å of pixel size). And the tilt series range is limited 461 to ±66°. Without a Volta phase plate, the defocus range is 3-4 µm. With a Volta phase 462 plate, the defocus is 0.5 µm. The total dose used was less than 120 e/Å 2 . Data were 463 acquired automatically under a low-dose mode using the SerialEM software (Mastronarde, 464 2005) . The beam-induced motion of movie frames was corrected with IMOD software 465 (Kremer et al., 1996) .