key: cord-0297518-90t8k8nc authors: Giri, Rajanish; Bhardwaj, Taniya; Saumya, Kumar Udit; Gadhave, Kundlik; Kapuganti, Shivani K; Sharma, Nitin title: The aggregation potential of Zika virus proteome date: 2022-03-27 journal: bioRxiv DOI: 10.1101/2022.03.26.485915 sha: 585e55083b5320e29419794483d2a0ef96e47d26 doc_id: 297518 cord_uid: 90t8k8nc The ability of human encoded soluble proteins to convert into amyloid fibrils is now recognized as a generic phenomenon in several human illnesses. Typically, such disease causal proteins/peptides consist of aggregation-prone regions (APR) that make them susceptible to misfolding and assemble into highly ordered β-sheet rich fibrils, distinct from their native soluble state. Here, we show that the zika virus (ZIKV) consists of several such aggregation prone hotspots spread across its entire proteome. Using a combination of high-accuracy prediction tools, we identified APRs in both structural and non-structural proteins of ZIKV. Furthermore, we have experimentally validated the bioinformatic results by subjecting the ZIKV proteins and peptides to artificial aggregation inducing environment. Using a combination of dye-based assays (ThT and ANS) and microscopy techniques (HR-TEM and AFM), we further characterized the morphological features of amyloid-like fibrils. We found that Envelope domain III (EDIII) protein, NS1 β-roll peptide, membrane-embedded signal peptide 2K, and cytosolic region of NS4B protein to be highly aggregating in the experimental setup. Our findings also pave the way for an extensive and detailed functional analysis of these predicted APRs in the future to enhance our understanding of the role played by amyloids in the pathogenesis of flavivirus. Graphical Abstract The ability of human encoded soluble proteins to convert into amyloid fibrils is now recognized as a generic phenomenon in several human illnesses. Typically, such disease causal proteins/peptides consist of aggregation-prone regions (APR) that make them susceptible to misfolding and assemble into highly ordered β -sheet rich fibrils, distinct from their native soluble state. Here, we show that the zika virus (ZIKV) consists of several such aggregation prone hotspots spread across its entire proteome. Using a combination of high-accuracy prediction tools, we identified APRs in both structural and non-structural proteins of ZIKV. Furthermore, we have experimentally validated the bioinformatic results by subjecting the ZIKV proteins and peptides to artificial aggregation inducing environment. Using a combination of dye-based assays (ThT and ANS) and microscopy techniques (HR-TEM and AFM), we further characterized the morphological features of amyloid-like fibrils. We found that Envelope domain III (EDIII) protein, NS1 ߚ -roll peptide, membrane-embedded signal peptide 2K, and cytosolic region of NS4B protein to be highly aggregating in the experimental setup. Our findings also pave the way for an extensive and detailed functional analysis of these predicted APRs in the future to enhance our understanding of the role played by amyloids in the pathogenesis of flavivirus. ZIKV is an arthropod-borne RNA virus belonging to the family flaviviridae. The onset of ZIKV infection is associated with mild symptoms and rarely requires hospitalization. 1 However, a significant threat of its infection lies in its neurotropic behavior as a wide range of human cells are highly permissive of its replication. Severe neurological complications such as microcephaly, 3, 4 In addition, ZIKV causes more rapid neural apoptosis in infants than in older mice models. 5 Several neurological disorders are attributed to protein misfolding and aggregation into amyloid fibrils, contributing to an extensive array of human diseases that are becoming more prevalent. Alzheimer's disease (AD), Parkinson's disease, and Huntington's disease are some extensively studied proteopathic disorders, primarily caused by the formation and deposition of amyloidogenic proteins. 6 Typically, such disease causal proteins and peptides undergo misfolding and assemble into highly ordered β -sheet rich fibrils. In this process, aggregationnucleating regions, also described as aggregation prone regions (APRs) within protein sequence, predispose them to oligomerize into insoluble precursor proto-fibrils. Several of such protofibrillar units assemble that eventually mature into amyloid fibrils. 7 These amyloid fibrils accumulate in tissues and subsequently lead to cellular damage and dysfunction, specific to the kind of protein involved. Viruses are recently known to affect the causal proteins of many neurodegenerative disorders like AD. Viral proteins are also investigated to form amyloidsfunctional as well as toxic to the host. 8, 9 Thus, an increasing cohort of viral diseases are now linked with amyloidogenic disorders, but surprisingly, this phenomenon in flaviviral infections is yet to be recognized. Previously, in this context, our group showed the formation of amyloid-like fibrils by ZIKV capsid anchor, which adopts β -sheet conformation, and the aggregates are cytotoxic to mammalian cells. 10 In that direction, this is important to investigate the aggregation propensity of as many Zika proteins as possible. In this manuscript, we have highlighted multiple hotspots with a high propensity to aggregate across the ZIKV proteome. Our analysis is further validated by aggregation of Envelope domain III (EDIII) protein, NS1 ߚ -roll peptide (1-30 residues of NS1 protein), 2K peptide, and NS4B-CR peptide (NS4B cytosolic region -131-169 residues) through artificially prepared aggregation inducing environment. Our findings here indicate yet another probable mechanism through which ZIKV may impart its pathogenic effect on infected cells. Numerous proteopathic disorders are associated with protein misfolding and aggregation. [11] [12] [13] Extensive studies on these pathogenic proteins have led to the recognition of common properties the APRs in the ZIKV proteome. 14 We have also utilized CamSol to identify the hydrophobic regions in ZIKV proteins (Figures 1 and 2, and Tables 1 and 2). The three structural proteins: Capsid (C), membrane precursor (prM), and Envelope (E), are essential for virus particle formation. Aggregation prone regions in them are predicted under physiological conditions. The ~12-kDa C protein is synthesized first, and its primary role is to facilitate the packaging of vg-RNA. The 104 amino acid long ZIKV C protein structurally comprises of an intrinsically disordered N-terminal region, four α -helices (α1-α4), and short interspersed loops connecting the helices. 15 Residues 11-17 are predicted as APR by three prediction tools in our study (Figure 2a and Table 1 ). Further, our computational screening also revealed several APRs containing residues 45-52. This particular region comprises of helices α 2 and α 3, which play a crucial role in stabilizing capsid homodimers through extensive hydrogen-bonding and hydrophobic interactions with surrounding protomer residues. 15 Also, nearby residues 42 and 43 are reported to be conserved and functionally indispensable for nuclear localization of the capsid protein. 17 At its C-terminus, C protein is connected to an 18-residue peptide known as capsid anchor (CA). Previously, we have reported the aggregation potential of ZIKV CA and have shown that the amyloid-like aggregates of CA are toxic to mammalian cells. 10 The structural proteins positioned next to the capsid towards ER lumen are prM and E proteins. The prM protein is composed of pr peptide and M protein, while E protein consists of three domains: EDI, EDII, and EDIII, along with stem and transmembrane regions. 19 The prM protein plays a crucial role in the transition of non-infectious immature viral particles to infectious mature virion, while E protein serves as the major membrane fusion and host surface binding protein for the virus. 20, 21 During maturation, furin protease assisted cleavage separates pr from M, leaving behind M protein as membrane-embedded while pr fragment falls off. 22 Our APR scanning in the 168 amino acid long prM protein led us to identify multiple hotspots, especially within residues 74-80, 136-145, and 154-163, which contains high scoring common APRs predicted by all five prediction tools. The 500 amino acid long E protein, which is the largest among all structural proteins, also contains many common APRs such as residues 197-200, 250-254, 381-386, 465-471, and 489-496 (Figure 2c and Table 1 ). Using the same four computational tools, the aggregation prone regions in the seven nonstructural proteins and a transmembrane peptide 2K are predicted under physiological conditions. The first non-structural protein, NS1, is considered essential in host immune surveillance and interaction with host membranes. 23 Table 2 ). Further, the succeeding proteins in the polyprotein NS2A and NS2B have several common APR hotspots distributed throughout the protein. Flavivirus NS2A protein is a membrane-associated hydrophobic protein engaged in multiple functions besides its involvement in RNA replication. 25 The NS2B is another membrane-bound protein engaged in interaction with NS3 to form an active NS2B-NS3 protease complex. 26 In our analysis, we found that the residues 7-20, 29-65 and 75-98, 104-160, and 197-220 consisted of many APRs in NS2A protein while residues 7-16, 32-44, 96-107, and 120-125 forms common APRs in NS2B protein ( Table 2) . NS3, a bifunctional enzyme, consists of the N-terminal protease domain (residues 1-167) and the C-terminal helicase domain (residues 168-618), essential for polyprotein processing and viral replication, respectively. 27 According to our observations, MetAmyl has predicted a maximum number of APRs in this protein, while TANGO has predicted the least. Most of the APRs are located in the helicase domain of NS3 protein (Refer to Table 2 for all APRs in NS3 protein). NS4A and NS4B proteins functions in rearranging the host ER membrane contributing to the formation of virus-induced membranous vesicles. 28, 29 Recently, ZIKV NS4A has been established as a cofactor of NS3 helicase ATPase activity. 30 From our analysis, we observed that APR distribution was majorly concentrated between the residues 53-73 and 104-121. In ZIKV NS4B protein, residues 38-50, 100-120, and 230-243 contain most of the APRs. In contrast, the 2K transmembrane region of 23 amino acids preceding the NS4B protein is predicted to be highly aggregation prone, having the potential to aggregate in most of its residues. The largest non-structural protein, NS5, is a complex protein with dual catalytic activity. Its N- Table 2 for all APRs). We have mapped the 20S proteasome cleavage sites in all the proteins of ZIKV using a SVMbased method known as Pcleavage and a proteasome site predictor NetChop. 32, 33 According to the results, all the ZIKV proteins contain the 20S cleavage sites; however, NetChop has predicted more cleavage sites than the Pcleavage method (Table S1 ). We experimentally investigated the aggregation behavior of certain proteins and peptides of ZIKV. For this purpose, we have used ThT dye-and ANS dye-based assays to observe the fibrillation process and HR-TEM and AFM to visualize fibril morphology. The EDIII corresponds to 302-409 residues of full-length E protein. It forms the antigenic determinant residues for host immune responses and, therefore, is considered a potential immunogen in subunit vaccine development. 34 According to our analysis, it contains potential amyloid-forming segments throughout the region (Figure 2c and Table 1 The flavivirus 2K peptide is a 23 amino acid fragment between NS4A and NS4B proteins. 35 ZIKV 2K is predicted to contain a fragment of 7-22 residues as an APR by all the servers used in this study (Figure 2i , and Table 2 ). Figure 5a depicts the ThT assay wherein the aggregated sample of 2K peptide displays a ~six-fold increase of ThT fluorescence intensity compared to the monomer. With ANS, ~2.5-fold increase in fluorescence intensity of aggregated sample (720 hrs) is observed in comparison with the monomer. The increasing fluorescence was also accompanied by a characteristic blue shift in the emission spectra from 535 nm to 490 nm, further indicating the formation of amyloid fibrils by 2K (figure 5b) . Furthermore, the HR-TEM images of the 720 hrs incubated sample reveal a typical amyloid structure with dense thread-like long and short interconnected fibrils (figure 5c). AFM images shown in figure 5d also uncovered the similar fibrillar structures of amyloid aggregates of 2K peptide. Figure 5e represents the height profile which quantitatively measures the height and width of the aggregates. These aggregated 2K peptide microscopy images are analogous to previously reported amyloids produced by many cytosolic and transmembrane protein aggregates. NS4B has a cytosolic region of ~35 amino acids present between the transmembrane domains 3 and 4, which plays a crucial role in replicating the viral genome. 36 The bioinformatic analysis shows the presence of multiple APRs throughout the NS4B protein, one of which is a five residue APR segment predicted by FoldAmyloid server from 150-155 amino acids spanning the cytosolic region (Figure 2j and Table 2 The occurrence of aggregation "hot spots" has been well-defined in the proteins and peptides, leading to several neurological diseases. There are more than 50 diseases known so far where protein aggregation and amyloid formation has been implicated as the hallmark. 37 43 HIV has been demonstrated to increase the levels of Aβ42 in cerebrospinal fluid in HIV-associated neurologic disease, known as HAND. 41 Further, the Ljuangan virus has been found to be present in neurons, astrocytes, and amyloid plaques in the hippocampus region of Alzheimer's patients. 40 A closely-related Hepatitis C virus in extreme cases can cause cognitive impairment and dementia. 46 Moreover, aggregation of viral proteins has been demonstrated in several cases. PB1-F2 protein of influenza itself adopts β -sheets conformation and forms amorphous aggregates in vitro. 47 Similarly, the M45 protein of murine cytomegalovirus derives self-assembly into amyloid fibrils. Furthermore, it interacts with many host proteins such as RIPK1 and RIPK3 and forms hetero-oligomeric amyloid fibrils altering their natural functions. 48 Recently, many protein regions of SARS-CoV-2 are observed to form amyloid-like structures in vitro. 49 An important thing to note here is that viruses can induce severe neurological symptoms by regulating the pathways vital for neural function and directly or indirectly affect the major hallmark proteins like Aβ42, APP, and Tau. They can also exert their pathogenesis through an aggregation-mediated pathway where the viral proteins are not only able to form insoluble aggregates but also forms hetero-oligomeric fibrils with host proteins. It is a well-known fact that the deposition of insoluble amyloid-like aggregates in between the neural cells can cause brain tissue atrophy. Evidence show that viruses have been evolving to be more pathogenic and are adopting misfolded conformations (viral proteins) while also affecting the host protein-folding pathways. It can be one of the ongoing evolutionary ways which require thorough investigation. As a neurotropic arbovirus, ZIKV has been associated with many severe neurological disorders of both central and peripheral nervous systems. 50 Its remarkable ability to invade the placentalblood-barrier, which only a few viruses are capable of, can cause irreversible damage to the fetal brain. Congenital ZIKV syndrome consists of a group of birth defects that includes microcephaly, partly collapsed skull, and hydrocephaly like disorders. 51 In infants, it persists even after birth which can trigger long-lasting effects on the nervous system and thus might be related to several neurological diseases. 51 The protein misfolding leads to amyloid fibrils which are found to be linked with a number of human diseases. The abnormal deposition of proteins leads to amyloid plaque, causing tissue damage and organ dysfunction. The neurodegenerative disease such as microcephaly linked with ZIKV is still a mystery in terms of its pathological mechanism. The aggregation propensity of ZIKV proteome lays some interesting questions for its pathological functionality, which are hidden till now. The experimental validation of proteins and peptides indicates the high potential of APRs in the ZIKV proteome to form amyloid fibrils. Further, studies are needed to establish a mechanistic link between ZIKV and its associated neurodegenerative pathogenesis. We believe that this present study will have implications in the discovery of some novel molecules against ZIKV originated amyloid fibrils and associated diseases. The peptides corresponding to NS1 The ZIKV proteome sequences for the strain Mr766 (UniProt ID: Q32ZE1) were derived from To dissolve any pre-formed aggregates, peptides were treated with HFIP solution (100%). It was left to evaporate at room temperature in a desiccator for overnight. The 2K peptide was dissolved in 10 % DMSO and 20 mM sodium phosphate buffer (pH 7.4) at a 1 mg/ml concentration. The NS1 ߚ -roll peptide was dissolved in 1 X PBS (pH 7) at 0.5 mg/ml concentration. The NS4B-CR peptide was dissolved in 20 mM sodium phosphate buffer (pH 7.4) at a 1 mg/ml concentration. The EDIII protein was purified as described above. The peptides and EDIII protein were then subjected to constant shaking at 1000 rpm and 37 °C using an Eppendorf ThermoMixer ® C. Samples were collected at different time intervals during experiments. ThT fluorescent dye binds selectively with amyloid fibrils to give a fluorescence peak at 490 nm. 59,60 Therefore, it was employed to study the formation of aggregates and kinetics reaction of ZIKV protein regions. Samples were prepared by below given protocol: ߚ -roll and NS4B-CR was studied by plotting the change in ThT fluorescence intensity on the y-axis with time (hrs) on the x-axis. The resultant data plot was fitted using a sigmoidal curve to obtain t 50 (when ThT fluorescence intensity reaches 50% of its maximum value). The equation describing the kinetics of fibrillogenesis is: Where A1 indicates the initial fluorescence, A2 is the final fluorescence, x_0 is the midpoint (t 50 value), and dx is a time constant. ANS, an extensively utilized fluorescent molecular probe, binds to cationic groups of proteins and is routinely used to study solvent-exposed hydrophobic clusters of proteins during amyloidogenesis. 60 ZIKV protein regions were monitored for their susceptibility of binding with ANS dye during the aggregation process. Samples were prepared by below given protocol: AFM images were captured using tapping-mode AFM (Dimension Icon from Bruker) system. Aggregated samples after thirty-fold dilution were drop-casted on the mica surface. The mica sheets were dried at room temperature overnight, and images were captured. Morphology of aggregates was also confirmed through HR-TEM (FP 5022/22-Tecnai G2 20 S-TWIN, FEI). A twenty-fold diluted solution of aggregated samples were mounted on 200-mesh carbon-coated copper grids (Ted Pella, Inc, USA). Samples were then negatively stained using 3% ammonium molybdate solution, following which grids were air-dried overnight, and images were captured using HR-TEM with an accelerating voltage of 200 kV. Data availability: All data are contained within the manuscript or as supporting information. Zika virus: An overview Zika virus promotes neuronal cell death in a non-cell autonomous manner by triggering the release of neurotoxic factors Zika Virus-Induced Neuronal Apoptosis via Increased Mitochondrial Fragmentation Zika virus infection during the period of maximal brain growth causes microcephaly and corticospinal neuron apoptosis in wild type mice Protein aggregation and its consequences for human disease Distinct stress conditions result in aggregation of proteins with similar properties Virus-Induced Aggregates in Infected Cells Virus-Induced Cytoplasmic Aggregates and Inclusions are Critical Cellular Regulatory and Antiviral Factors Zika virus capsid anchor forms cytotoxic amyloid-like fibrils Protein misfolding and aggregation in Alzheimer's disease and Type 2 Diabetes Mellitus Protein misfolding and aggregation in neurodegenerative diseases: a review of pathogeneses, novel detection strategies, and potential therapeutics Propagation and spread of pathogenic protein assemblies in neurodegenerative diseases Computational prediction of protein aggregation: Advances in proteomics, conformation-specific algorithms and biotechnological applications Capsid protein structure in Zika virus reveals the flavivirus assembly process Zika virus capsid anchor forms cytotoxic amyloid-like fibrils Specific interaction of capsid protein and importin-alpha/beta influences West Nile virus production Role of Capsid Anchor in the Morphogenesis of Zika Virus Structures of the Zika Virus Envelope Protein and Its Complex with a Flavivirus Broadly Protective Antibody Role of Zika Virus prM Protein in Viral Pathogenicity and Use in Vaccine Development Structures and Functions of the Envelope Glycoprotein in Flavivirus Infections Zika virus genome biology and molecular pathogenesis. Emerging microbes & infections 6 Flavivirus NS1: a multifaceted enigmatic viral protein Structure-guided insights on the role of NS1 in flavivirus infection Role of nonstructural protein NS2A in flavivirus assembly Crystal structure of Zika virus NS2B-NS3 protease in complex with a boronate inhibitor Crystal structure of a novel conformational state of the flavivirus NS3 protein: implications for polyprotein processing and viral replication NS4A and NS4B proteins from dengue virus: membranotropic regions Characterization of dengue virus NS4A and NS4B protein interaction Zika virus NS4A N-Terminal region (1-48) acts as a cofactor for inducing NTPase activity of NS3 helicase but not NS3 protease Crystal structure of full-length Zika virus NS5 protein reveals a conformation similar to Japanese encephalitis virus NS5 Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences Prediction of proteasome cleavage motifs by neural networks Rational Design of Zika Virus Subunit Vaccine with Enhanced Efficacy A single-amino acid substitution in West Nile virus 2K peptide between NS4A and NS4B confers resistance to lycorine, a flavivirus inhibitor Flaviviral NS4b, chameleon and jack-in-the-box roles in viral replication and pathogenesis, and a molecular target for antiviral intervention Editorial: Protein Aggregation and Solubility in Microorganisms (Archaea, Bacteria and Unicellular Eukaryotes): Implications and Applications Protein Misfolding, Amyloid Formation, and Human Disease: A Summary of Progress Over the Last Decade Prediction of 'hot spots' of aggregation in disease-linked polypeptides Picornavirus Identified in Alzheimer's Disease Brains: A Pathogenic Path? CSF biomarkers of Alzheimer disease in HIV-associated neurologic disease Temporal cognitive decline associated with exposure to infectious agents in a population-based, aging cohort A 3D human brain-like tissue model of herpes-induced Alzheimer's disease Highly pathogenic H5N1 influenza virus can enter the central nervous system and induce neuroinflammation and neurodegeneration Approved antiviral drugs over the past 50 years Hepatitis C viral infection and the risk of dementia PB1-F2 influenza A virus protein adopts a β -sheet conformation and forms amyloid fibers in membrane environments Viral M45 and necroptosis-associated proteins form heteromeric amyloid assemblies Amyloidogenic proteins in the SARS-CoV and SARS-CoV-2 proteomes Neurological manifestations of Zika virus infection Development of Infants With Congenital Zika Syndrome: What Do We Know and What Can We Expect? Zika Virus Infection of Human Mesenchymal Stem Cells Promotes Differential Expression of Proteins Linked to Several Neurological Diseases Amyloid precursor protein is a restriction factor that protects against Zika virus infection in mammalian brains Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins AGGRESCAN: A server for the prediction and evaluation of 'hot spots' of aggregation in polypeptides MetAmyl: A METa-predictor for AMYLoid proteins A method of prediction of amyloidogenic regions from protein sequence Protein Solubility Predictions Using the CamSol Method in the Study of Protein Homeostasis. Cold Spring Harbor perspectives in biology 11 Thioflavin T as an amyloid dye: fibril quantification, optimal concentration and effect on aggregation Effect of the fluorescent probes ThT and ANS on the mature amyloid fibrils