10_1101-2020_03_27_012757 ---- 67941284 1 2 3 4 5 6 Evaluating the transcriptional fidelity of cancer models 7 8 9 Da Peng1*, Rachel Gleyzer2*, Wen-Hsin Tai2, Pavithra Kumar2, Qin Bian2, Bradley Issacs2, 10 Edroaldo Lummertz da Rocha3, Stephanie Cai1, Kathleen DiNapoli4,5, Franklin W Huang6, 11 Patrick Cahan1,2,7 12 13 1Department of Biomedical Engineering, Johns Hopkins University School of Medicine, 14 Baltimore MD 21205 USA 15 16 2Institute for Cell Engineering, Johns Hopkins University School of Medicine, 17 Baltimore MD 21205 USA 18 19 3Department of Microbiology, Immunology and Parasitology, 20 Federal University of Santa Catarina, Florianópolis SC, Brazil 21 22 4Department of Cell Biology, Johns Hopkins University School of Medicine, 23 Baltimore, MD 21205 USA 24 25 5Department of Electrical and Computer Engineering, Johns Hopkins University, 26 Baltimore MD 21218 USA 27 28 6Division of Hematology/Oncology, Department of Medicine; Helen Diller Family Cancer Center; 29 Bakar Computational Health Sciences Institute; Institute for Human Genetics; 30 University of California, San Francisco, San Francisco, CA 31 32 7Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 33 Baltimore MD 21205 USA 34 35 36 * These authors made equal contributions. 37 38 39 Correspondence to: patrick.cahan@jhmi.edu 40 41 Article type: Research 42 43 Website: http://www.cahanlab.org/resources/cancerCellNet_web 44 45 Code: https://github.com/pcahan1/cancerCellNet 46 47 48 49 50 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 ABSTRACT 51 52 Background: Cancer researchers use cell lines, patient derived xenografts, engineered mice, 53 and tumoroids as models to investigate tumor biology and to identify therapies. The 54 generalizability and power of a model derives from the fidelity with which it represents the tumor 55 type under investigation, however, the extent to which this is true is often unclear. The 56 preponderance of models and the ability to readily generate new ones has created a demand 57 for tools that can measure the extent and ways in which cancer models resemble or diverge 58 from native tumors. 59 60 Methods: We developed a machine learning based computational tool, CancerCellNet, that 61 measures the similarity of cancer models to 22 naturally occurring tumor types and 36 subtypes, 62 in a platform and species agnostic manner. We applied this tool to 657 cancer cell lines, 415 63 patient derived xenografts, 26 distinct genetically engineered mouse models, and 131 64 tumoroids. We validated CancerCellNet by application to independent data, and we tested 65 several predictions with immunofluorescence. 66 67 Results: We have documented the cancer models with the greatest transcriptional fidelity to 68 natural tumors, we have identified cancers underserved by adequate models, and we have 69 found models with annotations that do not match their classification. By comparing models 70 across modalities, we report that, on average, genetically engineered mice and tumoroids have 71 higher transcriptional fidelity than patient derived xenografts and cell lines in four out of five 72 tumor types. However, several patient derived xenografts and tumoroids have classification 73 scores that are on par with native tumors, highlighting both their potential as faithful model 74 classes and their heterogeneity. 75 76 Conclusions: CancerCellNet enables the rapid assessment of transcriptional fidelity of tumor 77 models. We have made CancerCellNet available as freely downloadable software and as a web 78 application that can be applied to new cancer models that allows for direct comparison to the 79 cancer models evaluated here. 80 81 82 83 84 85 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 INTRODUCTION 86 Models are widely used to investigate cancer biology and to identify potential therapeutics. 87 Popular modeling modalities are cancer cell lines (CCLs)1, genetically engineered mouse 88 models (GEMMs)2, patient derived xenografts (PDXs)3, and tumoroids4. These classes of 89 models differ in the types of questions that they are designed to address. CCLs are often used 90 to address cell intrinsic mechanistic questions5, GEMMs to chart progression of molecularly 91 defined-disease6, and PDXs to explore patient-specific response to therapy in a physiologically 92 relevant context7. More recently, tumoroids have emerged as relatively inexpensive, 93 physiological, in vitro 3D models of tumor epithelium with applications ranging from measuring 94 drug responsiveness to exploring tumor dependence on cancer stem cells. Models also differ in 95 the extent to which the they represent specific aspects of a cancer type8. Even with this intra- 96 and inter-class model variation, all models should represent the tumor type or subtype under 97 investigation, and not another type of tumor, and not a non-cancerous tissue. Therefore, cancer-98 models should be selected not only based on the specific biological question but also based on 99 the similarity of the model to the cancer type under investigation9,10. 100 Various methods have been proposed to determine the similarity of cancer models to 101 their intended subjects. Domcke et al devised a 'suitability score' as a metric of the molecular 102 similarity of CCLs to high grade serous ovarian carcinoma based on a heuristic weighting of 103 copy number alterations, mutation status of several genes that distinguish ovarian cancer 104 subtypes, and hypermutation status11. Other studies have taken analogous approaches by 105 either focusing on transcriptomic or ensemble molecular profiles (e.g. transcriptomic and copy 106 number alterations) to quantify the similarity of cell lines to tumors12–14. These studies were 107 tumor-type specific, focusing on CCLs that model, for example, hepatocellular carcinoma or 108 breast cancer. Notably, Yu et al compared the transcriptomes of CCLs to The Cancer Genome 109 Atlas (TCGA) by correlation analysis, resulting in a panel of CCLs recommended as most 110 representative of 22 tumor types15. Most recently, Najgebauer et al16 and Salvadores et al17 111 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 have developed methods to assess CCLs using molecular traits such as copy number 112 alterations (CNA), somatic mutations, DNA methylation and transcriptomics. While all of these 113 studies have provided valuable information, they leave two major challenges unmet. The first 114 challenge is to determine the fidelity of GEMMs, PDXs, and tumoroids, and whether there are 115 stark differences between these classes of models and CCLs. The other major unmet challenge 116 is to enable the rapid assessment of new, emerging cancer models. This challenge is especially 117 relevant now as technical barriers to generating models have been substantially lowered18,19, 118 and because new models such as PDXs and tumoroids can be derived on patient-specific basis 119 therefore should be considered a distinct entity requiring individual validation4,20. 120 To address these challenges, we developed CancerCellNet (CCN), a computational tool 121 that uses transcriptomic data to quantitatively assess the similarity between cancer models and 122 22 naturally occurring tumor types and 36 subtypes in a platform- and species-agnostic manner. 123 Here, we describe CCN’s performance, and the results of applying it to assess 657 CCLs, 415 124 PDXs, 26 GEMMs, and 131 tumoroids. This has allowed us to identify the most faithful models 125 currently available, to document cancers underserved by adequate models, and to find models 126 with inaccurate tumor type annotation. Moreover, because CCN is open-source and easy to 127 use, it can be readily applied to newly generated cancer models as a means to assess their 128 fidelity. 129 130 RESULTS 131 CancerCellNet classifies samples accurately across species and technologies 132 Previously, we had developed a computational tool using the Random Forest 133 classification method to measure the similarity of engineered cell populations to their in vivo 134 counterparts based on transcriptional profiles21,22. More recently, we elaborated on this 135 approach to allow for classification of single cell RNA-seq data in a manner that allows for 136 cross-platform and cross-species analysis23. Here, we used an analogous approach to build a 137 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 platform that would allow us to quantitatively compare cancer models to naturally occurring 138 patient tumors (Fig 1A). In brief, we used TCGA RNA-seq expression data from 22 solid tumor 139 types to train a top-pair multi-class Random forest classifier (Fig 1B). We combined training 140 data from Rectal Adenocarcinoma (READ) and Colon Adenocarcinoma (COAD) into one 141 COAD_READ category because READ and COAD are considered to be virtually 142 indistinguishable at a molecular level24. We included an ‘Unknown’ category trained using 143 randomly shuffled gene-pair profiles generated from the training data of 22 tumor types to 144 identify query samples that are not reflective of any of the training data. To estimate the 145 performance of CCN and how it is impacted by parameter variation, we performed a parameter 146 sweep with a 5-fold 2/3 cross-validation strategy (i.e. 2/3 of the data sampled across each 147 cancer type was used to train, 1/3 was used to validate) (Fig 1C). The performance of CCN, as 148 measured by the mean area under the precision recall curve (AUPRC), did not fall below 0.945 149 and remained relatively stable across parameter sets (Supp Fig 1A). The optimal parameters 150 resulted in 1,979 features. The mean AUPRCs exceeded 0.95 in most tumor types with this 151 optimal parameter set (Fig 1D, Supp Fig 1B). The AUPRCs of CCN applied to independent 152 data RNA-Seq data from 725 tumors across five tumor types from the International Cancer 153 Genome Consortium (ICGC)25 ranged from 0.93 to 0.99, supporting the notion that the platform 154 is able to accurately classify tumor samples from diverse sources (Fig 1E). 155 As one of the central aims of our study is to compare distinct cancer models, including 156 GEMMs, our method needed to be able to classify samples from mouse and human samples 157 equivalently. We used the Top-Pair transform23 to achieve this and we tested the feasibility of 158 this approach by assessing the performance of a normal (i.e. non-tumor) cell and tissue 159 classifier trained on human data as applied to mouse samples. Consistent with prior 160 applications23, we found that the cross-species classifier performed well, achieving mean 161 AUPRC of 0.97 when applied to mouse data (Supp Fig 1C). 162 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 To evaluate cancer models at a finer resolution, we also developed an approach to 163 perform tumor subtype classifications (Supp Fig 1D). We constructed 11 different cancer 164 subtype classifiers based on the availability of expression or histological subtype 165 information24,26–36. We also included non-cancerous, normal tissues as categories for several 166 subtype classifiers when sufficient data was available: breast invasive carcinoma (BRCA), 167 COAD_READ, head and neck squamous cell carcinoma (HNSC), kidney renal clear cell 168 carcinoma (KIRC) and uterine corpus endometrial carcinoma (UCEC). The 11 subtype 169 classifiers all achieved high overall average AUPRs ranging from 0.80 to 0.99 (Supp Fig 1E). 170 171 Fidelity of cancer cell lines 172 Having validated the performance of CCN, we then used it to determine the fidelity of 173 CCLs. We mined RNA-seq expression data of 657 different cell lines across 20 cancer types 174 from the Cancer Cell Line Encyclopedia (CCLE) and applied CCN to them, finding a wide 175 classification range for cell lines of each tumor type (Fig 2A, Supp Tab 1). To verify the 176 classification results, we applied CCN to expression profiles from CCLE generated through 177 microarray expression profiling37. To ensure that CCN would function on microarray data, we 178 first tested it by applying a CCN classifier created to test microarray data to 720 expression 179 profiles of 12 tumor types. The cross-platform CCN classifier performed well, based on the 180 comparison to study-provided annotation, achieving a mean AUPRC of 0.91 (Supp Fig 2A). 181 Next, we applied this cross-platform classifier to microarray expression profiles from CCLE 182 (Supp Fig 2B). From the classification results of 571 cell lines that have both RNA-seq and 183 microarray expression profiles, we found a strong overall positive association between the 184 classification scores from RNA-seq and those from microarray (Supp Fig 2C). This comparison 185 supports the notion that the classification scores for each cell line are not artifacts of profiling 186 methodology. Moreover, this comparison shows that the scores are consistent between the 187 times that the cell lines were first assayed by microarray expression profiling in 2012 and by 188 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 RNA-Seq in 2019. We also observed high level of correlation between our analysis and the 189 analysis done by Yu et al15(Supp Fig 2D), further validating the robustness of the CCN results. 190 Next, we assessed the extent to which CCN classifications agreed with their nominal 191 tumor type of origin, which entailed translating quantitative CCN scores to classification labels. 192 To achieve this, we selected a decision threshold that maximized the Macro F1 measure, 193 harmonic mean of precision and recall, across 50 cross validations. Then, we annotated cell 194 lines based their CCN score profile as follows. Cell lines with CCN scores > threshold for the 195 tumor type of origin were annotated as 'correct'. Cell lines with CCN scores > threshold in the 196 tumor type of origin and at least one other tumor type were annotated as 'mixed'. Cell lines with 197 CCN scores > threshold for tumor types other than that of the cell line's origin were annotated 198 as 'other'. Cell lines that did not receive a CCN score > threshold for any tumor type were 199 annotated as 'none' (Fig 2B). We found that majority of cell lines originally annotated as Breast 200 invasive carcinoma (BRCA), Cervical squamous cell carcinoma and endocervical 201 adenocarcinoma (CESC), Skin Cutaneous Melanoma (SKCM), Colorectal Cancer 202 (COAD_READ) and Sarcoma (SARC) fell into the 'correct' category (Fig 2B). On the other 203 hand, no Esophageal carcinoma (ESCA), Pancreatic adenocarcinoma (PAAD) or Brain Lower 204 Grade Glioma (LGG) were classified as 'correct', demonstrating the need for more 205 transcriptionally faithful cell lines that model those general cancer types. 206 There are several possible explanations for cell lines not receiving a 'correct' 207 classification. One possibility is that the sample was incorrectly labeled in the study from which 208 we harvested the expression data. Consistent with this explanation, we found that colorectal 209 cancer line NCI-H68438,39, a cell line labelled as liver hepatocellular carcinoma (LIHC) by CCLE, 210 was classified strongly as COAD_READ (Supp Tab 1). Another possibility to explain low CCN 211 score is that cell lines were derived from subtypes of tumors that are not well-represented in 212 TCGA. To explore this hypothesis, we first performed tumor subtype classification on CCLs from 213 11 tumor types for which we had trained subtype classifiers (Supp Tab 2). We reasoned that if 214 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 a cell was a good model for a rarer subtype, then it would receive a poor general classification 215 but a high classification for the subtype that it models well. Therefore, we counted the number of 216 lines that fit this pattern. We found that of the 188 lines with no general classification, 25 (13%) 217 were classified as a specific subtype, suggesting that derivation from rare subtypes is not the 218 major contributor to the poor overall fidelity of CCLs. 219 Another potential contributor to low scoring cell lines is intra-tumor stromal and immune 220 cell impurity in the training data. If impurity were a confounder of CCN scoring, then we would 221 expect a strong positive correlation between mean purity and mean CCN classification scores of 222 CCLs per general tumor type. However, the Pearson correlation coefficient between the mean 223 purity of general tumor type and mean CCN classification scores of CCLs in the corresponding 224 general tumor type was low (0.14), suggesting that tumor purity is not a major contributor to the 225 low CCN scores across CCLs (Supp Fig 2E). 226 227 Comparison of SKCM and GBM CCLs to scRNA-seq 228 To more directly assess the impact of intra-tumor heterogeneity in the training data on 229 evaluating cell lines, we constructed a classifier using cell types found in human melanoma and 230 glioblastoma scRNA-seq data40,41. Previously, we have demonstrated the feasibility of using our 231 classification approach on scRNA-seq data23. Our scRNA-seq classifier achieved a high 232 average AUPRC (0.95) when applied to held-out data and high mean AUPRC (0.99) when 233 applied to few purified bulk testing samples (Supp Fig 3A-B). Comparing the CCN score from 234 bulk RNA-seq general classifier and scRNA-seq classifier, we observed a high level of 235 correlation (Pearson correlation of 0.89) between the SKCM CCN classification scores and 236 scRNA-seq SKCM malignant CCN classification scores for SKCM cell lines (Fig 2C, Supp Fig 237 3C). Of the 41 SKCM cell lines that were classified as SKCM by the bulk classifier, 37 were also 238 classified as SKCM malignant cells by the scRNA-seq classifier. Interestingly, we also observed 239 a high correlation between the SARC CCN classification score and scRNA-seq cancer 240 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 associated fibroblast (CAF) CCN classification scores (Pearson correlation of 0.92). Six of the 241 seven SKCM cell lines that had been classified as exclusively SARC by CCN were classified as 242 CAF by the scRNA-seq classifier (Fig 2D, Supp Fig 3C), which suggests the possibility that 243 these cell lines were derived from CAF or other mesenchymal populations, or that they have 244 acquired a mesenchymal character through their derivation. The high level of agreement 245 between scRNA-seq and bulk RNA-seq classification results shows that heterogeneity in the 246 training data of general CCN classifier has little impact in the classification of SKCM cell lines. 247 In contrast, we observed a weaker correlation between GBM CCN classification scores 248 and scRNA-seq GBM neoplastic CCN classification scores (Pearson correlation of 0.72) for 249 GBM cell lines (Fig 2E, Supp Fig 3D). Of the 31 GBM lines that were not classified as GBM 250 with CCN, 25 were classified as GBM neoplastic cells with the scRNA-seq classifier. Among the 251 22 GBM lines that were classified as SARC with CCN, 15 cell lines were classified as CAF (Fig 252 2F), 10 which were classified as both GBM neoplastic and CAF in the scRNA-seq classifier. 253 Similar to the situation with SKCM lines that classify as CAF, this result is consistent with the 254 possibility that some GBM lines classified as SARC by CCN could be derived from 255 mesenchymal subtypes exhibiting both strong mesenchymal signatures and glioblastoma 256 signatures or that they have acquired a mesenchymal character through their derivation. The 257 lower level of agreement between scRNA-seq and bulk RNA-seq classification results for GBM 258 models suggests that the heterogeneity of glioblastomas42 can impact the classification of GBM 259 cell lines, and that the use of scRNA-seq classifier can resolve this deficiency. 260 261 Immunofluorescence confirmation of CCN predictions 262 To experimentally explore some of our computational analyses, we performed 263 immunofluorescence on three cell lines that were not classified as their labelled categories: the 264 ovarian cancer line SK-OV-3 had a high UCEC CCN score (0.246), the ovarian cancer line 265 A2780 had a high Testicular Germ Cell Tumors (TGCT) CCN score (0.327), and the prostate 266 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 cancer line PC-3 had a high bladder cancer (BLCA) score (0.307) (Supp Tab 1). We reasoned 267 that if SK-OV-3, A2780 and PC-3 were classified most strongly as UCEC, TGCT and BLCA, 268 respectively, then they would express proteins that are indicative of these cancer types. 269 First, we measured the expression of the uterine-associated transcription factor 270 HOXB643,44, and the UCEC serous ovarian tumor biomarker WT145 in SK-OV-3, in the OV cell 271 line Caov-4, and in the UCEC cell line HEC-59. We chose Caov-4 as our positive control for OV 272 biomarker expression because it was determined by our analysis and others11,15 to be a good 273 model of OV. Likewise, we chose HEC-59 to be a positive control for UCEC. We found that SK-274 OV-3 has a small percentage (5%) of cells that expressed the uterine marker HOXB6 and a 275 large proportion (73%) of cells that expressed WT1 (Fig 3A). In contrast, no Caov-4 cells 276 expressed HOXB6, whereas 85% of cells expressed WT1. This suggests that SK-OV-3 exhibits 277 both biomarkers of ovarian tumor and uterine tissue. From our computational analysis and 278 experimental validation, SK-OV-3 is most likely an endometrioid subtype of ovarian cancer. This 279 result is also consistent with prior classification of SK-OV-346, and the fact that SK-OV-3 lacks 280 p53 mutations, which is prevalent in high-grade serous ovarian cancer47, and it harbors an 281 endometrioid-associated mutation in ARID1A11,46,48. Next, we measured the expression of 282 markers of OV and germ cell cancers (LIN28A49) in the OV-annotated cell line A2780, which 283 received a high TCGT CCN score. We found that 54% of A2780 cells expressed LIN28A 284 whereas it was not detected in Caov-4 (Fig 3B). The OV marker WT1 was also expressed in 285 fewer A2780 cells as compared to Caov-4 (48% vs 85%), which suggests that A2780 could be a 286 germ cell derived ovarian tumor. Taken together, our results suggest that SK-OV-3 and A2780 287 could represent OV subtypes of that are not well represented in TCGA training data, which 288 resulted in a low OV score and higher CCN score in other categories. 289 Lastly, we examined PC-3, annotated as a PRAD cell line but classified to be most 290 similar to BLCA. We found that 30% of the PC-3 cells expressed PPARG, a contributor to 291 urothelial differentiation50 that is not detected in the PRAD Vcap cell line but is highly expressed 292 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 in the BLCA RT4 cell line (Fig 3C). PC-3 cells also expressed the PRAD biomarker FOLH151 293 suggesting that PC-3 has an PRAD origin and gained urothelial or luminal characteristics 294 through the derivation process. In short, our limited experimental data support the CCN 295 classification results. 296 297 Subtype classification of cancer cell lines 298 Next, we explored the subtype classification of CCLs from three general tumor types in 299 more depth. We focused our subtype visualization (Fig 4A-C) on CCL models with general CCN 300 score above 0.1 in their nominal cancer type as this allowed us to analyze those models that fell 301 below the general threshold but were classified as a specific sub-type (Supp Tab 1-2). 302 Focusing first on UCEC, the histologically defined subtypes of UCEC, endometrioid and serous, 303 differ in prevalence, molecular properties, prognosis, and treatment. For instance, the 304 endometrioid subtype, which accounts for approximately 80% of uterine cancers, retains 305 estrogen receptor and progesterone receptor status and is responsive towards progestin 306 therapy52,53. Serous, a more aggressive subtype, is characterized by the loss of estrogen and 307 progesterone receptor and is not responsive to progestin therapy52,53. CCN classified the 308 majority of the UCEC cell lines as serous except for JHUEM-1 which is classified as mixed, with 309 similarities to both endometrioid and serous (Fig 4A). The preponderance CCLE lines of serous 310 versus endometroid character may be due to properties of serous cancer cells that promote 311 their in vitro propagation, such as upregulation of cell adhesion transcriptional programs54. 312 Some of our subtype classification results are consistent with prior observations. For example, 313 HEC-1A, HEC-1B, and KLE were previously characterized as type II endometrial cancer, which 314 includes a serous histological subtype55. On the other hand, our subtype classification results 315 contradict prior observations in at least one case. For instance, the Ishikawa cell line was 316 derived from type I endometrial cancer (endometrioid histological subtype)55,56, however CCN 317 classified a derivative of this line, Ishikawa 02 ER-, as serous. The high serous CCN score 318 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 could result from a shift in phenotype of the line concomitant with its loss of estrogen receptor 319 (ER) as this is a distinguishing feature of type II endometrial cancer (serous histological 320 subtype)52. Taken together, these results indicate a need for more endometroid-like CCLs. 321 Next, we examined the subtype classification of Lung Squamous Cell Carcinoma 322 (LUSC) and Lung adenocarcinoma (LUAD) cell lines (Fig 4B-C). All the LUSC lines with at least 323 one subtype classification had an underlying primitive subtype classification. This is consistent 324 either with the ease of deriving lines from tumors with a primitive character, or with a process by 325 which cell line derivation promotes similarity to more primitive subtype, which is marked by 326 increased cellular proliferation28. Some of our results are consistent with prior reports that have 327 investigated the resemblance of some lines to LUSC subtypes. For example, HCC-95, 328 previously been characterized as classical28,57, had a maximum CCN score in the classical 329 subtype (0.429) . Similarly, LUDLU-1 and EPLC-272H, previously reported as classical57 and 330 basal57 respectively, had maximal tumor subtype CCN scores for these sub-types (0.323 and 331 0.256) (Fig 4B, Supp Tab 2) despite classified as Unknown. Lastly, the LUAD cell lines that 332 were classified as a subtype were either classified as proximal inflammation or proximal 333 proliferation (Fig 4C). RERF-LC-Ad1 had the highest general classification score and the 334 highest proximal inflammation subtype classification score. Taken together, these subtype 335 classification results have revealed an absence of cell lines models for basal and secretory 336 LUSC, and for the Terminal respiratory unit (TRU) LUAD subtype. 337 338 Cancer cell lines’ popularity and transcriptional fidelity 339 Finally, we sought to measure the extent to which cell line transcriptional fidelity related 340 to model prevalence. We used the number of papers in which a model was mentioned, 341 normalized by the number of years since the cell line was documented, as a rough 342 approximation of model prevalence. To explore this relationship, we plotted the normalized 343 citation count versus general classification score, labeling the highest cited and highest 344 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 classified cell lines from each general tumor type (Fig 4D). For most of the general tumor types, 345 the highest cited cell line is not the highest classified cell line except for Hep G2, AGS and ML-346 1, representing liver hepatocellular carcinoma (LIHC), stomach adenocarcinoma (STAD), and 347 thyroid carcinoma (THCA), respectively. On the other hand, the general scores of the highest 348 cited cell lines representing BLCA (T24), BRCA (MDA-MB-231), and PRAD (PC-3) fall below 349 the classification threshold of 0.25. Notably, each of these tumor types have other lines with 350 scores exceeding 0.5, which should be considered as more faithful transcriptional models when 351 selecting lines for a study (Supp Tab 1 and 352 http://www.cahanlab.org/resources/cancerCellNet_results/). 353 354 Evaluation of patient derived xenografts 355 Next, we sought to evaluate a more recent class of cancer models: PDX. To do so, we 356 subjected the RNA-seq expression profiles of 415 PDX models from 13 different types of cancer 357 types generated previously20 to CCN. Similar to the results of CCLs, the PDXs exhibited a wide 358 range of classification scores (Fig 5A, Supp Tab 3). By categorizing the CCN scores of PDX 359 based on the proportion of samples associated with each tumor type that were correctly 360 classified, we found that SARC, SKCM, COAD_READ and BRCA have higher proportion of 361 correctly classified PDX than those of other cancer categories (Fig 5B). In contrast to CCLs, we 362 found a higher proportion of correctly classified PDX in STAD, PAAD and KIRC (Fig 5B). 363 However, similar to CCLs, no ESCA PDXs were classified as such. This held true when we 364 performed subtype classification on PDX samples: none of the PDX in ESCA were classified as 365 any of the ESCA subtypes (Supp Tab 4). UCEC PDXs had both endometrioid subtypes, serous 366 subtypes, and mixed subtypes, which provided a broader representation than CCLs (Fig 5C). 367 Several LUSC PDXs that were classified as a subtype were also classified as Head and Neck 368 squamous cell carcinoma (HNSC) or mix HNSC and LUSC (Fig 5D). This could be due to the 369 similarity in expression profiles of basal and classical subtypes of HNSC and LUSC28,58, which is 370 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 consistent with the observation that these PDXs were also subtyped as classical. No LUSC 371 PDXs were classified as the secretory subtype. In contrast to LUAD CCLs, four of the five LUAD 372 PDXs with a discernible sub-type were classified as proximal inflammatory (Fig 5E). On the 373 other hand, similar to the CCLs, there were no TRU subtypes in the LUAD PDX cohort. In 374 summary, we found that while individual PDXs can reach extremely high transcriptional fidelity 375 to both general tumor types and subtypes, many PDXs were not classified as the general tumor 376 type from which they originated. 377 378 Evaluation of GEMMs 379 Next, we used CCN to evaluate GEMMs of six general tumor types from nine studies for 380 which expression data was publicly available59–67. As was true for CCLs and PDXs, GEMMs 381 also had a wide range of CCN scores (Fig 6A, Supp Tab 5). We next categorized the CCN 382 scores based on the proportion of samples associated with each tumor type that were correctly 383 classified (Fig 6B). In contrast to LGG CCLs, LGG GEMMs, generated by Nf1 mutations 384 expressed in different neural progenitors in combination with Pten deletion66, consistently were 385 classified as LGG (Fig 6A-B). The GEMM dataset included multiple replicates per model, which 386 allowed us to examine intra-GEMM variability. Both at the level of CCN score and at the level of 387 categorization, GEMMs were invariant. For example, replicates of UCEC GEMMs driven by 388 Prg(cre/+)Pten(lox/lox) received almost identical general CCN scores (Fig 6C, Supp Tab 6). 389 GEMMs sharing genotypes across studies, such as LUAD GEMMs driven by Kras mutation and 390 loss of p5359,65,67, also received similar general and subtype classification scores (Fig 6A,B,E). 391 Next, we explored the extent to which genotype impacted subtype classification in 392 UCEC, LUSC, and LUAD. Prg(cre/+)Pten(lox/lox) GEMMs had a mixed subtype classification of 393 both serous and endometrioid, consistent with the fact that Pten loss occurs in both subtypes 394 (albeit more frequently in endometrioid). We also analyzed Prg(cre/+)Pten(lox/lox)Csf3r-/- 395 GEMMs. Polymorphonuclear neutrophils (PMNs), which play anti-tumor roles in endometrioid 396 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 cancer progression, are depleted in these animals. Interestingly, Prg(cre/+)Pten(lox/lox)Csf3r-/- 397 GEMMs had a serous subtype classification, which could be explained by differences in PMN 398 involvement in endometrioid versus serous uterine tumor development that are reflected in the 399 respective transcriptomes of the TCGA UCEC training data. We note that the tumor cells were 400 sorted prior to RNA-seq and thus the shift in subtype classification is not due to contamination of 401 GEMMs with non-tumor components. In short, this analysis supports the argument that tumor-402 cell extrinsic factors, in this case a reduction in anti-tumor PMNs, can shift the transcriptome of 403 a GEMM so that it more closely resembles a serous rather than endometrioid subtype. 404 The LUSC GEMMs that we analyzed were Lkb1fl/fl and they either overexpressed of 405 Sox2 (via two distinct mechanisms) or were also Ptenfl/fl 65. We note that the eight lenti-Sox2-406 Cre-infected;Lkb1fl/fl and Rosa26LSL-Sox2-IRES-GFP;Lkb1fl/fl samples that classified as 407 'Unknown' had LUSC CCN scores only modestly lower than the decision threshold (Fig 6D) 408 (mean CCN score = 0.217). Thirteen out of the 17 of the Sox2 GEMMs classified as the 409 secretory subtype of LUSC. The consistency is not surprising given both models overexpress 410 Sox2 and lose Lkb1. On the other hand, the Lkb1fl/fl;Ptenfl/fl GEMMs had substantially lower 411 general LUSC CCN scores and our subtype classification indicated that this GEMM was mostly 412 classified as 'Unknown', in contrast to prior reports suggesting that it is most similar to a basal 413 subtype68. None of the three LUSC GEMMs have strong classical CCN scores. Most of the 414 LUAD GEMMs, which were generated using various combinations of activating Kras mutation, 415 loss of Trp53, and loss of Smarca4L59,65,67, were correctly classified (Fig 6E). Those that were 416 not classified have modestly lower CCN score than the decision threshold (mean CCN score = 417 0.214) . There were no substantial differences in general or subtype classification across driver 418 genotypes. Although the sub-type of all LUAD GEMMs was 'Unknown', the subtypes tended to 419 have a mixture of high CCN proximal proliferation, proximal inflammation and TRU scores. 420 Taken together, this analysis suggests that there is a degree of similarity, and perhaps plasticity 421 between the primitive and secretory (but not basal or classical) subtypes of LUSC. On the other 422 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 hand, while the LUAD GEMMs classify strongly as LUAD, they do not have strong particular 423 subtype classification -- a result that does not vary by genotype. 424 425 Evaluation of Tumoroids 426 Lastly, we used CCN to assess a relatively novel cancer model: tumoroids. We 427 downloaded and assessed 131 distinct tumoroid expression profiles spanning 13 cancer 428 categories from The NCI Patient-Derived Models Repository (PDMR)69 and from three individual 429 studies70–72 (Fig 7A, Supp Tab 7). We note that several categories have three or fewer samples 430 (BRCA, CESC, KIRP, OV, LIHC, and BLCA from PDMR). Among the cancer categories 431 represented by more than three samples, only LUSC and PAAD have fewer than 50% classified 432 as their annotated label (Fig 7B). In contrast to GBM CCLs, all three induced pluripotent stem 433 cell-derived GBM tumoroids72 were classified as GBM with high CCN scores (mean = 0.53). To 434 further characterize the tumoroids, we performed subtype classification on them (Supp Tab 8). 435 UCEC tumoroids from PDMR contains a wide range of subtypes with two endometrioid, two 436 serous and one mixed type (Fig 7C). On the other hand, LUSC tumoroids appear to be 437 predominantly of classical subtypes with one tumoroid classified as a mix between classical and 438 primitive (Fig 7D). Lastly, similar to the CCL and PDX counterparts, LUAD tumoroids are 439 classified as proximal inflammatory and proximal proliferation with no tumoroids classified as 440 TRU subtype (Fig 7E). 441 442 Comparison of CCLs, PDXs, GEMMs and tumoroids 443 Finally, we sought to estimate the comparative transcriptional fidelity of the four cancer 444 models modalities. We compared the general CCN scores of each model on a per tumor type 445 basis (Fig 8). In the case of GEMMs, we used the mean classification score of all samples with 446 shared genotypes. We also used mean classification of technical replicates found in LIHC 447 tumoroids70. We evaluated models based on both the maximum CCN score, as this represents 448 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 the potential for a model class, and the median CCN score, as this indicates the current overall 449 transcriptional fidelity of a model class. PDXs achieved the highest CCN scores in three (UCEC, 450 PAAD, LUAD) out of the five cancer categories in which all four modalities were available (Fig 451 8), despite having low median CCN scores. Notably, PDXs have a median CCN score above 452 the 0.25 threshold in PAAD while none of the other three modalities have any samples above 453 the threshold. In LIHC, the highest CCN score for PDX (0.9) is only slightly lower than the 454 highest CCN score for tumoroid (0.91). This suggest that certain individual PDXs most closely 455 mimic the transcriptional state of native patient tumors despite a portion of the PDXs having low 456 CCN scores. Similarly, while the majority of the CCLs have low CCN scores, several lines 457 achieve high transcriptional fidelity in LUSC, LUAD and LIHC (Fig 8). Collectively, GEMMs and 458 tumoroids had the highest median CCN scores in four of the five model classes (LUSC and 459 LUAD for GEMMs and UCEC and LIHC for tumoroids). Notably, both of the LIHC tumoroids 460 achieved CCN scores on par with patient tumors (Fig 8). In brief, this analysis indicates that 461 PDXs and CCLs are heterogenous in terms of transcriptional fidelity, with a portion of the 462 models highly mimicking native tumors and the majority of the models having low transcriptional 463 fidelity (with the exception of PAAD for PDXs). On the other hand, GEMMs and tumoroids 464 displayed a consistently high fidelity across different models. 465 Because the CCN score is based on a moderate number of gene features (i.e. 1,979 466 gene pairs consisting of 1,689 unique genes) relative to the total number of protein-coding 467 genes in the genome, it is possible that a cancer model with a high CCN score might not have a 468 high global similarity to a naturally occurring tumor. Therefore, we also calculated the GRN 469 status, a metric of the extent to which tumor-type specific gene regulatory network is 470 established21, for all models (Supp Fig 4). We observed high level of correlation between the 471 two similarity metrics, which suggests that although CCN classifies on a selected set of genes, 472 its scores are highly correlated with global assessment of transcriptional similarity. 473 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 We also sought to compare model modalities in terms of the diversity of subtypes that 474 they represent (Supp Fig 5). As a reference, we also included in this analysis the overall 475 subtype incidence, as approximated by incidence in TCGA. Replicates in GEMMs and 476 tumoroids were averaged into one classification profile. In models of UCEC, there is a notable 477 difference in endometroid incidence, and the proportion of models classified as endometroid, 478 with PDX and tumoroids having any representatives (Supp Fig 5). All of the CCL, GEMM, and 479 tumoroid models of PAAD have an unknown subtype classification and no correct general 480 classification. However, the majority of PDXs are subtyped as either a mixture of basal and 481 classical, or classical alone. LUAD have proximal inflammation and proximal proliferation 482 subtypes modelled by CCLs and PDX (Supp Fig 5). Likewise, LUSC have basal, classical and 483 primitive subtypes modelled by CCLs and PDXs, and secretory subtype modelled by GEMMs 484 exclusively (Supp Fig 5). Taken together, these results demonstrate the need to carefully select 485 different model systems to more suitably model certain cancer subtypes. 486 487 DISCUSSION 488 A major goal in the field of cancer biology is to develop models that mimic naturally occurring 489 tumors with enough fidelity to enable therapeutic discoveries. However, methods to measure 490 the extent to which cancer models resemble or diverge from native tumors are lacking. This is 491 especially problematic now because there are many existing models from which to choose, and 492 it has become easier to generate new models. Here, we present CancerCellNet (CCN), a 493 computational tool that measures the similarity of cancer models to 22 naturally occurring tumor 494 types and 36 subtypes. While the similarity of CCLs to patient tumors has already been 495 explored in previous work, our tool introduces the capability to assess the transcriptional fidelity 496 of PDXs, GEMMs, and tumoroids. Because CCN is platform- and species-agnostic, it 497 represents a consistent platform to compare models across modalities including CCLs, PDXs, 498 GEMMs and tumoroids. Here, we applied CCN to 657 cancer cell lines, 415 patient derived 499 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 xenografts, 26 distinct genetically engineered mouse models and 131 tumoroids. Several 500 insights emerged from our computational analyses that have implications for the field of cancer 501 biology. 502 First, PDXs have the greatest potential to achieve transcriptional fidelity with three out of 503 five general tumor types for which data from all modalities was available, as indicated by the 504 high scores of individual PDXs. Notably PDXs are the only modality with samples classified as 505 PAAD. At the same time, the median CCN scores of PDXs were lower than that of GEMMs and 506 tumoroids in the other four tumor types. It is unclear what causes such a wide range of CCN 507 scores within PDXs. We suspect that some PDXs might have undergone selective pressures in 508 the host that distort the progression of genomic alterations away from what is observed in 509 natural tumor73. Future work to understand this heterogeneity is important so as to yield 510 consistently high fidelity PDXs, and to identify intrinsic and host-specific factors that so 511 powerfully shape the PDX transcriptome. 512 Second, in general GEMMs and tumoroids have higher median CCN scores than those 513 of PDXs and CCLs. This is also consistent with that fact that GEMMs are typically derived by 514 recapitulating well-defined driver mutations of natural tumors, and thus this observation 515 corroborates the importance of genetics in the etiology of cancer74. Moreover, in contrast to 516 most PDXs, GEMMs are typically generated in immune replete hosts. Therefore, the higher 517 overall fidelity of GEMMs may also be a result of the influence of a native immune system on 518 GEMM tumors75. The high median CCN scores of tumoroids can be attributed to several factors 519 including the increased mechanical stimuli and cell-cell interactions that come from 3D self-520 organizing cultures76,77. 521 Third, we have found that none of the samples that we evaluated here are 522 transcriptionally adequate models of ESCA. This may be due to an inherent lability of the ESCA 523 transcriptome that is often preceded by a metaplasia that has obscured determining its cell 524 type(s) of origin78. Therefore, this tumor type requires further attention to derive new models. 525 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 Fourth, we found that in several tumor types, GEMMs tend to reflect mixtures of 526 subtypes rather than conforming strongly to single subtypes. The reasons for this are not clear 527 but it is possible that in the cases that we examined the histologically defined subtypes have a 528 degree of plasticity that is exacerbated in the murine host environment. 529 Lastly, we recognize that many CCLs are not classified as their annotated labels. While 530 we have suggested that the lack of immune component is not a major confounder, we suspect 531 that the CCLs could undergo genetic divergence due to high number of passages, 532 chemotherapy before biopsy, culture condition and genetic instability79–82, which could all be 533 factors that drive CCLs away from their labelled tumors. 534 Currently, there are several limitations to our CCN tool, and caveats to our analyses 535 which indicate areas for future work and improvement. First, CCN is based on transcriptomic 536 data but other molecular readouts of tumor state, such as profiles of the proteome83, 537 epigenome84, non-coding RNA-ome84, and genome74 would be equally, if not more important, to 538 mimic in a model system. Therefore, it is possible that some models reflect tumor behavior well, 539 and because this behavior is not well predicted by transcriptome alone, these models have 540 lower CCN scores. To both measure the extent that such situations exist, and to correct for 541 them, we plan in the future to incorporate other omic data into CCN so as to make more 542 accurate and integrated model evaluation possible. As a first step in this direction, we plan to 543 incorporate DNA methylation and genomic sequencing data as additional features for our 544 Random forest classifier as this data is becoming more readily available for both training and 545 cancer models. We expect that this will allow us to both refine our tumor subtype categories and 546 it will enable more accurate predictions of how models respond to perturbations such as drug 547 treatment. 548 A second limitation is that in the cross-species analysis, CCN implicitly assumes that 549 homologs are functionally equivalent. The extent to which they are not functionally equivalent 550 determines how confounded the CCN results will be. This possibility seems to be of limited 551 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 consequence based on the high performance of the normal tissue cross-species classifier and 552 based on the fact that GEMMs have the highest median CCN scores (in addition to tumoroids). 553 A third caveat to our analysis is that there were many fewer distinct GEMMs and 554 tumoroids than CCLs and PDXs. As more transcriptional profiles for GEMMs and tumoroids 555 emerge, this comparative analysis should be revisited to assess the generality of our results. 556 Finally, the TCGA training data is made up of RNA-Seq from bulk tumor samples, which 557 necessarily includes non-tumor cells, whereas the CCLs are by definition cell lines of tumor 558 origin. Therefore, CCLs theoretically could have artificially low CCN scores due to the presence 559 of non-tumor cells in the training data. This problem appears to be limited as we found no 560 correlation between tumor purity and CCN score in the CCLE samples. However, this problem 561 is related to the question of intra-tumor heterogeneity. We demonstrated the feasibility of using 562 CCN and single cell RNA-seq data to refine the evaluation of cancer cell lines contingent upon 563 availability of scRNA-seq training data. As more training single cell RNA-seq data accrues, CCN 564 would be able to not only evaluate models on a per cell type basis, but also based on cellular 565 composition. 566 We have made the results of our analyses available online so that researchers can 567 easily explore the performance of selected models or identify the best models for any of the 22 568 general tumor types and the 36 subtypes presented here. To ensure that CCN is widely 569 available we have developed a free web application, which performs CCN analysis on user-570 uploaded data and allows for direct comparison of their data to the cancer models evaluated 571 here. We have also made the CCN code freely available under an Open Source license and as 572 an easily installed R package, and we are actively supporting its further development. Included 573 in the web application are instructions for training CCN and reproducing our analysis. The 574 documentation describes how to analyze models and compare the results to the panel of 575 models that we evaluated here, thereby allowing researchers to immediately compare their 576 models to the broader field in a comprehensive and standard fashion. 577 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 578 Online Methods 579 Training General CancerCellNet Classifier 580 To generate training data sets, we downloaded 8,991 patient tumor RNA-seq expression 581 count matrix and their corresponding sample table across 22 different tumor types from TCGA 582 using TCGAWorkflowData, TCGAbiolinks85 and SummarizedExperiment86 packages. We used 583 all the patient tumor samples for training the general CCN classifier. We limited training and 584 analysis of RNA-seq data to the 13,142 genes in common between the TCGA dataset and all 585 the query samples (CCLs, PDXs, GEMMs, and tumoroids). To train the top pair Random forest 586 classifier, we used a method similar to our previous method23. CCN first normalized the training 587 counts matrix by down-sampling the counts to 500,000 counts per sample. To significantly 588 reduce the execution time and memory of generating gene pairs for all possible genes, CCN 589 then selected n up-regulated genes, n down-regulated genes and n least differentially 590 expressed genes (CCN training parameter nTopGenes = n) for each of the 22 cancer 591 categories using template matching87 as the genes to generate top scoring gene pairs. In short, 592 for each tumor type, CCN defined a template vector that labelled the training tumor samples in 593 cancer type of interest as 1 and all other tumor samples as 0 CCN then calculated the Pearson 594 correlation coefficient between template vector and gene expressions for all genes. The genes 595 with strong match to template as either upregulated or downregulated had large absolute 596 Pearson correlation coefficient. CCN chose the upregulated, downregulated and least 597 differentially expressed genes based on the magnitude of Pearson correlation coefficient. 598 After CCN selected the genes for each cancer type, CCN generated gene pairs among 599 those genes. Gene pair transformation was a method inspired by the top-scoring pair classifier88 600 to allow compatibility of classifier with query expression profiles that were collected through 601 different platforms (e.g. microarray query data applied to RNA-seq training data). In brief, the 602 gene pair transformation compares 2 genes within an expression sample and encodes the 603 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 “gene1_gene2” gene-pair as 1 if the first gene has higher expression than the second gene. 604 Otherwise, gene pair transformation would encode the gene-pair as 0. Using all the gene pair 605 combinations generated through the gene sets per cancer type, CCN then selected top m 606 discriminative gene pairs (CCN training parameter nTopGenePairs = m) for each category using 607 template matching (with large absolute Pearson correlation coefficient) described above. To 608 prevent any single gene from dominating the gene pair list, we allowed each gene to appear at 609 maximum of three times among the gene pairs selected as features per cancer type. 610 After the top discriminative gene pairs were selected for each cancer category, CCN 611 grouped all the gene pairs together and gene pair transformed the training samples into a binary 612 matrix with all the discriminative gene pairs as row names and all the training samples as 613 column names. Using the binary gene pair matrix, CCN randomly shuffled the binary values 614 across rows then across columns to generate random profiles that should not resemble training 615 data from any of the cancer categories. CCN then sampled 70 random profiles, annotated them 616 as “Unknown” and used them as training data for the “Unknown” category. Using gene pair 617 binary training matrix, CCN constructed a multi-class Random Forest classifier of 2000 trees 618 and used stratified sampling of 60 sample size to ensure balance of training data in constructing 619 the decision trees. 620 To identify the best set of genes and gene-pair parameters (n and m), we used a grid-621 search cross-validation89 strategy with 5 cross-validations at each parameter set. The specific 622 parameters for the final CCN classifier using the function “broadClass_train” in the package 623 cancerCellNet are in Supp Tab 9. The gene-pairs are in Supp Tab 10. 624 625 Validating General CancerCellNet Classifier 626 Two thirds of patient tumor data from each cancer type were randomly sampled as 627 training data to construct a CCN classifier. Based on the training data, CCN selected the 628 classification genes and gene-pairs and trained a classifier. After the classifier was built, 35 629 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 held-out samples from each cancer category were sampled and 40 “Unknown” profiles were 630 generated for validation. The process of randomly sampling training set from 2/3 of all patient 631 tumor data, selecting features based on the training set, training classifier and validating was 632 repeated 50 times to have a more comprehensive assessment of the classifier trained with the 633 optimal parameter set. To test the performance of final CCN on independent testing data, we 634 applied it to 725 profiles from ICGC spanning 6 projects that do not overlap with TCGA (BRCA-635 KR, LIRI-JP, OV-AU, PACA-AU, PACA-CA, PRAD-FR). 636 637 Selecting Decision Thresholds 638 Our strategy for selecting a decision threshold was to find the value that maximizes the 639 average Macro F1 measure90 for each of the 50 cross-validations that were performed with the 640 optimal parameter set, testing thresholds between 0 and 1 with a 0.01 increment. The F1 641 measure is defined as: 642 𝑀𝑎𝑐𝑟𝑜 𝐹1 = 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 643 We selected the most commonly occurring threshold above 0.2 that maximized the average 644 Macro F1 measure across the 50 cross-validations as the decision threshold for the final 645 classifier (threshold = 0.25). The same approach was applied for the subtype classifiers. The 646 thresholds and the corresponding average precision, recall and F1 measures are recorded in 647 (Supp Tab 11). 648 649 Classifying Query Data into General Cancer Categories 650 We downloaded the RNA-seq cancer cell lines expression profiles and sample table 651 from (https://portals.broadinstitute.org/ccle/data), and microarray cancer cell lines expression 652 profiles and sample table from Barretina et al 37. We extracted two WT control NCCIT RNA-seq 653 expression profiles from Grow et al91. We received PDX expression estimates and sample 654 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 annotations from the authors of Gao et al 20. We gathered GEMM expression profiles from nine 655 different studies59–67. We downloaded tumoroid expression profiles from The NCI Patient-656 Derived Models Repository (PDMR)69 and from three individual studies70–72. To use CCN 657 classifier on GEMM data, the mouse genes from GEMM expression profiles were converted into 658 their human homologs. The query samples were classified using the final CCN classifier. Each 659 query classification profile was labelled as one of the four classification categories: “correct”, 660 “mixed”, “none” and “other” based on classification profiles. If a sample has a CCN score higher 661 than the decision threshold in the labelled cancer category, we assigned that as “correct”. If a 662 sample has CCN score higher than the decision threshold in labelled cancer category and in 663 other cancer categories, we assigned that as “mixed”. If a sample has no CCN score higher 664 than the decision threshold in any cancer category or has the highest CCN score in ‘Unknown’ 665 category, then we assigned it as “none”. If a sample has CCN score higher than the decision 666 threshold in a cancer category or categories not including the labelled cancer category, we 667 assigned it as ”other”. We analyzed and visualized the results using R and R packages 668 pheatmap92 and ggplot293. 669 670 Cross-Species Assessment 671 To assess the performance of cross-species classification, we downloaded 1003 672 labelled human tissue/cell type and 1993 labelled mouse tissue/cell type RNA-seq expression 673 profiles from Github (https://github.com/pcahan1/CellNet). We first converted the mouse genes 674 into human homologous genes. Then we found the intersecting genes between mouse 675 tissue/cell expression profiles and human tissue/cell expression profiles. Limiting the input of 676 human tissue RNA-seq profiles to the intersecting genes, we trained a CCN classifier with all 677 the human tissue/cell expression profiles. The parameters used for the function 678 “broadClass_train” in the package cancerCellNet are in Supp Tab 9. We randomly sampled 75 679 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 samples from each tissue category in mouse tissue/cell data and applied the classifier on those 680 samples to assess performance. 681 682 Cross-Technology Assessment 683 To assess the performance of CCN in applications to microarray data, we gathered 684 6,219 patient tumor microarray profiles across 12 different cancer types from more than 100 685 different projects (Supp Tab 12). We found the intersecting genes between the microarray 686 profiles and TCGA patient RNA-seq profiles. Limiting the input of RNA-seq profiles to the 687 intersecting genes, we created a CCN classifier with all the TCGA patient profiles using 688 parameters for the function “broadClass_train” listed in Supp Tab 9. After the microarray 689 specific classifier was trained, we randomly sampled 60 microarray patient samples from each 690 cancer category and applied CCN classifier on them as assessment of the cross-technology 691 performance in Supp Fig 2A. The same CCN classifier was used to assess microarray CCL 692 samples Supp Fig 2B. 693 694 Training and validating scRNA-seq Classifier 695 We extracted labelled human melanoma and glioblastoma scRNA-seq expression 696 profiles40,41, and compiled the two datasets excluding 3 cell types T.CD4, T.CD8 and Myeloid 697 due to low number of cells for training. 60 cells from each of the 11 cell types were sampled for 698 training a scRNA-seq classifier. The parameters for training a general scRNA-seq classifier 699 using the function “broadClass_train” are in Supp Tab 9. 25 cells from each of the 11 cell types 700 from the held-out data were selected to assess the single cell classifier. Using maximization of 701 average Macro F1 measure, we selected the decision threshold of 0.255. The gene-pairs that 702 were selected to construct the classifier are in Supp Tab 10. To assess the cross-technology 703 capability of applying scRNA-seq classifier to bulk RNA-seq, we downloaded 305 expression 704 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 profiles spanning 4 purified cell types (B cells, endothelial cells, monocyte/macrophage, 705 fibroblast) from https://github.com/pcahan1/CellNet. 706 707 Training Subtype CancerCellNet 708 We found 11 cancer types (BRCA, COAD, ESCA, HNSC, KIRC, LGG, PAAD, UCEC, 709 STAD, LUAD, LUSC) which have meaningful subtypes based on either histology or molecular 710 profile and have sufficient samples to train a subtype classifier with high AUPR. We also 711 included normal tissues samples from BRCA, COAD, HNSC, KIRC, UCEC to create a normal 712 tissue category in the construction of their subtype classifiers. Training samples were either 713 labelled as a cancer subtype for the cancer of interest or as “Unknown” if they belong to other 714 cancer types. Similar to general classifier training, CCN performed gene pair transformation and 715 selected the most discriminate gene pairs for each cancer subtype. In addition to the gene pairs 716 selected to discriminate cancer subtypes, CCN also performed general classification of all 717 training data and appended the classification profiles of training data with gene pair binary 718 matrix as additional features. The reason behind using general classification profile as additional 719 features is that many general cancer types may share similar subtypes, and general 720 classification profile could be important features to discriminate the general cancer type of 721 interest from other cancer types before performing finer subtype classification. The specific 722 parameters used to train individual subtype classifiers using “subClass_train” function of 723 CancerCellNet package can be found in Supp Tab 9 and the gene pairs are in Supp Tab 10. 724 725 Validating Subtype CancerCellNet 726 Similar to validating general class classifier, we randomly sampled 2/3 of all samples in 727 each cancer subtype as training data and sampled an equal amount across subtypes in the 1/3 728 held-out data for assessing subtype classifiers. We repeated the process 20 times for more 729 comprehensive assessment of subtype classifiers. 730 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 Classifying Query Data into Subtypes 731 We assigned subtype to query sample if the query sample has CCN score higher than 732 the decision threshold. The table of decision threshold for subtype classifiers are in Supp Tab 733 11. If no CCN scores exceed the decision threshold in any subtype or if the highest CCN score 734 is in ‘Unknown’ category, then we assigned that sample as ‘Unknown’. Analysis was performed 735 in R and visualizations were generated with the ComplexHeatmap package94. 736 737 Cells culture, Immunohistochemistry and histomorphometry 738 Caov-4 (ATCC® HTB-76™), SK-OV-3(ATCC® HTB-77™), RT4 (ATCC® HTB-2™), and 739 NCCIT(ATCC® CRL-2073™) cell lines were purchased from ATCC. HEC-59 (C0026001) and 740 A2780 (93112519-1VL) were obtained from Addexbio Technologies and Sigma-Aldrich. Vcap 741 and PC-3. SK-OV-3, Vcap, and RT4 were cultured in Dulbecco's Modified Eagle Medium 742 (DMEM, high glucose, 11960069, Gibco) with 1% Penicillin-Streptomycin-Glutamine ( 743 10378016, Life Technologies); Caov-4, PC-3, NCCIT, and A2780 were cultured using RPMI-744 1640 medium (11875093, Gibco) while HEC-59 was in Iscove's Modified Dulbecco's Medium 745 (IMDM, 12440053, Gibco). Both media were supplemented with 1% Penicillin-Streptomycin 746 (15140122, Gibco). All medium included 10% Fetal Bovine Serum (FBS). 747 Cells cultured in 48-well plate were washed twice with PBS and fixed in 10% buffered 748 formalin for 24 hrs at 4 °C. Immunostaining was performed using a standard protocol. Cells 749 were incubated with primary antibodies to goat HOXB6 (10 µg/mL, PA5-37867, Invitrogen), 750 mouse WT1(10 µg/mL, MA1-46028, Invitrogen), rabbit PPARG (1:50, ABN1445, Millipore), 751 mouse FOLH1(10 µg/mL, UM570025, Origene), and rabbit LIN28A (1:50, #3978, Cell Signaling) 752 in Antibody Diluent (S080981-2, DAKO), at 4 °C overnight followed with three 5 min washes in 753 TBST. The slides were then incubated with secondary antibodies conjugated with fluorescence 754 at room temperature for 1 h while avoiding light followed with three 5 min washes in TBST and 755 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 nuclear stained with mounting medium containing DAPI. Images were captured by Nikon 756 EcLipse Ti-S, DS-U3 and DS-Qi2. 757 Histomorphometry was performed using ImageJ (Version 2.0.0-rc-69/1.52i). % 758 N.positive cells was calculated by the percentage of the number of positive stained cells divided 759 by the number of DAPI-positive nucleus within three of randomly chosen areas. The data were 760 expressed as means ± SD. 761 762 Tumor Purity Analysis 763 We used the R package ESTIMATE95 to calculate the ESTIMATE scores from TCGA 764 tumor expression profiles that we used as training data for CCN classifier. To calculate tumor 765 purity we used the equation described in YoshiHara et al., 201395: 766 Tumour purity = cos (0.6049872018 + 0.0001467884 × ESTIMATE score) 767 768 Extracting Citation Counts 769 We used the R package RISmed96 to extract the number of citations for each cell line 770 through query search of “cell line name[Text Word] AND cancer[Text Word]” on PubMed. The 771 citation counts were normalized by dividing the citation counts with the number of years since 772 first documented. 773 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡𝑠 = 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡𝑠 # 𝑦𝑒𝑎𝑟𝑠 𝑠𝑖𝑛𝑐𝑒 𝑓𝑖𝑟𝑠𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑒𝑑 774 775 GRN construction and GRN Status 776 GRN construction was extended from our previous method21. 80 samples per cancer 777 type were randomly sampled and normalized through down sampling as training data for the 778 CLR GRN construction algorithm. Cancer type specific GRNs were identified by determining the 779 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 differentially expressed genes per each cancer type and extracting the subnetwork using those 780 genes. 781 To extend the original GRN status algorithm21 across different platforms and species, we 782 devised a rank-based GRN status algorithm. Like the original GRN status, rank based GRN 783 status is a metric of assessing the similarity of cancer type specific GRN between training data 784 in the cancer type of interest and query samples. Hence, high GRN status represents high level 785 of establishment or similarity of the cancer specific GRN in the query sample compared to those 786 of the training data. The expression profiles of training data and query data were transformed 787 into rank expression profiles by replacing the expression values with the rank of the expression 788 values within a sample (highest expressed gene would have the highest rank and lowest 789 expressed genes would have a rank of 1). Cancer type specific mean and standard deviation of 790 every gene’s rank expression were learned from training data. The modified Z-score values for 791 genes within cancer type specific GRN were calculated for query sample’s rank expression 792 profiles to quantify how dissimilar the expression values of genes in query sample’s cancer type 793 specific GRN compared to those of the reference training data: 794 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)XYZ = [ 0, 𝑖𝑓 𝑍𝑠𝑐𝑜𝑟𝑒 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒 𝑖𝑠 𝑓𝑜𝑢𝑛𝑑 𝑡𝑜 𝑏𝑒 𝑢𝑝𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑒𝑑 0, 𝑖𝑓 𝑍𝑠𝑐𝑜𝑟𝑒 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒 𝑖𝑠 𝑓𝑜𝑢𝑛𝑑 𝑡𝑜 𝑏𝑒 𝑑𝑜𝑤𝑛𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑒𝑑 𝑎𝑏𝑠(𝑍𝑠𝑐𝑜𝑟𝑒), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 795 If a gene in the cancer type specific GRN is found to be upregulated in the specific 796 cancer type relative to other cancer types, then we would consider query sample’s gene to be 797 similar if the ranking of the query sample’s gene is equal to or greater than the mean ranking of 798 the gene in training sample. As a result of similarity, we assign that gene of a Z-score of 0. The 799 same principle applies to cases where the gene is downregulated in cancer specific subnetwork. 800 GRN status for query sample is calculated as the weighted mean of the 801 (1000 − 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)XYZ) across genes in cancer type specific GRN. 1000 is an arbitrary 802 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 large number, and larger dissimilarity between query’s cancer type specific GRN indicate high 803 Z-scores for the GRN genes and low GRN status. 804 𝑅𝐺𝑆 = e(1000 − 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)XYZ)𝑤𝑒𝑖𝑔ℎ𝑡fghg i h ijk 805 𝐺𝑅𝑁 𝑆𝑡𝑎𝑡𝑢𝑠 = 𝑅𝐺𝑆 ∑ 𝑤𝑒𝑖𝑔ℎ𝑡fghg ihijk 806 The weight of individual genes in the cancer specific network is determined by the 807 importance of the gene in the Random Forest classifier. Finally, the GRN status gets normalized 808 with respect to the GRN status of the cancer type of interest and the cancer type with the lowest 809 mean GRN status. 810 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 = 𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 mngop − 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 Xih qrhqgo) 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠qrhqgo sptg ihsgogus) 811 Where “min cancer” represents the cancer type where its training data have the lowest 812 mean GRN status in the cancer type of interest, and 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 Xih qrhqgo) represents the 813 lowest average GRN status in the cancer type of interest. 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠qrhqgo sptg ihsgogus) 814 represents average GRN status of the cancer type of interest in the training data. 815 816 Code availability 817 CancerCellNet code and documentation is available at GitHub: 818 https://github.com/pcahan1/cancerCellNet 819 820 Acknowledgements 821 This work was supported by the National Institutes of Health NCI Ovarian Cancer SPORE 822 P50CA228991 via a Development Research Program award to PC. FWH was supported by a 823 Prostate Cancer Foundation Young Investigator Award, Department of Defense W81XWH-17-824 PCRP-HD (F.W.H.), the National Institutes of Health/National Cancer Institute P20 CA233255-825 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 01 (F.W.H.) U19 CA214253 (F.W.H.). We would like to thank John Powers, Hao Zhu, Tian-Li 826 Wang, Charles Eberhart, and Kaloyan Tsanov for comments on the manuscript and helpful 827 discussions. Some figures were created in part with Biorender.com. 828 829 FIGURE LEGENDS 830 Fig. 1 CancerCellNet (CCN) workflow, training, and performance. (A) Schematic of CCN 831 usage. CCN was designed to assess and compare the expression profiles of cancer models 832 such as CCLs, PDXs, GEMMs, and tumoroids with native patient tumors. To use trained 833 classifier, CCN inputs the query samples (e.g. expression profiles from CCLs, PDXs, GEMMs, 834 tumoroids) and generates a classification profile for the query samples. The column names of 835 the classification heatmap represent sample annotation and the row names of the classification 836 heatmap represent different cancer types. Each grid is colored from black to yellow representing 837 the lowest classification score (e.g. 0) to highest classification score (e.g. 1). (B) Schematic of 838 CCN training process. CCN uses patient tumor expression profiles of 22 different cancer types 839 from TCGA as training data. First, CCN identifies n genes that are upregulated, n that are 840 downregulated, and n that are relatively invariant in each tumor type versus all of the others. 841 Then, CCN performs a pair transform on these genes and subsequently selects the most 842 discriminative set of m gene pairs for each cancer type as features (or predictors) for the 843 Random forest classifier. Lastly, CCN trains a multi-class Random Forest classifier using gene-844 pair transformed training data. (C) Parameter optimization strategy. 5 cross-validations of each 845 parameter set in which 2/3 of TCGA data was used to train and 1/3 to validate was used search 846 for the values of n and m that maximized performance of the classifier as measured by area 847 under the precision recall curve (AUPRC). (D) Mean and standard deviation of classifiers based 848 on 50 cross-validations with the optimal parameter set. (E) AUPRC of the final CCN classifier 849 when applied to independent patient tumor data from ICGC. 850 851 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 Fig. 2 Evaluation of cancer cell lines. (A) General classification heatmap of CCLs extracted 852 from CCLE. Column annotations of the heatmap represent the labelled cancer category of the 853 CCLs given by CCLE and the row names of the heatmap represent different cancer categories. 854 CCLs’ general classification profiles are categorized into 4 categories: correct (red), correct 855 mixed (pink), no classification (light green) and other classification (dark green) based on the 856 decision threshold of 0.25. (B) Bar plot represents the proportion of each classification category 857 in CCLs across cancer types ordered from the cancer types with the highest proportion of 858 correct and correct mixed CCLs to lowest proportion. (C) Comparison between SKCM general 859 CCN scores from bulk RNA-seq classifier and SKCM malignant CCN scores from scRNA-seq 860 classifier for SKCM CCLs. (D) Comparison between SARC general CCN scores from bulk RNA-861 seq classifier and CAF CCN scores from scRNA-seq classifier for SKCM CCLs. (E) Comparison 862 between GBM general CCN scores from bulk RNA-seq classifier and GBM neoplastic CCN 863 scores from scRNA-seq classifier for GBM CCLs. (F) Comparison between SARC general CCN 864 scores and CAF CCN scores from scRNA-seq classifier for GBM CCLs. The green lines 865 indicate the decision threshold for scRNA-seq classifier and general classifier. 866 867 Fig. 3 Immunofluorescence of selected cell lines. (A) Classification profiles (left) and IF 868 expression (middle) of Caov-4 (OV positive control), HEC-59 (UCEC positive control) and SK-869 OV-3 for WT1 (OV biomarker) and HOXB6 (uterine biomarker). The bar plots quantify the 870 average percentage of positive cells for WT1 (top-right) and HOXB6 (bottom-right). (B) 871 Classification profiles (left) and IF expression (middle) of Caov-4, NCCIT (germ cell tumor 872 positive control) and A2780 for WT1 and LIN28A (germ cell tumor biomarker). Classification of 873 NCCIT were performed using RNA-seq profiles of WT control NCCIT duplicate from Grow et 874 al91. The bar plots quantify the average percentage of positive cells for WT1 (top-right) and 875 LIN28A (bottom-right). (C) Classification profiles (left) and IF expression (middle) of Vcap 876 (PRAD positive control), RT4 (BLCA positive control) and PC-3 for FOLH1 (prostate biomarker) 877 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 and PPARG (urothelial biomarker). The bar plots quantify the average percentage of positive 878 cells for FOLH1 (top-right) and PPARG (bottom-right). 879 880 Fig. 4 Subtype classification of CCLs and CCL prevalence. The heatmap visualizations 881 represent subtype classification of (A) UCEC CCLs, (B) LUSC CCLs and (C) LUAD CCLs. Only 882 samples with CCN scores > 0.1 in their nominal tumor type are displayed. (D) Comparison of 883 normalized citation counts and general CCN classification scores of CCLs. Labelled cell lines 884 either have the highest CCN classification score in their labelled cancer category or highest 885 normalized citation count. Each citation count was normalized by number of years since first 886 documented on PubMed. 887 888 Fig. 5 Evaluation of patient derived xenografts. (A) General classification heatmap of PDXs. 889 Column annotations represent annotated cancer type of the PDXs, and row names represent 890 cancer categories. (B) Proportion of classification categories in PDXs across cancer types is 891 visualized in the bar plot and ordered from the cancer type with highest proportion of correct and 892 mixed correct classified PDXs to the lowest. Subtype classification heatmaps of (C) UCEC 893 PDXs, (D) LUSC PDXs and (E) LUAD PDXs. Only samples with CCN scores > 0.1 in their 894 nominal tumor type are displayed. 895 896 Fig. 6 Evaluation of genetically engineered mouse models. (A) General classification 897 heatmap of GEMMs. Column annotations represent annotated cancer type of the GEMMs, and 898 row names represent cancer categories. (B) Proportion of classification categories in GEMMs 899 across cancer types is visualized in the bar plot and ordered from the cancer type with highest 900 proportion of correct and mixed correct classified GEMMs to the lowest. Subtype classification 901 heatmap of (C) UCEC GEMMs, (D) LUSC GEMMs and (E) LUAD GEMMs. Only samples with 902 CCN scores > 0.1 in their nominal tumor type are displayed. 903 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 904 Fig. 7 Evaluation of tumoroid models. (A) General classification heatmap of tumoroids. 905 Column annotations represent annotated cancer type of the tumoroids, and row names 906 represent cancer categories. (B) Proportion of classification categories in tumoroids across 907 cancer types is visualized in the bar plot and ordered from the cancer type with highest 908 proportion of correct and mixed correct classified tumoroids to the lowest. Subtype classification 909 heatmap of (C) UCEC tumoroids, (D) LUSC tumoroids and (E) LUAD tumoroids. Only samples 910 with CCN scores > 0.1 in their nominal tumor type are displayed. 911 912 Fig. 8 Comparison of CCLs, PDXs, and GEMMs. Box-and-whiskers plot comparing general 913 CCN scores across CCLs, GEMMs, PDXs of five general tumor types (UCEC, PAAD, LUSC, 914 LUAD, LIHC). 915 916 Supplementary Information 917 Supplementary Figure 1 Assessment of CCN general classifier and subtype classifier. (A) 918 Mean AUPRC of repeated grid-search cross-validation for each parameter grid. (B) Mean and 919 range of CCN classifier’s PR curves from 50 cross validations based on the optimal feature 920 selection parameters n and m. (C) AUPRC of CCN human tissue classifier when applied to 921 mouse tissue data. (D) The schematic of training a subtype classifier in CCN. CCN uses patient 922 tumor expression profiles from cancer of interest as training data. CCN performs gene-pair 923 transformation and selects the most discriminative gene pairs among the cancer subtypes from 924 training data as features. CCN then applies the general classification on training data and uses 925 the general classification profile as features in addition to gene pairs for training a Random 926 Forest classifier. The weight of the general classification profiles as features can be tuned to 927 improve AUPRC. (E) The mean and standard deviation of AUPRC for 11 subtype classifiers 928 based on 20 iterations of random sampling of training and held-out data, training subtype 929 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 classifier using training data, classification of held-out data, and calculation of recall and 930 precision. 931 932 Supplementary Figure 2 Further validation of CCN and classification results. To validate the 933 cross-platform classification performance of CCN, a new classifier specifically trained to classify 934 microarray data was trained using RNA-seq data from TCGA as training data and intersecting 935 genes between RNA-seq data and microarray data. (A) AUPRC of CCN classifier when applied 936 to tumor profiles assayed on microarrays. (B) Classification heatmap of CCLs using microarray 937 expression data. (C) Pearson correlation between CCN scores of CCLE lines generated from 938 RNA-seq data and microarray data. (D) Comparison between CCLs’ CCN scores and the 939 similarity metric from Yu et al15, median correlations of transcriptional profiles between CCLs 940 and TCGA tumors from CCLs’ labelled cancer category. (E) Comparison of mean tumor purity 941 of training data and mean CCN scores of CCLs for each cancer category. 942 943 Supplementary Figure 3 Single-cell classification of SKCM and GBM cell lines. (A) AUPRC of 944 the single-cell classifier when applied to scRNA-seq held-out data. (B) AUPRC of the scRNA-945 seq classifier when applied to purified bulk RNA samples. (C) Single-cell classification of SKCM 946 CCLs. Red bar-plot (top) represents general CCN scores in SARC and blue bar-plot (bottom) 947 represents general CCN scores in SKCM. (D) Single-cell classification of GBM CCLs. Red bar-948 plot (top) represents general CCN scores in SARC and yellow bar-plot (bottom) represents 949 general CCN scores in GBM. 950 951 Supplementary Figure 4 Correlation between cancer type specific network GRN status and 952 general CCN scores. 953 954 955 Supplementary Figure 5 Proportion of cancer subtypes in different cancer models and TCGA 956 tumor data across 11 general cancer types. 957 958 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 959 Supplementary Table 1 General classification profiles of CCLs. 960 961 Supplementary Table 2 Subtype classification profiles of CCLs. 962 963 Supplementary Table 3 General classification profiles of PDXs. 964 965 Supplementary Table 4 Subtype classification profiles of PDXs. 966 967 Supplementary Table 5 General classification profiles of GEMMs 968 969 Supplementary Table 6 Subtype classification profiles of GEMMs. 970 971 Supplementary Table 7 General classification profiles of tumoroids. 972 973 Supplementary Table 8 Subtype classification profiles of tumoroids. 974 975 Supplementary Table 9 Specific parameters used for training of all classifiers. 976 977 Supplementary Table 10 Gene-pairs selected for final training of CCN general, subtype 978 classifiers and single-cell classifier. 979 980 Supplementary Table 11 Decision thresholds and the corresponding precision and recall for 981 the general classifier and subtype classifier. 982 983 Supplementary Table 12 Accessions of tumor microarray data used in validation. 984 985 986 REFERENCES 987 1. Sharma, S. V., Haber, D. A. & Settleman, J. Cell line-based platforms to evaluate 988 the therapeutic efficacy of candidate anticancer agents. Nat. Rev. Cancer 10, 241–989 253 (2010). 990 2. Kersten, K., de Visser, K. E., van Miltenburg, M. H. & Jonkers, J. Genetically 991 engineered mouse models in oncology research and cancer medicine. EMBO Mol. 992 Med. 9, 137–153 (2017). 993 3. Hidalgo, M. et al. Patient-derived xenograft models: an emerging platform for 994 translational cancer research. Cancer Discov. 4, 998–1013 (2014). 995 4. Drost, J. & Clevers, H. Organoids in cancer research. Nat. Rev. Cancer 18, 407–996 418 (2018). 997 5. Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. 998 Nat. Biotechnol. 33, 306–312 (2015). 999 6. Koren, S. et al. PIK3CA(H1047R) induces multipotency and multi-lineage mammary 1000 tumours. Nature 525, 114–118 (2015). 1001 7. DeRose, Y. S. et al. Tumor grafts derived from women with breast cancer 1002 authentically reflect tumor pathology, growth, metastasis and disease outcomes. 1003 Nat. Med. 17, 1514–1520 (2011). 1004 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 8. Sharpless, N. E. & Depinho, R. A. The mighty mouse: genetically engineered 1005 mouse models in cancer drug development. Nat. Rev. Drug Discov. 5, 741–754 1006 (2006). 1007 9. Mouradov, D. et al. Colorectal cancer cell lines are representative models of the 1008 main molecular subtypes of primary cancer. Cancer Res. 74, 3238–3247 (2014). 1009 10. Stuckelberger, S. & Drapkin, R. Precious GEMMs: emergence of faithful models for 1010 ovarian cancer research. J. Pathol. 245, 129–131 (2018). 1011 11. Domcke, S., Sinha, R., Levine, D. A., Sander, C. & Schultz, N. Evaluating cell lines 1012 as tumour models by comparison of genomic profiles. Nat. Commun. 4, 2126 1013 (2013). 1014 12. Jiang, G. et al. Comprehensive comparison of molecular portraits between cell lines 1015 and tumors in breast cancer. BMC Genomics 17 Suppl 7, 525 (2016). 1016 13. Chen, B., Sirota, M., Fan-Minogue, H., Hadley, D. & Butte, A. J. Relating 1017 hepatocellular carcinoma tumor samples and cell lines using gene expression data 1018 in translational research. BMC Med. Genomics 8 Suppl 2, S5 (2015). 1019 14. Vincent, K. M., Findlay, S. D. & Postovit, L. M. Assessing breast cancer cell lines as 1020 tumour models by comparison of mRNA expression profiles. Breast Cancer Res. 1021 17, 114 (2015). 1022 15. Yu, K. et al. Comprehensive transcriptomic analysis of cell lines as models of 1023 primary tumors across 22 tumor types. Nat. Commun. 10, 3574 (2019). 1024 16. Najgebauer, H. et al. CELLector: Genomics-Guided Selection of Cancer In Vitro 1025 Models. Cell Syst. 10, 424–432.e6 (2020). 1026 17. Salvadores, M., Fuster-Tormo, F. & Supek, F. Matching cell lines with cancer type 1027 and subtype of origin via mutational, epigenomic, and transcriptomic patterns. Sci. 1028 Adv. 6, (2020). 1029 18. Guernet, A. & Grumolato, L. CRISPR/Cas9 editing of the genome for cancer 1030 modeling. Methods 121-122, 130–137 (2017). 1031 19. Gargiulo, G. Next-Generation in vivo Modeling of Human Cancers. Front. Oncol. 8, 1032 429 (2018). 1033 20. Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to 1034 predict clinical trial drug response. Nat. Med. 21, 1318–1325 (2015). 1035 21. Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 1036 903–915 (2014). 1037 22. Radley, A. H. et al. Assessment of engineered cells using CellNet and RNA-seq. 1038 Nat. Protoc. 12, 1089–1102 (2017). 1039 23. Tan, Y. & Cahan, P. SingleCellNet: A Computational Tool to Classify Single Cell 1040 RNA-Seq Data Across Platforms and Across Species. Cell Syst. 9, 207–213.e2 1041 (2019). 1042 24. Cancer Genome Atlas Network. Comprehensive molecular characterization of 1043 human colon and rectal cancer. Nature 487, 330–337 (2012). 1044 25. Zhang, J. et al. International Cancer Genome Consortium Data Portal--a one-stop 1045 shop for cancer genomics data. Database (Oxford) 2011, bar026 (2011). 1046 26. Cancer Genome Atlas Network. Comprehensive molecular portraits of human 1047 breast tumours. Nature 490, 61–70 (2012). 1048 27. Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic 1049 subtypes. J. Clin. Oncol. 27, 1160–1167 (2009). 1050 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 28. Wilkerson, M. D. et al. Lung squamous cell carcinoma mRNA expression subtypes 1051 are reproducible, clinically important, and correspond to normal cell types. Clin. 1052 Cancer Res. 16, 4864–4875 (2010). 1053 29. Cancer Genome Atlas Research Network. Electronic address: 1054 andrew_aguirre@dfci.harvard.edu & Cancer Genome Atlas Research Network. 1055 Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer 1056 Cell 32, 185–203.e13 (2017). 1057 30. Cancer Genome Atlas Research Network et al. Integrated genomic characterization 1058 of endometrial carcinoma. Nature 497, 67–73 (2013). 1059 31. Cancer Genome Atlas Research Network et al. Integrated genomic characterization 1060 of oesophageal carcinoma. Nature 541, 169–175 (2017). 1061 32. Cancer Genome Atlas Network. Comprehensive genomic characterization of head 1062 and neck squamous cell carcinomas. Nature 517, 576–582 (2015). 1063 33. Cancer Genome Atlas Research Network. Comprehensive molecular 1064 characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). 1065 34. Verhaak, R. G. W. et al. Integrated genomic analysis identifies clinically relevant 1066 subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, 1067 and NF1. Cancer Cell 17, 98–110 (2010). 1068 35. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of 1069 lung adenocarcinoma. Nature 511, 543–550 (2014). 1070 36. Hu, B. et al. Gastric cancer: Classification, histology and application of molecular 1071 pathology. J. Gastrointest. Oncol. 3, 251–261 (2012). 1072 37. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling 1073 of anticancer drug sensitivity. Nature 483, 603–607 (2012). 1074 38. Medico, E. et al. The molecular landscape of colorectal cancer cell lines unveils 1075 clinically actionable kinase targets. Nat. Commun. 6, 7002 (2015). 1076 39. Park, J.-G. et al. Characteristics of Cell Lines Established from Human Colorectal 1077 Carcinoma. Cancer Res. (1987). 1078 40. Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and 1079 resistance to checkpoint blockade. Cell 175, 984–997.e24 (2018). 1080 41. Darmanis, S. et al. Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at 1081 the Migrating Front of Human Glioblastoma. Cell Rep. 21, 1399–1410 (2017). 1082 42. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in 1083 primary glioblastoma. Science 344, 1396–1401 (2014). 1084 43. Xu, B. et al. Regulation of endometrial receptivity by the highly expressed HOXA9, 1085 HOXA11 and HOXD10 HOX-class homeobox genes. Hum. Reprod. 29, 781–790 1086 (2014). 1087 44. Raines, A. M. et al. Recombineering-based dissection of flanking and paralogous 1088 Hox gene functions in mouse reproductive tracts. Development 140, 2942–2952 1089 (2013). 1090 45. Netinatsunthorn, W., Hanprasertpong, J., Dechsukhum, C., Leetanaporn, R. & 1091 Geater, A. WT1 gene expression as a prognostic marker in advanced serous 1092 epithelial ovarian carcinoma: an immunohistochemical study. BMC Cancer 6, 90 1093 (2006). 1094 46. Kelly, Z. et al. The prognostic significance of specific HOX gene expression patterns 1095 in ovarian cancer. Int. J. Cancer 139, 1608–1617 (2016). 1096 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 47. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian 1097 carcinoma. Nature 474, 609–615 (2011). 1098 48. Wiegand, K. C. et al. ARID1A mutations in endometriosis-associated ovarian 1099 carcinomas. N. Engl. J. Med. 363, 1532–1543 (2010). 1100 49. Murray, M. J. et al. LIN28 Expression in malignant germ cell tumors downregulates 1101 let-7 and increases oncogene levels. Cancer Res. 73, 4872–4884 (2013). 1102 50. Biton, A. et al. Independent component analysis uncovers the landscape of the 1103 bladder tumor transcriptome and reveals insights into luminal and basal subtypes. 1104 Cell Rep. 9, 1235–1245 (2014). 1105 51. Fair, W. R., Israeli, R. S. & Heston, W. D. Prostate-specific membrane antigen. 1106 Prostate 32, 140–148 (1997). 1107 52. Black, J. D., English, D. P., Roque, D. M. & Santin, A. D. Targeted therapy in 1108 uterine serous carcinoma: an aggressive variant of endometrial cancer. Womens 1109 Health (Lond. Engl.) 10, 45–57 (2014). 1110 53. Yang, S., Thiel, K. W. & Leslie, K. K. Progesterone: the ultimate endometrial tumor 1111 suppressor. Trends Endocrinol. Metab. 22, 145–152 (2011). 1112 54. Huszar, M. et al. Up-regulation of L1CAM is linked to loss of hormone receptors and 1113 E-cadherin in aggressive subtypes of endometrial carcinomas. J. Pathol. 220, 551–1114 561 (2010). 1115 55. Kozak, J., Wdowiak, P., Maciejewski, R. & Torres, A. A guide for endometrial 1116 cancer cell lines functional assays using the measurements of electronic 1117 impedance. Cytotechnology 70, 339–350 (2018). 1118 56. Korch, C. et al. DNA profiling analysis of endometrial and ovarian cell lines reveals 1119 misidentification, redundancy and contamination. Gynecol. Oncol. 127, 241–248 1120 (2012). 1121 57. Wu, D. et al. Gene-expression data integration to squamous cell lung cancer 1122 subtypes reveals drug sensitivity. Br. J. Cancer 109, 1599–1608 (2013). 1123 58. Walter, V. et al. Molecular subtypes in head and neck cancer exhibit distinct 1124 patterns of chromosomal gain and loss of canonical cancer genes. PLoS One 8, 1125 e56823 (2013). 1126 59. Adeegbe, D. O. et al. BET Bromodomain Inhibition Cooperates with PD-1 Blockade 1127 to Facilitate Antitumor Response in Kras-Mutant Non-Small Cell Lung Cancer. 1128 Cancer Immunol Res 6, 1234–1245 (2018). 1129 60. Blaisdell, A. et al. Neutrophils oppose uterine epithelial carcinogenesis via 1130 debridement of hypoxic tumor cells. Cancer Cell 28, 785–799 (2015). 1131 61. Fitamant, J. et al. YAP inhibition restores hepatocyte differentiation in advanced 1132 HCC, leading to tumor regression. Cell Rep. 10, 1692–1707 (2015). 1133 62. Jia, D. et al. Crebbp loss drives small cell lung cancer and increases sensitivity to 1134 HDAC inhibition. Cancer Discov. 8, 1422–1437 (2018). 1135 63. Kress, T. R. et al. Identification of MYC-Dependent Transcriptional Programs in 1136 Oncogene-Addicted Liver Tumors. Cancer Res. 76, 3463–3472 (2016). 1137 64. Li, L. et al. GKAP acts as a genetic modulator of NMDAR signaling to govern 1138 invasive tumor growth. Cancer Cell 33, 736–751.e5 (2018). 1139 65. Mollaoglu, G. et al. The Lineage-Defining Transcription Factors SOX2 and NKX2-1 1140 Determine Lung Cancer Cell Fate and Shape the Tumor Immune 1141 Microenvironment. Immunity 49, 764–779.e9 (2018). 1142 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 41 66. Pan, Y. et al. Whole tumor RNA-sequencing and deconvolution reveal a clinically-1143 prognostic PTEN/PI3K-regulated glioma transcriptional signature. Oncotarget 8, 1144 52474–52487 (2017). 1145 67. Lissanu Deribe, Y. et al. Mutations in the SWI/SNF complex induce a targetable 1146 dependence on oxidative phosphorylation in lung cancer. Nat. Med. 24, 1047–1057 1147 (2018). 1148 68. Xu, C. et al. Loss of Lkb1 and Pten leads to lung squamous cell carcinoma with 1149 elevated PD-L1 expression. Cancer Cell 25, 590–604 (2014). 1150 69. NCI-Frederick, Frederick, MD. National Laboratory for Cancer Research. The NCI 1151 Patient-Derived Models Repository (PDMR). (2019). at 1152 70. Broutier, L. et al. Human primary liver cancer-derived organoid cultures for disease 1153 modeling and drug screening. Nat. Med. 23, 1424–1435 (2017). 1154 71. Lee, S. H. et al. Tumor Evolution and Drug Response in Patient-Derived Organoid 1155 Models of Bladder Cancer. Cell 173, 515–528.e17 (2018). 1156 72. Ogawa, J., Pao, G. M., Shokhirev, M. N. & Verma, I. M. Glioblastoma model using 1157 human cerebral organoids. Cell Rep. 23, 1220–1229 (2018). 1158 73. Ben-David, U. et al. Patient-derived xenografts undergo mouse-specific tumor 1159 evolution. Nat. Genet. 49, 1567–1575 (2017). 1160 74. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 1161 719–724 (2009). 1162 75. Balkwill, F. R., Capasso, M. & Hagemann, T. The tumor microenvironment at a 1163 glance. J. Cell Sci. 125, 5591–5596 (2012). 1164 76. Lancaster, M. A. & Knoblich, J. A. Organogenesis in a dish: modeling development 1165 and disease using organoid technologies. Science 345, 1247125 (2014). 1166 77. Bregenzer, M. E. et al. Integrated cancer tissue engineering models for precision 1167 medicine. PLoS One 14, e0216564 (2019). 1168 78. Wang, D. H. & Souza, R. F. Biology of Barrett’s esophagus and esophageal 1169 adenocarcinoma. Gastrointest Endosc Clin N Am 21, 25–38 (2011). 1170 79. Lee, J. et al. Tumor stem cells derived from glioblastomas cultured in bFGF and 1171 EGF more closely mirror the phenotype and genotype of primary tumors than do 1172 serum-cultured cell lines. Cancer Cell 9, 391–403 (2006). 1173 80. Wenger, S. L. et al. Comparison of established cell lines at different passages by 1174 karyotype and comparative genomic hybridization. Biosci. Rep. 24, 631–639 (2004). 1175 81. Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug 1176 response. Nature 560, 325–330 (2018). 1177 82. Cooke, S. L. et al. Genomic analysis of genetic heterogeneity and evolution in high-1178 grade serous ovarian carcinoma. Oncogene 29, 4905–4913 (2010). 1179 83. Hristova, V. A. & Chan, D. W. Cancer biomarker discovery and translation: 1180 proteomics and beyond. Expert Rev Proteomics 16, 93–103 (2019). 1181 84. Dawson, M. A. & Kouzarides, T. Cancer epigenetics: from mechanism to therapy. 1182 Cell 150, 12–27 (2012). 1183 85. Silva, T. C. et al. TCGA Workflow: Analyze cancer genomics and epigenomics data 1184 using Bioconductor packages. [version 2; peer review: 1 approved, 2 approved with 1185 reservations]. F1000Res. 5, 1542 (2016). 1186 86. Morgan, M., Obenchain, V., Hester, J. & Pag`es, H. SummarizedExperiment: 1187 SummarizedExperiment container. (2018). 1188 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 42 87. Pavlidis, P. & Noble, W. S. Analysis of strain and regional variation in gene 1189 expression in mouse brain. Genome Biol. 2, RESEARCH0042 (2001). 1190 88. Geman, D., d Avignon, C., Naiman, D. Q. & Winslow, R. L. Classifying gene 1191 expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 3, 1192 Article19 (2004). 1193 89. Krstajic, D., Buturovic, L. J., Leahy, D. E. & Thomas, S. Cross-validation pitfalls 1194 when selecting and assessing regression and classification models. J. Cheminform. 1195 6, 10 (2014). 1196 90. Lipton, Z. C., Elkan, C. & Naryanaswamy, B. Optimal Thresholding of Classifiers to 1197 Maximize F1 Measure. Mach. Learn. Knowl. Discov. Databases 8725, 225–239 1198 (2014). 1199 91. Grow, E. J. et al. Intrinsic retroviral reactivation in human preimplantation embryos 1200 and pluripotent cells. Nature 522, 221–225 (2015). 1201 92. Kolde, R. pheatmap: Pretty Heatmaps. (CRAN, 2019). 1202 93. Wickham, H. ggplot2 - Elegant Graphics for Data Analysis . (Springer-Verlag New 1203 York, 2016). doi:10.1007/978-0-387-98141-3 1204 94. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations 1205 in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). 1206 95. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture 1207 from expression data. Nat. Commun. 4, 2612 (2013). 1208 96. Kovalchik, S. RISmed: Download Content from NCBI Databases. (CRAN.R-project, 1209 2017). 1210 1211 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ A B Figure 1 HighLow C an ce r T yp es Cancer models Classification score Cancer cell lines (CCL) Patient derived xenograft (PDX) Genetically engineered mouse model (GEMM) Tumoroids Select parameter set with maximum mean AUPRC. Train on all TCGA data CancerCellNet Set parameters n, m Randomly select 2/3 TCGA data; run training process Assess performance on 1/3 held out data Repeat steps (2-3) 5 times (1) (2) (3) (4) Repeat steps (1-4) for each parameter set (5) CancerCellNet RNA-seq from … G en e pa irs Training data Training process Train Random Forest classifier G en es Samples G en es Labeled RNA-seq data Select n genes Gene pair transform Select m gene pairs G en e pa irs G en es Samples Samples Samples Samples Samples CancerCellNet C D E .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 2 A F C D E CCN Score B .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ CCN Score A B C Figure 3 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ D A B Figure 4 C General classification General CCN score (UCEC) Sub-type classification Endometrioid Serous Normal Unknown General classification General CCN score (LUSC) Sub-type classification basal classical primitive secretory Unknown General classification General CCN score (LUAD) Sub-type classification prox.-inflam prox.-prolif TRU Unknown .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ CCN Score Figure 5 A B C D E General classification General CCN score (UCEC) Sub-type classification Endometrioid Serous Normal Unknown General classification General CCN score (LUSC) Sub-type classification basal classical primitive secretory Unknown General classification General CCN score (LUAD) Sub-type classification prox.-inflam prox.-prolif TRU Unknown .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 6 C BA D E General classification General CCN score (UCEC) Sub-type classification Genotype Endometrioid Serous Normal Unknown General classification General CCN score (LUSC) Sub-type classification Genotype basal classical primitive secretory Unknown General classification General CCN score (LUAD) Sub-type classification Genotype prox.-inflam prox.-prolif TRU Unknown CCN Score .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 7 A B C D E General classification General CCN score (UCEC) Sub-type classification Endometrioid Serous Normal Unknown General classification General CCN score (LUSC) Sub-type classification basal classical primitive secretory Unknown General classification General CCN score (LUAD) Sub-type classification prox.-inflam prox.-prolif TRU Unknown CCN Score .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 8 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Supplemental Figure 1 BA D E Training data Samples G en es RNA-Seq TCGA Training process Gene Pair Transform Feature Selection Train Random forest classifier G en es G en e P ai rs CancerCellNetBroad Class Classification Add on to Gene Pairs as Additional Features C C N S co re s G en e P ai rs C .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Supplemental Figure 2 A B D E C .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Supplemental Figure 3 C D A B .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Supplemental Figure 4 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ Supplemental Figure 5 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2020.03.27.012757doi: bioRxiv preprint https://doi.org/10.1101/2020.03.27.012757 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_04_24_059154 ---- 64477024 1 Biochemical, structural insights of newly isolated AA16 family of Lytic Polysaccharide 1 Monooxygenase (LPMO) from Aspergillus fumigatus and investigation of its synergistic 2 effect using biomass. 3 Musaddique Hossain, Subba Reddy Dodda, Bishwajit Singh Kapoor, Kaustav Aikat, and 4 Sudit S. Mukhopadhyay* 5 Department of Biotechnology, National Institute of Technology Durgapur-713209, West 6 Bengal, India 7 Running title: Biochemical, structural insights, and investigation of the synergistic effect of 8 newly isolated AA16 family of Lytic Polysaccharide Monooxygenase (LPMO) from 9 Aspergillus fumigatus. 10 * To whom the corresponding author should be addressed. 11 E-mail: suditmukhopadhy@yahoo.com 12 Phone: +919434788139 13 14 15 16 17 18 19 20 21 22 23 24 25 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 2 Abstract 26 The efficient conversion of lignocellulosic biomass into fermentable sugar is a bottleneck for 27 the cheap production of bio-ethanol. The recently identified enzyme Lytic Polysaccharide 28 Monooxygenase (LPMO) family has brought new hope because of its boosting capabilities of 29 cellulose hydrolysis. In this report, we have identified and characterized a new class of 30 auxiliary (AA16) oxidative enzyme LPMO from the genome of a locally isolated 31 thermophilic fungus Aspergillus fumigatus (NITDGPKA3) and evaluated its boosting 32 capacity of biomass hydrolysis. The AfLPMO16 is an intronless gene and encodes the 29kDa 33 protein. While Sequence-wise, it is close to the C1 type of AaAA16 and cellulose-active 34 AA10 family of LPMOs, but the predicted three-dimensional structure shows the 35 resemblance with the AA11 family of LPMO (PDB Id: 4MAH). The gene was expressed 36 under an inducible promoter (AOX1) with C-terminal His tag in the Pichia pastoris. The 37 protein was purified using Ni-NTA affinity chromatography, and we studied the enzyme 38 kinetics with 2,6-dimethoxyphenol. We observed polysaccharides depolymerization activity 39 with Carboxymethyl cellulose (CMC) and Phosphoric acid swollen cellulose (PASC). 40 Moreover, the simultaneous use of cellulase cocktail (commercial) and AfLPMO16 enhances 41 lignocellulosic biomass hydrolysis by 2-fold, which is highest so far reported in the LPMO 42 family. 43 44 Importance 45 The auxiliary enzymes, such as LPMOs, have industrial importance. These enzymes are used 46 in cellulolytic enzyme cocktail due to their synergistic effect along with cellulases. In our 47 study, we have biochemically and functionally characterized the new AA16 family of LPMO 48 from Aspergillus fumigatus (NITDGPKA3). The biochemical characterization is the 49 fundamental scientific elucidation of the newly isolated enzyme. The functional 50 characterization, biomass degradation activity of AfLPMO16, and cellulase cocktail 51 (commercial) combination enhancing the activity by 2-fold. This enhancement is the highest 52 reported so far, which gives the enzyme AfLPMO16 enormous potential for industrial use. 53 54 Keywords: A.fumigatus, Auxiliary activity, Cloning, Kinetics, LPMO, Lignocelluloses, 55 Molecular docking 56 57 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 3 Introduction 58 The diminution of fossil fuels and the growing concern of environmental consequences, 59 particularly climate changes, have steered our fast-growing economy for clean and renewable 60 energy production [1]. Among different renewable energy sources, bioethanol is one of the 61 promising alternatives to fossil fuel because of its low CO2 emission [2, 3] and its 62 manufacturing reliance on lignocellulosic biomass, which is bio-renewable and abundance on 63 earth. However, the structural complexity and the recalcitrance of this renewable carbon 64 source [4] have hindered its optimal use. The current process of saccharification of 65 lignocellulosic biomass is time-consuming and costly. Therefore, the requirement of cost-66 effective and fast controlled destruction of lignocellulose has driven the bioethanol industry 67 to explore the accessory enzymes to achieve a better and efficient enzyme cocktail for the 68 commercial production of lignocellulose-derived ethanol. 69 A breakthrough in such exploration came into existence when a mono-copper redox enzyme, 70 known as Lytic polysaccharide monooxygenase (LPMO), was first reported in 2010 [5-8]. 71 LPMO increases lignocellulosic biomass conversion efficiency[9,10 ] by catalyzing the 72 hydroxylation of C1 and/or C4 carbon involved in glycosidic bonds that connect glucose unit 73 in cellulose and allow cellulase enzymes to process the destabilized complex polysaccharides 74 [11-15]. Harris et al., in their study, used LPMO from T reesei along with classical cellulases 75 and showed that the degradation of polysaccharide substrates was increased by a factor of 76 two when compared with the activity of classical cellulases alone [16]. A CBM33 domain-77 containing enzyme identified from Serratia marcescens with boosting chitinase activity, later 78 classified as LPMO. A study by Nakagawa et al. showed that an AA10 family of LPMO from 79 Streptomyces griseus could increase the efficiency of chitinase enzymes by 30- and 20-fold 80 on both α and β forms of chitin, respectively [17]. Along with this work, there are some 81 recent reports of the synergistic effect of LPMOs with glycoside hydrolases on 82 polysaccharide substrates [18-20]. 83 LPMOs are classified as AA9, AA10, AA11, AA13, AA14, and AA15 in the CAZy database 84 (http://www.cazy.org/), based on their amino acid sequence similarity. Recently Filiatrault-85 Chastel et al. identified the AA16, a new family of LPMO from the secretome of a fungi 86 Aspergillus aculeatus (AaAA16). The AaAA16 was initially isolated as X273 protein 87 (unnamed domain) and later identified as C1-oxidizing LPMO active on cellulose [21]. 88 AaAA16, the only AA16 family of LPMO so far, has been identified, and it lacks complete 89 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 4 biochemical characterization. The biochemical characterization, structural characterization, 90 and the assessment of biomass conversion efficiency are required to understand better the 91 action of members of this new family on plant biomass and their possible biological roles. 92 While we were analyzing the cellulose hydrolyzing genes from the genome of A. fumigatus 93 (Aspergillus genome database), we identified five LPMOs, one belonging to AA16 family 94 because of its X273 domain. Further, we cloned the AfLPMO16 gene from the genome of our 95 locally isolated strain of A. fumigatus (NITDGPKA3) [22] (GenBank accession No. 96 JQ046374) by designing the primers based on the A. fumigatus LPMO sequence 97 (CAF32158.1)(NCBI). The cloned A. fumigatus (NITDGPKA3) LPMO (after cloning and 98 sequencing the sequence submitted to GenBank; accession No. MT462230) is expressed in 99 Pichia pastoris X33. The heterologous protein (AfLPMO16) purified and used for 100 biochemical and functional characterization. The saccharification rate assessment suggests 101 that AfLPMO16 has fast and effective glucose releasing ability from lignocellulose and 102 cellulose when used with a commercial cellulase cocktail. Enzyme kinetics using 2,6-103 dimethoxyphenol as a substrate [23] confirmed the oxidative activity. The lignocellulosic 104 biomass (alkaline pre-treated raw rice straw) conversion efficiency along with cellulases 105 suggests that AfLPMO16 could be an essential member of the cellulase cocktail for industrial 106 use. 107 Results 108 Cloning, expression, and purification of AfLPMO16 109 AfLPMO16 (GenBank accession No. MT462230) is an intronless 870 nucleotide long gene 110 that encodes 290 amino acids. The theoretical molecular mass is 29KDa (including signal 111 peptide). The gene sequence of AA16 from our isolated strain of A.fumigatus (NITDGPKA3) 112 has shown almost 99.6% homology with the gene sequence of AA16 present in the genome 113 database of A.fumigatus (CAF32158.1) (NCBI database). 114 The protein of AfLPMO16 (GenBank accession No. MT462230) was produced in Pichia 115 pastoris X33 without its C-terminal extension. After the optimization of the expression 116 procedure, we achieved approximately 0.8 mg/ml of purified protein. The SDS-PAGE 117 analysis (Fig 1) confirmed the single band of the purified protein (Fig. 1: lanes 5 and 6). We 118 further confirmed the purified recombinant protein bearing the 6X His-tag by Western blot 119 using an anti-His antibody (Fig. 1: Lane W1 & W2); the purified protein (lane 5 & 6 of SDS-120 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 5 PAGE) used for western blot. The expressed recombinant AfLPMO16 band appeared at 121 approximately 32kDa position in SDS-PAGE (Fig. 1), which is slightly higher than the 122 expected size. It is probably due to glycosylation [24], or recombinant protein has c-myc 123 epitope and 6x His tag in its c-terminal that can increase the molecular mass by 2.7KDa. For 124 further confirmation of N-glycosylation, we checked the AfLPMO16 sequence glycosylation 125 site using NetNGlyc 1.0 server (DTU Bioinformatics, Technical University of Denmark, 126 http://www.cbs.dtu.dk/services/NetNGlyc/) [36]. There were two N-glycosylation sites 127 present above the 0.5 threshold value at 114 & 149 amino acid sequence positions with 0.76 128 and 0.56 potential values, respectively. 129 Enzyme assay and Kinetics 130 LPMO converts 2,6-dimethoxyphenol (2,6-DMP) into 1-Coerulignone (Fig. 2a) due to its 131 oxidative property, and 1-Coerulignone has an extinction coefficient of 53200 . 1-132 Coerulignone gives absorbance at 469nm wavelength; therefore, we can easily quantify it 133 using a spectrophotometer [21]. The OD at 469nm wavelength steadily increases with time 134 that clearly indicates the steady conversion of 2,6-dimethoxyphenol to 1-Coerulignone (Fig. 135 2a). It also suggests the sufficient activity of the enzyme AfLPMO16. Temperature and pH 136 influence the activity of LPMO. Thus, during the kinetic study, we used optimum 137 temperature 30 and pH 6.0, as described by [21]. AfLPMO16 showed proper activity for 138 the chemical substrate 2,6-dimethoxyphenol; there was a steady release of 1-Coerulignone 139 when incubated 2,6-dimethoxyphenol with AfLPMO16. The enzyme kinetics was performed 140 with different concentrations of 2,6-dimethoxyphenol. We obtained the Kinetics parameters 141 such as Michaelis Menten constant (Km) and maximum velocity (Vmax) from the Line-142 weaver-Burk plot (Fig. 2b) as 5.4mM, and 0.153 U/mg, respectively. The calculated catalytic 143 activity Kcat was 277.67 min -1 (Table 1). These kinetics parameters suggest that the oxidative 144 property of AfLPMO16. 145 In-silico analysis for substrate specificity 146 The AfLPMO16 contains 19 amino acids long N-terminal signal peptide before His1 catalytic 147 domain (1-169aa), and C terminal Serine rich region (170-271aa) (Fig. 3a). This N-terminal 148 sequence is one of the marker features of fungal LPMOs, but this serine-rich C-terminal or 149 linker is a feature of AA16 family. It also lacks the CBM1 module or 150 glycosylphosphatidylinositol (GPI) anchor, like other AA16 LPMOs [19]. AfLPMO16 also 151 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 6 has conserved Histidine at 1st and 109th positions, which are mainly involved in copper 152 binding, the signature characteristic of LPMOs. There are other conserved sequences like 153 Gly, Pro, Asn, Cys, Try, Tyr, Leu, and Asp, including GNV(I)QGELQ motif (Fig. 3b) The 154 fully conserved sequences (highlighted with red background) are the marker amino acids 155 represent the LPMOs. The partially conserved sequences (within the blue boxes) are the 156 marker of different auxiliary families (Fig. 3b). The sequence alignment studies of AA16 157 family (including AfLPMO16) with other families (AA9, AA10, and AA11) of LPMOs 158 suggested (Fig. S1) a co-relation between AA10 family and AA16 LPMOs. The substrate-159 binding motif in the L2 loop of cellulose active LPMO10 has some similarities with AA16 160 L2 loop motif (marked with black box) and cellulose active motif (Fig. 3b). In AA16 LPMOs 161 the conserved motif in L2 loop GNI(V)QGEL the region is replaced by YNWFG(A)NL for 162 C1 oxidizing AA10 LPMOs, which are also cellulose active. The previous study suggests that 163 the amino acids (Y79, N80, F82, Y111, and W141) in loop L2 take part in substrate 164 specificity for LPMO 10, and mutations (Y79, N80D, F82A, Y111F, W141Q) alter the 165 specificity of the substrate from chitin to cellulose [37]. In AfLPMO16, the corresponding 166 amino acids GNQYR (Fig. 3b) (marked with black arrows), some amino acids from these 167 positions (N & Y) are also present in cellulose-active AA10 LPMOs. Hopefully, the polar 168 amino acids (Q & R) are charged and may interact with chitin due to electrostatic interaction. 169 Alternatively, there are high chances that few mutations in these amino acids may help 170 AfLPMO16 to interact with chitin. Further, in chitin active LPMOs, more than 70% residues 171 of the motif (Y(W)EPQSVE) are polar, including two negatively charged Glu (E). In 172 cellulose active LPMOs, 70% residues of the motif (Y(W)NWFGVL) are hydrophobic [38]. 173 In contrast, in AfLPMO16, 70% residues are polar, including one negatively charged Glu (E), 174 one hydrophobic Tyr (Y), and others are neutral. The presence of polar residue and negative 175 charged Glu (E) suggests that AfLPMO16 may bind to chitin. Electrostatics interaction 176 between the substrate and enzyme active site plays a pivotal role in substrate binding. The 177 electrostatic potential surface at the catalytic site of the AfLPMO16 was found unchanged or 178 slightly positive-charged at pH 6.0 (Fig. 3c) (Marked in the figure). The electrostatic 179 interaction study suggests that the AfLPMO16 may also bind to cellulose [52]. 180 Regioselectivity of AfLPMO16 181 Amino acids on the substrate-binding surface determine the oxidative regioselectivity of 182 LPMOs [29]. Sequence comparison and mutation studies revealed that the conserved amino 183 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 7 acids near the catalytic center in C1 and C1/C4 oxidizing AA10 and AA9 LPMOs are 184 responsible for regioselectivity. In the case of C1/C4 oxidizing AA10, the amino acid Asn85 185 near the catalytic center is responsible for C4 oxidizing activity. Alteration of this amino acid 186 (N85F) diminished the C4 activity and produced only C1 oxidized product [39]. In C1 187 oxidizing AA9 LPMOs, hydrophobic amino acids Phe and Tyr are conserved in addition to 188 Asn. While in C1 oxidizing AA10 LPMOs, the Phe amino acid has replaced the 189 corresponding Asn site (Fig. 3b)(marked with red arrow). The Phe is also parallel to the 190 substrate-binding surface [47]. In AA16, the corresponding Gln (Q) may be parallel to the 191 substrate-binding region (Fig. 3b). The function of conserved Gln (Q) is not clear. However, 192 this polar amino acid has a similar side chain with polar Asn (N). The axial distance between 193 the conserved amino acid and copper catalytic center is another crucial factor for 194 regioselectivity. The C1/C4 oxidizing AA10 LPMOs have more open or wider axial gaps 195 than C1 oxidizing AA10 LPMOs [39]. Here the distance between Gln56 and His20 is 7.7Å, 196 and the distance between Gln56 and Cu catalytic center is 11.1Å. In the absence of the AA16 197 structure (crystal or model), we cannot compare the lengths; nevertheless, this distance may 198 play a key role in regioselectivity. 199 Phylogenetic tree construction and analysis 200 The sequential and functional relationship of AA10 and AA16 LPMOs has been discussed, 201 but phylogenetic studies based on the sequence similarity give an evolutionary origin. Based 202 on sequence comparison, AfLPMO16 is evolutionarily closer to the LPMO of Aspergillus 203 fisheri (91% sequence homology). The constructed phylogenetic tree contains two main 204 clades and two subclades (Fig. 4). The first clade contains all AA10 LPMOs from bacterial 205 species such as Bacillus thuringiensis, Bacillus amyloliquefaciens, Streptomyces lividans, and 206 Enterococcus faecalis. The second clade includes all fungal AA10 and AA16LPMOs, mainly 207 belongs to Aspergillus, and Penicillium species in which AA16 LPMOs are mostly from 208 A.niger, A.fumigatus, A.fisheri, Aspergillus kawachii (Fig. 4). 209 Model structure prediction and molecular docking analysis 210 I-TASSER was used to predict the three-dimensional structure of the AfLPMO16. Most of the 211 LPMOs have immunoglobulin-like distorted β-sandwich fold like structures, in which loops 212 connect seven antiparallel β-strands with a different number of α-helix insertions (Fig. 5a). 213 The final model has a β-sandwich structure connected by loops with two α-helices. The 214 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 8 superimposition of the AfLPMO16 with other LPMO families like AA9, AA10, AA11, and 215 AA13 showed that they share common antiparallel β-strands and helices with more loops, 216 which indicate higher flexibility. Moreover, AfLPMO16 showed 1.2Å RMSD with AA11 217 (PDB Id: 4MAH) LPMO lower than the other LPMOs. So the 3D structure of AfLPMO16 218 suggests that it has more structural resemblance with AA11 LPMO. We also found one 219 disulfide bond in AfLPMO16 between the Cys78-Cys186 amino acids, signature of thermo-220 stability (Fig. S2). The histidine brace amino acids, such as His20 and His109, participate in 221 coordination bond with Cu ions. The surface of AfLPMO16 has an active site (Fig. 5b). The 222 interaction studies with cellohexose suggest amino acids like Gln48, Gln181, Ser178, His109, 223 His20, Asn54, Asp50, Tyr52, and Glu58 (Active enzyme starts with His1; so His20 will His1 224 and corresponding amino acids can be numbered accordingly) are in the active site and are 225 involved in the interaction with the substrate (Fig. 5c). Molecular docking suggests that 226 AfLPMO16 has a cellulose-binding surface (Fig. 5b & 5c). This study also suggests that the 227 binding energy between AfLPMO16 and cellulose is -7.0 kcal/mol, which is highest 228 compared to chitin (-5.5kcal/mol) and other polysaccharides. 229 Polysaccharides depolymerization by AfLPMO16 230 AfLPMO16 showed efficient depolymerization activity on both CMC and PASC (Fig. 6a & 231 6b). We quantified the amount of reducing sugar released by enzymatic degradation. When 232 incubated CMC with increasing concentrations of the enzyme, the amount of product 233 (reducing sugar) increased with the increase of AfLPMO16 concentration (Fig. 6a). When we 234 added 50µg of the enzyme, nearly 0.05mg/ml of reducing sugar was released. For 100µg of 235 the enzyme, the product was nearly 0.136mg/ml, and for 200µg of the enzyme, the amount of 236 product released was approximately 0.356mg/ml (Fig. 6a). This result indicates the 237 polysaccharide (CMC) depolymerization activity of AfLPMO16. 238 Further, we used insoluble PASC as a substrate and incubated with an increasing 239 concentration of AfLPMO16, and determined the relative absorbance of PASC with the 240 growing amount of enzyme. The enzyme degrades the polysaccharide (substrate) into smaller 241 polysaccharide units (monosaccharides, disaccharides, etc.), which are soluble and make the 242 reaction mixture clearer. Therefore, it leads to a decrease in the absorbance resulting 243 increment in relative absorbance [40]. Ultimately we will find a graph where relative 244 absorbance increase with increasing concentration of AfLPMO16. Hence In this experiment, 245 we found a rise in relative absorbance concerning the untreated substrate with a high 246 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 9 concentration of enzyme AfLPMO16 (Fig. 6b). The graph (Fig. 6b) showed that 0.17 247 absorbance difference concerning untreated substrate when we used 50µl (concentration 248 0.8µg/µl) of the enzyme. The difference in absorbance steadily increased with the escalation 249 of enzyme concentration (200µl of the enzyme at the concentration of 0.8µg/µl the relative 250 absorbance reached nearly 0.36). Hence these experiments confirmed the intrinsic 251 polysaccharide degradation property of the AfLPMO16 like other LPMOs. In these 252 experiments, we used the heat-inactivated AfLPMO16 and ascorbic acid-deficient set to 253 verify these results (data not shown). 254 Pre-treated lignocellulosic biomass and cellulose hydrolysis with simultaneous treatment of 255 AfLPMO16 and commercial cellulase 256 There are two modes of action to show the synergy or boosting effect of LPMO while using 257 with cellulase- sequential assay and simultaneous assay. In the sequential assay, LPMO 258 should add a prior time limit to cellulase. And in the simultaneous assay, both the enzymes 259 LPMO and cellulase are being used together to the substrate. In this study, we chose to 260 perform a simultaneous assay for two reasons; simultaneous assay shows better synergy or 261 boosting in crystalline cellulose [41] than sequential one. Furthermore, we aimed to check the 262 synergy or stimulating activity of commercial cellulase by AfLPMO16 so that it may include 263 in the cocktail for better depolymerizing action. Here the boosting effect of AfLPMO16 was 264 studied with a commercial cellulase cocktail on both cellulose (Avicel) and lignocellulosic 265 biomass (alkaline pre-treated rice straw). The alkaline pre-treatment has a beneficiary over 266 acid pre-treatment in terms of hydrolysis yield [48]. The reason is that alkaline pre-treatment 267 sufficiently removes the lignin [42], but it preserves hemicelluloses [43]. When incubating 268 Avicel with AfLPMO16 and cellulase, the amount of reducing sugar released was almost 269 double compared to Avicel incubated with either cellulase alone or cellulase along with heat-270 inactivated AfLPMO16 (Fig. 7b). A similar kind of boosting effect we observed in every 271 time point from 5 hrs to 72 hrs. We also found the synergistic impact of AfLPMO16 in 272 lignocellulosic biomass transformation to fermentable sugar (Fig. 7a). When we incubated 273 the alkaline pre-treated rice straw with 100 µg and 200µg of AfLPMO16 along with cellulase, 274 almost 1.7 fold and slightly above 2-fold of reducing sugar were released respectively 275 compared to lignocellulose incubated with either cellulase alone or cellulase along with heat-276 inactivated AfLPMO16 (Fig. 7a) suggests the enhancement is dependent on the amount of 277 auxiliary enzyme AfLPMO16. For further elaboration of the synergistic effect of AfLPMO16, 278 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 10 another set of reactions prepared where the biomass was treated with an increasing 279 concentration of only AfLPMO16. A minimal amount of hydrolysis activity was there, nearly 280 0.04 mg/ml to 0.06 mg/ml, reducing sugar quantified for AfLPMO16 treated biomass (Fig. 281 7c). This hydrolysis activity of AfLPMO16 alone is negligible compare to only cellulase 282 treated biomass. 283 Nevertheless, the simultaneous use of AfLPMO16 and cellulase enhances the hydrolysis 284 activity two-fold compared to the only cellulase treated biomass (Fig. 7c). This result 285 strongly indicates the synergistic effect of AfLPMO16 with cellulase. All these results 286 confirmed the boosting effect or synergistic effect of AfLPMO16 on the hydrolytic activity of 287 cellulase for both cellulosic and lignocellulosic biomass degradation. So far highest 288 synergistic effect was reported by AA9 (Table 2), which is less than two-fold [44, 45]. 289 Discussion 290 The gene was cloned in pPICZαA vector under the control of AOX1 promoter by following 291 the same strategy developed for AaAA16 and PMO9A_MLACI [19, 26]. The nucleotide 292 sequence of AfLPMO16 was codon-optimized for Pichia pastoris. The recombinant protein 293 containing a C-terminal polyhistidine tag was produced in flasks in the presence of trace 294 metals, including copper, and purified from the culture supernatant by immobilized metal ion 295 affinity chromatography (IMAC: Ni-NTA affinity chromatography), following the same 296 protocol used for AaAA16 [19]. We were successful in producing the active AfLPMO16 in 297 P.pastoris X33 (Fig. 1) in a shake flask. Despite the chance of N-terminal modification in 298 shake flask culture instead of bioreactor culture [19], the amount of active enzyme obtained 299 in shake flask was sufficient for characterization. The enzyme activity determined by 2,6-300 dimethoxyphenol concerning the heat-inactivated enzyme and without ascorbic acid as 301 negative controls (data not shown). The enzyme activity suggests the successful production 302 of active protein (Fig. 2a), and interestingly, the initial reaction rate is faster compared to later 303 time span. Lytic polysaccharide monooxygenase (LPMO) releases a spectrum of cleavage 304 products from their polymeric substrates cellulose, hemicellulose, or chitin. The correct 305 identification and quantitation of these released products is the basis of MS/HPLC-based 306 detection methods for LPMO activity, which is time taking and is required specialized 307 laboratories to measure LPMO activity in day-to-day work. A spectrophotometric assay 308 based on the 2,6-dimethoxyphenol can accurately measure the enzymatic action and can be 309 used for enzyme screening, production, and purification, and can also be applied to study 310 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 11 enzyme Kinetics [21]. Thus it is swift, robust for biochemical characterization, and also 311 accurately determines the active enzyme. 312 Sequence analyses indicating that the AfLPMO16 has some signature characteristics for both 313 cellulose and chitin-binding and both C1 and C1/C4 oxidizing activity. However, 314 experimental confirmation is required to establish the presence or absence of any chitin-315 binding nature and C1/C4 oxidizing capability of AfLPMO16. The constructed phylogenetic 316 tree (Fig. 4) suggests that the fungal AA10 and AA16 LPMOs are more likely to come from a 317 common ancestor. Molecular docking study suggests that AfLPMO16 has the highest affinity 318 towards cellulose among the known substrates, based on the binding energy. The binding 319 energy between cellulose and AfLPMO16 is -7.0 Kcal/mol, which makes thermodynamically 320 strong binding between enzyme and substrate (Fig. 5b & 5c) compared to other substrates. 321 The LPMOs are essential for their auxiliary activity and polysaccharide degrading property. 322 We observed polysaccharide depolymerizing activity on carboxymethyl cellulose (CMC) and 323 phosphoric acid swollen cellulose (PASC) (Fig. 6a & 6b). Due to its auxiliary activity, it 324 enhances the action of the cellulase enzyme for the degradation of cellulose and 325 lignocelluloses [49]. The only identified AA16 family, the AaAA16, showed a sequential 326 boosting effect with T. reesei CBHI on nano-fibrillated cellulose (NFC) and PASC. The 327 AaAA16, the recent addition of the AA16 family of LPMO in the CAZY database, showed 328 synergism with the CBH1 for the degradation of cellulose [19]. However, AaAA16 study did 329 not deal with the biomass hydrolysis boosting effect of the AA16 family. The boosting result 330 is most important in the technical aspect for enhancing the activity of the cellulase cocktail. 331 LPMO enzyme has earned much research interest due to their synergistic effect or boosting 332 effect on cellulase enzyme [45]. AfLPMO16 showed a boosting impact on cellulose and 333 lignocellulose hydrolysis (Fig. 7a & 7b). The synergism of AfLPMO16 has shown in (Fig. 334 7c), where the only AfLPMO16 and only cellulase treated biomass hydrolysis activity is low 335 compare to the combined effect of these two enzymes. The simultaneous use of AfLPMO16 336 and cellulase enhances nearly two-fold biomass hydrolysis compare to the only cellulase 337 treated biomass hydrolysis. This enhancement of two-fold biomass hydrolysis is higher than 338 that of other LPMO families [50]. However, the synergy or boosting effect depends on many 339 factors such as pre-treatment [51], the lignin content of lignocelluloses and acting cellulase 340 [46]. Still, over 50% enhancement suggests intense demands on inclusion on cellulase 341 cocktail. However, the mechanism of synergism with the cellulase enzyme complex is poorly 342 understood. The probable explanation of such a boosting effect could be that the cellulosic 343 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 12 biomass is partially depolymerized by the LPMO, which gives further access to the cellulase 344 enzymes. 345 Conclusion 346 In concluding remark, AfLPMO16 is the second report of the AA16 family of LPMO, but for 347 the first time, we have characterized the AA16 family biochemically and structurally. In-348 silico sequence analysis, structure analysis, and molecular docking studies suggest some 349 unique characteristics of the AfLPMO16, like cellulose-binding ability, chances of chitin-350 binding, and C1 and C4 oxidizing property. Further studies, including the engineering 351 approach, are required to confirm these characteristics. Nevertheless, the most crucial aspect 352 of AfLPMO16 is the significant boosting effect on commercial cellulase cocktail in 353 lignocellulosic biomass conversion, and that suggests its importance in the bioethanol 354 industry. 355 Materials and Methods 356 Sequence analysis and Phylogenetic analysis: 357 AfLPMO16 sequence (CAF32158.1) was obtained from NCBI, and the sequence was further 358 confirmed from the Aspergillus genome database (http://www.aspgd.org/). To avoid 359 interference from the presence or the absence of additional residues or domains, the signal 360 peptides, and C-terminal extensions were removed before the alignment. Homology sequence 361 alignment was performed by the BLAST [22]. Clustal Omega [23] was used for multiple 362 sequence alignment. The sequence alignment was edited with Espript for better visualization. 363 Pymol [24] and MEGA7 [25] were used to construct a phylogenetic tree after sequence 364 alignment. To build the phylogenetic tree, the sequences of twenty-seven (27) LPMO genes 365 (edited to remove N-terminal signal sequence, C-terminal extension or GPI anchor, CBM1 366 module) were taken from different species belong to AA10 and AA16 family of LPMOs. The 367 neighbor-joining tree was constructed with 1000 bootstrap replications. 368 Cloning of AfLPMO16 369 Aspergillus fumigatus NITDGPKA3 was grown on CMC agar media containing 2% CMC, 370 0.2% peptone, 2% agar in basal medium (0.2% NaNO3, 0.05%KCl, 0.05%MgSO4, 371 0.001%FeSO4, 0.1%K2HPO4). The fungal biomass was then milled in a pestle and mortar 372 followed by rapid overtaxing in solution with an appropriate lysis buffer for proper lysis of 373 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 13 the cell. Genomic DNA was isolated from the fungal biomass using the DNA extraction 374 buffer (400mM Tris-HCl, 150mM NaCl, 0.5M EDTA, 1%SDS) and followed by Phenol, 375 chloroform and isoamyl alcohol (25:24:1) extraction. The final pellet was washed with 70% 376 alcohol, air-dried, and dissolved in sterile water. AfLPMO16 gene was amplified by 377 polymerase chain reaction (PCR). The codon-optimized gene for Pichia pastoris was inserted 378 into the pPICZαA vector (Invitrogen Carlsbad, California, USA). The gene was cloned with 379 the native signal sequence and 6x His-tag at the C-terminal [26]. The cloning was done by 380 following the same protocol as AaAA16 and PMO9A_MLACI [19, 26]. The vector 381 (pPICZαA) containing the AfLPMO16 gene was linearized by Pme1 (New England BioLabs) 382 and transformed to Pichia pastoris X33 competent cells. The Zeocin resistant transformants 383 were picked and screened for protein production. The cloned gene was further confirmed by 384 sequencing and the sequence submitted to GenBank (GenBank accession No. MT462230). 385 Expression and purification of AfLPMO16 386 The positive colonies were selected on YPDS (Zeocin: 100μg/ml) plates. The positive 387 transformants were further screened by the colony PCR and expression studies. Protein 388 expression was carried out initially in BMGY media containing 1ml/L Pichia trace minerals 389 4 (PTM4) salt (2g/L CuSO4·5H2O, 3g/L MnSO4·H2 O, 0.2g/L Na2MoO4·2H2O, 0.02g/L 390 H3BO3, 0.5g/L CaSO4·2H2O, 0.5g/L CoCl2, 12.5g/L ZnSO4·7H2O, 22g/L FeSO4·7H2O, NaI 391 0.08g/L, H2SO4 1mL/L) and 0.1 g/L of biotin. Then after 16 hours, Pichia cells were 392 transferred into BMMY medium (PTM4 salt) with continuous induction by the addition of 393 1% methanol (optimized) every day (after every 24 hours) for three days. After three days, 394 the culture media was spun down (8,000rpm for 10mins) at 40C. The pellet was discarded, 395 and the media was collected. The protein was precipitated from the media by ammonium 396 sulfate precipitation (90% saturation). The pellet was redissolved in Tris buffer (Tris-HCl 397 50mM pH-7.8, NaCl-400mM, Imidazole-10mM). The recombinant protein was purified by 398 immobilized ion affinity chromatography (Ni-NTA affinity chromatography)[27], followed 399 by dialysis with 50mM phosphate buffer, pH 6.0. We followed the expression and 400 purification procedure, same as AaAA16 [19]. The yield of the purified protein was almost 401 0.8 mg/ml. The concentration was measured by Bradford assay, and BSA was used for 402 standard concentration. The protein was separated by SDS-PAGE using 12% acrylamide in 403 resolving gel(dH2O-3.6 ml, Acrylamide+Bisacrylamide – 4.0 ml, 1.5M Tris-2.6 ml, 404 10%SDS-0.1 ml, 10% APS-0.1 ml, TEMED- 0.01 ml; for 10 ml), stained with coomassie 405 blue, and the purified protein band was also confirmed by Western blot analysis by using an 406 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 14 anti-His antibody (Abcam). 407 Biochemical assays of AfLPMO16 408 Biochemical characterization of AfLPMO16 409 2,6 DMP (2,6-dimethoxyphenol) was used as a substrate for AfLPMO16 in this study. The 410 reaction was done in phosphate buffer (100mM pH 6.0) containing 10mM 2,6-411 dimethoxyphenol, 5μM hydrogen peroxide, and 50μg of purified AfLPMO16 at 30�C. The 412 amount of product 1-coerulignone was measured by spectrophotometer using the standard 413 extinction coefficient (53200M-1cm-1) and Lambert-Beer law. For kinetic assay different 2,6-414 dimethoxyphenol concentrations (1mM, 5mM, 10mM, 20mM, 25mM, 30mM, 40mM, 415 50mM, 70mM and 100mM) were used. The kinetic parameters were calculated based on the 416 Line-weaver-Burk plot (LB plot). One unit of enzyme activity is defined as the amount of 417 enzyme which releases 1μM of 1-coerulignone (product) per minute in standard reaction 418 condition. 419 Polysaccharides depolymerization by AfLPMO16 420 Different cellulosic compounds such as PASC, avicel®PH-101 (SIGMA), and carboxyl 421 methylcellulose (CMC) was used. We used 1% Avicel®PH-101 (SIGMA) (crystalline 422 cellulose) and 1% CMC (Carboxyl methylcellulose sodium salt) with different concentrations 423 of purified AfLPMO16 for different incubation time. Reducing sugar was determined by 424 Dinitro salicylic acid (DNS) assay. For PASC assay, we used 0.25% PASC and incubated 425 with increasing concentration of AfLPMO16 for 6 hours and measured the OD after 6hrs of 426 incubation and plot the relative absorbance ([OD of AfLPMO16 treated PASC]-[OD of 427 untreated substrate]) with enzyme concentration [28]. 428 Biomass and cellulose hydrolysis by cellulase and AfLPMO16 429 Cellulose and lignocellulose (alkaline pre-treated raw rice straw) [29] was used to determine 430 the cellulose hydrolysis enhancing capacity. Rice straw was pre-treated with 5% NaOH (1:10 431 W/V ratio) at 120�C at 15Psi pressure for 1 hour, and sodium azide (20%) 10μl (per 10ml) 432 was added at the reaction mixture to prevent any microbial contamination. The reaction was 433 performed at 50�C, and the amount of reducing sugar was quantified after 5hours, 24hours, 434 48hours, and 72 hours by Dinitro salicylic acid (DNS) assay. 20μl of cellulase (commercial) 435 (MP Biomedicals LLC) (5mg/ml) was used along with two different concentrations of 436 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 15 AfLPMO16 125μl (100μg) and 250μl (200μg) [concentration 0.8mg/ml]. Reaction sets were 437 prepared using the only cellulase, only AfLPMO16 with different concentrations, combined 438 AfLPMO16 and cellulase and lastly, cellulase with inactivated AfLPMO16. AfLPMO16 was 439 heat-inactivated by keeping at 100�C temperature for 30 minutes. Reducing sugar from each 440 triplicate sets were quantified. In the case of cellulose degradation, 400μl (1%) of avicel 441 (SIGMA) was incubated with 10μl of cellulase (commercial) (MP Biomedicals LLC) 442 (5mg/ml). Reducing sugar was quantified after 5 hours of incubation. For these biochemical 443 assays, we used 100mM phosphate buffer (pH-6.0), and heat-inactivated AfLPMO16 was 444 taken as a negative control. 445 Molecular modeling and Molecular docking 446 I-TASSER [30] server was used to model the AfLPMO16. The final model was energy 447 minimized by Gromacs software [31]. The Ramachandran plot [32] and Procheck [33] was 448 used to evaluate the final model. For Metal Ion-Binding site prediction and docking server or 449 MIB server (http://bioinfo.cmu.edu.tw/MIB/) were used to identify the copper (Cu) ion 450 position. A molecular docking study was performed by the Autodock Vina [34] using MGL 451 tools (Molecular graphics laboratory). The optimized substrate structures were prepared by 452 Autodock vina and saved in PDBQT format. The grid size parameters used in this docking 453 were 44, 46, 46, and grid center parameters used in this study were 49, 45, and 55. The 454 genetic algorithm was also used for docking. Molecular interactions between enzyme and 455 substrate were analyzed by the MGL tools [35]. The electrostatic potential surface of the 456 AfLPMO16 is calculated by the APBS plugin available in Pymol at pH 6.0. 457 458 Acknowledgments 459 MH is thankful to DBT, and SRD is grateful to DST Inspire for their fellowship. The authors 460 are also thankful to DST-FIST grant of the Department of Biotechnology, NIT Durgapur. 461 Funding 462 This study is financially supported by the DBT, Govt. of India (Grant No. BT/PR13127/ 463 PBD/26/447/2015). 464 Authors’ contribution 465 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 16 MH and SM designed the research work. MH, BSK, and SM wrote the manuscript. MH 466 performed biochemical assays. SRD performed In-Silico analysis. MH and KA analyzed the 467 results. All authors read and approved the manuscript. 468 Conflict of interest 469 Authors have no competing interests. The manuscript has been spell-checked, grammar 470 checked and plagiarism-checked by “Grammarly.” 471 Ethical approval 472 No human participants or animal is being used during the study. 473 474 References 475 1. Dias De Oliveira Me, Vaughan Be, Rykiel EJ (2005) Ethanol as Fuel: Energy, Carbon 476 Dioxide Balances, and Ecological Footprint. Bioscience. https://doi.org/10.1641/0006-477 3568(2005)055[0593:eafecd]2.0.co;2 478 2. Saricks C, Santini D, Wang M (1999) Effects of Fuel Ethanol Use on Fuel-Cycle Energy 479 and Greenhouse Gas Emissions 480 3. X. Lang, D. G. MacDonald, G. A. Hil (2002) Recycle Bioreactor for Bioethanol 481 Production from Wheat Starch II. Fermentation and Economics. Energy Sources. 482 https://doi.org/10.1080/009083101300058426 483 4. Somerville C, Bauer S, Brininstool G, et al (2004) Toward a systems approach to 484 understanding plant cell walls. Science (80-. ). 485 5. Forsberg Z, Vaaje-kolstad G, Westereng B, et al (2011) Cleavage of cellulose by a cbm33 486 protein. Protein Sci. https://doi.org/10.1002/pro.689 487 6. Phillips CM, Beeson WT, Cate JH, Marletta MA (2011) Cellobiose dehydrogenase and a 488 copper-dependent polysaccharide monooxygenase potentiate cellulose degradation by 489 Neurospora crassa. ACS Chem Biol. https://doi.org/10.1021/cb200351 490 7. Quinlan RJ, Sweeney MD, Lo Leggio L, et al (2011) Insights into the oxidative 491 degradation of cellulose by a copper metalloenzyme that exploits biomass components. 492 Proc Natl Acad Sci. https://doi.org/10.1073/pnas.1105776108 493 8. Johansen KS (2016) Discovery and industrial applications of lytic polysaccharide 494 monooxygenases. Biochem Soc Trans. https://doi.org/10.1042/bst20150204 495 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 17 9. 496 10. Beeson WT, Vu V V., Span EA, et al. (2015) Cellulose Degradation by Polysaccharide 497 Monooxygenases. Annu Rev Biochem. https://doi.org/10.1146/annurev-biochem-498 060614-034439 499 11. Vermaas J V., Crowley MF, Beckham GT, Payne CM (2015) Effects of lytic 500 polysaccharide monooxygenase oxidation on cellulose structure and binding of oxidized 501 cellulose oligomers to cellulases. J Phys Chem B. 502 https://doi.org/10.1021/acs.jpcb.5b00778 503 12. Forsberg Z, Vaaje-kolstad G, Westereng B, et al (2011) Cleavage of cellulose by a cbm33 504 protein. Protein Sci. https://doi.org/10.1002/pro.689 505 13. Vermaas J V., Crowley MF, Beckham GT, Payne CM (2015) Effects of lytic 506 polysaccharide monooxygenase oxidation on cellulose structure and binding of oxidized 507 cellulose oligomers to cellulases. J Phys Chem B. 508 https://doi.org/10.1021/acs.jpcb.5b00778 509 14. Harris P V., Welner D, McFarland KC, et al (2010) Stimulation of lignocellulosic biomass 510 hydrolysis by proteins of glycoside hydrolase family 61: Structure and function of a 511 large, enigmatic family. Biochemistry. https://doi.org/10.1021/bi100009p 512 15. Nakagawa YS, Kudo M, Loose JSM, et al (2015) A small lytic polysaccharide 513 monooxygenase from Streptomyces griseus targeting α- And β-chitin. FEBS J. 514 https://doi.org/10.1111/febs.13203 515 16. Crouch LI, Labourel A, Walton PH, et al (2016) The contribution of non-catalytic 516 carbohydrate-binding modules to the activity of lytic polysaccharide monooxygenases. J 517 Biol Chem. https://doi.org/10.1074/jbc.M115.702365 518 17. Chabbert B, Habrant A, Herbaut M, et al (2017) Action of lytic polysaccharide 519 monooxygenase on plant tissue is governed by cellular type. Sci Rep. 520 https://doi.org/10.1038/s41598-017-17938-2 521 18. Liu B, Krishnaswamyreddy S, Muraleedharan MN, et al (2018b) Side-by-side 522 biochemical comparison of two lytic polysaccharide monooxygenases from the white-523 rot fungus Heterobasidion irregulare on their activity against crystalline cellulose and 524 glucomannan. PLoS One. https://doi.org/10.1371/journal.pone.0203430 525 19. Filiatrault-Chastel C, Navarro D, Haon M, et al (2019) AA16, a new lytic polysaccharide 526 monooxygenase family identified in fungal secretomes. Biotechnol Biofuels. 527 https://doi.org/10.1186/s13068-019-1394-y 528 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 18 20. Sarkar N, Aikat K (2014) Aspergillus fumigatus NITDGPKA3 provides for increased 529 cellulase production. Int J Chem Eng 2014:. https://doi.org/10.1155/2014/959845 530 21. Breslmayr E, Hanžek M, Hanrahan A, et al (2018) A fast and sensitive activity assay for 531 lytic polysaccharide monooxygenase. Biotechnol Biofuels. 532 https://doi.org/10.1186/s13068-018-1063-6 533 22. Altschul SF, Gish W, Miller W, et al (1990) Basic local alignment search tool. J Mol Biol. 534 https://doi.org/10.1016/S0022-2836(05)80360-2 535 23. Sievers F, Higgins DG (2014) Clustal Omega, accurate alignment of very large numbers 536 of sequences. Methods Mol Biol. https://doi.org/10.1007/978-1-62703-646-7_6 537 24. DeLano W. . (2002) Pymol: An open-source molecular graphics tool. CCP4 Newsl 538 Protein Crystallogr 539 25. Kumar S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary Genetics 540 Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol. 541 https://doi.org/10.1093/molbev/msw054 542 26. Basotra N, Dhiman SS, Agrawal D, et al (2019) Characterization of a novel Lytic 543 Polysaccharide Monooxygenase from Malbranchea cinnamomea exhibiting dual 544 catalytic behavior. Carbohydr Res. https://doi.org/10.1016/j.carres.2019.04.006 545 27. Bennati-granier C, Garajova S, Champion C, et al (2015) Substrate specificity and 546 regioselectivity of fungal AA9 lytic polysaccharide monooxygenases secreted by 547 Podospora anserina To cite this version�: Substrate specificity and regioselectivity of 548 fungal AA9 lytic polysaccharide monooxygenases secreted by Pod. Biotechnol Biofuels. 549 https://doi.org/10.1186/s13068-015-0274-3 550 28. Hansson H, Karkehabadi S, Mikkelsen N, et al (2017) High-resolution structure of a lytic 551 polysaccharide monooxygenase from Hypocrea jecorina reveals a predicted linker as an 552 integral part of the catalytic domain. J Biol Chem 292:19099–19109. 553 https://doi.org/10.1074/jbc.M117.799767 554 29. Yoswathana (2010) Bioethanol Production from Rice Straw. Energy Res J 1:26–31. 555 https://doi.org/10.3844/erjsp.2010.26.31 556 30. Zhang R, Liu Y, Zhang Y, et al (2019) Identification of a thermostable fungal lytic 557 polysaccharide monooxygenase and evaluation of its effect on lignocellulosic 558 degradation. Appl Microbiol Biotechnol 103:5739–5750. 559 https://doi.org/10.1007/s00253-019-09928-3 560 31. Pronk S, Páll S, Schulz R, et al (2013) GROMACS 4.5: A high-throughput and highly 561 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 19 parallel open source molecular simulation toolkit. Bioinformatics. 562 https://doi.org/10.1093/bioinformatics/btt055 563 32. Gopalakrishnan K, Sowmiya G, Sheik SS, Sekar K (2007) Ramachandran plot on the web 564 (2.0). Protein Pept Lett 565 33. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (2002) PROCHECK: a program 566 to check the stereochemical quality of protein structures. J Appl Crystallogr. 567 https://doi.org/10.1107/s0021889892009944 568 34. Trott O, Olson AJ (2010) Software news and update AutoDock Vina: Improving the speed 569 and accuracy of docking with a new scoring function, efficient optimization, and 570 multithreading. J Comput Chem. https://doi.org/10.1002/jcc.21334 571 35. Morris GM, Huey R, Lindstrom W, et al (2009) AutoDock4 and AutoDockTools4: 572 Automated Docking with Selective Receptor Flexibility. J Comp Chem. 573 https://doi.org/10.1002/jcc.21256 574 36. Agrawal, D., Kaur, B., Kaur Brar, K., Chadha, B.S., 2020. An innovative approach of 575 priming lignocellulosics with lytic polysaccharide monooxygenases prior to 576 saccharification with glycosyl hydrolases can economize the second-generation ethanol 577 process.Bioresour. Technol.308, 123257. https://doi.org/10.1016/j.biortech.2020.123257 578 37. Jensen MS, Klinkenberg G, Bissaro B, et al (2019) Engineering chitinolytic activity into a 579 cellulose-active lytic polysaccharide monooxygenase provides insights into substrate 580 specificity. J Biol Chem. https://doi.org/10.1074/jbc.RA119.010056 581 38. Zhou X, Zhu H (2020) Current understanding of substrate specificity and regioselectivity 582 of LPMOs. Bioresour Bioprocess 7:. https://doi.org/10.1186/s40643-020-0300-6 583 39. Forsberg Z, Mackenzie AK, Sørlie M, et al (2014) Structural and functional 584 characterization of a conserved pair of bacterial cellulose-oxidizing lytic polysaccharide 585 monooxygenases. Proc Natl Acad Sci U S A. https://doi.org/10.1073/pnas.1402771111 586 40. Hansson H, Karkehabadi S, Mikkelsen N, et al (2017) High-resolution structure of a lytic 587 polysaccharide monooxygenase from Hypocrea jecorina reveals a predicted linker as an 588 integral part of the catalytic domain. J Biol Chem 292:19099–19109. 589 https://doi.org/10.1074/jbc.M117.799767 590 41. Eibinger, M., Ganner, T., Bubner, P., Rošker, S., Kracher, D., Haltrich, D., Ludwig, R., 591 Plank, H., Nidetzky, B., 2014. Cellulose surface degradation by a lytic polysaccharide 592 monooxygenase and its effect on cellulase hydrolytic efficiency. J. Biol. Chem. 593 https://doi.org/10.1074/jbc.M114.602227 594 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 20 42. Kim, I.J., Youn, H.J., Kim, K.H., 2016. Synergism of an auxiliary activity 9 (AA9) from 595 Chaetomium globosum with xylanase on the hydrolysis of xylan and lignocellulose. 596 Process Biochem. 51, 1445–1451. https://doi.org/10.1016/j.procbio.2016.06.017 597 43. Kim, I.J., Jung, J.Y., Lee, H.J., Park, H.S., Jung, Y.H., Park, K., Kim, K.H., 2015. 598 Customized optimization of cellulase mixtures for differently pre-treated rice straw. 599 Bioprocess Biosyst. Eng. 38, 929–937. https://doi.org/10.1007/s00449-014-1338-7 600 44. Zhang R, Liu Y, Zhang Y, et al (2019) Identification of a thermostable fungal lytic 601 polysaccharide monooxygenase and evaluation of its effect on lignocellulosic 602 degradation. Appl Microbiol Biotechnol 103:5739–5750. 603 https://doi.org/10.1007/s00253-019-09928-3 604 45. Hemsworth, G.R., Johnston, E.M., Davies, G.J., Walton, P.H., 2015. Lytic Polysaccharide 605 Monooxygenases in Biomass Conversion. Trends Biotechnol. xx, 1–15. 606 https://doi.org/10.1016/j.tibtech.2015.09.006 607 46. Dimarogona, M., Topakas, E., Olsson, L., Christakopoulos, P., 2012. Bioresource 608 Technology Lignin boosts the cellulase performance of a GH-61 enzyme from 609 Sporotrichum thermophile. Bioresour. Technol. 110, 480–487. 610 https://doi.org/10.1016/j.biortech.2012.01.116 611 47. Liu B, Kognole AA, Wu M, et al (2018a) Structural and molecular dynamics 612 studies of a C1-oxidizing lytic polysaccharide monooxygenase from Heterobasidion 613 irregulare reveal amino acids important for substrate recognition. 285:2225–2242. 614 https://doi.org/10.1111/febs.14472 615 48. Kim, I.J., Youn, H.J., Kim, K.H., 2016. Synergism of an auxiliary activity 9 616 (AA9) from Chaetomium globosum with xylanase on the hydrolysis of xylan and 617 lignocellulose. Process Biochem. 51, 1445–1451. 618 https://doi.org/10.1016/j.procbio.2016.06.017 619 49. Corrêa TLR, Júnior AT, Wolf LD, et al (2019) An actinobacteria lytic 620 polysaccharide monooxygenase acts on both cellulose and xylan to boost biomass 621 saccharification. Biotechnol Biofuels 12:1–14. https://doi.org/10.1186/s13068-019-622 1449-0 623 50. Hu J, Tian D, Renneckar S, Saddler JN (2018) Enzyme mediated 624 nanofibrillation of cellulose by the synergistic actions of an endoglucanase, lytic 625 polysaccharide monooxygenase (LPMO) and xylanase. Sci Rep 8:4–11. 626 https://doi.org/10.1038/s41598-018-21016-6 627 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 21 51. Müller G, Várnai A, Johansen KS, et al (2015) Harnessing the potential of 628 LPMO-containing cellulase cocktails poses new demands on processing conditions. 629 Biotechnol Biofuels. https://doi.org/10.1186/s13068-015-0376-y 630 52. Pantoom S, Songsiriritthigul C, Suginta W (2008) The effects of the surface-exposed 631 residues on the binding and hydrolytic activities of Vibrio carchariae chitinase A. BMC 632 Biochem 9:1–11. https://doi.org/10.1186/1471-2091-9-2 633 634 635 636 Figure legends 637 Figure 1 Expression and purification of AfLPMO16 (marked with red arrow). SDS PAGE 638 analysis; lane1, flow-through, lane2,3&4 wash, lane 5 & 6. Purified AfLPMO16: Western 639 blot analysis using purified protein presented in lane 5 & 6 of SDS page marked as lane W1 640 and W2 641 Figure 2 Enzyme kinetics studies of AfLPMO16 with 2,6-DMP (mean values are plotted). (a) 642 Chemical reaction to convert 2,6DMP to 1-coerulignone; OD at 469 nm vs. time plot. (b) LB 643 plot or 1/v vs 1/[s] plot. 644 Figure 3 In silico analysis of AfLPMO16. (a) Schematic diagram of AfLPMO16; signal 645 peptide: 19 amino acids, catalytic domain: 1-169 amino acids, and a serine-rich domain: 169-646 271 amino acids. (b) Multiple sequence alignment of AA16 LPMOs, C1 oxidizing, and 647 C1/C4 oxidizing AA10 LPMOs: Conserved sequences are highlighted. The red arrow 648 indicates the amino acid responsible for regioselectivity; the Black arrow represents the 649 amino acid responsible for substrate specificity, the black box represents the AA16 conserved 650 motif. (c) The electrostatic surface potential of AfLPMO16 model structure at pH6.0, blue 651 and red color represents positive and negative potential surface respectively. The area 652 surrounded by the ring represents the catalytic site. 653 Figure 4 Phylogenetic relationship of AfLPMO16 with AA10 LPMOs. A neighbor-joining 654 tree from MEGA showing C1(Bacterial) & C2(Fungal) clades and C2 clade further divided 655 into C2.1 ( Penicillium & other ) & C2.2 (Aspergillus) subclades. 656 Figure 5 Model structure and molecular docking of AfLPMO16. (a) Predicted three-657 dimensional models of the AfLPMO16 showing functional loops LS(orange), L2(blue), 658 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 22 L3(green), LC(magenta) loops surrounding the copper active site. (b) Histidine brace (His20, 659 His109) of AfLPMO16 surrounding the copper metal. (c) Amino acids involved in substrate 660 binding: Gln48, Gln181, Ser178, His109, His20, Asn54, Asp50, Tyr52, Glu58 661 Figure 6 Polysaccharides degradation activity of AfLPMO16. (a) CMC depolymerization: 662 estimation of reducing sugar with the increasing amount of AfLPMO16. (b) PASC 663 hydrolysis: relative absorbance at 405nm vs. AfLPMO16 quantity plot. Results are the mean 664 value of the minimum three experiments. The bar represents the standard deviation (SD) 665 Figure 7 Boosting effect of AfLPMO16. (a) Hydrolysis of alkali pre-treated rice straw: light-666 grey bar indicates only cellulase and deep-grey indicates heat inactive AfLPMO16 with 667 cellulase, dark-grey and black bar indicates cellulase along with two different quantity of 668 AfLPMO16. (b) Avicel hydrolysis: reducing sugar estimation. Light-grey bar indicates only 669 cellulase and deep-grey indicates heat inactive AfLPMO16 with cellulase, dark-grey and 670 black bar indicates cellulase along with two different quantities of AfLPMO16. (c) 671 Synergistic effect: light-grey bars indicate biomass hydrolysis by two different concentrations 672 of AfLPMO16; dark-grey bar indicating the only cellulase treated biomass and black bar 673 indicating combined treated biomass with AfLPMO16 & cellulase. Error bars represent the 674 standard deviation of experiments ran in triplicate. The different number of asterisks (*) 675 indicate a significant difference between glucose release in the presence of AfLPMO16 by 676 one-way ANOVA followed by Student's t-test (P<<0.05). 677 678 679 Enzyme Kinetics Parameter Values Vmax in U/mg 0.153 Km in mM 5.4 Kcat in min -1 277.67 680 Table 1: Enzyme kinetics of AfAA16 with 2,6, DMP as a substrate. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 23 681 682 683 684 685 686 Substrates (Biomass) Cellulases LPMOs Fold increase % increase References Wheat straw Celluclast (Novozymes) StCel61a (AA9) - 20% [46] Corn stover Celluclast (Novozymes) TaAA9 25% [50] Raw rice straw Celluclast (Novozymes) CgAA9 1.1-1.2 - [48] Raw rice straw Cellulase (MP Biomedicals) AfLPMO16 2 ~100% - Table 2: Lignocellulosic biomass hydrolysis enhancement by LPMOs (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.04.24.059154doi: bioRxiv preprint https://doi.org/10.1101/2020.04.24.059154 10_1101-2020_05_10_087288 ---- Heparan sulfate proteoglycans as attachment factor for SARS-CoV-2 1 Heparan sulfate proteoglycans as attachment factor for SARS-CoV-2 Lin Liu,1,5 Pradeep Chopra,1,5 Xiuru Li,1,5 Kim M. Bouwman,2 S. Mark Tompkins,3 Margreet A. Wolfert,1,2 Robert P. de Vries2, and Geert-Jan Boons1,2,4,* 1Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Road, Athens, GA 30602, USA 2Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, and Bijvoet Center for Biomolecular Research, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands 3Center for Vaccines and Immunology, University of Georgia, Athens, GA 30602, USA 4Department of Chemistry, University of Georgia, Athens, GA 30602, USA 5These authors contributed equally to this work *Corresponding author. E-mail: gjboons@ccrc.uga.edu or g.j.p.h.boons@uu.nl (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 2 ABSTRACT Severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) is causing an unprecedented global pandemic demanding the urgent development of therapeutic strategies. Microarray binding experiments using an extensive heparan sulfate (HS) oligosaccharide library showed that the receptor binding domain (RBD) of the spike of SARS-CoV-2 can bind HS in a length- and sequence-dependent manner. Hexa- and octa- saccharides composed of IdoA2S-GlcNS6S repeating units were identified as optimal ligands. Surface plasma resonance (SPR) showed the SARS-CoV-2 spike protein binds with much higher affinity to heparin (KD = 55 nM) compared to the RBD (KD = 1 µM) alone. We also found that heparin does not interfere in angiotensin-converting enzyme 2 (ACE2) binding or proteolytic processing of the spike. Our data supports a model in which HS functions as the point of initial attachment for SARS-CoV-2 infection. Tissue staining studies using biologically relevant tissues indicate that heparan sulfate proteoglycan (HSPG) is a critical attachment factor for the virus. Collectively, our results highlight the potential of using HS oligosaccharides as a therapeutic agent by inhibiting SARS-CoV-2 binding to target cells. KEYWORDS SARS-CoV-2, coronavirus, heparan sulfate, heparin, spike glycoprotein, microarray, surface plasma resonance (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 3 INTRODUCTION The SARS-CoV-2 pandemic demands urgent development of therapeutic strategies. An attractive approach is to interfere in the attachment of the virus to the host cell.1 The entry of SARS-CoV-2 into cells is initiated by binding of the transmembrane spike (S) glycoprotein of the virus to angiotensin-converting enzyme 2 (ACE2) of the host.2 SARS- CoV is closely related to SARS-CoV-2 and employs the same receptor.3 The spike protein of SARS-CoV-2 is comprised of two subunits; S1 is responsible for binding to the host receptor, whereas S2 promotes membrane fusion. The C terminal domain (CTD) of S1 harbors the receptor binding domain (RBD).4 It is known that the spike protein of a number of human coronaviruses can bind to a secondary receptor, or co-receptor, to facilitate cell entry. For example, MERS-CoV employs sialic acid as co-receptor along with its main receptor DPP4.5 Human CoV-NL63, which also utilizes ACE2 as the receptor, uses heparan sulfate (HS) proteoglycans, as a co-receptor.6 It has also been shown that entry of SARS-CoV pseudo-typed virus into Vero E6 and Caco-2 cells can substantially be inhibited by heparin or treatment with heparin lyases, indicating the importance of HS for infectivity.7 There are indications that the SARS-CoV-2 spike also interacts with HS. One early report showed that heparin can induce a conformation change in the RBD of SARS-CoV- 2.8 A combined SPR and computational study indicated that glycosaminoglycans can bind to the proteolytic cleavage site of the S1 and S2 protein.9-10 Several reports have indicated that heparin or related structures can inhibit the infection process of SARS-CoV-2 in different cell lines.11-14 HS are highly complex O- and N-sulfated polysaccharides that reside as major components on the cell surface and extracellular matrix of all eukaryotic cells.15 Various proteins interact with HS thereby regulating many biological and disease processes, including cell adhesion, proliferation, differentiation, and inflammation. They are also used by many viruses, including herpes simplex virus (HSV), Dengue virus, HIV, and various coronaviruses, as receptor or co-receptor.16-18 The biosynthesis of HS is highly regulated and the length, degree, and pattern of sulfation of HS can differ substantially between different cell types. The so-called “HS sulfate code hypothesis” is based on the notion that the expression of specific HS epitopes (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 4 by cells makes it possible to recruit specific HS-binding proteins, thereby controlling a multitude of biological processes.19-20 In support of this hypothesis, several studies have shown that HS binding proteins exhibit preferences for specific HS oligosaccharide motifs.21-22 Therefore, we were compelled to investigate whether the spike of SARS-CoV- 2 recognizes specific HS motifs. Such insight is expected to pave the way to develop inhibitors of viral cell binding and entry. Previously, we prepared an unprecedented library of structurally well-defined heparan sulfate oligosaccharides that differ in chain length, backbone composition and sulfation pattern.23-24 This collection of HS oligosaccharides was used to develop a glycan microarray for the systematic analysis of selectivity of HS-binding proteins. Using this microarray platform in conjugation with detailed binding studies, we found that the RBD domain of SARS-CoV-2-spike can bind HS in a length- and sequence-dependent manner, and the observations support a model in which the RBD confers sequence selectivity, and the affinity of binding is enhanced by additional interactions with other HS binding sites in for example the S1/S2 proteolytic cleavage site.9 In addition, it was found that heparin does not interfere in ACE binding or proteolytic processing of the spike. Tissue staining studies using biologically relevant tissues indicate that heparan sulfate proteoglycans (HSPG) is a critical attachment factor for the virus. RESULTS AND DISCUSSION Surface plasma resonances (SPR) experiments were performed to probe whether the RBD domain of SARS-CoV-2 spike protein can bind with heparin. Biotinylated heparin was immobilized on a streptavidin-coated sensor chip and binding experiments were carried out by employing as analytes different concentrations of RBD, monomeric spike protein and trimeric spike protein of SARS-CoV-2. The spike glycoprotein of SARS-CoV- 2 (S1+S2, extra cellular domain, amino acid residue 1-1213) was expressed in insect cells having a C-terminal His-tag.25-26 Recombinant SARS-CoV-2-RBD, containing amino acid residue 319-541, was expressed in HEK293 cells also with a C-terminal His-tag.25-26 The spike protein trimer, having the furin cleavage site deleted and bearing with two stabilizing mutations, was expressed in HEK293 cells with a C-terminal His-tag. Representative (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 5 sensorgrams are shown in Fig. 1. KD values were determined using a 1:1 Langmuir binding model. Figure 1. SPR sensorgrams representing the concentration-dependent kinetic analysis of the binding of immobilized heparin with SARS-CoV-2 related proteins (A) RBD, (B) spike monomer, and (C) spike trimer. The RBD domain binds to heparin with a moderate affinity having a KD value of ~1 µM. The full-length monomeric spike protein showed a much higher binding affinity with a KD value of 55 nM. Previously reported computational studies have indicated that the RBD domain may harbor an additional HS binding domain located either within or adjacent to the receptor binding motif.14, 27 It has also been suggested that another HS-binding site Spike monomer KD = 55 nM A B Spike monomer KD = 55 nM 0 10 20 30 40 50 60 70 80 -10 0 R es po ns e (R U ) Ti m e (s) -100 100 200 300 400 500 600 700 8000 KD = 55 nM Spike trimer KD = 64 nM R es po ns e (R U ) -5 0 5 10 15 20 -100 0 100 200 300 400 500 600 700 800 C Ti m e (s) KD = 64 nM -10 0 10 20 30 40 50 60 -100 0 100 200 300 400 500 600 700 800 Ti me (s) R es po ns e (R U ) KD = 1000 nM 1100 nM 17 nM 446 nM 6.97 nM 446 nM 6.97 nM 2 folds dilution RBD Spike monomer Spike trimer (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 6 reside in the S1/S2 proteolytic cleavage site of the spike of the S2 domain.9 Thus, the high affinity of the monomeric spike protein probably is due to the presence of additional binding site in the spike protein, which greatly enhanced its binding to heparin. The trimeric spike protein displayed a similar binding affinity (KD = 64 nM) as the monomer. One of the putative heparin binding sites in the trimeric spike protein, the S1/S2 proteolytic cleavage site was mutated.25 Thus, a possible increase in avidity due to multivalency may have been off-set by a lack of a secondary binding site. Intrigued by these results, we examined if the SARS-CoV-2 proteins bind to heparan sulfate in a sequence preferred manner. We have developed an HS microarray having well over 100 unique di-, tetra-, hexa-, and octa-saccharides differing in backbone composition and sulfation pattern23-24 (Fig. 2C). The synthetic HS oligosaccharides contains an anomeric aminopentyl linker allowing printing on N-hydroxysuccinimide (NHS)-active glass slides. The HS oligosaccharides were printed at 100 µM concentration in replicates of 6 by non-contact piezoelectric printing. The quality of the HS microarray was validated using various well characterized HS-binding proteins. Sub-arrays were incubated with different concentrations of SARS-CoV-2 RBD and spike protein in a binding buffer (pH 7.4, 20 mM Tris, 150 mM NaCl, 2 mM CaCl2, 2 mM MgCl2 with 1% BSA and 0.05% Tween-20) at room temperature for 1 h. After washing and drying, the subarrays were exposed to an anti-His antibody labeled with AlexaFluor® 647 for another hour, washed, dried and binding was detected by fluorescent scanning. To analyze the data, the compounds were arranged according to increasing backbone length, and within each group by increasing numbers of sulfates. Intriguingly, the proteins showed a strong preference for specific HS oligosaccharides (Fig. 2A, B). Furthermore, it was found that the RBD, monomeric spike protein, and trimeric spike protein exhibit similar binding patterns (Fig. S1). Compounds showing strong responsiveness (76, 77, 78, and 80) are composed of tri-sulfated repeating units (IdoA2S-GlcNS6S). The binding is length-dependent and HS oligosaccharide 80 (IdoA2S-GlcNS6S)4 and 78 (IdoA2S- GlcNS6S)3 having four and three repeating units, respectively, showed the strongest binding. On the other hand, tetrasaccharide 56 (IdoA2S-GlcNS6S)2, which has the same repeating unit structure, gave very low responsiveness. A similar observation was made for disaccharide 4 (IdoA2S-GlcNS6S). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 7 Figure 2. Binding of synthetic heparan sulfate oligosaccharides to SARS-CoV-2-spike and RBD by microarray. (A) Binding of SARS-CoV-2-spike (10 µg/mL) to the heparan sulfate microarray. The strongest binding structures are shown as inserts. (B) Binding of SARS-CoV2-RBD (30 µg/mL) on the heparan sulfate microarray. (C) Compounds numbering and structures of the heparan sulfate library. IdoA2S-GlcNS6S IdoA2S-GlcNS6S -IdoA2S-GlcNS6S GlcA-GlcNS6S-IdoA2S-GlcNS6S -IdoA2S-GlcNS6S GlcA-GlcNS6S-IdoA2S-GlcNS6S3S-IdoA2S-GlcNS6S IdoA2S-GlcNS6S-IdoA2S-GlcNS6S -IdoA2S-GlcNS6S IdoA2S-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS6S -IdoA2S-GlcNS6S 4 56 76 77 78 80 4 80 78 77 76 56 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 0 5×10 3 1×10 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 0 1×10 4 2×10 4 3×10 4 4×10 4 Fl uo re sc en ce (A U ) 78 80 77 76564 Fl uo re sc en ce (A U ) A B di tetra hexa# of sµgar octa 8xSO3- 9x 12x1-3xSO3- 0-1xSO3- 2xSO3- 3xSO3- 4xSO3- 5xSO3- 6x 2x 5x 6xSO3- 7xSO3- 7x 1 IdoA-GlcNAc6S 28 GlcA-GlcNAc6S-IdoA2S-GlcNAc6S 55 GlcA-GlcNS6S-GlcA2S-GlcNS6S 2 GlcA-GlcNAc6S 29 IdoA-GlcNAc6S-IdoA2S-GlcNAc6S 56 IdoA2S-GlcNS6S-IdoA2S-GlcNS6S 3 IdoA2S-GlcNAc6S 30 GlcA-GlcNS-IdoA2S-GlcNS 57 GlcA-GlcNS3S6S-IdoA2S-GlcNS6S 4 IdoA2S-GlcNS6S 31 GlcA-GlcNS-GlcA2S-GlcNS 58 GlcA-GlcNAc-IdoA2S-GlcNAc6S-GlcA-GlcNAc 5 GlcA-GlcNAc-GlcA-GlcNAc 32 GlcA-GlcNAc-IdoA2S-GlcNS6S 59 GlcA-GlcNS-IdoA2S-GlcNS6S-GlcA-GlcNS 6 GlcA-GlcNAc-IdoA-GlcNAc 33 GlcA-GlcNS-IdoA-GlcNS6S 60 GlcA-GlcNAc-IdoA2S-GlcNS6S-IdoA2S-GlcNS 7 GlcA-GlcNAc-IdoA2S-GlcNAc 34 IdoA-GlcNS6S-GlcA-GlcNS 61 GlcA-GlcNS6S-GlcA-GlcNS6S-GlcA-GlcNS6S 8 GlcA-GlcNAc-GlcA2S-GlcNAc 35 IdoA2S-GlcNAc6S-GlcA-GlcNAc6S 62 GlcA-GlcNS6S-IdoA-GlcNS6S-GlcA-GlcNS6S 9 GlcA-GlcNAc-IdoA-GlcNAc6S 36 IdoA2S-GlcNS-GlcA-GlcNS 63 GlcA-GlcNS6S-GlcA-GlcNS6S-IdoA-GlcNS6S 10 IdoA-GlcNAc6S-GlcA-GlcNAc 37 GlcA-GlcNS6S-IdoA-GlcNS 64 GlcA-GlcNS6S-IdoA-GlcNS6S-IdoA-GlcNS6S 11 IdoA2S-GlcNAc-GlcA-GlcNAc 38 IdoA-GlcNS-IdoA-GlcNS6S 65 GlcA-GlcNS-IdoA2S-GlcNS6S-IdoA2S-GlcNS 12 IdoA-GlcNAc-IdoA-GlcNAc6S 39 GlcA-GlcNAc6S-GlcA2S-GlcNAc6S 66 GlcA-GlcNAc6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS 13 IdoA-GlcNAc6S-IdoA-GlcNAc 40 IdoA-GlcNS6S-GlcA-GlcNS6S 67 GlcA-GlcNS6S-IdoA2S-GlcNS6S-GlcA-GlcNS6S 14 GlcA-GalNAc-GlcA-GalNAc4S 41 GlcA-GlcNS6S-IdoA-GlcNS6S 68 GlcA-GlcNS6S-IdoA2S-GlcNS6S-IdoA-GlcNS6S 15 IdoA-GlcNS-IdoA-GlcNAc 42 GlcA-GlcNS6S-GlcA-GlcNS6S 69 GlcA-GlcNS6S-GlcA-GlcNS6S-IdoA2S-GlcNS6S 16 IdoA-GlcNAc-IdoA-GlcNS 43 IdoA-GlcNS6S-IdoA-GlcNS6S 70 GlcA-GlcNS6S-IdoA-GlcNS6S-IdoA2S-GlcNS6S 17 IdoA-GlcNAc6S-IdoA-GlcNAc6S 44 GlcA-GlcNS-GlcA2S-GlcNS6S 71 GlcA-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS 18 IdoA-GlcNAc6S-GlcA-GlcNAc6S 45 GlcA-GlcNS-IdoA2S-GlcNS6S 72 GlcA-GlcNS6S-IdoA2S-GlcNS3S6S-GlcA-GlcNS6S 19 GlcA-GlcNAc6S-IdoA-GlcNAc6S 46 IdoA2S-GlcNS6S-GlcA-GlcNS 73 GlcA-GlcNS6S-IdoA2S-GlcNS3S6S-IdoA-GlcNS6S 20 GlcA-GlcNAc6S-GlcA-GlcNAc6S 47 IdoA2S-GlcNAc6S-IdoA2S-GlcNAc6S 74 GlcA-GlcNS6S-GlcA-GlcNS3S6S-IdoA2S-GlcNS6S 21 GlcA-GlcNAc-GlcA2S-GlcNAc6S 48 IdoA-GlcNS-IdoA2S-GlcNS6S 75 GlcA-GlcNS6S-IdoA-GlcNS3S6S-IdoA2S-GlcNS6S 22 GlcA-GlcNAc-IdoA-GlcNS6S 49 IdoA2S-GlcNS6S-IdoA-GlcNAc6S 76 GlcA-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS6S 23 GlcA-GlcNAc-IdoA2S-GlcNAc6S 50 GlcA-GlcNS6S-IdoA2S-GlcNS6S 77 GlcA-GlcNS6S-IdoA2S-GlcNS3S6S-IdoA2S-GlcNS6S 24 IdoA2S-GlcNAc6S-GlcA-GlcNAc 51 IdoA2S-GlcNS6S-GlcA-GlcNS6S 78 IdoA2S-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS6S 25 GlcA-GalNAc4S-GlcA-GalNAc4S 52 IdoA-GlcNS6S-IdoA2S-GlcNS6S 79 GlcA-GlcNS6S-IdoA-GlcNS-IdoA2S-GlcNS6S-IdoA-GlcNAc6S 26 GlcA-GlcNS6S-GlcA-GlcNAc 53 GlcA-GlcNS3S-IdoA2S-GlcNS6S 80 IdoA2S-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS6S-IdoA2S-GlcNS6S 27 IdoA-GlcNS-IdoA-GlcNS 54 IdoA2S-GlcNS6S-IdoA2S-GlcNS C (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 8 The structure-binding data shows that perturbations in the backbone or sulfation pattern led to substantial reductions in binding. The importance of the IdoA2S residue is highlighted by comparing hexasaccharides 78 with 76 in which a single IdoA2S in the distal disaccharide repeating unit is replaced with GlcA. This modification leads to a substantial reduction in responsiveness. Further replacements of IdoA2S with GlcA in compound 76 completely abolish binding, as evident for compounds 69, 67, and 61. The structure-activity data also showed that the 2-O-sulfates are crucial, and binding was lost when such functionalities were not present (76 vs. 70, 68, and 64). Lack of one or more 6- O-sulfates also resulted in substantial reductions in binding (76 vs. 71 and 65). Although the SARS-CoV-2 spike and RBD showed similar selectivities, the binding of the spike appeared stronger and much higher fluorescent readings were observed at the same protein concentration. Next, we examined whether HS oligosaccharide 80 can interfere in the interaction of the spike or RBD with immobilized heparin. Thus, the spike protein (150 nM) or RBD (2.4 µM) were pre-mixed with different concentrations of compound 80 and then used as analytes. The IC50 values were determined by non-linear fitting of Log(inhibitor) vs. response using variable slope (Fig. S2). The IC50 values for the spike protein and RBD are 38 nM and 264 nM, respectively. To further determine the possible role of HS in the infection process, we examined the binding affinities of spike proteins to ACE2 and compared these with binding affinities for heparin. Biotinylated ACE2 was immobilized on a streptavidin-coated sensor chip and binding experiments were performed with different concentrations of the SARS-CoV-2 derived proteins. Representative sensorgrams for the RBD domain, monomeric spike protein, and trimeric spike protein are shown in Fig. 3. KD values of 3.6 nM, 24.5 nM and 0.7 nM were determined using a 1:1 Langmuir binding model, respectively, which are in agreement with reported data.28 It shows convincingly that the RBD domain has a much higher affinity for ACE2 compared to that of heparin. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 9 Figure 3. Sensorgrams representing the concentration-dependent kinetic analysis of the binding of immobilized ACE2 with SARS-CoV-2 derived proteins (A) RBD, (B) spike monomer, and (C) spike trimer. (D) Comparison of the KD values of heparin binding and ACE2 binding to SARS- CoV-2 related proteins. D Protein heparin binding KD (nM) ACE2 binding KD (nM) RBD ~1000 3.6 spike monomer 55 24.5 spike trimmer 64 0.7 A d a -5 0 5 10 15 20 25 30 35 40 45 -100 0 100 200 300 400 500 Ti m e (s) R es po ns e (R U ) KD = 3.6 nM -50 0 50 100 150 200 -100 0 100 200 300 400 500 Ti me (s) R es po ns e (R U ) KD = 24.5 nM -50 0 50 100 150 200 250 300 350 -100 0 100 200 300 400 500 Ti m e (s) R es po ns e (R U ) KD = 0.7 nM B C 100 nM 3.125 nM 200 nM 6.25 nM 200 nM 3.125 nM 2 folds dilution RBD Spike monomer Spike trimer (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 10 A number of reports have indicated that heparin and related compounds can block infection of cells by SARS-CoV-2. Therefore, we were compelled to investigate the molecular mechanisms by which heparin blocks viral entry.2, 10, 13 It is possible that the anti-viral properties of heparin are due to binding to the RBD domain thereby blocking the interaction with ACE2. Alternatively, heparin may interfere in the proteolytic processing of the spike protein thereby preventing membrane fusion. In this respect, the spike of SARS-CoV-2 contains a unique furin cleavage site, which is not present in other CoV’s, and has been proposed to contribute to high infectivity,29 because cleavage of the spike protein is a prerequisite for membrane fusion. Modeling studies have indicated that the furin cleavage site may harbor a binding site for HS.27 Finally, HS may function as an attachment factor and the addition of exogenous heparin may interfere in this process. To examine whether heparin can interfere in binding of the spike to ACE2, we performed microarray experiments in which biotinylated Fc tagged ACE2 (50 µg/mL) was printed onto streptavidin coated microarray slides. The printing quality was confirmed by using a goat-anti-human Fc antibody conjugated with AlexaFluoro®647 (Fig. S3A). Next, His-tagged RBD and monomeric spike protein were premixed with different concentrations of heparin and binding of the proteins to immobilized ACE2 was accomplished by anti-His antibody. Soluble human ACE2 was used as positive control. Although, ACE2 efficiently inhibited RBD and spike binding (Fig. S3 B, C), no substantial changes in binding were observed in the presence of 10 µg/mL and 100 µg/mL of heparin (Fig. 4 A, B). Furthermore, we immobilized the RBD and monomeric spike proteins on ELISA plates and assayed the binding of ACE2 to the spike proteins in the presence or absence of heparin (Fig. 4 C, D). Soluble human ACE2 was used as a positive control, which as expected exhibited potent inhibition. At 100 µg/mL of heparin, no inhibition of binding was observed for either RBD or monomeric spike protein. These results indicate that heparin does not substantially interfere in the interaction of the spike with ACE2. To investigate whether the binding of heparin can hinder cleavage of the spike protein by furin, we exposed the monomeric spike protein to furin in the presence of different concentrations of heparin and examined protein cleavage by SDS-PAGE. The spike protein (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 11 was readily cleaved by furin even in the presence of high concentration of heparin (400 µg/mL), while 50 µg/mL of a known furin inhibitor completely abolished cleavage. Figure 4. (A) Influence of heparin on the binding of His-tagged RBD or (B) His-tagged Spike monomer to biotinylated human ACE2 immobilized on streptavidin coated microarray slides. Detection of RBD and spike was accomplished using an anti-His antibody labeled with AlexaFluor 647. (C) Influence of heparin on the binding of biotinylated human ACE2 to RBD and (D) to immobilized spike monomer immobilized to high surface microtiter plates. Binding was detected by treatment with streptavidin-HRP followed by addition of a colorimetric HRP substrate. (E) Western Blot analysis of furin-mediated cleavage of spike monomer in the presence and absence of heparin or a known furin inhibitor (hexa-D-arginine). It is also possible that heparin interferes in the initial attachment of the virus to the glycocalyx thereby preventing infection. Therefore, we examined the importance of HS for 0 10 100 0 4×10 3 8×10 3 1.2×10 4 A B C ED R FU Heparin (µg/mL) 0 10 100 0 2×10 4 4×10 4 6×10 4 R FU Heparin (µg/mL) RBD spike monomer 0 100 0 1 2 3 Heparin (µg/mL) Ab so rb an ce ( 45 0 nm ) 0 100 0.0 0.5 1.0 1.5 2.0 Heparin (µg/mL) Ab so rb an ce ( 45 0 nm ) RBD spike monomer Spike monomer Cleaved S1 Heparin (µg/mL) -100200 -- furin +++ -+ hexa-D-Arg +-- -- (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 12 binding of trimeric RBD to relevant tissues.30 Ferrets are a susceptible animal model for SARS-CoV-231-32 and closely related minks are easily infected on farms.33 Formalin-fixed, paraffin-embedded lung tissue slides resemble the complex membrane structures to which spike proteins need to bind before it can engage with ACE2 for cell entry. Expression of ACE2 was assessed using an ACE2 antibody allowing us to compare the binding with the SARS-CoV-RBD protein and binding localization and dependency on HS. The ACE2 antibody (Fig. 5A) and the RBD trimer bound efficiently to the ferret lung tissues (Fig. 5B). We also examined a commonly used heparan sulfate antibody, which bound efficiently to ferret lung tissue, indicating the omnipresence of HS. After overnight exposure to heparanase (HPSE), the ACE2 antibody staining was mostly unaffected, indicating HSPG-independent binding. On the other hand, the SARS-CoV-2 RBD trimer was not able to engage with the ferret lung tissue slide after HPSE treatment. No staining was observed with the heparin sulfate antibody (10E4), indicating all HS had been removed. Thus, these results indicate that HS is required for initial cell attachment before the spike can engage with ACE2. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 13 Figure 5. Binding of ACE2 antibody, SARS-CoV-2 RBD, and heparan sulfate antibody to ferret lung serial tissue slides. (A) ACE2 antibody staining without and after HPSE treatment. (B) SARS-CoV-2 RBD staining without and after HPSE treatment. (C) Heparan sulfate antibody (10E4) staining without and after HPSE treatment. HPSE treatment was achieved by overnight incubation of the tissues with HPSE (0.2 µg/mL) at 37 oC. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 14 DISCUSSION AND CONCLUSIONS The glycan microarray and SPR results indicate that the spike of SARS-CoV-2 can bind HS in a length- and sequence-dependent manner, and hexa- and octa-saccharides composed of IdoA2S-GlcNS6S repeating units have been defined as optimal ligands. The data supports a model in which the RBD of the spike confers sequence specificity and an additional HS binding site in the S1/S2 proteolytic cleavage site9 enhances the avidity of binding probably by non-specific interactions. In a BioRxiv preprint, we presented, for the first time, experimental support for such a model and subsequent papers have confirmed that the RBD harbors a HS binding site. Although IdoA2S-GlcNS6S sequons are abundantly present in heparin, it is a minor component of HS.34 Interestingly, it has been reported that the expression of the (GlcNS6S-IdoA2S)3 motif is highly regulated and plays a crucial role in cell behavior and disease including endothelial cell activation.35 Severe thrombosis in COVID-19 patients is associated with endothelial dysfunction36 and a connection may exist between SARS-CoV-2’s ability to bind to HS and thrombotic disorder. It is also possible that HS is a determinant of the cell- and tissue tropism. A number of reports have shown that heparin and related products can block infection by pseudotyped virus or authentic SARS-CoV-2 virus.12-14, 27 We explored the possibility that binding of heparin blocks the RBD from interacting with ACE2. However, in two experimental formats such properties were not observed. We found that the affinity of the RBD for heparin is much lower than that for ACE2, providing a rationale for the inability of heparin to inhibit the binding between RBD or spike with ACE2. One computational study has indicated that ACE2 and HS bind to the same region of the RBD.27 Another docking study located the HS binding site adjacent to the ACE2-binding site and inferred a model in which a ternary complex is formed between RBD, HS and ACE2.14 Further studies are required to determine the exact location of the HS binding site, which in turn may provide a better understanding of the interplay between binding of spike with ACE2 and heparin. We employed physiological relevant tissues to explore the importance of HS for SARS- CoV-2 adhesion and demonstrated that HPSE treatment greatly reduces RBD binding but not that of ACE2. The data supports a model in which HS functions as a host attachment factor that facilitates SARS-CoV-2 infection. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 15 The current clinical guidelines call for the use of unfractionated heparin or low molecular weight heparin (LMWH) for the treatment of all COVID-19 patients for systemic clotting in the absences of contradictions.37-38 Heparin treatment may have additional benefits and may compete with the binding of the spike protein to cell surface HS thereby preventing infectivity. Our data suggest that non-coagulating heparin or HS preparations can be developed that reduce cell binding and infectivity without a risk of causing bleeding. In this respect, administration of heparin requires great care because its anticoagulant activity can result in excessive bleeding. Antithrombin III (AT-III), which confers anticoagulant activity, binds a specific pentasaccharide GlcNAc(6S)-GlcA- GlcNS(3S)(6S)-IdoA2S-GlcNS(6S) embedded in HS or heparin. Removal of the sulfate at C-3 of N-sulfoglucosamine (GlcNS3S) of the pentasaccharide results in a 105-fold reduction in binding affinity.39 Importantly, such a functionality is not present in the identified HS ligand of SARS-CoV-2 spike, and therefore compounds can be developed that can inhibit cell binding, but do not interact with ATIII. As a result, such preparations can be used at higher doses without causing adverse side effects. Our data also shows that multivalent interactions of the spike with HS results in high avidity of binding. This observation provides opportunities to develop glycopolymers modified by HS oligosaccharides as inhibitors of SARS-CoV-2 cell binding to prevent or treat COVID-19. ACKNOWLEDGMENTS This research was supported by the National Institutes of Health (P41GM103390 and R01HL151617 to G.-J.B.). R.P.dV is a recipient of an ERC Starting Grant from the European Commission (802780) and a Beijerinck Premium of the Royal Dutch Academy of Sciences. We thank Sander Herfst (Department of Viroscience, Erasmus Medical Center) for the ferret tissues and Gavin Wright (Addgene) for providing HPSE-bio-His (Plasmid #53407). Plasmids for expression of SARS-CoV-2 spike and RBD proteins were provided by Dr. Florian Krammer (Icahn School of Medicine at Mount Sinai, produced under NIAID CEIRS contract HHSN272201400008C). Production of recombinant proteins was supported by NIAID Centers of Excellence for Influenza Research and Surveillance (CEIRS) contract HHSN272201400004C to S.M.T. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 16 REFERENCES 1. Dimitrov, D. S., Virus entry: molecular mechanisms and biomedical applications. Nat. Rev. Microbiol. 2004, 2 (2), 109-122. 2. Walls, A. C.; Park, Y.-J.; Tortorici, M. A.; Wall, A.; McGuire, A. T.; Veesler, D., Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 2020, 181 (2), 281-292.e6. 3. Li, F.; Li, W.; Farzan, M.; Harrison, S. C., Structure of SARS coronavirus spike receptor-binding domain complexed with receptor. Science 2005, 309 (5742), 1864-1868. 4. Monteil, V.; Kwon, H.; Prado, P.; Hagelkrüys, A.; Wimmer, R. A.; Stahl, M.; Leopoldi, A.; Garreta, E.; Hurtado del Pozo, C.; Prosper, F.; Romero, J. P.; Wirnsberger, G.; Zhang, H.; Slutsky, A. S.; Conder, R.; Montserrat, N.; Mirazimi, A.; Penninger, J. M., Inhibition of SARS-CoV-2 infections in engineered human tissues using clinical-grade soluble human ACE2. Cell 2020, 181 (1), 1-9. 5. Li, W.; Hulswit, R. J. G.; Widjaja, I.; Raj, V. S.; McBride, R.; Peng, W.; Widagdo, W.; Tortorici, M. A.; van Dieren, B.; Lang, Y.; van Lent, J. W. M.; Paulson, J. C.; de Haan, C. A. M.; de Groot, R. J.; van Kuppeveld, F. J. M.; Haagmans, B. L.; Bosch, B.-J., Identification of sialic acid-binding function for the Middle East respiratory syndrome coronavirus spike glycoprotein. Proc. Natl. Acad. Sci. 2017, 114 (40), E8508-E8517. 6. Milewska, A.; Zarebski, M.; Nowak, P.; Stozek, K.; Potempa, J.; Pyrc, K., Human coronavirus NL63 utilizes heparan sulfate proteoglycans for attachment to target cells. J. Virol. 2014, 88 (22), 13221-13230. 7. Lang, J.; Yang, N.; Deng, J.; Liu, K.; Yang, P.; Zhang, G.; Jiang, C., Inhibition of SARS pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans. PLoS One 2011, 6 (8), e23710. 8. Mycroft-West, C.; Su, D.; Elli, S.; Li, Y.; Guimond, S.; Miller, G.; Turnbull, J.; Yates, E.; Guerrini, M.; Fernig, D.; Lima, M.; Skidmore, M., The 2019 coronavirus (SARS-CoV-2) surface protein (Spike) S1 receptor binding domain undergoes conformational change upon heparin binding. bioRxiv 2020, 2020.02.29.971093. 9. Kim, S. Y.; Jin, W.; Sood, A.; Montgomery, D. W.; Grant, O. C.; Fuster, M. M.; Fu, L.; Dordick, J. S.; Woods, R. J.; Zhang, F.; Linhardt, R. J., Glycosaminoglycan binding (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 17 motif at S1/S2 proteolytic cleavage site on spike glycoprotein may facilitate novel coronavirus (SARS-CoV-2) host cell entry. bioRxiv 2020, 2020.04.14.041459. 10. Tang, T.; Bidon, M.; Jaimes, J. A.; Whittaker, G. R.; Daniel, S., Coronavirus membrane fusion mechanism offers a potential target for antiviral development. Antiviral Res. 2020, 178, 104792. 11. Partridge, L. J.; Urwin, L.; Nicklin, M. J. H.; James, D. C.; Green, L. R.; Monk, P. N., ACE2-independent interaction of SARS-CoV-2 spike protein to human epithelial cells can be inhibited by unfractionated heparin. bioRxiv 2020, 2020.05.21.107870. 12. Guimond, S. E.; Mycroft-West, C. J.; Gandhi, N. S.; Tree, J. A.; Buttigieg, K. R.; Coombes, N.; Nystrom, K.; Said, J.; Setoh, Y. X.; Amarilla, A.; Modhiran, N.; Julian Sng, D. J.; Chhabra, M.; Watterson, D.; Young, P. R.; Khromykh, A. A.; Lima, M. A.; Fernig, D. G.; Su, D.; Yates, E. A.; Hammond, E.; Dredge, K.; Carroll, M. W.; Trybala, E.; Bergstrom, T.; Ferro, V.; Skidmore, M. A.; Turnbull, J. E., Pixatimod (PG545), a clinical- stage heparan sulfate mimetic, is a potent inhibitor of the SARS-CoV-2 virus. bioRxiv 2020, 2020.06.24.169334. 13. Mycroft-West, C. J.; Su, D.; Pagani, I.; Rudd, T. R.; Elli, S.; Guimond, S. E.; Miller, G.; Meneghetti, M. C. Z.; Nader, H. B.; Li, Y.; Nunes, Q. M.; Procter, P.; Mancini, N.; Clementi, M.; Bisio, A.; Forsyth, N. R.; Turnbull, J. E.; Guerrini, M.; Fernig, D. G.; Vicenzi, E.; Yates, E. A.; Lima, M. A.; Skidmore, M. A., Heparin inhibits cellular invasion by SARS-CoV-2: structural dependence of the interaction of the surface protein (spike) S1 receptor binding domain with heparin. bioRxiv 2020, 2020.04.28.066761. 14. Clausen, T. M.; Sandoval, D. R.; Spliid, C. B.; Pihl, J.; Perrett, H. R.; Painter, C. D.; Narayanan, A.; Majowicz, S. A.; Kwong, E. M.; McVicar, R. N.; Thacker, B. E.; Glass, C. A.; Yang, Z.; Torres, J. L.; Golden, G. J.; Bartels, P. L.; Porell, R. N.; Garretson, A. F.; Laubach, L.; Feldman, J.; Yin, X.; Pu, Y.; Hauser, B. M.; Caradonna, T. M.; Kellman, B. P.; Martino, C.; Gordts, P. L. S. M.; Chanda, S. K.; Schmidt, A. G.; Godula, K.; Leibel, S. L.; Jose, J.; Corbett, K. D.; Ward, A. B.; Carlin, A. F.; Esko, J. D., SARS-CoV-2 Infection Depends on Cellular Heparan Sulfate and ACE2. Cell 2020, 183 (4), 1043-1057.e15. 15. Bishop, J. R.; Schuksz, M.; Esko, J. D., Heparan sulphate proteoglycans fine-tune mammalian physiology. Nature 2007, 446 (7139), 1030-1037. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 18 16. Cagno, V.; Tseligka, E. D.; Jones, S. T.; Tapparel, C., Heparan sulfate proteoglycans and viral attachment: true receptors or adaptation bias? Viruses 2019, 11 (7), 596. 17. de Haan, C. A. M.; Haijema, B. J.; Schellen, P.; Wichgers Schreur, P.; te Lintelo, E.; Vennema, H.; Rottier, P. J. M., Cleavage of group 1 coronavirus spike proteins: how furin cleavage is traded off against heparan sulfate binding upon cell culture adaptation. J. Virol. 2008, 82 (12), 6078-6083. 18. de Haan, C. A. M.; Li, Z.; te Lintelo, E.; Bosch, B. J.; Haijema, B. J.; Rottier, P. J. M., Murine coronavirus with an extended host range uses heparan sulfate as an entry receptor. J. Virol. 2005, 79 (22), 14451-14456. 19. Sarrazin, S.; Lamanna, W. C.; Esko, J. D., Heparan sulfate proteoglycans. Cold Spring Harb. Perspect. Biol. 2011, 3 (7), a004952. 20. Xu, D.; Esko, J. D., Demystifying heparan sulfate–protein interactions. Annu. Rev. Biochem 2014, 83 (1), 129-157. 21. Kamhi, E.; Joo, E. J.; Dordick, J. S.; Linhardt, R. J., Glycosaminoglycans in infectious disease. Biol. Rev. 2013, 88 (4), 928-943. 22. García, B.; Merayo-Lloves, J.; Martin, C.; Alcalde, I.; Quirós, L. M.; Vazquez, F., Surface proteoglycans as mediators in bacterial pathogens infections. Front. Microbiol. 2016, 7, 220. 23. Zong, C.; Venot, A.; Li, X.; Lu, W.; Xiao, W.; Wilkes, J.-S. L.; Salanga, C. L.; Handel, T. M.; Wang, L.; Wolfert, M. A.; Boons, G.-J., Heparan sulfate microarray reveals that heparan sulfate–protein binding exhibits different ligand requirements. J. Am. Chem. Soc. 2017, 139 (28), 9534-9543. 24. Arungundram, S.; Al-Mafraji, K.; Asong, J.; Leach, F. E.; Amster, I. J.; Venot, A.; Turnbull, J. E.; Boons, G.-J., Modular Synthesis of Heparan Sulfate Oligosaccharides for Structure−Activity Relationship Studies. J. Am. Chem. Soc. 2009, 131 (47), 17394-17405. 25. Stadlbauer, D.; Amanat, F.; Chromikova, V.; Jiang, K.; Strohmeier, S.; Arunkumar, G. A.; Tan, J.; Bhavsar, D.; Capuano, C.; Kirkpatrick, E.; Meade, P.; Brito, R. N.; Teo, C.; McMahon, M.; Simon, V.; Krammer, F. SARS-CoV-2 seroconversion in humans: A detailed protocol for a serological assay, antigen production, and test setup. Curr. Protoc. Microbiol. 2020, 57 (1), e100.. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 19 26. Amanat, F.; Stadlbauer, D.; Strohmeier, S.; Nguyen, T. H. O.; Chromikova, V.; McMahon, M.; Jiang, K.; Arunkumar, G. A.; Jurczyszak, D.; Polanco, J.; Bermudez- Gonzalez, M.; Kleiner, G.; Aydillo, T.; Miorin, L.; Fierer, D. S.; Lugo, L. A.; Kojic, E. M.; Stoever, J.; Liu, S. T. H.; Cunningham-Rundles, C.; Felgner, P. L.; Moran, T.; Garcia- Sastre, A.; Caplivski, D.; Cheng, A. C.; Kedzierska, K.; Vapalahti, O.; Hepojoki, J. M.; Simon, V.; Krammer, F. A serological assay to detect SARS-CoV-2 seroconversion in humans. Nat. Med. 2020, 26 (7), 1033-1036. 27. Kim, S. Y.; Jin, W.; Sood, A.; Montgomery, D. W.; Grant, O. C.; Fuster, M. M.; Fu, L.; Dordick, J. S.; Woods, R. J.; Zhang, F.; Linhardt, R. J., Characterization of heparin and severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) spike glycoprotein binding interactions. Antiviral Res. 2020, 181, 104873. 28. Shang, J.; Ye, G.; Shi, K.; Wan, Y.; Luo, C.; Aihara, H.; Geng, Q.; Auerbach, A.; Li, F., Structural basis of receptor recognition by SARS-CoV-2. Nature 2020, 581 (7807), 221-224. 29. Xia, S.; Lan, Q.; Su, S.; Wang, X.; Xu, W.; Liu, Z.; Zhu, Y.; Wang, Q.; Lu, L.; Jiang, S., The role of furin cleavage site in SARS-CoV-2 spike protein-mediated membrane fusion in the presence or absence of trypsin. Signal. Transduct. Target. Ther. 2020, 5 (1), 92. 30. Bouwman, K. M.; Tomris, I.; Turner, H. L.; van der Woude, R.; Bosman, G. P.; Rockx, B.; Herfst, S.; Haagmans, B. L.; Ward, A. B.; Boons, G.-J.; de Vries, R. P., Multimerization- and glycosylation-dependent receptor binding of SARS-CoV-2 spike proteins. bioRxiv 2020, 2020.09.04.282558. 31. Kim, Y.-I.; Kim, S.-G.; Kim, S.-M.; Kim, E.-H.; Park, S.-J.; Yu, K.-M.; Chang, J.- H.; Kim, E. J.; Lee, S.; Casel, M. A. B.; Um, J.; Song, M.-S.; Jeong, H. W.; Lai, V. D.; Kim, Y.; Chin, B. S.; Park, J.-S.; Chung, K.-H.; Foo, S.-S.; Poo, H.; Mo, I.-P.; Lee, O.-J.; Webby, R. J.; Jung, J. U.; Choi, Y. K., Infection and rapid transmission of SARS-CoV-2 in ferrets. Cell Host Microbe 2020, 27 (5), 704-709.e2. 32. Richard, M.; Kok, A.; de Meulder, D.; Bestebroer, T. M.; Lamers, M. M.; Okba, N. M. A.; Fentener van Vlissingen, M.; Rockx, B.; Haagmans, B. L.; Koopmans, M. P. G.; Fouchier, R. A. M.; Herfst, S., SARS-CoV-2 is transmitted via contact and via the air between ferrets. Nat. Comm. 2020, 11 (1), 3496. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 20 33. Oreshkova, N.; Molenaar, R. J.; Vreman, S.; Harders, F.; Oude Munnink, B. B.; Hakze-van der Honing, R. W.; Gerhards, N.; Tolsma, P.; Bouwstra, R.; Sikkema, R. S.; Tacken, M. G.; de Rooij, M. M.; Weesendorp, E.; Engelsma, M. Y.; Bruschke, C. J.; Smit, L. A.; Koopmans, M.; van der Poel, W. H.; Stegeman, A., SARS-CoV-2 infection in farmed minks, the Netherlands, April and May 2020. Eurosurveillance 2020, 25 (23), 2001005. 34. Rabenstein, D. L., Heparin and heparan sulfate: structure and function. Nat. Prod. Rep. 2002, 19 (3), 312-331. 35. Smits, N. C.; Kurup, S.; Rops, A. L.; ten Dam, G. B.; Massuger, L. F.; Hafmans, T.; Turnbull, J. E.; Spillmann, D.; Li, J.-p.; Kennel, S. J.; Wall, J. S.; Shworak, N. W.; Dekhuijzen, P. N. R.; van der Vlag, J.; van Kuppevelt, T. H., The heparan sulfate motif (GlcNS6S-IdoA2S)3, common in heparin, has a strict topography and is involved in cell behavior and disease. J. Biol. Chem. 2010, 285 (52), 41143-41151. 36. Sardu, C. G., J.; Morelli, M.B.; Wang, X.; Marfella, R.; Santulli, G. , Is COVID-19 an endothelial disease? Clinical and basic evidence. Preprints 2020, 2020040204. 37. Tang, N.; Bai, H.; Chen, X.; Gong, J.; Li, D.; Sun, Z., Anticoagulant treatment is associated with decreased mortality in severe coronavirus disease 2019 patients with coagulopathy. J. Thromb. Haemost. 2020, 18 (5), 1094-1099. 38. Thachil, J.; Tang, N.; Gando, S.; Falanga, A.; Cattaneo, M.; Levi, M.; Clark, C.; Iba, T., ISTH interim guidance on recognition and management of coagulopathy in COVID-19. J. Thromb. Haemost. 2020, 18 (5), 1023-1026. 39. Thacker, B. E.; Xu, D.; Lawrence, R.; Esko, J. D., Heparan sulfate 3-O-sulfation: a rare modification in search of a function. Matrix Biol. 2014, 35, 60-72. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.05.10.087288doi: bioRxiv preprint https://doi.org/10.1101/2020.05.10.087288 10_1101-2020_06_17_156679 ---- 87208811 1 A COVID Moonshot: assessment of ligand binding to the SARS-CoV-2 main protease by saturation 1 transfer difference NMR spectroscopy 2 3 Anastassia L. Kantsadi1, Emma Cattermole1, Minos-Timotheos Matsoukas2, Georgios A. Spyroulias2 4 and Ioannis Vakonakis1* 5 1Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, United 6 Kingdom 7 2Department of Pharmacy, University of Patras, Panepistimioupoli Campus, GR-26504, Greece 8 *To whom correspondence should be addressed, e-mail: ioannis.vakonakis@bioch.ox.ac.uk, Tel.: 9 +44 1865 275725, Fax: +44 1865 613201 10 11 Short title: Assessment of ligand binding to SARS-CoV-2 Mpro by STD-NMR 12 Keywords: SARS-CoV-2, COVID-19, Moonshot, Mpro, NMR, STD, screening, fragments, molecular 13 dynamics, MD, competition 14 15 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 2 Abstract 16 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological cause of the 17 coronavirus disease 2019, for which no effective therapeutics are available. The SARS-CoV-2 main 18 protease (Mpro) is essential for viral replication and constitutes a promising therapeutic target. Many 19 efforts aimed at deriving effective Mpro inhibitors are currently underway, including an international 20 open-science discovery project, codenamed COVID Moonshot. As part of COVID Moonshot, we used 21 saturation transfer difference nuclear magnetic resonance (STD-NMR) spectroscopy to assess the 22 binding of putative Mpro ligands to the viral protease, including molecules identified by 23 crystallographic fragment screening and novel compounds designed as Mpro inhibitors. In this 24 manner, we aimed to complement enzymatic activity assays of Mpro performed by other groups with 25 information on ligand affinity. We have made the Mpro STD-NMR data publicly available. Here, we 26 provide detailed information on the NMR protocols used and challenges faced, thereby placing these 27 data into context. Our goal is to assist the interpretation of Mpro STD-NMR data, thereby accelerating 28 ongoing drug design efforts. 29 30 31 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 3 Introduction 32 Infections by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) resulted in 33 approximately 1.8 million deaths in 2020 (1) and led to the coronavirus 2019 (COVID-19) pandemic 34 (2-4). SARS-CoV-2 is a zoonotic betacoronavirus highly similar to SARS-CoV and MERS-CoV, which 35 caused outbreaks in 2002 and 2012, respectively (5-7). SARS-CoV-2 encodes its proteome in a single, 36 positive-sense, linear RNA molecule of ~30 kb length, the majority of which (~21.5 kb) is translated 37 into two polypeptides, pp1a and pp1ab, via ribosomal frame-shifting (8, 9). Key viral enzymes and 38 factors, including most proteins of the reverse-transcriptase machinery, inhibitors of host translation 39 and molecules signalling for host cell survival, are released from pp1a and pp1ab via post-40 translational cleavage by two viral cysteine proteases (10). These proteases, a papain-like enzyme 41 cleaving pp1ab at three sites, and a 3C-like protease cleaving the polypeptide at 11 sites, are primary 42 targets for the development of antiviral drugs. 43 The 3C-like protease of SARS-CoV-2, also known as the viral main protease (Mpro), has been the 44 target of intense study owing to its centrality in viral replication. Mpro studies have benefited from 45 previous structural analyses of the SARC-CoV 3C-like protease and the earlier development of 46 putative inhibitors (11-14). The active sites of these proteases are highly conserved, and 47 peptidomimetic inhibitors active against Mpro are also potent against the SARS-CoV 3C-like protease 48 (15, 16). However, to date no Mpro-targeting inhibitors have been validated in clinical trials. In order 49 to accelerate Mpro inhibitor development, an international, crowd-funded, open-science project was 50 formed under the banner of COVID Moonshot (17), combining high-throughput crystallographic 51 screening (18), computational chemistry, enzymatic activity assays and mass spectroscopy (19) 52 among the many methodologies contributed by collaborating groups. 53 As part of COVID Moonshot, we utilised saturation transfer difference nuclear magnetic 54 resonance (STD-NMR) spectroscopy (20-22) to investigate the Mpro binding of ligands initially 55 identified by crystallographic screening, as well as molecules designed specifically as non-covalent 56 inhibitors of this protease. Our goal was to provide orthogonal information on ligand binding to that 57 which could be gained by enzymatic activity assays conducted in parallel by other groups. STD-NMR 58 is a proven method for characterising the binding of small molecules to biological macromolecules, 59 able to provide both quantitative affinity information and structural data on the proximity of ligand 60 chemical groups to the protein. Here, we provide detailed documentation on the NMR protocols 61 used to record these data and highlight the advantages, limitations and assumptions underpinning 62 our approach. Our aim is to assist the comparison of Mpro STD-NMR data with other quantitative 63 measurements, and facilitate the consideration of these data when designing future Mpro inhibitors. 64 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 4 Materials and Methods 65 Protein production and purification 66 We created a SARS-CoV-2 Mpro genetic construct in pFLOAT vector (23), encoding for the viral 67 protease and an N-terminal His6-tag separated by a modified human rhinovirus (HRV) 3C protease 68 recognition site, designed to reconstitute a native Mpro N-terminus upon HRV 3C cleavage. The Mpro 69 construct was transformed into Escherichia coli strain Rosetta(DE3) (Novagen) and transformed 70 clones were pre-cultured at 37 °C for 5 h in lysogeny broth supplemented with appropriate 71 antibiotics. Starter cultures were used to inoculate 1 L of Terrific Broth Autoinduction Media 72 (Formedium) supplemented with 10% v/v glycerol and appropriate antibiotics. Cell cultures were 73 grown at 37 °C for 5 h and then cooled to 18 °C for 12 h. Bacterial cells were harvested by 74 centrifugation at 5,000 x g for 15 min. 75 Cell pellets were resuspended in 50 mM trisaminomethane (Tris)-Cl pH 8, 300 mM NaCl, 10 mM 76 imidazole buffer, incubated with 0.05 mg/ml benzonase nuclease (Sigma Aldrich) and lysed by 77 sonication on ice. Lysates were clarified by centrifugation at 50,000 x g at 4 °C for 1 h. Lysate 78 supernatants were loaded onto a HiTrap Talon metal affinity column (GE Healthcare) pre-79 equilibrated with lysis buffer. Column wash was performed with 50 mM Tris-Cl pH 8, 300 mM NaCl 80 and 25 mM imidazole, followed by protein elution using the same buffer and an imidazole gradient 81 from 25 to 500 mM concentration. The His6-tag was cleaved using home-made HRV 3C protease. The 82 HRV 3C protease, His6-tag and further impurities were removed by a reverse HiTrap Talon column. 83 Flow-through fractions were concentrated and applied to a Superdex75 26/600 size exclusion 84 column (GE Healthcare) equilibrated in NMR buffer (150 mM NaCl, 20 mM Na2HPO4 pH 7.4). 85 86 Nuclear magnetic resonance (NMR) spectroscopy 87 All NMR experiments were performed using a 950 MHz solution-state instrument comprising an 88 Oxford Instruments superconducting magnet, Bruker Avance III console and TCI probehead. A Bruker 89 SampleJet sample changer was used for sample manipulation. Experiments were performed and 90 data processed using TopSpin (Bruker). For direct STD-NMR measurements, samples comprised 10 91 μM Mpro and variable concentrations (20 μM – 4 mM) of ligand compounds formulated in NMR 92 buffer supplemented with 10% v/v D2O and deuterated dimethyl sulfoxide (D6-DMSO, 99.96% D, 93 Sigma Aldrich) to 5% v/v final D6-DMSO concentration. In competition experiments, samples 94 comprised 2 μM Mpro, 0.8 mM of ligand x0434 and variable concentrations (0 – 20 μM) of competing 95 compound in NMR buffer supplemented with D2O and D6-DMSO as above. Sample volume was 140 96 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 5 μL and samples were loaded in 3 mm outer diameter SampleJet NMR tubes (Bruker) placed in 96-97 tube racks. NMR tubes were sealed with POM balls. 98 STD-NMR experiments were performed at 10 oC using a pulse sequence described previously (20) 99 and an excitation sculpting water-suppression scheme (24). Protein signals were suppressed in STD-100 NMR by the application of a 30 msec spin-lock pulse. We collected time-domain data of 16,384 101 complex points and 41.6 μsec dwell time (12.02 kHz sweepwidth). Data were collected in an 102 interleaved pattern, with on- and off-resonance irradiation data separated into 16 blocks of 16 103 transients each (256 total transients per irradiation frequency). Transient recycle delay was 4 sec and 104 on- or off-resonance irradiation was performed using 0.1 mW of power for 3.5 sec at 0.5 ppm or 26 105 ppm, respectively, for a total experiment time of approximately 50 minutes. Reconstructed time-106 domain data from the difference of on- and off-resonance irradiation (STD spectra) or only the off-107 resonance irradiation (reference spectra) were processed by applying a 2 Hz exponential line 108 broadening function and 2-fold zero-filling prior to Fourier transformation. Phasing parameters were 109 derived for each sample from the reference spectra and copied to the STD spectra. 1H peak 110 intensities were integrated in TopSpin using a local-baseline adjustment function. Data fitting to 111 extract Kd values were performed in OriginPro (OriginLab). The folded state of M pro in the presence 112 of each ligand was verified by collecting 1H NMR spectra similar to Fig. 1A from all samples ahead of 113 STD-NMR experiments. 114 115 Ligand handling 116 Compounds for the initial STD-NMR assessment of crystallographic fragment binding to Mpro were 117 provided by the XChem group at Diamond Light Source in the form of a 384-well plated library (DSI-118 poised, Enamine), with compounds dissolved in D6-DMSO at 500 mM nominal concentration. 1 μL of 119 dissolved compounds was aspirated from this library and immediately mixed with 9 μL of D6-DMSO 120 for a final fragment concentration of 50 mM, from which NMR samples were formulated. For 121 titrations of the same crystallographic fragments compounds were procured directly from Enamine 122 in the form of lyophilized powder, which was dissolved in D6-DMSO to derive compound stocks at 10 123 mM and 100 mM concentrations for NMR sample formulation. 124 STD-NMR assays of bespoke Mpro ligands used compounds commercially synthesised for COVID 125 Moonshot. These ligands were provided to us by the XChem group in 96-well plates, containing 0.7 126 μL of 20 mM D6-DMSO-disolved compound per well. Plates were created using an Echo liquid 127 handling robot (Labcyte) and immediately sealed and frozen at -20 oC. For use, ligand plates were 128 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 6 thoroughly defrosted at room temperature and spun at 3,500 g for 5 minutes. In single-129 concentration STD-NMR experiments, 140 μL of a pre-formulated mixture of Mpro and NMR buffer 130 with D2O and D6-DMSO were added to each well to create the final NMR sample. For STD-NMR 131 competition experiments, 0.5 μL of ligands were aspirated from the plates and immediately mixed 132 with 19.5 μL of D6-DMSO for final ligand concentration of 0.5 mM from which NMR samples were 133 formulated. 134 135 Molecular dynamics (MD) simulations 136 The monomeric complexes of Mpro bound to chemical fragments were obtained from the RCSB 137 Protein Data Bank entries 5R81 (ligand x0195), 5REB (x0387), 5RGI (x0397), 5RGK (x0426), 5R83 138 (x0434) and 5REH (x0540) for MD simulations with GROMACS version 2018 (25) and the 139 AMBER99SB-ILDN force field (26). All complexes were inserted in a pre-equilibrated box containing 140 water implemented using the TIP3P water model (26). Force field parameters for the six ligands 141 were generated using the general Amber force field and HF/6 – 31G*– derived RESP atomic charges 142 (27). The reference system consisted of the protein, the ligand, ~31,400 water molecules, 95 Na and 143 95 Cl ions in a 100 x 100 x 100 Å simulation box, resulting in a total number of ~98,000 atoms. Each 144 system was energy-minimized and subsequently subjected to a 20 ns MD equilibration, with an 145 isothermal-isobaric ensemble using isotropic pressure control (28), and positional restraints on 146 protein and ligand coordinates. The resulting equilibrated systems were replicated 4 times and 147 independent 200 ns MD trajectories were produced with a time step of 2 fs, in constant temperature 148 of 300 K, using separate v-rescale thermostats (28) for the protein, ligand and solvent molecules. 149 Lennard-Jones interactions were computed using a cut-off of 10 Å and electrosta�c interac�ons were 150 treated using particle mesh Ewald (29) with the same real-space cut-off. Analysis on the resulting 151 trajectories was performed using MDAnalysis (30, 31). Structures were visualised using PyMOL (32). 152 153 Notes 154 The enzymatic inhibition potential of Mpro ligands, measured by RapidFire mass spectroscopy 155 (17), was retrieved from the Collaborative Drug Discovery database (33). 156 157 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 7 Results 158 STD-NMR assays of M pro ligand binding 159 Mpro forms dimers in crystals via an extensive interaction interface involving two domains (15). 160 Mpro dimers likely have a sub-μM solution dissociation constant (Kd) by analogy to previously studied 161 3C-like coronavirus proteases (34). At the 10 μM protein concentration of our NMR assays Mpro is, 162 thus, expected to be dimeric with an estimated molecular weight of nearly 70 kDa. Despite the 163 relatively large size of Mpro for solution NMR, 1H spectra of the protease readily showed the presence 164 of multiple up-field shifted (<0.5 ppm) peaks corresponding to protein methyl groups (Fig. 1A). In 165 addition to demonstrating that Mpro is folded under the conditions tested, these spectra allowed us 166 to identify the chemical shifts of Mpro methyl groups that may be suitable for on-resonance 167 irradiation in STD-NMR experiments. Trials with on-resonance irradiation applied to different methyl 168 group peaks showed that irradiating at 0.5 ppm (Fig. 1A) produced the strongest STD signal from 169 ligands in the presence of Mpro, while simultaneously avoiding ligand excitation that would yield 170 false-positive signals in the absence of Mpro (Fig. 1B). Further, we noted that small molecules 171 abundant in the samples but not binding specifically to Mpro, such as DMSO, produced pseudo-172 dispersive residual signal lineshapes in STD spectra, while true Mpro ligands produced peaks in STD 173 with absorptive 1H lineshapes. We surmised that STD-NMR is suitable for screening ligand binding to 174 Mpro, requiring relatively small amounts (10-50 μgr) of protein and time (under 1 hour) per sample 175 studied. 176 The strength of STD signal is quantified by calculating the ratio of integrated signal intensity of 177 peaks in the STD spectrum over that of the reference spectrum (STDratio). The STDratio factor is 178 inversely proportional to ligand Kd, as �� where [L] is ligand concentration. 179 Measuring STDratio values over a range of ligand concentrations allows fitting of the proportionality 180 constant and calculation of ligand Kd. However, time and sample-amount considerations, including 181 the limited availability of bespoke compounds synthesized for the COVID Moonshot project, made 182 recording full STD-NMR titrations impractical for screening hundreds of ligands. Thus, we evaluated 183 whether measuring the STDratio value at a single ligand concentration may be an informative 184 alternative to Kd, provided restraints could be placed, for example, on the proportionality constant. 185 Theoretical and practical considerations suggested that three parameters influence our 186 evaluation of single-concentration STDratio values towards an affinity context. Firstly, the STDratio 187 factor is affected by the efficiency of NOE magnetisation transfer between protein and ligand, which 188 in turn depends on the proximity of ligand and protein groups, and the chemical nature of these 189 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 8 groups (20-22). To minimize the influence of these factors across diverse ligands, we sought to 190 quantify the STDratio of only aromatic ligand groups, and only consider those showing the strongest 191 STD signal; thus, that are in closest proximity to the protein. Second, STD-NMR assays require ligand 192 exchange between protein-bound and -free states in the timeframe of the experiment; strongly 193 bound compounds that dissociate very slowly from the protein would yield reduced STDratio values 194 compared to weaker ligands that dissociate more readily. Structures of Mpro with many different 195 ligands show that the protein conformation does not change upon complex formation and that the 196 active site is fully solvent-exposed (18), which suggests that ligand association can proceed with high 197 rate (107 – 108 M-1s-1). Under this assumption, the ligand dissociation rate is the primary determinant 198 of interaction strength. Given the duration of the STD-NMR experiment in our assays, and the ratios 199 of ligand:protein used, we estimated that significant protein – ligand exchange will take place even 200 for interactions as strong as low-μM Kd. Finally, uncertainties or errors in nominal ligand 201 concentration skew the correlation of STDratio to compound affinities; as shown in Fig. S1, STDratio 202 values increase strongly when very small amounts of ligands are assessed. Thus, overly large STDratio 203 values may be measured if ligand concentrations are significantly lower than anticipated. 204 205 Quantitating M pro binding of ligands identified by crystallographic screening 206 Mindful of the limitations inherent to measuring single-concentration STDratio values, and prior to 207 using STD-NMR to evaluate bespoke Mpro ligands, we used this method to assess binding to the 208 protease of small chemical fragments identified in crystallographic screening experiments (18). In 209 crystallographic screening campaigns of other target proteins such fragments were seen to have 210 very weak affinities (> 1 mM Kd, e.g. (35)), thereby satisfying the exchange criterion set out above. 39 211 non-covalent Mpro interactors are part of the DSI-poised fragment library to which we were given 212 access, comprising 17 active site binders, two compounds targeting the Mpro dimerisation interface 213 and 20 molecules binding elsewhere on the protein surface (18). We initially recorded STD-NMR 214 spectra from these compounds in the absence of Mpro to confirm that we obtained no or minimal 215 STD signal when protease is omitted, and to verify ligand identity from reference 1H spectra. Five 216 ligands gave no solution NMR signal or produced reference 1H spectra inconsistent with the 217 compound chemical structure; these ligands were not evaluated further. Samples of 10 μM Mpro and 218 0.8 mM nominal ligand concentration were then formulated from the remaining 34 compounds 219 (Table S1), and STD-NMR spectra were recorded, from which only aromatic ligand STD signals were 220 considered for further analysis. 221 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 9 We observed large variations in STD signal intensity and STDratio values in the presence of M pro 222 across compounds (Fig. 2A,B; Table S1), with many ligands producing little or no STD signal, 223 suggesting substantial differences in compound affinity for the protease. However, we also noted 224 that ligand reference spectra different substantially in intensity (Fig. 2C), despite compounds being 225 at the same nominal concentration. Integrating ligand peaks in these reference spectra revealed 226 differences in per-1H intensity of up to ~15-fold, indicating significant variation of ligand 227 concentrations in solution (Table S1). Such concentration differences could arise from errors in 228 sample formulation or from concentration inconsistencies in the compound library. To evaluate the 229 former we also integrated the residual 1H signal of D6-DMSO in our reference spectra, and found it to 230 vary by less than 35% across any pair of samples (11% average deviation). As DMSO was added 231 alongside ligands in our samples, we concluded that sample formulation may have contributed 232 errors in compound concentration of up to ~1/3, but did not account for the ~15-fold differences in 233 concentration observed. 234 Given that differences in compound concentration can skew the relative STDratio values of ligands 235 (Fig. S1), and that such concentration differences were also observed among newly designed Mpro 236 inhibitors (see below), we questioned whether recording STDratio values under these conditions can 237 provide useful information. To address this question we attempted to quantify the affinity of 238 crystallographic fragments to Mpro, selecting ligands that showed clear differences in STDratio values 239 in the assays above and focusing on compounds binding at the Mpro active site; hence, that are of 240 potential interest to inhibitor development. We performed Mpro binding titrations monitored by STD-241 NMR of compounds x0195, x0354, x0426 and x0434 in 50 μM – 4 mM concentrations (Fig. S2), and 242 noted that only compounds x0434 and x0195, which show the highest STDratio (Fig. 2A), bound 243 strongly enough for an affinity constant to be estimated (Kd of 1.6 ± 0.2 mM and 1.7 ± 0.2 mM, 244 respectively). In contrast, the titrations of x0354 and x0426, which yielded lower STDratio values, 245 could not be fit to extract a Kd indicating weaker binding to M pro. 246 To further this analysis, we assessed the binding of fragments x0195, x0387, x0397, x0426, x0434 247 and x0540 to the Mpro active site using quadruplicate atomistic molecular dynamics (MD) simulations 248 of 200 nsec duration. As shown in Fig. S3A,B, and Movies S1 and S2, fragments with high STDradio 249 values (x0434 and x0195) always located in the Mpro active site despite exchanging between 250 different binding conformations (Fig. S4), with average ligand root-mean-square-deviation (RMSD) of 251 3.2 Å and 5.1 Å respectively after the first 100 nsec of simulation. Medium STDratio value fragments 252 (x0426 and x0540, Fig. S3C,D, and Movies S3 and S4) show average RMSDs of approximately 9 Å in 253 the same simulation timeframe, frequently exchanging to alternative binding poses and with x0540 254 occasionally exiting the Mpro active site. In contrast, fragments showing very little STD NMR signal 255 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 10 (x0397 and x0387, Fig. S3E,F, and Movies S5 and S6) regularly exit the Mpro active site and show 256 average RMSDs in excess of 15 Å with very limited stability. Combining the quantitative Kd and MD 257 information above, we surmised that, despite limitations inherent in this type of analysis and 258 uncertainties in ligand amounts, STDratio values recorded at single compound concentration can act 259 as proxy measurements of Mpro affinity for ligands. 260 261 Assessment of M pro binding by COVID Moonshot ligands 262 We proceeded to characterise by STD-NMR the Mpro binding of bespoke ligands created as part of 263 the COVID Moonshot project and designed to act as non-covalent inhibitors of the protease (17). 264 Similar to the assays of crystallographic fragments above, we focused our analysis of STD signals to 265 aromatic moieties of ligands binding to the Mpro active side and extracted STDratio values only from 266 the strongest STD peaks. Once again, we noted substantial differences in apparent compound 267 concentrations, judging from reference 1H spectral intensities (Fig. 3A), which could not be 268 attributed to errors in sample preparation as the standard deviation of residual 1H intensity in the 269 D6-DMSO peak did not exceed 5% in any of the ligand batches tested. Crucially, out of 650 different 270 molecules tested, samples of 35 compounds (7.6%) contained no ligand and 86 (13.2%) very little 271 ligand (Fig. 3A). In these cases, NMR assays were repeated using a separate batch of compound; 272 however, 96.2% of repeat experiments yielded the same outcome of no or very little ligand in the 273 NMR samples. 274 We measured STDratio values from samples were ligands produced sufficiently strong reference 1H 275 NMR spectra to be readily visible, and deposited these values and associated raw NMR data to the 276 Collaborative Drug Discovery database (33). Some of these ligands were assessed independently for 277 enzymatic inhibition of Mpro using a mass spectroscopy method as part of the COVID Moonshot 278 collaboration (17). Where both parameters are available, we compared the STDratio values and 50% 279 inhibition concentrations (IC50) of these ligands. As shown in Fig. 3B, STDratio and IC50 values show 280 weak correlation (R2=30%) for most ligands tested; however, a subset of ligands displayed 281 conspicuously low or even no STD signals considering their effect on Mpro activity, and presented 282 themselves as outliers in the correlation graph. As these outlier ligands had IC50 values below 10 μM, 283 suggesting that their affinities to the protease may be in the μM Kd region, we considered whether 284 our approach gives rise to false-negative STD results, for example through slow ligand dissociation 285 from Mpro. 286 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 11 To address this question, we derived an assay whereby the bespoke, high-affinity Mpro inhibitor 287 would outcompete a lower-affinity ligand known to provide strong STD signal from the protease 288 active site. In these experiments the lower-affinity ligand would act as ‘spy’ molecule whose STD 289 signal reduces as function of inhibitor concentration. We used fragment x0434, which yields 290 substantial STD signal with Mpro (Fig. 1B and 2A), as ‘spy’, and tested protease inhibitors EDJ-MED-291 a364e151-1, LON-WEI-ff7b210a-5, CHO-MSK-6e55470f-14 and LOR-NOR-30067bb9-11 as x0434 292 competitors. Of these inhibitors, EDJ-MED-a364e151-1 gave rise to substantial STD signal in earlier 293 assays, whereas the remaining produced little or no STD signal; yet, all four inhibitors were reported 294 to have low-μM or sub-μM IC50 values based on M pro enzymatic assays. In these competition 295 experiments, both EDJ-MED-a364e151-1 and LON-WEI-ff7b210a-5 yielded Kd parameters 296 comparable to the reported IC50 values (Fig. S5A,B), showing that at least in the case of LON-WEI-297 ff7b210a-5 the absence of STD signal in the single-concentration NMR assays above represented a 298 false-negative result. In contrast, CHO-MSK-6e55470f-14 and LOR-NOR-30067bb9-11 were unable to 299 compete x0434 from the protease active site (Fig. S5C,D), suggesting that in these two cases the 300 reported IC50 values do not reflect inhibitor binding to the protease, and that the weak STD signal of 301 the initial assays was a better proxy of affinity. We surmised that although some low STDratio values 302 of Mpro inhibitors may not accurately reflect compound affinity to the protease, such values cannot 303 be discounted as a whole as they may correspond to non-binding ligands. 304 305 306 307 308 309 310 311 312 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 12 Discussion 313 Fragment-based screening is a tried and tested method for reducing the number of compounds 314 that need to be assessed for binding against a specific target in order to sample chemical space (36). 315 Combined with X-ray crystallography, which provides information on the target site and binding 316 pose of ligands, initial fragments can quickly be iterated into potent and specifically-interacting 317 compounds. The COVID Moonshot collaboration (17) took advantage of crystallographic fragment-318 based screening (18) to initiate the design of novel inhibitors targeting the essential main protease 319 of the SARS-CoV-2 coronavirus; however crystallographic structures do not report on ligand affinity 320 and inhibitory potency in enzymatic assays does not always correlate with ligand binding. Thus, 321 supplementing these methods with solution NMR tools highly sensitive to ligand binding can provide 322 a powerful combination of orthogonal information and assurance against false starts. 323 We showed that STD-NMR is a suitable method for characterising ligand binding to Mpro, allowing 324 us to assess ligand interactions using relatively small amounts of protein and in under one hour of 325 experiment time per ligand (Fig. 1B). However, screening compounds in a high-throughput manner is 326 not compatible with the time- and ligand-amount requirements of full STD-NMR titrations. Thus, we 327 resorted to using an unconventional metric, the single-concentration STDratio value, as proxy for 328 ligand affinity. Although this metric has limitations due to its dependency on magnetisation transfer 329 between protein and ligand, and on relatively rapid exchange between the ligand-free and -bound 330 states, we demonstrated that it can nevertheless be informative. Specifically, the relative STDratio 331 values of chemical fragments bound to the Mpro active site provided insight on fragment affinity (Fig. 332 2A), as crosschecked by quantitative titrations (Fig. S2) and MD simulations (Fig. S3). Furthermore, 333 STDratio values of COVID Moonshot compounds held a weak correlation to enzymatic IC50 parameters 334 (Fig. 3B), although false-negative and -positive results from both methods contribute to multiple 335 outliers. Thus, in our view the biggest limitation of using the single-concentration STDratio value as 336 metric relates to its supra-linear sensitivity to ligand concentration (Fig. S1), which as demonstrated 337 here can vary substantially across ligands in a large project (Fig. 3A). 338 How then should the STD data recorded as part of COVID Moonshot be used? Firstly, we showed 339 that at least for some bespoke Mpro ligands the STDratio value obtained is a better proxy for 340 compound affinity compared to IC50 parameters from enzymatic assays (Fig. S5). This, inherently, is 341 the value of employing orthogonal methods thereby minimizing the number of potential false 342 results. Thus, when one is considering existing Mpro ligands to base the design of future inhibitors, a 343 high STDratio value as well as low IC50 parameters are both desirable. Second, due to the 344 aforementioned limitations of single-concentration STDratio value as proxy of affinity, and the 345 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 13 influence of uncertainties in ligand concentrations, we believe that comparisons of compounds and 346 derivatives differing by less than ~50% in STDratio is not meaningful. Rather, we propose that the 347 STDratio values of M pro ligands measured and available at the CDD database should be treated as a 348 qualitative metrics of compound affinity. 349 In conclusion, we presented here protocols for the assessment of SARS-CoV-2 Mpro ligands using 350 STD-NMR spectroscopy, and evaluated the relative qualitative affinities of chemical fragments and 351 compounds designed as part of COVID Moonshot. Although development of novel antivirals to 352 combat COVID-19 is still at an early stage, we hope that this information will prove valuable to 353 groups working towards such treatments. 354 355 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 14 References 356 1. WHO. Coronavirus disease 2019 [Available from: 357 https://www.who.int/emergencies/diseases/novel-coronavirus-2019. 358 2. Kucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, et al. Early dynamics of 359 transmission and control of COVID-19: a mathematical modelling study. Lancet Infect Dis. 360 2020;20(5):553-8. 361 3. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, et al. A new coronavirus associated with 362 human respiratory disease in China. Nature. 2020;579(7798):265-9. 363 4. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with 364 Pneumonia in China, 2019. N Engl J Med. 2020;382(8):727-33. 365 5. Bermingham A, Chand MA, Brown CS, Aarons E, Tong C, Langrish C, et al. Severe respiratory 366 illness caused by a novel coronavirus, in a patient transferred to the United Kingdom from the 367 Middle East, September 2012. Euro Surveill. 2012;17(40):20290. 368 6. Kuiken T, Fouchier RA, Schutten M, Rimmelzwaan GF, van Amerongen G, van Riel D, et al. 369 Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome. Lancet. 370 2003;362(9380):263-70. 371 7. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus AD, Fouchier RA. Isolation of a novel 372 coronavirus from a man with pneumonia in Saudi Arabia. N Engl J Med. 2012;367(19):1814-20. 373 8. Thiel V, Ivanov KA, Putics A, Hertzig T, Schelle B, Bayer S, et al. Mechanisms and enzymes 374 involved in SARS coronavirus genome expression. J Gen Virol. 2003;84(Pt 9):2305-15. 375 9. Bredenbeek PJ, Pachuk CJ, Noten AF, Charite J, Luytjes W, Weiss SR, et al. The primary 376 structure and expression of the second open reading frame of the polymerase gene of the 377 coronavirus MHV-A59; a highly conserved polymerase is expressed by an efficient ribosomal 378 frameshifting mechanism. Nucleic Acids Res. 1990;18(7):1825-32. 379 10. Hilgenfeld R. From SARS to MERS: crystallographic studies on coronaviral proteases enable 380 antiviral drug design. FEBS J. 2014;281(18):4085-96. 381 11. Ghosh AK, Xi K, Grum-Tokars V, Xu X, Ratia K, Fu W, et al. Structure-based design, synthesis, 382 and biological evaluation of peptidomimetic SARS-CoV 3CLpro inhibitors. Bioorg Med Chem Lett. 383 2007;17(21):5876-80. 384 12. Verschueren KH, Pumpor K, Anemuller S, Chen S, Mesters JR, Hilgenfeld R. A structural view 385 of the inactivation of the SARS coronavirus main proteinase by benzotriazole esters. Chem Biol. 386 2008;15(6):597-606. 387 13. Yang H, Yang M, Ding Y, Liu Y, Lou Z, Zhou Z, et al. The crystal structures of severe acute 388 respiratory syndrome virus main protease and its complex with an inhibitor. Proc Natl Acad Sci U S A. 389 2003;100(23):13190-5. 390 14. Yang H, Xie W, Xue X, Yang K, Ma J, Liang W, et al. Design of wide-spectrum inhibitors 391 targeting coronavirus main proteases. PLoS Biol. 2005;3(10):e324. 392 15. Zhang L, Lin D, Sun X, Curth U, Drosten C, Sauerhering L, et al. Crystal structure of SARS-CoV-393 2 main protease provides a basis for design of improved alpha-ketoamide inhibitors. Science. 394 2020;368(6489):409-12. 395 16. Rut W, Groborz K, Zhang L, Sun X, Zmudzinski M, Pawlik B, et al. SARS-CoV-2 M(pro) 396 inhibitors and activity-based probes for patient-sample imaging. Nat Chem Biol. 2020. 397 17. , Achdout H, Aimon A, Bar-David E, Barr H, Ben-Shmuel A, et al. COVID Moonshot: Open 398 Science Discovery of SARS-CoV-2 Main Protease Inhibitors by Combining Crowdsourcing, High-399 Throughput Experiments, Computational Simulations, and Machine Learning. bioRxiv. 2020. 400 18. Douangamath A, Fearon D, Gehrtz P, Krojer T, Lukacik P, Owen CD, et al. Crystallographic 401 and electrophilic fragment screening of the SARS-CoV-2 main protease. Nat Commun. 402 2020;11(1):5047. 403 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 15 19. El-Baba TJ, Lutomski CA, Kantsadi AL, Malla TR, John T, Mikhailov V, et al. Allosteric Inhibition 404 of the SARS-CoV-2 Main Protease: Insights from Mass Spectrometry Based Assays. Angew Chem Int 405 Edit. 2020. 406 20. Mayer M, Meyer B. Characterization of Ligand Binding by Saturation Transfer Difference 407 NMR Spectroscopy. Angew Chem Int Ed Engl. 1999;38(12):1784-8. 408 21. Becker W, Bhattiprolu KC, Gubensak N, Zangger K. Investigating Protein-Ligand Interactions 409 by Solution Nuclear Magnetic Resonance Spectroscopy. Chemphyschem. 2018;19(8):895-906. 410 22. Walpole S, Monaco S, Nepravishta R, Angulo J. STD NMR as a technique for ligand screening 411 and structural studies. Methods in Enzymology. 615: Elsevier; 2019. p. 423-51. 412 23. Rogala KB, Dynes NJ, Hatzopoulos GN, Yan J, Pong SK, Robinson CV, et al. The Caenorhabditis 413 elegans protein SAS-5 forms large oligomeric assemblies critical for centriole formation. Elife. 414 2015;4:e07410. 415 24. Hwang TL, Shaka AJ. Water Suppression That Works - Excitation Sculpting Using Arbitrary 416 Wave-Forms and Pulsed-Field Gradients. Journal of Magnetic Resonance Series A. 1995;112(2):275-417 9. 418 25. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High 419 performance molecular simulations through multi-level parallelism from laptops to supercomputers. 420 SoftwareX. 2015;1:19-25. 421 26. Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, et al. Improved side-422 chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78(8):1950-8. 423 27. Bayly CI, Cieplak P, Cornell W, Kollman PA. A well-behaved electrostatic potential based 424 method using charge restraints for deriving atomic charges: the RESP model. J Phys Chem. 425 1993;97(40):10269-80. 426 28. Bussi G, Zykova-Timan T, Parrinello M. Isothermal-isobaric molecular dynamics using 427 stochastic velocity rescaling. J Chem Phys. 2009;130(7):074101. 428 29. Darden T, York D, Pedersen L. Particle mesh Ewald: An N⋅ log (N) method for Ewald sums in 429 large systems. J Chem Phys. 1993;98(12):10089-92. 430 30. Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O. MDAnalysis: a toolkit for the 431 analysis of molecular dynamics simulations. J Comput Chem. 2011;32(10):2319-27. 432 31. Gowers RJ, Linke M, Barnoud J, Reddy TJE, Melo MN, Seyler SL, et al., editors. MDAnalysis: A 433 Python package for the rapid analysis of molecular dynamics simulations. 15th Python in Science 434 Conference; 2016; Austin, TX. 435 32. DeLano WL. The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA. 436 http://www.pymol.org.2002. 437 33. Collaborative Drug Discovery database public access 2020 [Available from: 438 https://www.collaborativedrug.com/public-access/. 439 34. Grum-Tokars V, Ratia K, Begaye A, Baker SC, Mesecar AD. Evaluating the 3C-like protease 440 activity of SARS-Coronavirus: recommendations for standardized assays for drug discovery. Virus 441 Res. 2008;133(1):63-73. 442 35. Davies TG, Wixted WE, Coyle JE, Griffiths-Jones C, Hearn K, McMenamin R, et al. Monoacidic 443 Inhibitors of the Kelch-like ECH-Associated Protein 1: Nuclear Factor Erythroid 2-Related Factor 2 444 (KEAP1:NRF2) Protein-Protein Interaction with High Cell Potency Identified by Fragment-Based 445 Discovery. J Med Chem. 2016;59(8):3991-4006. 446 36. Erlanson DA, Fesik SW, Hubbard RE, Jahnke W, Jhoti H. Twenty years on: the impact of 447 fragments on drug discovery. Nat Rev Drug Discov. 2016;15(9):605-19. 448 449 450 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 16 Acknowledgements 451 We are grateful to Nick Soffe for maintenance of the Oxford Biochemistry solution NMR facility, 452 to Claire Strain-Damerell, Petra Lukacik and Martin A. Walsh for advice on Mpro production, to 453 Anthony Aimon and Frank von Delft for providing the DSI-poised fragment library, to Adrián García, 454 Nil Casajuana and Clàudia Llinàs del Torrent for advice with MD analysis tools, and to Leonardo 455 Pardo for providing access to high-performance computing facilities. This work was supported by 456 philanthropic donations to the University of Oxford COVID-19 Research Response Fund and the 457 Oxford Glycobiology Institute Endowment. The Oxford Biochemistry NMR facility was supported by 458 the Wellcome Trust (094872/Z/10/Z), the Engineering and Physical Sciences Research Council 459 (EP/R029849/1), the Wellcome Institutional Strategic Support Fund, the EPA Cephalosporin Fund 460 and the John Fell OUP Research Fund. This work was also supported by the “Reinforcement of 461 Postdoctoral Researchers - 2nd Cycle” (MIS-5033021), implemented by the Greek State Scholarships 462 Foundation (ΙΚΥ). 463 464 465 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 17 Figure 1: 1D and STD-NMR spectra of SARS-CoV-2 M pro . A) Methyl regions from 1H NMR spectra of 466 recombinant SARS-CoV-2 Mpro. The spectrum on the left was recorded from a 10 μM protein 467 concentration sample in a 5 mm NMR tube at 25 oC using an excitation sculpting water-suppression 468 method (24). 512 acquisitions with recycle delay of 1.25 sec were averaged, for a total experiment 469 time of just over 10 min. The spectrum on the right was recorded from a 10 μM Mpro sample in a 3 470 mm NMR tube at 10 oC, using the same pulse sequence and acquisition parameters. For both 471 spectra, data were processed with a quadratic sine function prior to Fourier transformation. Protein 472 resonances are weaker in the 10 oC spectrum due to lower temperature and the reduced amount of 473 sample used for acquisition in the smaller NMR tube. The position where on-resonance irradiation 474 was applied for STD spectra is indicated. B) Vertically offset 1H STD-NMR spectra from ligand x0434 475 binding to Mpro. The reference spectrum is in black with the x0434, H2O and DMSO 1H resonances 476 indicated. The STD spectrum of x0434 in the presence of Mpro is shown in red while that in the 477 absence of Mpro is in green. STD spectra are scaled up 64x compared to the reference spectrum. 478 Bottom panels correspond to magnified views of the indicated spectral regions, with x0434 479 resonances assigned to chemical groups of that ligand as shown. 480 481 Figure 2: Assessment of fragment binding to M pro . A) STDratio values for chemical fragments identified 482 by crystallographic screening as binding to Mpro (18). Ligands binding to the Mpro active site are 483 coloured orange, at the Mpro dimer interface in red, and elsewhere on the protein surface in blue. B) 484 Overlay of STD-NMR spectra from fragments x0305, x0387 and x434, which bind the Mpro active site, 485 showing the ligand aromatic region in the presence of Mpro. Spectra are colour coded per ligand as 486 indicated. As seen, the three fragments yield significantly different STD signal intensities captured in 487 the STDratio values shown in (A). C) Overlay of reference spectra from fragments x305, x376 and x540, 488 showing the ligand aromatic region. Peak intensities vary substantially, suggesting significant 489 differences in ligand concentration. 490 491 Figure 3. STD-NMR of COVID Moonshot ligands binding to M pro . A) Overlay of reference spectra from 492 the indicated COVID Moonshot ligands, showing the ligand aromatic region in each case. in the 493 presence of Mpro. Spectra are colour coded per ligand as indicated. As seen, peak intensities vary 494 substantially, suggesting significant differences in ligand concentration. Peaks of ligand EDJ-MED-495 c8e7a002-1 (green) are indicated by arrows; ligand EDJ-MED-e4b030d8-12 (red) produced no peaks 496 in the NMR spectrum. B) Plot of STDratio values from COVID Moonshot ligands assessed by STD-NMR 497 against their IC50 value estimated by RapidFire mass spectroscopy enzymatic assays (17). Ligands in 498 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 18 blue show weak correlation between the two methods (red line, corresponding to an exponential 499 function along the IC50 dimension). Ligands in grey represent outliers of the STD-NMR or enzymatic 500 method as discussed in the text. 501 502 503 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 10 8 6 4 2 0 x0434 H2O DMSO Reference STD (+Mpro) STD (-Mpro) B 2 0 STD irradiation A 2 0 25 oC 10 oC δ 1H (ppm) N NH NH O x0434 1 δ 1H (ppm) δ 1H (ppm) 2 3 4 1 2 3 4 5 5 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 0 10 20 30 40 50 60 x0434 x0195 x0540 x0426 x0305 x0072 x0161 x0107 x1249 x0395 x0354 x0387 x0397 x1187 x0390 x0194 X 1086 X 1237 X 0350 x1226 X 1235 X 0669 x0398 x0478 X 1119 X 0177 X 0376 X 1132 X 0499 X 1101 X 1163 x0464 X 0336 X 0165 x0425 S TD ra tio (x 1 0- 3 ) Ligand fragments x0305 x0387 x0434 [ppm] 8.5 8.0 7.5 7.0 6.5 δ 1H (ppm) B A [ppm] 8.0 7.5 7.0 6.5 6.0 5.5 x0305 x0376 x0540 C .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ [ppm] 8.0 7.5 7.0 6.5 6.0 RAL-THA-6b94ceba-1 LOR-NOR-c954e7ad-2 EDJ-MED-c8e7a002-1 EDJ-MED-e4b030d8-12 A δ 1H (ppm) 0.01 0.1 1 10 100 0 100 200 300 400 500 R ap id Fi re IC 50 ( µ M ) STDratio (x 10-3)B R2=30% .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2020.06.17.156679doi: bioRxiv preprint https://doi.org/10.1101/2020.06.17.156679 http://creativecommons.org/licenses/by/4.0/ 10_1101-2020_07_08_188672 ---- 64876765 1 Intramolecular quality control: HIV-1 Envelope gp160 signal-peptide cleavage as a functional folding checkpoint Nicholas McCaul1,2,5, Matthias Quandte1,2,6, Ilja Bontjer3, Guus van Zadelhoff2, Aafke Land2,7, Rogier W. Sanders3,4, Ineke Braakman2* 1 These authors contributed equally 2 Cellular Protein Chemistry, Bijvoet Center for Biomolecular Research, Science4Life, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands 3 Department of Medical Microbiology, Laboratory of Experimental Virology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands 4 Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, NY, USA 5 Present address: Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, MA, USA. 6 Present address: dr heinekamp Benelux B.V., Leidse Rijn 51, 3454 PZ, De Meern The Netherlands 7 Present address Hogeschool Utrecht, Institute of Life Sciences, FC Dondersstraat 65, 3572 JE, Utrecht, The Netherlands *Lead Contact: i.braakman@uu.nl .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Summary Removal of the membrane-tethering signal peptides that target secretory proteins to the endoplasmic reticulum is a prerequisite for proper folding. While generally thought to be removed well before translation termination, we here report two novel post-targeting functions for the HIV-1 gp120 signal peptide, which remains attached until gp120 folding triggers its removal. First, the signal peptide improves fidelity of folding by enhancing conformational plasticity of gp120 by driving disulfide isomerization through a redox- active cysteine, at the same time delaying folding by tethering the N-terminus to the membrane, which needs assembly with the C-terminus. Second, its carefully timed cleavage represents intramolecular quality control and ensures release and stabilization of (only) natively folded gp120. Postponed cleavage and the redox-active cysteine both are highly conserved and important for viral fitness. Considering the ~15% secretory proteins in our genome and the frequency of N-to-C contacts in protein structures, these regulatory roles of the signal peptide are bound to be more common in secretory-protein biosynthesis. Keywords: endoplasmic reticulum, gp120, disulfide bond, redox-active cysteine, protein folding, signal peptide, membrane tethering Introduction The endoplasmic reticulum (ER) is home to a wealth of resident chaperones and folding enzymes that cater to approximately a third of all mammalian proteins during their biosynthesis (Ellgaard et al., 2016; Kanapin et al., 2003). It is the site of N-linked glycan .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 addition and disulfide-bond formation, both of which contribute to protein folding, solubility, stability, and function. Targeting to the mammalian ER in general is mediated by N-terminal signal peptides, which direct the ribosome-nascent chain complex to the membrane and initiate co-translational translocation (Blobel and Dobberstein, 1975; Gorlich et al., 1992; Görlich et al., 1992; Jackson and Blobel, 1977; Lingappa et al., 1977; Walter, 1981). For soluble and type-I transmembrane proteins, the N-terminal signal peptide is 15-50 amino acids long and contains a cleavage site recognized by the signal peptidase complex (von Heijne, 1985). While a great deal of sequence variation occurs between signal sequences, conserved features do exist. These include a positively charged, N-terminal n-region, a hydrophobic h-region and an ER-lumenal c- region (von Heijne, 1983, 1984, 1985). Classic paradigm-establishing studies showed that cleavable signal peptides are removed co-translationally, immediately upon exposure of the cleavage site in the ER lumen (Blobel and Dobberstein, 1975; Jackson and Blobel, 1977). This would imply that signal peptides function only as cellular postal codes and that signal-peptide cleavage and folding are independent events. Evidence is emerging however that increased nascent-chain lengths are required for cleavage (Daniels et al., 2003; Hegde and Bernstein, 2006; Rutkowski et al., 2003), indicating that the signal peptidase does not cleave each consensus site immediately upon translocation into the ER lumen. Examples are the influenza-virus hemagglutinin, in which signal-peptide cleavage occurs on the longer nascent chain, well after glycosylation (Daniels et al., 2003), EDEM1 (Tamura et al., 2011), human cytomegalovirus (HCMV) protein US11 (Rehm et .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 al., 2001), and HIV-1 envelope glycoprotein gp160 (Li et al., 1994, 2000), suggesting that signal peptides can function as more than mere postal codes. Late signal-peptide cleavage is easily overlooked because Western blots lack temporal resolution and may mask small mass differences. Gp160 is the sole antigenic protein on the surface of the HIV-1 virion and mediates HIV- 1 entry into target cells (Wyatt and Sodroski, 1998). It folds and trimerizes in the ER, leaves upon release by chaperones and packaging into COPII-coated vesicles, and is cleaved by Golgi furin proteases into two non-covalently associated subunits: the soluble subunit gp120 (Figure 1A, in colors), which binds host-cell receptors, and the transmembrane subunit gp41 (Figure 1A, uncolored), which contains the fusion peptide (Decroly et al., 1994; Earl et al., 1990; Earl et al., 1991; Hallenberger et al., 1992; Wyatt and Sodroski, 1998). The so-called outer-domain residues [according to (Pancera et al., 2010)] are colored in pink (Figure 1A), the inner domain, which folds from more peripheral parts of the gp120 sequence, in grey, and the variable loops in green. Correct function of gp160 requires proper folding including oxidation of the correct cysteine pairs into disulfide bonds (Bontjer et al., 2009; Land and Braakman, 2001; Land et al., 2003; Sanders et al., 2008; Snapp et al., 2017). Disulfide-bond formation and isomerization in gp160 begin co-translationally, on the ribosome-attached nascent chain, and continue long after translation, until the correct set of ten conserved disulfide bonds have been formed (Land and Braakman, 2001; Land et al., 2003). The soluble subunit gp120 can be expressed independently of gp41 and folds with highly similar kinetics as gp160 (Land et al., 2003). Signal-peptide cleavage only occurs once gp120 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 attains a near-native conformation and requires both N-glycosylation and disulfide-bond formation, but is gp41 independent (Land et al., 2003; Li et al., 1996). Mutation-induced co-translational signal-peptide cleavage changes the folding pathway of gp120 and is disadvantageous for viral function (Pfeiffer et al., 2006; Snapp et al., 2017). Given the interplay between signal-peptide cleavage and gp120 folding, we set out to investigate the mechanism that drives post-translational cleavage and its relevance for gp120 folding and viral fitness. We used various kinetic oxidative-folding assays on gp120 combined with functional studies on recombinant HIV strains encoding gp160 mutants, and discovered a novel role for the ER-targeting signal peptide as quality-control checkpoint and folding mediator. A conserved cysteine in the membrane-tethered signal peptide drives disulfide isomerization in the gp120 ectodomain until gp120 folding triggers signal- peptide cleavage and release of the N-terminus. We uncovered this functional, mutual regulation as an intramolecular quality control that ensures native folding of a multidomain glycoprotein. Results Signal-peptide cleavage requires the gp120 C-terminus Of the nine disulfide bonds in gp120, five are critical for proper folding and signal- peptide cleavage, three in the constant regions of gp120 in the (grey) inner domain, and two in the outer (pink) domain at the base of variable loops (green) V3 and V4 [Figure 1A, (van Anken et al., 2008)]. Gp120 undergoes extensive disulfide isomerization during .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 its folding process as seen from the smear of gp120 folding intermediates (IT) in the non-reducing gel upon 35S-radiolabeling, from reduced gp120 down to beyond native gp120 [(Land et al., 2003); Figure 1B, NR, 0' chase]. This smear gradually disappears into a native band with discrete mobility, at around the time the signal peptide is removed (Figure 1B R, from 15' chase). Yet, it is far from obvious which aspect of gp120 folding triggers signal-peptide cleavage. We therefore embarked on the linear approach and prepared C-terminal truncations of gp120 from 110-aa length (111X, a gp120 molecule truncated after position 110) to full-length and analyzed in which the signal peptide was cleaved (Figure S1). Radioactive pulse-chase experiments showed that only full-length gp120 (511 residues long) and 494X lost their signal peptides, but that in all shorter forms, including 485X, the signal peptide remained uncleaved and hence attached to the protein (Figure S1). We continued with a time course for 485X and 494X to examine and compare their folding pathways (Figure 1B). Both truncations encompass the entire gp120 sequence except for the last 26 and 17 amino acids, respectively (Figure 1A). Like wild-type gp120, immediately after pulse labeling (synthesis) the 2 C-terminally truncated mutants ran close to the position of reduced gp120 in non-reducing SDS- PAGE. The 485X truncation formed disulfide bonds towards a compact structure, as the folding intermediates IT ran lower in the gel than reduced protein. It failed to form native gp120 however (NT, Figure 1B) or another stable intermediate. Instead it acquired compactness far beyond the mobility of NT and remained highly heterogeneous, suggesting the formation of long-distance disulfide bonds that increased compactness .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 and hence electrophoretic mobility (Figure 1B, Cells NR, 15'-2h). Only a fraction, if any, of the 485X mutant lost its signal peptide or acquired competence to leave the ER and be secreted (Figure 1B, Medium), even though all cysteines were present in the 485X gp120 protein. In contrast, addition of only 9 residues made 494X behave like wild-type gp120: cleavage rate and secretion were indistinguishable (Figure 1B). Oxidative folding progressed similarly as well, except for a transient non-native disulfide-linked population that ran more compact than NT (Figure 1B, Cells NR, 0-1 h) and disappeared over time (Cells NR, 2-4 h). We concluded that signal-peptide cleavage required synthesis and folding of more than 484 out of the 511 amino acids of gp120. The downstream amino acids in the C5 region in the inner (grey) domain (Figure 1A, teal) triggered the switch from non-cleavable to cleavable. A pseudo salt bridge in the inner-domain -sandwich controls signal-peptide cleavage and gp160 function Amino acids 485-494 form a -strand (Figures 2A and 2B, teal, 31), which is part of the -sandwich in the inner (grey) domain of gp120 (Figures 1A, 2A, and 2B) [coding of strands and helices from (Garces et al., 2015)]. This -sandwich is formed by interactions of seven -strands in constant domains C1, C2, and C5 (Figures 1A, 2A, and 2B). Six of the strands are close to the N-terminus and the 6th strand, 31 (teal), is contributed by the C-terminal C5 region. As the addition of 31 triggered signal-peptide removal we hypothesized that the complete and properly folded -sandwich was the minimal requirement for cleavage. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 To address this, we designed charge mutants aimed to prevent assembly of 31 with the N-terminal part of the -sandwich (Figure 2B). In the high-resolution crystal structure (Garces et al., 2015) K487 (in 31) forms hydrogen bonds with E47 (in 2), E91 (in 5), and the main-chain oxygen of N92 (in 5) (Figure 2B). We created charge-reversed mutants of the N-terminal glutamates (E47K and E91K), the C-terminal lysine (K487E) and combinations thereof (Figures 2C and S2). We did not include N92 in our mutagenesis study since its interaction involves the main-chain oxygen, which cannot be removed; we considered this inappropriate for our question. As gp120 is the dominant subunit in gp160 folding and signal-peptide cleavage, and allows more detailed analysis because it is smaller than gp160, we subjected wild-type gp120 and all mutants to pulse-chase analysis of their oxidative folding, signal-peptide cleavage, and secretion (Figures 2C-E and S2A-D). Like the C-terminal truncations (485X, Figure 1B), all charge mutants in the -sandwich formed gp120 molecules with higher electrophoretic mobility than native gp120 (NT), implying appreciable non-native long-range disulfide bonding, persisting at all time points or disappearing into aggregates (Figure S2B). K487E showed the strongest phenotype: minimal formation of NT and a much-delayed signal-peptide cleavage (Figures 2C, E and S2B, Cells NR and R, band Rc). This folding step was crucial for function as K487E mutant virus was non- infectious (Figure 2F). A striking rescue of the strong folding defect of K487E was effected by the charge reversal at the N-terminus: the double mutant E47K K487E displayed improved gp120 oxidation (Cells NR), signal-peptide cleavage (Cells R), and secretion (Medium) (Figures 2C-E and S2C), and rescued infectivity to ~30% of wild .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 type (Figure 2F). All N-terminal E-to-K mutants (E47K, E91K) oxidized and accumulated in a native-like position in gel but failed to form much of the sharp NT band seen in wild type (Figure 2C and S2A). The start of signal-peptide cleavage of both single mutants was delayed to 30-60 min after synthesis and total cleavage was 25-50% lower than wild type by 4 h (Cells R and Figure 2E and S2C). As a result, the secretion of all N- terminal mutants was decreased by ≥60% in 4 h compared to wild type (Medium, Figures 2C, D and S2A, C). The redundancy of two negative charges likely contributes to the intermediate folding phenotype of the N-terminal mutants and the ~10% residual viral infectivity of the E47K mutant (Figure 2F). We concluded that the C-terminal -strand was essential for proper folding of the - sandwich in the inner domain, which completes folding of gp120 and triggers signal- peptide cleavage. Timing of cleavage hence represents a checkpoint for proper folding of gp120. Retention of the signal peptide causes hypercompacting of gp160 During folding, gp120 undergoes extensive disulfide formation and isomerization before reaching its native state. These intermediates appear as “waves” on SDS-PAGE representing varying degrees of compactness of folding intermediates (Land et al., 2003). Because mutants of gp120 that exhibited delayed or absent cleavage all formed hypercompact forms that ran below NT (Figures 1B and 2C), we asked whether this heterogeneous electrophoretic mobility represented continued isomerization of gp120. We substituted the alanine in the -1-position relative to the cleavage site for a valine, to .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 prevent cleavage by the signal-peptidase complex (Figure 2G, Cells R). Initial oxidative folding of A30V was similar to that of wild type (Figure 2G, NR). The A30V mutant, however, did not form a native band but populated all forms, from reduced to hypercompact oxidized, probably isomerizing continuously. The monomeric forms gradually disappeared into disulfide-linked, SDS-insoluble aggregates that increased in size and eventually became too large to enter the gel (Figure 2G, Agg). In both wild-type and A30V gp120, an endoglycosidase H-resistant band appeared over time (Figure 2G, EHr). For wild-type gp120 this represents molecules that have transited through the Golgi complex and acquired an N-acetylglucosamine residue on their N-glycans but have yet to be secreted. For A30V gp120 this population may be due to the inaccessibility of some sugars for removal due to formation of SDS-insoluble aggregates. We concluded that retention of the signal peptide either promotes formation of these hypercompact forms or prevents recovery from them. Because all signal-peptide- retaining mutants showed a high propensity of aggregation, it is likely that these SDS- insoluble aggregates are comprised of hypercompact forms of gp120. Tethering the N- terminus appears beneficial for folding, but release of gp120 from both tether and isomerization-driving cysteine is vital for stabilization of the acquired native fold and release from the ER. The cysteine in the signal peptide interacts with cysteines in gp160 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 As the signal peptide stays attached to gp160 for at least 15 min after chain termination it influences both co- and post-translational folding, through the tethering of the N- terminus to the membrane, as well as through interactions with the mature gp120 sequence (Snapp et al., 2017). The hypercompacting in non-cleaved mutants by continued disulfide isomerization (Figure 2G) implies that an unpaired cysteine must be available to keep attacking formed disulfide bonds. Opening once-formed disulfides may improve folding yield as the folding protein regains conformational freedom, and at the same time has a chance to recover from non-native disulfide bonding. Existing disulfides may be attacked by a cysteine from an oxidoreductase in the ER, or by an intramolecular cysteine in gp120 [as shown for BPTI (Weissman and Kim, 1992, 1993)]. The unpaired cysteine in position 28 within the signal peptide is a likely interaction candidate, because it is part of the consensus sequence for the signal peptidase and as such (partially) exposed to the ER lumen. During translocation, C28 may interact with gp160 cysteines while they pass through the translocon. Folding analysis as in Figures 1 and 2 however showed that mutating C28 had no detectable effect on oxidative folding (Figure 3A, C28A): folding intermediates disappeared, folded NT appeared, and the signal peptide was cleaved similarly and at similar times as wild-type gp120. Either C28A was identical to wild type or differences are missed due to asynchrony of the folding gp120 population. To amplify mobility differences, we alkylated with iodoacetic acid, which adds a charge to each free cysteine it binds to. To better synchronize the folding cohort, we modified the pulse-chase protocol with a preincubation with puromycin to release unlabeled nascent chains before labeling and added cycloheximide in the chase media to block elongation of radiolabeled nascent chains. At .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 each chase time, gp120 C28A ran higher on a reducing gel (Figure S3), indicating that it has more free cysteines as it bound more iodoacetic acid than wild-type gp120. This could be due to either slower disulfide-bond formation, faster disulfide-bond reduction, or a combination of both, which suggests a role for C28 in the net gp120 disulfide formation or isomerization during folding. The importance of C28 became clear when we removed disulfide bond 54-74 in C1. Deletion of disulfide 54-74 prevents signal-peptide cleavage, but allows gp160 to reach a compact position just above natively folded protein NT [Figure 3B, (van Anken et al., 2008)]. When C28A was introduced into the 54-74 deletion, folding intermediates were blocked at a much earlier phase and remained significantly less compact (Figure 3B). The phenotype was the same when we combined C28A with the individual deletions of C54 or C74 (Figure S4). C28A not only prevented formation of compact folding intermediates, it also increased their heterogeneity. As C28 deletion aggravated folding defects of 54-74 disulfide bond mutants, C28 must have partially compensated for the 54-74 folding defect by participating in oxidative folding. We concluded that C28A in the signal peptide was important for oxidative folding of incompletely folded gp120, most likely for sustaining isomerization of non-native disulfide bonds, and is partially redundant with the 54-74 cysteines for this process. To analyze whether C28, in addition to redundancy, interacted directly with the 54-74 disulfide bond we used a 110 amino-acid truncation (111X, Figure 3C) for simplicity as it retained the signal peptide and contains a single native cysteine pair. Because .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 formation of this disulfide bond was not detectable by comparing reduced and non- reduced samples, we made use of an alkylation-switch assay [Figure 3D, (Appenzeller- Herzog and Ellgaard, 2008)]. In short, we radiolabeled cells expressing 111X and blocked free cysteines with NEM. Cells then were homogenized and denatured in 2% SDS and incubated again with NEM to block any free cysteines previously shielded by structure. After immunoprecipitation and reduction of disulfide bonds with TCEP, we alkylated resulting free cysteines with mPEG-malemide 5,000, which adds ~5 kDa of mass for each cysteine alkylated. Samples were immunoprecipitated again to remove mPEG and were analyzed by non-reducing 4-15% SDS-PAGE (Figure 3E). The 111X construct only showed weak disulfide-bond formation with only ~13% of molecules forming a disulfide bond (Figure 3E, Wt). Upon removal of the signal-peptide cysteine C28 however, the population that contained a disulfide bond increased significantly to ~22% (Figure 3F). The presence of C28 thus further destabilized the already unstable 54-74 disulfide bond. The non-native disulfide bond 28-74 in the C54A mutant barely formed, whereas the 28-54 disulfide bond in the C74A mutant was highly variable (Figure 3F). This suggests that disulfide bonds involving the signal-peptide cysteine 28 are unstable and may only occur transiently, a feature consistent with a transient role in disulfide isomerization. The N-terminal cysteines form long-range disulfides during early gp160 folding Gp120 undergoes constant disulfide isomerization during folding (Land et al., 2003) and prolonged association of the signal peptide appears to intrinsically sustain isomerization .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 and may further destabilize the already unstable 54-74 disulfide bond. Moreover, we have shown redundancy of C28 with disulfide bond 54-74, which is why we asked whether the three N-terminal cysteines in gp120 were taking part in long-range disulfide bonds during folding. We removed the V1V2 variable loops, which are not essential for folding and function (Bontjer et al., 2009), and inserted a cleavage site for the protease thrombin through mutagenesis (L125R). This removed all 3 disulfide bonds in V1V2, 126-196 and 131-157 by the loop deletion and 119-205 by mutation (C119-205A). We named the resulting construct gp120Th. Reduction after cleavage produces an N- terminal fragment of ~15 kDa containing the signal peptide and the 3 N-terminal cysteines C28, C54, and C74, plus a ~75 kDa fragment containing the rest of gp120 (Figure 4A). If long-range disulfides between the N and C-terminal fragments indeed exist, the cleaved, non-reduced molecule should run in the same position as uncleaved in non-reducing conditions, and should dissociate into the 2 fragments under reducing conditions. Radioactive pulse-chase experiments as described above were modified: instead of deglycosylation with EndoH we denatured the protein with 0.2% SDS and cleaved gp120 with 0.75 U Thrombin. The 2 fragments were separated by 15-20% discontinuous SDS-PAGE. As expected, gp120 that lacked all N-terminal cysteines did not form any long-distance disulfide bonds (Figure 4B and C, C28A C54-74A). We confirmed that wild-type gp120 contained long-distance disulfide bonds between N- terminal cysteines and the rest of the molecule during early folding (Figure 4B and C, Wt). Removal of C28 significantly reduced the number of molecules with a long-range .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 disulfide bond (Figure 4D), likely due to increased stability of the 54-74 disulfide bond in the absence of C28. Strikingly, all cysteine mutants that retained a single cysteine yielded some long-distance disulfides, suggesting that all three N-terminal cysteines could form a non-native pair with downstream cysteines in gp120 (Figure 4B-D). Removal of V1/V2 disulfides causes more rapid gp120 folding Perhaps counterintuitively, the thrombin-cleavage construct (gp120Th) folded faster than full-length gp120 (Figure 5A). Directly after the pulse, gp120Th ran as a more diffuse band whereas full-length gp120 (gp120 Wt) remained close to the reduced- gp120 mobility (Figure 5A NR). This increased compactness shows that gp120Th had already formed more or larger-loop-forming disulfide bonds (Snapp et al., 2017). As a result, signal-peptide cleavage of gp120Th was faster: almost complete for gp120Th after a 1-hour chase, compared to ~50% cleaved of gp120 Wt (Figure 5A, R). The 54-74 disulfide-bond mutants in gp120-Wt background fold to a stable intermediate just above the native position [Figures 3B and S4, (van Anken et al., 2008)] whereas the same mutants lacking V1V2 (in gp120Th) failed to accumulate in a single band (Figure 5B), reminiscent of the folding of gp120 C28A C54-74A (Figure 3B). This indicates that V1V2 deletion phenocopied C28 deletion in the 54-74 mutants. Redundancy of V1V2 with C28 was confirmed by the lack of additional effect of C28 removal in the gp120Th 54-74 mutants (Figure 5B). C28A results in decreased HIV-1 production and pseudovirus infectivity .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 As the biochemical data suggested a role for C28 in gp120 folding, we examined the effect of C28A mutation on viral production and infectivity. For this we transfected cells with a molecular clone containing the full HIV genome (LAI strain), containing either wild-type gp160 or the C28A mutant. As the reading frames of Env and HIV-1 Vpu overlap and mutations in the signal peptide of gp120 can cause changes in the C- terminus of Vpu, we produced the viruses in HEK 293T cells, which are deficient in CD4 and tetherin and therefore do not require Vpu to enhance virus production (Van Damme et al., 2008). We consistently detected significantly less C28A HIV virus than wild-type HIV produced (Figure 6A). Strikingly, the C28A virus was significantly more infectious than wild-type HIV but displayed strong heterogeneity in infectivities, indicative of heterogeneity in C28A gp160 incorporated into the virions (Figure 6B). Due to the severe deficit in virus production, despite increased infectivity, C28A-gp160-containing HIV is not likely to be competitive in nature. Indeed, alignments of >4,300 gp160 sequences from across all subtypes show that C28 is ~87% conserved (www.hiv.lanl.gov). To uncouple differences in virus production from infectivity we moved to a pseudovirus system, which allows analysis of the effect of C28A gp160 on infectivity alone (Figure 6C and D). As expected, we found very little difference in virus production between wild- type and C28A gp160 (Figure 6C). Infectivity of the C28A gp160 pseudovirus was roughly 60% less than the infectivity of wild type (Figure 6D). We concluded therefore that gp160 conformation, its function, and as a result HIV, suffered from the removal of the signal-peptide cysteine. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 Discussion During and for 15-30 min after synthesis, the N-terminus of HIV-1 gp160 remains tethered to the ER membrane by its transient signal anchor. We here show that conformational plasticity is enhanced through the cysteine in the signal peptide driving disulfide isomerization, in part via the 54-74 disulfide bond, until the gp120 C-terminus has assembled with the N-terminus, completing the inner-domain β-sandwich and gp120 folding (Figure 7). This triggers signal-peptide cleavage, removing C28 from the protein, halting further isomerization and stabilizing the native gp120 form. This intramolecular quality-control process is essential for viral fitness of HIV and can be impaired and restored by single charge reversals in the gp120 inner domain. Hierarchy of gp120 folding The inner domain with the β-sandwich and the outer domain, which contains a stacked double β-barrel, together constitute the minimal folding-capable “core” of gp120 [(Figure 2A), (Garces et al., 2015; Kwong et al., 1998)]. Gp120 is completed with the surrounding variable loops V1V2, V3, and V4 (green in Figures 1A, 2A and B). The core contains six of the nine disulfide bonds in gp120, including the five that are essential for correct folding and signal-peptide cleavage (van Anken et al., 2008). The inner-domain β-sandwich consists of seven strands, six of which are N-terminal. We here show that proper folding of the sandwich requires assembly with the C-terminal β31 strand and formation of the five essential disulfide bonds, which then leads to signal-peptide cleavage [Figure 2A, (van Anken et al., 2008)]. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 Some folding of the gp120 'hairpin' begins during translation and translocation into the ER (Land et al., 2003), but the bulk occurs post-translationally (Figure 7). The (pink) outer domain is the first complete domain to emerge from the translocon and has low contact order, meaning that folding does not require integration of distal residues (Figure 7). It is the first domain to fold, which requires formation of the two native disulfide bonds (296-331 and 385-418) in the -barrel: gp120 lacking either disulfide bond barely folds past the reduced position in SDS-PAGE (Sanders et al., 2008; van Anken et al., 2008). The (grey) inner domain of gp120 folds next: individual deletions of its three essential disulfide bonds (54-74, 218-247, and 228-239) fold into more compact structures than the outer-domain deletions (van Anken et al., 2008). The most intriguing is the 54-74 disulfide: the C54-74A mutant accumulates in a sharp band just above the native position, retaining its signal peptide. In contrast, C218-247A and C228-239A fail to form defined intermediates (van Anken et al., 2008), suggesting that these -sandwich- embracing disulfides (Figure 2A) stabilize the inner domain. Folding of the inner domain leads to cleavage of the signal peptide. Until that time, the signal peptide acts as signal anchor because it adopts an α-helical conformation that extends past the cleavage site and prevents proteolytic cleavage (Snapp et al., 2017). Folding and integration of the inner domain must break the helix and allow cleavage to occur, as in crystal structures of gp120 this early helical region is a β-strand (Garces et .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 al., 2015). As the disulfide bonds of V1/V2 and gp41 are dispensable for signal-peptide cleavage and their deletion shows no aberration in the folding pathway (van Anken et al., 2008), those domains likely fold after or independent of the outer and inner domains (Figure 7). This is underscored by gp120 folding and signal-peptide cleavage being largely independent of gp41 (Land et al., 2003), and by N- and C-terminal sequences in the gp120 inner domain forming the binding site of gp41 (Garces et al., 2015; Julien et al., 2013; Lyumkis et al., 2013). Gp41 binding may explain apparent inconsistencies between folding and function of some inner-domain mutants (Garces et al., 2015; Yang et al., 2003). The conserved gp41 binding site on the gp120 inner-domain -barrel also may explain the conservation (and hence value) of the intramolecular quality-control system: it ensures proper folding of this binding site, with high fidelity and well timed, before gp41 folding. The regulation of signal-peptide cleavage by folding of gp120 implies that the formation of the -sandwich generates sufficient force to break the attached α-helix and expose the cleavage site. Alpha-helical proteins have lower mechanical strength than β-sheet proteins, which often need to resist dissociation and unfolding; lower mechanical strength facilitates conformational changes to expose transient binding sites or allow signaling (Chen et al., 2015). Only ~5 pN indeed suffices for exposure of a protease- cleavage site in an α-helix: for proteolytic activation of Notch, cleavage of the NRR domain by ADAM17 (Gordon et al., 2015), and of the talin R3 domain, a 4-helix bundle (del Rio et al., 2009; Yao et al., 2014); the von Willebrand factor A2 domain requires ~8 pN (Zhang et al., 2009). Pulling apart a -sheet protein such as Ig domains, OspA, or .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 ubiquitin by shearing needs >160 pN (Brockwell et al., 2003; Carrion-Vazquez et al., 2003; Hertadi et al., 2003), with less force required for peeling (Brockwell et al., 2003). Physiological forces measured so far do not exceed ~40 pN, a force level at which proteins in general may be destabilized already (Chen et al., 2015). The α-helical region around the cleavage site in gp120 thus would lose the stability competition from the -sheet in the inner domain, if their structures are incompatible; indeed, in the gp160 structure this α-helical region is a -strand (Figures 2A and 7). First-time folding, i.e. the completion of the inner-domain -sandwich by assembly of β31, is likely to generate sufficient force as well, as 7-12 pN allows constant binding of a filamin -strand to a -sheet (Rognoni et al., 2012). Formation of the inner-domain disulfide bonds may further raise the stabilizing force (Eckels et al., 2019). The completion of gp120 therefore likely generates the ~5-pN force needed to break the α-helix and allow the signal peptidase to cleave off the gp120 signal peptide. Effects of the attached signal peptide The postponed cleavage of the signal peptide makes it a transient signal anchor, which acts as membrane tether. This limits conformational freedom and benefits folding, i.e. the integration of the C-terminal 31-strand into the folded inner-domain -sandwich, as in knotted proteins (Soler and Faisca, 2012). The prolonged proximity of the free signal-peptide cysteine (28) supports disulfide isomerization and increases conformational plasticity during gp120 folding. Gp120 requires a native set of disulfide bonds to attain its functional 3D structure, but already .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 during synthesis non-native disulfides are formed, which reshuffle over time (Land et al., 2003). The isomerization is detected as “waves” in SDS-PAGE, where the heterogeneous population of folding intermediates oscillates between higher and lower compactness over time [(Figure 3A),(Land et al., 2003)]. Despite extensive isomerization during folding, wild-type gp120 only transiently occupied forms more compact than native. In contrast, the various β-sandwich and uncleavable mutants extensively populated hypercompact states with non-native long-range disulfide bonds, indicating that without stable assembly of the N- and C-termini and resulting retention of the signal peptide, isomerization continues unabated and drives the formation of these hypercompact structures. The constant disulfide isomerization is sustained by the redox-active cysteine 28 in the signal peptide, as its sulfhydryl group is free to attack existing disulfide bonds. Once gp120 folding has reached a state where isomerization is no longer preferred (N- and C- termini in the inner domain assembled), the signal peptide is cleaved, removing an important, conserved driving force behind isomerization. Cleavage of the signal peptide then acts as a sink because it removes the disulfide-attacking cysteine and pulls the folding equilibrium to the native structure. Mode of action of C28 In a short construct, C28 favored a disulfide bond with C54 (Figure 3F). Despite a limited ability to form disulfide bonds with cysteines downstream of the 54-74 bond (Figure 4D), deletion of C28 did not aggravate the C54-74A defective phenotype in the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 absence of V1V2. C28 hence likely sustains isomerization primarily by constant attack and destabilization of the 54-74 disulfide bond. This propagates a free sulfhydryl group downstream through the folding protein, as the result of an intramolecular electron- transport chain from the more C-terminal cysteines via C54 and C74 to C28 (Figure 7). In the 110-residue gp120 chain (111X), essentially a mimic of a released gp120 nascent chain, in maximally 20% of molecules the 54-74 bond had formed, demonstrating its inherent instability as well as the likelihood that C28 already acts on 54 and 74 during translation. Only in the presence of V1V2, C28 showed redundancy with the 54-74 disulfide, implying that C28 can fulfill roles otherwise played by C54 and C74 (and vice versa). This suggests that C28 is involved in folding (and isomerization of V1V2) and may play this role by direct interaction with V1V2 cysteines, in absence of C54 and C74, distinct from its 54-74-mediated role in downstream disulfide bonds formation. We cannot exclude that the attack of C28 on the V1V2 disulfides leads to an alternative electron transport chain from C-terminal cysteines via V1V2 to C28. Either way, the redox-active C28 needs to be removed at the end of gp120 folding to ensure stability of the gp120 conformation. Intramolecular oxidoreductase and quality control for proper folding Built-in isomerase activity may seem redundant considering that gp120 folds in the ER, a compartment that contains >20 protein-disulfide-isomerase family members (Jansen et al., 2012). These oxidoreductases are large, bulky proteins however, which cannot .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 catalyze disulfide-bond formation once areas have attained significant tertiary structure. Isomerase activity built into the folding protein allows free-thiol propagation (in essence electron transport) in areas otherwise unreachable by folding enzymes. This perhaps should not be too surprising given that in folded proteins the majority of cysteines are solvent inaccessible (Srinivasan et al., 1990) and during folding disulfide bonds become resistant to reduction with DTT (Tatu et al., 1993; Tatu et al., 1995). An example of intramolecular disulfide isomerization is the cysteine in the pro-peptide of bovine pancreatic trypsin inhibitor (BPTI), which increased both the rate and yield of BPTI folding (Weissman and Kim, 1992). The majority of disulfide formation during in- vitro folding of BPTI results from intramolecular disulfide rearrangements (Creighton et al., 1993; Darby et al., 1995; Weissman and Kim, 1995). Transfer of free thiols between lumenal and transmembrane domains in the ER has been demonstrated for vitamin-K- epoxide reductase (Liu et al., 2014; Schulman et al., 2010), indicating that such exchanges are possible. While C28 is located in the transmembrane α-helix (Snapp et al., 2017), suggesting immersion in the membrane, sliding of transmembrane domains up and down in the membrane is possible (Borochov and Shinitzky, 1976; Danielson et al., 1994; Mowbray and Koshland, 1987). As C28 is part of the consensus sequence for signal-peptide cleavage, it likely is exposed to the ER lumen at least part of the time. Not only intramolecular oxidoreductase activity offers an advantage, also intramolecular quality control. Release of a protein from the ER requires its folding to the extent that chaperones do not bind anymore, for instance due to shielding of hydrophobic residues .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 from Hsp70 chaperones. The intramolecular quality control we here describe ensures a much more subtle regulation of conformational quality. Gp160's function as HIV-1 fusion protein requires the native interaction between gp41 and the gp120 inner domain. Only proper exposure of the gp41 binding site in gp120 will lead to a functional protein. Intramolecular quality control ensures precision to the level of single residues as well as precision of timing. Conserved and multiple roles for signal peptides Post-translational signal-peptide cleavage of gp120 is conserved across different subtypes of HIV-1 as biochemical properties, even if sequences are not strictly conserved (Snapp et al., 2017). This appears to be more general, as in other organisms signal peptides mutate at a lower rate than the surrounding mature peptide (Morrison et al., 2003; Williams et al., 2000), or they mutate at the same rate, but with an increased proportion of null (Veitia and Caburet, 2009) or function-preserving mutants (Garcia- Maroto et al., 1991). Function-altering mutants often have deleterious effects (Bonfanti et al., 2009; Piersma et al., 2006). Detailed kinetic analysis of signal-peptide cleavage has not been reported for a great number of proteins, and Western blotting often does not offer the necessary resolution, but gp160 is not alone in its biosynthesis-dependent and biosynthesis-regulated signal- peptide cleavage (Anjos et al., 2002; Daniels et al., 2003; Matczuk et al., 2013; Rutkowski et al., 2003; Zschenker et al., 2001). Whereas structure regulates cleavage in HCMV US11 (Rehm et al., 2001; Tamura et al., 2011), function regulates cleavage in .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 ERAD-associated protein EDEM1 (Rehm et al., 2001; Tamura et al., 2011). These studies demonstrate that a variety of conditions including nascent-chain length and N- glycan addition can play a role in signal peptide cleavage. Signal peptides are more than address labels and folding and signal-peptide cleavage are more interdependent than originally thought (Li et al., 1994; Rehm et al., 2001; Tamura et al., 2011). We argue that late signal-peptide cleavage may be much more common than biochemical experiments have uncovered. Cleavage may occur at any time from co- translationally until late post-translationally. Considering the low rate of protein synthesis, ~3 to 6 amino acids per second (Braakman et al., 1991; Horwitz et al., 1969; Ingolia et al., 2011; Knopf and Lamfrom, 1965), it can lead to long average synthesis times (~1.5–3 min for gp120 and ~2–5 min for gp160). In fact, translation rates are much more heterogeneous (Ingolia et al., 2019): nascent chains of influenza virus HA may take >15 min to complete, corresponding to a rate of less than one residue per second [(Braakman et al., 1991); unpublished observations]. For large proteins, the difference between early and late co-translational cleavage leaves a window of several minutes, during which the signal peptide functions as an anchor tethering the protein to the ER membrane. Sequence features in the signal peptide, such as an exposed cysteine, are given the opportunity to interact with the folding protein. The membrane tether limits conformational freedom of the protein and reduces overall conformational entropy, which is predicted to increase fidelity of protein folding and stability (Dill and Alonso, 1988; Zhou, 2008; Zhou and Dill, 2001). This may well benefit .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 the formation of N- and C-terminal contacts in proteins, which are present in ~50% of soluble PDB structures (Krishna and Englander, 2005) and are present in multiple multimeric viral glycoproteins (Chen et al., 1998; Garces et al., 2015; Gogala et al., 2014; Sauter et al., 1992; Sun et al., 2014). Here we have presented compelling evidence for the direct functional contribution of the signal peptide to HIV-1 gp160 folding. The signal peptide drives disulfide isomerization of gp120 during folding, increasing conformational plasticity while tethering the N- terminus, and functions as quality control organizer, leaving only after near-native conformation has been attained. As evidence grows, it becomes clear that signal peptides demonstrate functions far beyond their originally assigned roles as cellular postal codes. Acknowledgements: We would like to thank members of the Braakman-Van der Sluijs and Sanders labs for their fruitful discussions and insights. In particular Peter van der Sluijs for critical reading of the manuscript and Joseline Houwman for critical reading of the manuscript and design of figure 7. This work was supported by grants from the Dutch Research Council (NWO)- Chemical Sciences (I.Br, N.M, A.L, M.Q), the European Union 7th framework program, ITN “Virus Entry” (I.Br, N.M, M.Q), the European Union’s Horizon 2020 research and innovation program under grant agreement No. 681137 (R.W.S. and I.Bo). R.W.S. is a recipient of a Vici grant from the Dutch Research Council (NWO). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Author Contributions: Conceptualization: N.M, M.Q and I.B. Methodology: N.M. Investigation: N.M, M.Q, I.Bo and A.L. Writing – Original Draft: N.M, M.Q, and I.B. Writing – Review & Editing: N.M, M.Q, I.Bo, R.W.S, A.L and I.B. Funding Acquisition: R.W.S and I.B. Declaration of Interests: The authors declare no competing interests. Figure Legends Figure 1. Signal-peptide cleavage requires the gp120 C-terminus A) Schematic representation of gp160 amino-acid sequence with its signal peptide (orange) still attached [adapted from (Leonard et al., 1990)]. Gp120 inner domain (grey) and outer domain (pink) [according to (Pancera et al., 2010)], cysteines (red) are numbered and disulfide bonds represented by red bars. Thickness of disulfide bonds is representative of their importance for folding and (or) infectivity [thickest essential for folding, middle dispensable for folding, essential for infectivity, thinnest dispensable for both folding and infectivity (van Anken et al., 2008)]. Gp120 contains five constant regions (C1-C5) and five variable regions (green, V1-V5). Oligomannose and complex glycans are represented as three- or two-pronged forked symbols respectively [adapted from (Leonard et al., 1990)]. Amino-acid stretch 485-494 is marked in teal. B) HeLa cells transiently expressing gp120 Wt and C-terminal truncations were radiolabeled for 5 minutes and chased for the indicated times. After detergent lysis, samples were immunoprecipitated with polyclonal antibody 40336. After immunoprecipitation, samples .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 were analyzed by non-reducing (Cells NR) and reducing (Cells R) 7.5% SDS-PAGE. Gp120 was immunoprecipitated from medium samples with antibody 40336 and directly analyzed by reducing 7.5% SDS-PAGE (Medium). Gels were dried and exposed to Kodak-MR films or Fujifilm phosphor screens for quantification. Ru: reduced, signal peptide cleaved gp120, Rc: reduced, signal-peptide-uncleaved gp120, IT: intermediates, NT: native. Figure 2. Integration of gp120 N- and C-terminus regulates signal-peptide cleavage A) Gp120 crystal structure, 5CEZ (Garces et al., 2015), domains are colored as in Figure 1A. N and C termini are indicated; disulfide bonds are shown as red lines. Inner- domain -sandwich is boxed. B) Zoom in of inner-domain -sandwich. C-terminal -strand in teal with K487 forming hydrogen bonds (dashed lines) with E47, E91 and main-chain oxygen of N92. Beta strands are numbered, and disulfide bonds indicated as red lines. Amino acids are named and numbered according to HXB2 sequence. C) Experiments as in Figure 1C with HeLa cells expressing Wt gp120 or indicated - sandwich mutants. D) Quantifications of experiments performed as in C, intracellular levels at 15’ were used to correct for differences in expression between mutants and corrected values compared to wild-type secretion at 4 h. Error bars: SD. E) As in D except % signal peptide cleaved at 4 h was measured from reducing gels. F) Luciferase-based infectivity assay on TZM-bl cells. Cells were infected with 100 pg of HIV-1 LAI virus containing Wt or mutant gp160. Error bars: SD. G) Pulse-chase .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 performed as in Figure 1B. **: p<0.01, ****: p<0.0001. Complete statistical values are listed in Table 1. Figure 3. Signal-peptide cysteine is involved in gp120 oxidative folding A and B) Pulse-chase experiments performed as in Figure 1B. Ru: reduced, signal peptide cleaved gp120, Rc: reduced, signal-peptide-uncleaved gp120, IT: intermediates, NT: native. C) Schematic representation of gp160 111X truncation construct with its signal anchor (orange), ectodomain (grey), numbered cysteines (red) and disulfide bond indicated by red bar; C-terminal HA tag in yellow; N-glycan depicted as forked structure. D) Schematic representation of mPEG alkylation-switch assay. In short, free cysteines are alkylated by NEM, which is excluded from disulfide bonds. Disulfide bonds are then reduced and resulting free cysteines are alkylated with mPEG- malemide which provides a 5 kDa shift per cysteine alkylated when analyzed by SDS- PAGE. E) HEK 293T cells expressing the indicated 111X truncations were pulse labeled for 30 minutes in the presence (+) or absence (–) of 5 mM DTT. At the end of the pulse, cells were scraped from dishes, homogenized and subjected to the double-alkylation mPEG-malemide alkylation protocol depicted in D (Appenzeller-Herzog and Ellgaard, 2008). After alkylation, samples were immunoprecipitated with a polyclonal antibody recognizing the HA-tag and analyzed by non-reducing 4-15% gradient SDS-PAGE. *: background band. F) Autoradiographs from experiments performed as described in E were quantified. Error bars: SD. **: p< 0.01; complete statistical values are listed in Table 1. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Figure 4. Gp120 exhibits long-distance, non-native disulfide bonds during early folding A) Schematic representation of gp120 thrombin-cleavable construct. Inner domain (grey), outer domain (pink) and variable loops (green) from Figure 1A. Black bar indicates cleavage site for thrombin. B) Pulse-chase experiments conducted as in Figure 1B with a 5 min pulse labeling, except that detergent lysates were immunoprecipitated with polyclonal serum HT3. After immunoprecipitation, samples were cleaved with thrombin or mock treated for 15’ at RT. All samples then were analyzed by 15-20% discontinuous SDS-PAGE. NC: non-cleaved, full-length protein, C’: C-terminal fragment; N’: N-terminal fragment. C) Zoom in of gels from B showing full- length and C-terminal fragments, lane profiles were generated from autoradiographs in ImageQuant TL. D) Quantifications of autoradiographs from B. Values were calculated by dividing the signal in the N-terminal fragment by the full-length uncleaved protein and subtracting the value for reducing conditions from non-reducing conditions to determine percent of molecules with a long-distance disulfide bond. Resulting values then were normalized to wild type. Error bars: SD. *: p< 0.05, **: p<0.01, ***: p<0.001. Complete statistical values are listed in Table 1. Figure 5. Folding of thrombin-cleavable gp120 construct A) Pulse-chase experiments conducted as in Figure 1B except that HeLa cells were transfected with wild-type full-length gp120 (gp120 Wt) or thrombin-cleavable gp120 (gp120Th) and pulsed for 10 minutes. B) Pulse-chase experiments conducted as in Figure 1B except that HeLa cells were transfected with cysteine mutants of gp120Th. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 Polyclonal serum HT3 was used for immunoprecipitation from detergent lysates and samples were analyzed by 7.5% non-reducing SDS-PAGE. IT: folding intermediates, NT: native gp120, Ru: reduced signal-peptide-uncleaved gp120, Rc: reduced signal- peptide-cleaved gp120. Red text in (A) refers to gp120Th running positions. Figure 6. C28A gp160 is detrimental to HIV-1 production and pseudovirus infectivity A) HEK 293T cells were transfected with wild-type or mutant pLAI constructs and virus production was measured by CA-p24 ELISA. B) Infection assays were performed as in Figure 2F except with wild-type or C28A gp160 containing HIV-1, as produced in A. bg = background. C) Virus produced as in A, except cells expressed wild-type or mutant JR-FL constructs along with packaging plasmids. D) Infectivity assays were performed as in Figure 2E. Error bars: SD, pSG3ΔEnv: virus produced without gp160 plasmid, bg: background, *: p<0.05, **: p<0.01, ****: p<0.0001. Complete statistical values are listed in Table 1. Figure 7. Model for gp120 folding, signal-peptide cleavage and intramolecular disulfide shuffling. A) Post-translational domain folding and signal-peptide cleavage of gp120. Grey: inner domain, bright pink: outer domain, green: variable loops, orange: signal peptide, light pink: ribosome, blue: Sec61 translocon. B) Conformational changes in the signal peptide and proximal areas during gp120 folding that lead to cleavage. Colors as in (A). C) C28 sustains intramolecular disulphide isomerization by interacting with downstream .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 cysteine residues. Solid lines: interactions found experimentally, dashed lines: predicted interactions. Colors as in (A). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 Table 1. Complete statistical reporting for experiments in Figures 2, 4 and 6. Figure 2 ANOVA summary Multiple Comparisons (Sidak’s multiple comparison test) F P Value R Squared Pair Adjusted P Value 2D 21.97 <0.0001 0.8888 Wt vs K487E Wt vs E47K Wt vs E91K K487E vs E47K K487E <0.0001 0.0002 0.0004 0.0147 2E 36.33 <0.0001 0.9296 Wt vs K487E Wt vs E47K Wt vs E47K Wt vs E91K K487E vs E47K K492E <0.0001 0.0301 0.0695 <0.0001 <0.0001 2F 52.75 <0.0001 0.9361 Wt vs K487E Wt vs E47K Wt vs E47K Wt vs E91K K487E vs E47K K492E <0.0001 <0.0001 <0.0001 <0.0001 0.0004 Figure 4D Comparison P value Method Wt vs C28A 0.0175 Paired t test Wt vs C54A 0.0103 Paired t test Wt vs C74A 0.6213 Paired t test Wt vs C27A C54A 0.0123 Paired t test Wt vs C28A V74A 0.0006 Paired t test Wt vs C54- 74A 0.0222 Paired t test Wt vs C28A C54-74A 0.0045 Paired t test Figure 6 P value Method .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 6A 0.012 Unpaired t test 6B <0.0001 Unpaired t test 6C 0.0485 Unpaired t test 6D <0.0001 Unpaired t test Materials and methods Plasmids, antibodies, reagents and viruses The full-length molecular clone of HIV-1LAI (pLAI) was the source of wild-type and mutant viruses (Peden et al., 1991). The QuikChange Site-Directed Mutagenesis kit (Stratagene) was used to introduce mutations into Env in plasmid pRS1 as described before; the entire Env gene was verified by DNA sequencing (Sanders et al., 2004). Mutant Env genes from pRS1 were cloned back into pLAI as SalI-BamHI fragments. For transient transfection of gp120/160 we used the previously described pMQ plasmid (Snapp et al., 2017). C-terminal truncations were generated by PCR of wt gp120 and Gibson assembled back into XbaI/XhoI digested pMQ. The thrombin-cleavable construct was designed based on stable V1V2 loop deletion number 2 (Bontjer et al., 2009) and generated from gp120 C119-205A using Gibson assembly (Gibson et al., 2009). All point mutations were introduced using QuikChange Site-Directed mutagenesis as above. For immunoprecipitation: we used the previously described polyclonal rabbit anti-gp160 antibody 40336 which recognizes all forms of gp120 (Land et al., 2003), polyclonal antibody HT3 (NIH531) which was obtained from the NIH AIDS reagent program and - HA tag antibody “MrBrown” produced by us (Schildknegt et al., 2019). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 Although we studied gp160 of the LAI isolate, we followed the canonical HXB2 residue numbering (GenBank: K03455.1), which relates to the LAI numbering as follows: because of an insertion of five residues in the V1 loop of LAI gp160, all cysteine residues beyond this loop have a number that is 5 residues lower in HXB2 than in LAI: until Cys131, numbering is identical, but Cys162 in LAI becomes 157 in HXB2, etc. Thrombin was purchased as a lyophilized power from Sigma Aldrich (T-6634) and stored in thrombin-storage buffer [50 mM Sodium Citrate pH 6.5, 200 mM NaCl, 0.1% BSA (w/v), 50% glycerol (w/v)]. Cells and transfections The SupT1 cell line was cultured in Advanced RPMI 1640 medium (Gibco), supplemented with 1% fetal calf serum (v/v, FCS), 2 mM L-glutamine (Gibco), 15 units/ml penicillin and 15 µg/ml streptomycin. The TZM-bl reporter cell line, obtained from NIH AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH (John C. Kappes, Xiaoyun Wu, and Tranzyme, Inc., (Durham, NC)), the HEK293T cell line, and the C33A cell line were cultured in Dulbecco’s modified Eagle medium (Gibco) containing 10% FCS, 100 units/ml penicillin and 100 µg/ml streptomycin. HeLa cells (ATCC) were maintained in MEM containing 10% FCS, nonessential amino acids, glutamax and penicillin/streptomycin (100 U/ml). Twenty-four hours before pulse labeling, HeLa cells were transfected with pMQ gp120/gp160 or HA constructs using polyethylenimine (Polysciences) as described before (Hoelen et al., 2010). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 Virus production Virus stocks were produced by transfecting HEK293T cells with wild-type or mutant pLAI constructs using the Lipofectamine 2000 Transfection Reagent (Invitrogen) per manufacturer’s protocol. Production of virus stocks on C33A cells were done by calcium-phosphate precipitation. The virus-containing culture supernatants were harvested 2 days post-transfection, stored at -80°C, and the virus concentrations were quantitated by CA-p24 ELISA as described before (Moore and Jarrett, 1988). These values were used to normalize the amount of virus used in subsequent infection experiments. Single Cycle Infection The TZM-bl reporter cell line stably expresses high levels of CD4 and HIV-1 coreceptors CCR5 and CXCR4 and contains the luciferase and β-galactosidase genes under the control of the HIV-1 long-terminal-repeat (LTR) promoter (Wei et al., 2002). Single-cycle infectivity assays were performed as described before (Bontjer et al., 2009; Bontjer et al., 2010). In brief, one day prior to infection, 17 x 106 TZM-bl cells per well were plated on a 96-well plate in DMEM containing 10% FCS, 100 units/ml penicillin and 100 µg/ml streptomycin and incubated at 37ºC with 5% CO2. A fixed amount of virus LAI virus (500 pg of CA-p24) or a fixed amount of JR-FL or LAI pseudo-virus (1,000 pg of CA-p24) was added to the cells (70-80% confluency) in the presence of 400 nM saquinavir (Roche) to block secondary rounds of infection and 40 µg/ml DEAE in a total volume of 200 µl. Two days post-infection, medium was removed, cells were washed with phosphate-buffered saline (50 mM sodium phosphate buffer, pH 7.0, 150 mM NaCl) and lysed in Reporter .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 Lysis buffer (Promega). Luciferase activity was measured using a Luciferase Assay kit (Promega) and a Glomax luminometer (Turner BioSystems) per manufacturer’s instructions. Uninfected cells were used to correct for background luciferase activity. All infections were performed in quadruplicate. Folding assay HeLa cells transfected with wild-type or mutant gp160/gp120 constructs were subjected to pulse-chase analysis as described before (McCaul et al., 2019; Snapp et al., 2017). In short, cells were starved for cysteine and methionine for 15-30 min and pulse labeled for 5 min with 55 µCi/ 35-mm dish of Easytag express 35S protein labeling mix (Perkin Elmer). Where indicated (+DTT), cells were incubated with 5 mM DTT for 5 min before and during the pulse. The pulse was stopped, and chase started by the first of 2 washes with chase medium containing an excess of unlabeled cysteine and methionine. At the end of each chase, medium was collected, and cells were cooled on ice and further disulfide bond formation and isomerization was blocked with 20 mM iodoacetamide. Cells were lysed and detergent lysates and medium samples were subjected to overnight immunoprecipitation at 4C with polyclonal antibody 40336 against gp160. Deglycosylation, SDS-PAGE, and autoradiography Where appropriate, to identify gp160 folding intermediates, glycans were removed from lysate-derived gp120 or gp160 with Endoglycosidase H (Roche) treatment of the immunoprecipitates as described before (Land et al., 2003). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 Samples were subjected to non-reducing and reducing (25 mM DTT) SDS-PAGE. Gels were dried and exposed to super-resolution phosphor screens (FujiFilm) or Kodak Biomax MR films (Carestream). Phosphor screens were scanned with a Typhoon FLA- 7000 scanner (GE Healthcare Life Sciences). Quantifications were performed with ImageQuantTL software (GE Healthcare Life Sciences). mPEG treatment HEK 293T cells transfected with wild-type or mutant 111X were subjected to radioactive labeling as described above. At the end of the labeling, cells were transferred to ice and incubated in Dulbecco’s PBS without Ca2+ and Mg2+ containing 20 mM N-ethyl malemide (NEM) and 5 mM EDTA. Cells then were subjected to a modified “double- alkylation variant” mPEG treatment as described by Appenzeller-Herzog and Ellgaard (Appenzeller-Herzog and Ellgaard, 2008). In short, cells were homogenized by passage through a 25-G needle and proteins denatured with 2% SDS for 1 h @ 95 C. Samples then were alkylated again with 20 mM NEM before immunoprecipitation with anti-HA tag antibody MrBrown for 2 hours at 4 C. After immunoprecipitation, samples were denatured and reduced with 25 mM TCEP followed by incubation with 15 mM mPEG- mal 5000 for 1 h at room temperature. Samples were immunoprecipitated again via the HA-tag and analyzed by 4-15% non-reducing gradient SDS-PAGE (BioRad) and processed as before. Thrombin cleavage .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 After HeLa cells transfected with various thrombin-cleavable constructs were pulse- labeled as described above, detergent lysates were immunoprecipitated with antibody HT3 for 1 h at 4C with rotation. Immunoprecipitates were washed and resuspended in 15 µl thrombin cleavage buffer (20 mM Tris-HCl, pH 8.4, 150 mM NaCl, 2.5 mM CaCl2) + 0.2% SDS and denatured for 5 minutes at 95C. SDS was quenched by addition of 10 µl cleavage buffer + 2% Tx100. Thrombin (0.75 U) in 5 µl cleavage buffer then was added to samples and incubated for exactly 15 minutes. For mock-digested samples, an equivalent volume of thrombin storage buffer was added instead. Digestion was stopped by the addition of hot (95°C) 5X sample buffer and immediately placing in a 95C heat block for 5 minutes. Samples then were subjected to non-reducing or reducing (25 mM DTT) 15-20% discontinuous-gradient SDS-PAGE and processed as before. Statistical Reporting Statistics for each experiment were calculated using Prism 7 (Graphpad). For experiments in Figures 2D-F differences were assessed using a one-way ANOVA with follow-up testing to analyze differences between specific pairs with p values corrected for multiple comparisons. For experiments in Figure 3F and 4D differences were assessed using paired t-tests between wild-type and mutants. For experiments in Figure 6A-D differences were assessed using unpaired t-tests. A complete list of all pairs examined, statistical methods and resulting p values can be found in Table 1. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 Figure S1. N-terminal truncations of gp120 retain their signal peptides. Pulse-chase experiments were performed as in Figure 1B except that HeLa cells were transfected with the indicated gp120 truncations. Detergent lysates were immunoprecipitated either with polyclonal serum 40336 (a) or a polyclonal serum that recognizes the signal peptide (b). Figure S2. Inner domain β-sandwich mutants affect gp120 folding and HIV infectivity. A) Pulse-chase experiments were performed as in Figure 2C except that HeLa cells were transfected with the indicated mutants. B) Uncropped gels from Figure 2C. IT: folding intermediates, NT: native gp120, Ru: reduced signal-peptide-uncleaved gp120, Rc: reduced signal-peptide-cleaved gp120. C) Quantifications performed as in Figure 2D. D) Quantifications performed as in Figure 2E. E) Infection assays were performed as in Figure 2F. Error bars: SD. Figure S3. Synchronized folding of gp120 Wt and C28A. A) Pulse-chase experiment was performed as in Figure 1B except that cells expressing Wt or C28A gp120 were treated from 5 minutes before the pulse with 2 mM puromycin and chased in the presence of 500 mM cycloheximide. Samples were analyzed by reducing 7.5% SDS-PAGE after immunoprecipitation. B) Lane profiles from A. Ru: reduced signal-peptide-uncleaved gp120, Rc: reduced signal-peptide-cleaved gp120. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 41 Figure S4. Removal of signal-peptide cysteine C28 aggravates folding phenotype of 54-74 disulfide-bond mutants Pulse-chase experiments were performed as in Figure 1B except that HeLa cells expressed C28 and disulfide-bond 54-74 mutants. References Anjos, S., Nguyen, A., Ounissi-Benkalha, H., Tessier, M.C., and Polychronakos, C. (2002). A common autoimmunity predisposing signal peptide variant of the cytotoxic T- lymphocyte antigen 4 results in inefficient glycosylation of the susceptibility allele. J Biol Chem 277, 46478-46486. Appenzeller-Herzog, C., and Ellgaard, L. (2008). In vivo reduction-oxidation state of protein disulfide isomerase: the two active sites independently occur in the reduced and oxidized forms. Antioxid Redox Signal 10, 55-64. Blobel, G., and Dobberstein, B. (1975). Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J Cell Biol 67, 835-851. Bonfanti, R., Colombo, C., Nocerino, V., Massa, O., Lampasona, V., Iafusco, D., Viscardi, M., Chiumello, G., Meschi, F., and Barbetti, F. (2009). Insulin gene mutations as cause of diabetes in children negative for five type 1 diabetes autoantibodies. Diabetes Care 32, 123-125. Bontjer, I., Land, A., Eggink, D., Verkade, E., Tuin, K., Baldwin, C., Pollakis, G., Paxton, W.A., Braakman, I., Berkhout, B., et al. (2009). Optimization of human immunodeficiency virus type 1 envelope glycoproteins with V1/V2 deleted, using virus evolution. J Virol 83, 368-383. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 42 Borochov, H., and Shinitzky, M. (1976). Vertical displacement of membrane proteins mediated by changes in microviscosity. Proc Natl Acad Sci U S A 73, 4526-4530. Braakman, I., Hoover-Litty, H., Wagner, K.R., and Helenius, A. (1991). Folding of influenza hemagglutinin in the endoplasmic reticulum. J Cell Biol 114, 401-411. Brockwell, D.J., Paci, E., Zinober, R.C., Beddard, G.S., Olmsted, P.D., Smith, D.A., Perham, R.N., and Radford, S.E. (2003). Pulling geometry defines the mechanical resistance of a beta-sheet protein. Nat Struct Biol 10, 731-737. Carrion-Vazquez, M., Li, H., Lu, H., Marszalek, P.E., Oberhauser, A.F., and Fernandez, J.M. (2003). The mechanical stability of ubiquitin is linkage dependent. Nat Struct Biol 10, 738-743. Chen, J., Lee, K.H., Steinhauer, D.A., Stevens, D.J., Skehel, J.J., and Wiley, D.C. (1998). Structure of the hemagglutinin precursor cleavage site, a determinant of influenza pathogenicity and the origin of the labile conformation. Cell 95, 409-417. Chen, Y., Radford, S.E., and Brockwell, D.J. (2015). Force-induced remodelling of proteins and their complexes. Curr Opin Struct Biol 30, 89-99. Creighton, T.E., Bagley, C.J., Cooper, L., Darby, N.J., Freedman, R.B., Kemmink, J., and Sheikh, A. (1993). On the biosynthesis of bovine pancreatic trypsin inhibitor (BPTI). Structure, processing, folding and disulphide bond formation of the precursor in vitro and in microsomes. J Mol Biol 232, 1176-1196. Daniels, R., Kurowski, B., Johnson, A.E., and Hebert, D.N. (2003). N-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin. Mol Cell 11, 79-90. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 43 Danielson, M.A., Biemann, H.P., Koshland, D.E., Jr., and Falke, J.J. (1994). Attractant- and disulfide-induced conformational changes in the ligand binding domain of the chemotaxis aspartate receptor: a 19F NMR study. Biochemistry 33, 6100-6109. Darby, N.J., Morin, P.E., Talbo, G., and Creighton, T.E. (1995). Refolding of bovine pancreatic trypsin inhibitor via non-native disulphide intermediates. J Mol Biol 249, 463- 477. Decroly, E., Vandenbranden, M., Ruysschaert, J.M., Cogniaux, J., Jacob, G.S., Howard, S.C., Marshall, G., Kompelli, A., Basak, A., Jean, F., et al. (1994). The convertases furin and PC1 can both cleave the human immunodeficiency virus (HIV)-1 envelope glycoprotein gp160 into gp120 (HIV-1 SU) and gp41 (HIV-I TM). J Biol Chem 269, 12240-12247. del Rio, A., Perez-Jimenez, R., Liu, R., Roca-Cusachs, P., Fernandez, J.M., and Sheetz, M.P. (2009). Stretching single talin rod molecules activates vinculin binding. Science 323, 638-641. Dill, K.A., and Alonso, D.O.V. (1988). Conformational Entropy and Protein Stability (Berlin, Heidelberg: Springer Berlin Heidelberg). Earl, P.L., Doms, R.W., and Moss, B. (1990). Oligomeric structure of the human immunodeficiency virus type 1 envelope glycoprotein. Proc Natl Acad Sci U S A 87, 648-652. Earl, P.L., Moss, B., and Doms, R.W. (1991). Folding, interaction with GRP78-BiP, assembly, and transport of the human immunodeficiency virus type 1 envelope protein. J Virol 65, 2047-2055. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 44 Eckels, E.C., Haldar, S., Tapia-Rojo, R., Rivas-Pardo, J.A., and Fernandez, J.M. (2019). The Mechanical Power of Titin Folding. Cell Rep 27, 1836-1847 e1834. Ellgaard, L., McCaul, N., Chatsisvili, A., and Braakman, I. (2016). Co- and Post- Translational Protein Folding in the ER. Traffic 17, 615-638. Garces, F., Lee, J.H., de Val, N., de la Pena, A.T., Kong, L., Puchades, C., Hua, Y., Stanfield, R.L., Burton, D.R., Moore, J.P., et al. (2015). Affinity Maturation of a Potent Family of HIV Antibodies Is Primarily Focused on Accommodating or Avoiding Glycans. Immunity 43, 1053-1063. Garcia-Maroto, F., Castagnaro, A., Sanchez de la Hoz, P., Marana, C., Carbonero, P., and Garcia-Olmedo, F. (1991). Extreme variations in the ratios of non-synonymous to synonymous nucleotide substitution rates in signal peptide evolution. FEBS Lett 287, 67-70. Gibson, D.G., Young, L., Chuang, R.Y., Venter, J.C., Hutchison, C.A., 3rd, and Smith, H.O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6, 343-345. Gogala, M., Becker, T., Beatrix, B., Armache, J.P., Barrio-Garcia, C., Berninghausen, O., and Beckmann, R. (2014). Structures of the Sec61 complex engaged in nascent peptide translocation or membrane insertion. Nature 506, 107-110. Gordon, W.R., Zimmerman, B., He, L., Miles, L.J., Huang, J., Tiyanont, K., McArthur, D.G., Aster, J.C., Perrimon, N., Loparo, J.J., et al. (2015). Mechanical Allostery: Evidence for a Force Requirement in the Proteolytic Activation of Notch. Dev Cell 33, 729-736. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 45 Gorlich, D., Hartmann, E., Prehn, S., and Rapoport, T.A. (1992). A protein of the endoplasmic reticulum involved early in polypeptide translocation. Nature 357, 47-52. Görlich, D., Prehn, S., Hartmann, E., Kalies, K.-U., and Rapoport, T.A. (1992). A mammalian homolog of SEC61p and SECYp is associated with ribosomes and nascent polypeptides during translocation. Cell 71, 489-503. Hallenberger, S., Bosch, V., Angliker, H., Shaw, E., Klenk, H.D., and Garten, W. (1992). Inhibition of furin-mediated cleavage activation of HIV-1 glycoprotein gp160. Nature 360, 358-361. Hegde, R.S., and Bernstein, H.D. (2006). The surprising complexity of signal sequences. Trends Biochem Sci 31, 563-571. Hertadi, R., Gruswitz, F., Silver, L., Koide, A., Koide, S., Arakawa, H., and Ikai, A. (2003). Unfolding mechanics of multiple OspA substructures investigated with single molecule force spectroscopy. J Mol Biol 333, 993-1002. Hoelen, H., Kleizen, B., Schmidt, A., Richardson, J., Charitou, P., Thomas, P.J., and Braakman, I. (2010). The primary folding defect and rescue of DeltaF508 CFTR emerge during translation of the mutant domain. PLoS One 5, e15458. Horwitz, M.S., Scharff, M.D., and Maizel, J.V., Jr. (1969). Synthesis and assembly of adenovirus 2. I. Polypeptide synthesis, assembly of capsomeres, and morphogenesis of the virion. Virology 39, 682-694. Ingolia, N.T., Hussmann, J.A., and Weissman, J.S. (2019). Ribosome Profiling: Global Views of Translation. Cold Spring Harb Perspect Biol 11. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 46 Ingolia, N.T., Lareau, L.F., and Weissman, J.S. (2011). Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789-802. Jackson, R.C., and Blobel, G. (1977). Post-translational cleavage of presecretory proteins with an extract of rough microsomes from dog pancreas containing signal peptidase activity. Proc Natl Acad Sci U S A 74, 5598-5602. Jansen, G., Maattanen, P., Denisov, A.Y., Scarffe, L., Schade, B., Balghi, H., Dejgaard, K., Chen, L.Y., Muller, W.J., Gehring, K., et al. (2012). An interaction map of endoplasmic reticulum chaperones and foldases. Mol Cell Proteomics 11, 710-723. Julien, J.P., Cupo, A., Sok, D., Stanfield, R.L., Lyumkis, D., Deller, M.C., Klasse, P.J., Burton, D.R., Sanders, R.W., Moore, J.P., et al. (2013). Crystal structure of a soluble cleaved HIV-1 envelope trimer. Science 342, 1477-1483. Kanapin, A., Batalov, S., Davis, M.J., Gough, J., Grimmond, S., Kawaji, H., Magrane, M., Matsuda, H., Schonbach, C., Teasdale, R.D., et al. (2003). Mouse proteome analysis. Genome Res 13, 1335-1344. Knopf, P.M., and Lamfrom, H. (1965). Changes in the Ribosome Distribution during Incubation of Rabbit Reticulocytes in Vitro. Biochim Biophys Acta 95, 398-407. Krishna, M.M., and Englander, S.W. (2005). The N-terminal to C-terminal motif in protein folding and function. Proc Natl Acad Sci U S A 102, 1053-1058. Kwong, P.D., Wyatt, R., Robinson, J., Sweet, R.W., Sodroski, J., and Hendrickson, W.A. (1998). Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature 393, 648-659. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 47 Land, A., and Braakman, I. (2001). Folding of the human immunodeficiency virus type 1 envelope glycoprotein in the endoplasmic reticulum. Biochimie 83, 783-790. Land, A., Zonneveld, D., and Braakman, I. (2003). Folding of HIV-1 envelope glycoprotein involves extensive isomerization of disulfide bonds and conformation- dependent leader peptide cleavage. FASEB J 17, 1058-1067. Li, Y., Bergeron, J.J., Luo, L., Ou, W.J., Thomas, D.Y., and Kang, C.Y. (1996). Effects of inefficient cleavage of the signal sequence of HIV-1 gp 120 on its association with calnexin, folding, and intracellular transport. Proc Natl Acad Sci U S A 93, 9606-9611. Li, Y., Luo, L., Thomas, D.Y., and Kang, C.Y. (1994). Control of expression, glycosylation, and secretion of HIV-1 gp120 by homologous and heterologous signal sequences. Virology 204, 266-278. Li, Y., Luo, L., Thomas, D.Y., and Kang, C.Y. (2000). The HIV-1 Env protein signal sequence retards its cleavage and down-regulates the glycoprotein folding. Virology 272, 417-428. Lingappa, V.R., Devillers-Thiery, A., and Blobel, G. (1977). Nascent prehormones are intermediates in the biosynthesis of authentic bovine pituitary growth hormone and prolactin. Proc Natl Acad Sci U S A 74, 2432-2436. Liu, S., Cheng, W., Fowle Grider, R., Shen, G., and Li, W. (2014). Structures of an intramembrane vitamin K epoxide reductase homolog reveal control mechanisms for electron transfer. Nat Commun 5, 3110. Lyumkis, D., Julien, J.P., de Val, N., Cupo, A., Potter, C.S., Klasse, P.J., Burton, D.R., Sanders, R.W., Moore, J.P., Carragher, B., et al. (2013). Cryo-EM structure of a fully glycosylated soluble cleaved HIV-1 envelope trimer. Science 342, 1484-1490. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 48 Matczuk, A.K., Kunec, D., and Veit, M. (2013). Co-translational processing of glycoprotein 3 from equine arteritis virus: N-glycosylation adjacent to the signal peptide prevents cleavage. J Biol Chem 288, 35396-35405. McCaul, N., Yeoh, H.Y., van Zadelhoff, G., Lodder, N., Kleizen, B., and Braakman, I. (2019). Analysis of Protein Folding, Transport, and Degradation in Living Cells by Radioactive Pulse Chase. J Vis Exp. Moore, J.P., and Jarrett, R.F. (1988). Sensitive ELISA for the gp120 and gp160 surface glycoproteins of HIV-1. AIDS Res Hum Retroviruses 4, 369-379. Morrison, G.M., Semple, C.A., Kilanowski, F.M., Hill, R.E., and Dorin, J.R. (2003). Signal sequence conservation and mature peptide divergence within subgroups of the murine beta-defensin gene family. Mol Biol Evol 20, 460-470. Mowbray, S.L., and Koshland, D.E., Jr. (1987). Additive and independent responses in a single receptor: aspartate and maltose stimuli on the tar protein. Cell 50, 171-180. Peden, K., Emerman, M., and Montagnier, L. (1991). Changes in growth properties on passage in tissue culture of viruses derived from infectious molecular clones of HIV- 1LAI, HIV-1MAL, and HIV-1ELI. Virology 185, 661-672. Pfeiffer, T., Pisch, T., Devitt, G., Holtkotte, D., and Bosch, V. (2006). Effects of signal peptide exchange on HIV-1 glycoprotein expression and viral infectivity in mammalian cells. FEBS Lett 580, 3775-3778. Piersma, D., Berns, E.M., Verhoef-Post, M., Uitterlinden, A.G., Braakman, I., Pols, H.A., and Themmen, A.P. (2006). A common polymorphism renders the luteinizing hormone receptor protein more active by improving signal peptide function and predicts adverse outcome in breast cancer patients. J Clin Endocrinol Metab 91, 1470-1476. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 49 Rehm, A., Stern, P., Ploegh, H.L., and Tortorella, D. (2001). Signal peptide cleavage of a type I membrane protein, HCMV US11, is dependent on its membrane anchor. EMBO J 20, 1573-1582. Rognoni, L., Stigler, J., Pelz, B., Ylanne, J., and Rief, M. (2012). Dynamic force sensing of filamin revealed in single-molecule experiments. Proc Natl Acad Sci U S A 109, 19679-19684. Rutkowski, D.T., Ott, C.M., Polansky, J.R., and Lingappa, V.R. (2003). Signal sequences initiate the pathway of maturation in the endoplasmic reticulum lumen. J Biol Chem 278, 30365-30372. Sanders, R.W., Dankers, M.M., Busser, E., Caffrey, M., Moore, J.P., and Berkhout, B. (2004). Evolution of the HIV-1 envelope glycoproteins with a disulfide bond between gp120 and gp41. Retrovirology 1, 3. Sanders, R.W., Hsu, S.T., van Anken, E., Liscaljet, I.M., Dankers, M., Bontjer, I., Land, A., Braakman, I., Bonvin, A.M., and Berkhout, B. (2008). Evolution rescues folding of human immunodeficiency virus-1 envelope glycoprotein GP120 lacking a conserved disulfide bond. Mol Biol Cell 19, 4707-4716. Sauter, N.K., Hanson, J.E., Glick, G.D., Brown, J.H., Crowther, R.L., Park, S.J., Skehel, J.J., and Wiley, D.C. (1992). Binding of influenza virus hemagglutinin to analogs of its cell-surface receptor, sialic acid: analysis by proton nuclear magnetic resonance spectroscopy and X-ray crystallography. Biochemistry 31, 9609-9621. Schildknegt, D., Lodder, N., Pandey, A., Egmond, M., Pena, F., Braakman, I., and van der Sluijs, P. (2019). Characterization of CNPY5 and its family members. Protein Sci 28, 1276-1289. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 50 Schulman, S., Wang, B., Li, W., and Rapoport, T.A. (2010). Vitamin K epoxide reductase prefers ER membrane-anchored thioredoxin-like redox partners. Proc Natl Acad Sci U S A 107, 15027-15032. Snapp, E.L., McCaul, N., Quandte, M., Cabartova, Z., Bontjer, I., Källgren, C., Nilsson, I., Land, A., von Heijne, G., Sanders, R.W., et al. (2017). Structure and topology around the cleavage site regulate post-translational cleavage of the HIV-1 gp160 signal peptide. eLife 6, e26067. Soler, M.A., and Faisca, P.F. (2012). How difficult is it to fold a knotted protein? In silico insights from surface-tethered folding experiments. PLoS One 7, e52343. Srinivasan, N., Sowdhamini, R., Ramakrishnan, C., and Balaram, P. (1990). Conformations of disulfide bridges in proteins. Int J Pept Protein Res 36, 147-155. Sun, X., Li, Q., Wu, Y., Wang, M., Liu, Y., Qi, J., Vavricka, C.J., and Gao, G.F. (2014). Structure of influenza virus N7: the last piece of the neuraminidase "jigsaw" puzzle. J Virol 88, 9197-9207. Tamura, T., Cormier, J.H., and Hebert, D.N. (2011). Characterization of early EDEM1 protein maturation events and their functional implications. J Biol Chem 286, 24906- 24915. Tatu, U., Braakman, I., and Helenius, A. (1993). Membrane glycoprotein folding, oligomerization and intracellular transport: effects of dithiothreitol in living cells. EMBO J 12, 2151-2157. Tatu, U., Hammond, C., and Helenius, A. (1995). Folding and oligomerization of influenza hemagglutinin in the ER and the intermediate compartment. EMBO J 14, 1340-1348. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 51 van Anken, E., Sanders, R.W., Liscaljet, I.M., Land, A., Bontjer, I., Tillemans, S., Nabatov, A.A., Paxton, W.A., Berkhout, B., and Braakman, I. (2008). Only five of 10 strictly conserved disulfide bonds are essential for folding and eight for function of the HIV-1 envelope glycoprotein. Mol Biol Cell 19, 4298-4309. Van Damme, N., Goff, D., Katsura, C., Jorgenson, R.L., Mitchell, R., Johnson, M.C., Stephens, E.B., and Guatelli, J. (2008). The interferon-induced protein BST-2 restricts HIV-1 release and is downregulated from the cell surface by the viral Vpu protein. Cell Host Microbe 3, 245-252. Veitia, R.A., and Caburet, S. (2009). Extensive sequence turnover of the signal peptides of members of the GDF/BMP family: exploring their evolutionary landscape. Biol Direct 4, 22. von Heijne, G. (1983). Patterns of amino acids near signal-sequence cleavage sites. Eur J Biochem 133, 17-21. von Heijne, G. (1984). Analysis of the distribution of charged residues in the N-terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. EMBO J 3, 2315-2318. von Heijne, G. (1985). Signal sequences. The limits of variation. J Mol Biol 184, 99-105. Walter, P. (1981). Translocation of proteins across the endoplasmic reticulum III. Signal recognition protein (SRP) causes signal sequence-dependent and site- specific arrest of chain elongation that is released by microsomal membranes. The Journal of Cell Biology 91, 557-561. Weissman, J.S., and Kim, P.S. (1992). The pro region of BPTI facilitates folding. Cell 71, 841-851. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 52 Weissman, J.S., and Kim, P.S. (1993). Efficient catalysis of disulphide bond rearrangements by protein disulphide isomerase. Nature 365, 185-188. Weissman, J.S., and Kim, P.S. (1995). A kinetic explanation for the rearrangement pathway of BPTI folding. Nat Struct Biol 2, 1123-1130. Williams, E.J., Pal, C., and Hurst, L.D. (2000). The molecular evolution of signal peptides. Gene 253, 313-322. Wyatt, R., and Sodroski, J. (1998). The HIV-1 envelope glycoproteins: fusogens, antigens, and immunogens. Science 280, 1884-1888. Yang, X., Mahony, E., Holm, G.H., Kassa, A., and Sodroski, J. (2003). Role of the gp120 inner domain β-sandwich in the interaction between the human immunodeficiency virus envelope glycoprotein subunits. Virology 313, 117-125. Yao, M., Goult, B.T., Chen, H., Cong, P., Sheetz, M.P., and Yan, J. (2014). Mechanical activation of vinculin binding to talin locks talin in an unfolded conformation. Sci Rep 4, 4610. Zhang, X., Halvorsen, K., Zhang, C.Z., Wong, W.P., and Springer, T.A. (2009). Mechanoenzymatic cleavage of the ultralarge vascular protein von Willebrand factor. Science 324, 1330-1334. Zhou, H.X. (2008). Protein folding in confined and crowded environments. Arch Biochem Biophys 469, 76-82. Zhou, H.X., and Dill, K.A. (2001). Stabilization of proteins in confined spaces. Biochemistry 40, 11289-11293. Zschenker, O., Jung, N., Rethmeier, J., Trautwein, S., Hertel, S., Zeigler, M., and Ameis, D. (2001). Characterization of lysosomal acid lipase mutations in the signal .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 53 peptide and mature polypeptide region causing Wolman disease. J Lipid Res 42, 1033- 1040. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 54 28 74 119 126 131 157 196 205 218 228 239 247 296 331 378 385 418 445 598 604 ►◄ C6 C7 gp120 gp41 Signal Peptide ER-membrane ER lumen Cytosol N- C- V1 V2 V3 V4 V5 C1 C2 C3 C4 C5 485 494 A F F A A I L A L L A V G GGG S S S L L S E L S L S E L S L R N N N N N N N T E L I S T Q E L I S T W L I S T Q W E L W L R Q V I T Q F V W E L I S T Q F W E L I S W L A I Q V W K E L A I Q V W K A R Q V A I T Q V W K E M L A I T Q W C M L W C M D L A Y H Q V K D L A Y H P M A R N N N N N N G G G G G G G R R R E R QE L S T Q E L S F L Q E L Q R Q P T Q P V E L I T Q P W L I W D L I Y Q P V K D L A I Y H P V K A R N G G G G G G G R R R R L S L S L L R L S D L FD L E L R V E L I T DL I C D L I V DL I Y H V A R N G G G R R R R R SE L R E L S E L S L A E L A I V W A RQ V A I T Q W E L A I T Q W L I W C L A I Y Q V K D LA I Y H P V K A R N N N G G G G SR W V F I I M I V K H Q G V W K E M LR Y P V W K E R V W K Y G V K T Y V W L V W KE A T N V CA T H F N V C E D L A S T P N V K E D A T P N V W K E D A T Y H Q P N V K E M D L A I T Y Q F N V W K E MD I T H Q V W K E M D LI S V G P V L PV A T G N WK E MDL R SY G N DR G N VK A R Q G P N V K E R G N R G F N E D I R S NS G N E S E S M S K E M I S T K M T G N F S C A T N K L L C L S P V C K D L S TP V C K L T G N N C I S T Q V C AT G R N K I T IS T K D I S T P K L T Y G R P F V I P F N V K A I S T P F C E I S T F C D A I S T Y Q P V C K E D L A I S T Y H Q P V C K L A T Y G R N NN N G R S T LS L IV E L I V G R V IT Q V E L I S T Q F E L I S T F C L A I S T Q P V C E D L A I ST H Q P V C K L A T G R N N N N G R G R G RN NN GR K I K G R A I K A I Q I S F A I S T Q K E L A I S T H Q P V W K M L A T R N N N N N G G F F I S F I S QK E I S T H Q P V C K T R N N N G G D S G S T S F E G F T F E I S T Q P F E I S T F WI S T Q P W C K E M D L A I S T Y Q P V W C K M L A T Y R N N N N N N G G G C AS T G M L I T G M L W L I R 0’ 15’ 30’ 1h 2h 4h 1h 2h 4h0’ 15’ 30’ 1h 2h 4h Chase WtIT NT Ru Rc 494XIT NT Ru Rc 485X IT NT Ru Rc Cells NR Cells R Medium B Figure 1 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 54-74 223-252228-239 119-205 378-445 385-418 296-331 126-196 131-157 C N A K487 E91 N92 E47 B β31β2 β5 0’ 30’15’ IT NT 2h1h 4h Rc 0’ 30’15’ 2h1h 4h 2h1h 4h Ru Cells NR Cells R Medium IT NT Rc Ru Rc Ru IT NT Rc Ru IT NT IT NT Rc Ru C Wt E47K K487E E91K E47K K487E InfectivitySignal-peptide cleavageSecretion FED G A30V Wt 0’ 15 ’ 30 ’ 1h 2h 4h 0 15 30 1h 2h 4h Cells NR Cells R Medium Agg IT NT Ru Rc EHr Figure 2 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 0’ 2’ 5’ 15 ’ 30 ’ 60 ’ 90 ’ 12 0’ 0’ 2’ 5’ 15 ’ 30 ’ 60 ’ 90 ’ 12 0’ IT NT Ru Rc Chase NR R Wt C28A A C2 8A C2 8A C2 8A C2 8A C5 4- 74 A C5 4- 74 A C5 4- 74 A C5 4- 74 A C2 8A C 54 -7 4A C2 8A C 54 -7 4A C2 8A C 54 -7 4A C2 8A C 54 -7 4A 0’ 15’ 30’ 60’ IT NT Chase B ER-membrane ER lumen Cytosol N- -C C1 54 74 HA-Tag Signal anchor H Q GV WK E M LR Y P P P V W K E R V W K Y Y Y G V K T Y V W L V W KE A T N V CA A T H F N V C D L A S T P N V E A T P N V W K E D A T H Q P N V E M D L A T Y F N V W K E MD I T H Q V V E M D D D I S C AS T G M L I T G M L W L I C c c 1. NEM c c c c c c c c c c TCEP c cc c 2. mPEG c cc c SDS-PAGE D F Figure 3 E + - + - + - + - + -DTT Pulse Wt C28A C54A C74A C28A C54-74A c c c c * .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ N- -C ~95 kDa N ~15 kDa C ~75 kDa A NR RNR R NR RNR R NR RNR R NR RNR R NR RNR R NR RNR R NR RNR R NR RNR R +- +- +- +- +- + + +- - - Thrombin NC C’ N’ Wt C28A C54A C74A C28A C54A C28A C74A C54-74A C28A C54-74A 100 70 15 B C « « « « « « « « « < < Wt C28A C54A C74A C28A C54A C28A C74A C54-74A C28A C54-74A NC C’ D Figure 4 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 50 40 30 25 0’ 15’ 30’ W t C2 8A C5 4A C7 4A C2 8A C 54 A C2 8A C 74 A C5 4- 74 A C2 8A C 54 -7 4A W t C2 8A C5 4A C7 4A C2 8A C 54 A C2 8A C 74 A C5 4- 74 A C2 8A C 54 -7 4A W t C2 8A C5 4A C7 4A C2 8A C 54 A C2 8A C 74 A C5 4- 74 A C2 8A C 54 -7 4A IT NT B 0’ 15’ 30’ 1h 0’ 15’ 30’ 1h 0’ 15’ 30’ 1h 0’ 15’ 30’ 1h 50 60 70 40 IT NT NT A Chase gp120 Wt gp120 Wtgp120Th gp120Th Ru Ru Rc Rc Figure 5 NR R .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 6 A B C D .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ completion inner-domain breaks I IVIIIII Cytosol ER lumen structure signal-peptide conformation to cleavage Signal peptide sustains disulfide isomerization and restrains N-terminus 28C 54 C 74C C CC C C C C C C C C C C C C C OD ID V ID V OD N C OD α-helical prevents cleavage of folding helix stabilized due signal-peptide ID V OD Figure 7 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.07.08.188672doi: bioRxiv preprint https://doi.org/10.1101/2020.07.08.188672 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_11_24_390039 ---- O'Keefe et al. JOCES 2020 257758 resubmission Ipomoeassin-F inhibits the in vitro biogenesis of the SARS- CoV-2 spike protein and its host cell membrane receptor Sarah O’Keefe1,4, Peristera Roboti1, Kwabena B. Duah2, Guanghui Zong3, Hayden Schneider2, Wei Q. Shi2 and Stephen High1,4 1School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PT, United Kingdom 2Department of Chemistry, Ball State University, Muncie, Indiana 47306, USA 3Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, USA 4Lead Contacts for correspondence: sarah.okeefe@manchester.ac.uk; stephen.high@manchester.ac.uk Running Title Ipom-F as a potential antiviral agent Keywords Cell-free translation, Endoplasmic reticulum (ER), ER membrane complex (EMC), Sec61 translocon, viral protein biogenesis. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 2 of 21 Abstract In order to produce proteins essential for their propagation, many pathogenic human viruses, including SARS-CoV-2 the causative agent of COVID-19 respiratory disease, commandeer host biosynthetic machineries and mechanisms. Three major structural proteins, the spike, envelope and membrane proteins, are amongst several SARS-CoV-2 components synthesised at the endoplasmic reticulum (ER) of infected human cells prior to the assembly of new viral particles. Hence, the inhibition of membrane protein synthesis at the ER is an attractive strategy for reducing the pathogenicity of SARS-CoV-2 and other obligate viral pathogens. Using an in vitro system, we demonstrate that the small molecule inhibitor ipomoeassin F (Ipom-F) potently blocks the Sec61-mediated ER membrane translocation/insertion of three therapeutic protein targets for SARS-CoV-2 infection; the viral spike and ORF8 proteins together with angiotensin-converting enzyme 2, the host cell plasma membrane receptor. Our findings highlight the potential for using ER protein translocation inhibitors such as Ipom-F as host-targeting, broad-spectrum, antiviral agents. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 3 of 21 Introduction Many viruses, including SARS-CoV-2 (Zhou et al., 2020; Zhu et al., 2020) (Fig. 1A), hijack the host cell secretory pathway to correctly synthesise, fold and assemble important viral proteins (Bojkova et al., 2020; Gordon et al., 2020; Sicari et al., 2020). Hence, small molecule inhibitors of Sec61-mediated co-translational protein entry into the endoplasmic reticulum (ER) (Luesch and Paavilainen, 2020) have potential as broad-spectrum antivirals (Heaton et al., 2016; Shah et al., 2018). Such inhibitors offer a dual approach; first, by directly inhibiting production of key viral proteins and, second, by reducing levels of host proteins co-opted during viral infection. Hence, human angiotensin-converting enzyme 2 (ACE2) is an important host cell receptor for SARS-CoV-2 viral entry (Cantuti-Castelvetri et al., 2020; Daly et al., 2020; Walls et al, 2020) synthesised at the ER prior to its trafficking to the plasma membrane (Warner et al., 2005). Our recent studies show that ipomoeassin-F (Ipom-F) (Fig. 1B) is a potent and selective inhibitor of Sec61-mediated protein translocation at the ER membrane (Zong et al., 2019; O’Keefe et al., 2020 submitted). Given that SARS-CoV-2 membrane proteins likely co-opt host mechanisms of ER entry (cf. Gordon et al., 2020; Sicari et al., 2020), we concluded that their sensitivity to Ipom-F would likely be comparable to that of endogenous Sec61 clients (Fig. 1C; see also Zong et al., 2019; O’Keefe et al., 2020 submitted). We, therefore, evaluated the effects of Ipom-F on SARS-CoV-2 proteins containing hydrophobic ER targeting signals (Fig. 1D). The in vitro membrane insertion of the viral spike (S) protein and membrane translocation of the ORF8 protein are both strongly inhibited by Ipom- F, whilst several other viral membrane proteins are unaffected (Fig. 2). Likewise, the ER integration of ACE2, an important host receptor for SARS-CoV-2 (Walls et al., 2020), is highly sensitive to Ipom-F (Fig. 2). We show that the principle molecular basis for the Ipom-F sensitivity of SARS- CoV-2 proteins is their dependence on Sec61, as dictated by their individual structural features and membrane topologies (Fig. 3). Taken together, our in vitro study of SARS-CoV-2 protein synthesis at the ER highlights Ipom-F as a promising candidate for the development of a broad-spectrum, host-targeting, antiviral agent. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 4 of 21 Results and Discussion Ipom-F selectively inhibits ER translocation of the viral ORF8 and S proteins To explore the ability of Ipom-F to inhibit the ER translocation of a small, yet structurally diverse, panel of SARS-CoV-2 membrane and secretory-like proteins, we first used a well-established in vitro translation system supplemented with canine pancreatic microsomes (Fig. 2A). To facilitate the detection of ER translocation, we modified the viral ORF8, S, E, M and ORF6 proteins by adding an OPG2-tag; an epitope that supports efficient ER lumenal N-glycosylation and enables product recovery via immunoprecipitation, without affecting Ipom-F sensitivity (Fig. S1A) (O’Keefe et al., 2020 submitted). For viral proteins that lack endogenous sites for N-glycosylation, such as the E protein, the ER lumenal OPG2-tag acts as a reporter for ER translocation and enables their recovery of by immunoprecipitation. Where viral proteins already contain suitable sites for N- glycosylation (S and M proteins), the cytosolic OPG2-tag is used solely for immunoprecipitation. The identity of the resulting N-glycosylated species for each of these OPG2-tagged viral proteins was confirmed by endoglycosidase H (Endo H) treatment of the radiolabelled products associated with the membrane fraction prior to SDS-PAGE (Fig. 2B, cf. lanes 1 and 2 in each panel). Using ER lumenal modification of either endogenous N-glycosylation sites (viral S and M proteins) or the appended OPG2-tag (viral E and ORF8 proteins) as a reporter for ER membrane translocation, we found that 1 µM Ipom-F strongly inhibited both the translocation of the soluble, secretory-like protein ORF8-OPG2 and the integration of the type I transmembrane proteins (TMP) S-OPG2, and truncated derivatives thereof (Fig. 2B, Fig. 2C, Fig. S1C). Furthermore, membrane insertion of the human type I TMP, ACE2, was inhibited to a similar extent (Fig. 2B, Fig. 2C, ~70 to ~90% inhibition for these three proteins). These results mirror previous findings showing that precursor proteins bearing N- terminal signal peptides, and which are therefore obligate clients for the Sec61- translocon, are typically very sensitive to Ipom-F-mediated inhibition (Zong et al., 2019; O’Keefe et al., 2020 submitted). In the context of SARS-CoV-2 infection, wherein ACE2 acts as an important host cell receptor for the SARS-CoV-2 virus .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 5 of 21 via its interaction with the viral S protein (Walls et al., 2020), these data suggest that an Ipom-F-induced antiviral effect might be achieved via selective reductions in the biogenesis of both host and viral proteins (cf. Fig. 1A). In contrast to the viral S and ORF8 proteins, insertion of the viral E protein was unaffected by Ipom-F (Fig. 2B-C), consistent with its recent classification as a type III TMP (Duart et al., 2020). Type III TMP integration is highly resistant to Ipom-F (Zong et al., 2019), most likely because they exploit a novel pathway for ER insertion (cf. Fig. 3; O’Keefe et al., 2020 submitted). We therefore conclude that the known substrate-selective inhibitory action of Ipom-F at the Sec61 translocon is directly applicable to viral membrane proteins; whereby the ER translocation of secretory proteins and type I TMPs, but not type III TMPs, is efficiently blocked by Ipom-F. The viral M protein is a multi-pass TMP with its first TMD oriented so the N- terminus is exoplasmic (Nexo) and hence can be considered “type III-like”. Although human multi-pass TMPs of this type typically require both the ER membrane complex (EMC) and Sec61 translocon for their authentic ER insertion (Chitwood et al., 2018), Ipom-F had no significant effect on the ER translocation/insertion of the M protein in vitro, as judged by the efficiency of N- glycosylation of its N-terminal domain (Fig. 2C). We conclude that the integration of its first TMD is unaffected by Ipom-F, consistent with its use of the EMC (Chitwood et al., 2018; O’Keefe et al., 2020 submitted). There is however a qualitative reduction in the intensity of both the non- and N-glycosylated forms of the M protein when compared to the control (see Fig. 2B and Fig. S1A). We speculate that this decrease may reflect an Ipom-F-induced effect on the Sec61- dependent integration of the second and/or third TM-spans of the M protein (cf. Chitwood et al., 2018) and our future studies will aim to resolve this question. Nevertheless, like similar host cell multi-pass TMPs that are resistant to a similar Sec61 inhibitor mycolactone (Morel et al., 2018), the M protein appears more resistant to Ipom-F than either the S or ORF8 proteins (Fig 2, Fig. S1A). In practice, the potential resistance of this highly abundant and functionally diverse class of endogenous multi-spanning membrane proteins (von Heijne, 2007) may limit any Ipom-F-induced cytotoxicity towards host cells. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 6 of 21 ORF6 assumes a lumenal-facing hairpin topology in ER-derived microsomes Cell-based studies of the ORF6 protein from SARS-CoV-1 suggest it has an unusual hairpin topology with both its N- and C-termini located on the exoplasmic side of the host cell membrane, to which it binds via an N-terminal amphipathic helix (Netland et al., 2007). To independently determine the membrane topology of SARS-CoV-2 ORF6, we prepared versions with OPG2 tags at both its N- and C-termini, or single tagged equivalents (see Fig. 2B, schematics, OPG2-ORF6- OPG2, OPG2-ORF6 and ORF6-OPG2). Following membrane insertion, doubly tagged OPG2-ORF6-OPG2 shows significant amounts of species with 3- and 4- N-linked glycans (Fig. 2B). This pattern confirms that the SARS-CoV-2 ORF6 protein assumes a ‘hairpin’ conformation in the ER membrane with both its N- and C-termini in the lumen (Fig. 2B, OPG2-ORF6-OPG2). These 3- and 4-N- glycan bearing OPG2-ORF6-OPG2 species are also resistant to extraction with alkaline sodium carbonate buffer (Fig. S1D) and protected from added protease (Fig. S1E), further indicating that the majority of the ORF6 protein is stably associated with the ER membrane in a ‘hairpin’ (Nexo/Cexo) topology. Consistent with this unusual membrane topology, we find no indication that the membrane insertion of any of our OPG2-tagged ORF6 variants is reduced by Ipom-F, strongly suggesting that its association with the inner leaflet of the ER membrane does not require protein translocation via the central channel of the Sec61 translocon (Gérard et al., 2020; O’Keefe et al., 2020 submitted). We noted a sub-set of OPG2-ORF6-OPG2 species bearing only a single N-glycan was also clearly present in the membrane-associated fraction with or without Ipom-F treatment (Fig. 2B, OPG2-ORF6-OPG2, see 1Gly). Based on comparison to singly OPG2-tagged variants (Fig. 2B), we conclude that OPG2-ORF6-OPG2- 1Gly has its N-terminus in the ER lumen, where only one of its two consensus sites is efficiently N-glycosylated (cf. Nilsson and von Heijne, 1993), whilst its C- terminus is either ER luminal but non-glycosylated or remains on the cytosolic side of the membrane. In the latter case, it may be that, in addition to its hairpin topology, some fraction of ORF6 may be integrated into ER-derived microsomes as a type III TMP (cf. Fig. S2E; see also Netland et al., 2007) that is resistant to Ipom-F inhibition (this study; Zong et al., 2019; O’Keefe et al., 2020 submitted). .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 7 of 21 The molecular basis for SARS-CoV-2 protein sensitivity to Ipom-F Having ascertained that Ipom-F inhibits the ER membrane translocation/insertion of the viral ORF8 and S proteins, but not of the ORF6, E or M proteins (Fig. 2C), we next investigated the molecular basis for this selectivity. For these studies we employed semi-permeabilised (SP) mammalian cells, depleted of specific membrane components via siRNA-mediated knockdown, as our source of ER membrane (Fig. 3A; Wilson et al., 2007). Consistent with our recent work (O’Keefe et al., 2020 submitted), and based on the quantitative immunoblotting of target and control gene products (Fig. S2A-C), we selectively depleted HeLa cells for core components of the Sec61 translocon (Sec61α-kd, ~65% reduction), the EMC (EMC5-kd, ~73% reduction) and both together (Sec61α+EMC5-kd, ~68% and ~78% reduction) prior to semi-permeabilisation with digitonin and use for in vitro ER translocation assays. Following the analysis of total OPG2-tagged translation products recovered by immunoprecipitation, we found that: i) the S protein and a truncated derivative were both more strongly affected by the depletion of Sec61α than of EMC5 (Fig. 3B, Fig. S2D); ii) the ORF8 protein was likewise strongly affected by Sec61α depletion but also sensitive to EMC5 depletion (Fig. 3C); iii) the E protein showed diminished insertion efficiency after knock-down of Sec61α and EMC5, although the latter had a more pronounced effect (Fig. 3D). In each case, the combined knockdown of Sec61α and EMC5 resulted in a reduction of membrane insertion that was either comparable to, or greater than, that achieved following the knock- down of Sec61α alone (Figs. 3B to 3D). For the ORF6 protein, the total level of N-glycosylated OPG2-ORF6-OPG2 species was unaffected by any knockdown condition tested (Fig. 3E). However, we note a marked increase in the proportion of potentially mis-inserted OPG2-ORF6-OPG2-1Gly species, particularly after co-depletion of EMC5 and Sec61α (see Fig. 3E; Fig. S2E). We speculate that the unusual hairpin topology of the ORF6 protein may be attributed to the EMC and Sec61 complex acting in concert to provide an Ipom-F insensitive pathway for protein translocation across the ER membrane (O’Keefe et al., 2020, submitted). Perturbation of this pathway seemingly increases the potential for ORF6 to mis- .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 8 of 21 insert (cf. Chitwood et al., 2018), perhaps as a consequence of disruption to the translocation of its C-terminus (Fig. S2E). Taken together, our data establish that, analogous to human membrane and secretory proteins, the principal molecular basis for the Ipom-F-sensitivity of the SARS-CoV-2 ORF8 and S proteins is their dependence on Sec61-mediated protein translocation into and across the ER membrane. In contrast, the E, M, and ORF6 proteins appear capable of exploiting one or more alternative membrane insertion/translocation pathways that can bypass the translocase activity of the Sec61 complex. These alternatives most likely include a recently described route for type III TMP insertion that requires the insertase function of the EMC (O’Keefe et al., 2020 submitted), which our data suggest is also sufficient to confer Ipom-F-resistance to the viral E protein and at least the first TM-span of the viral M protein. Concluding Remarks We conclude, that Sec61-selective protein translocation inhibitors like Ipom-F hold promise as broad-spectrum antivirals that may exert a therapeutic effect by selectively inhibiting the ER translocation of viral and/or host proteins which are crucial to viral infection and propagation (Mast et al., 2020). In the context of SARS-CoV-2, integration of the viral S protein and its host cell receptor, ACE2, into the ER membrane is significantly reduced by Ipom-F (Fig. 2C, 3B). Likewise, translocation of the viral ORF8 protein across the ER membrane and into its lumen is substantially diminished (Fig. 2C, 3C). The binding of the viral S protein to cell surface ACE2 is a key step in host cell infection (Drew and Janes, 2020), whilst ORF8 may protect SARS-CoV-2 infected cells against host cytotoxic T lymphocytes (Zhang et al., 2020), making all three of these proteins viable therapeutic targets (Drew and Janes, 2020; Li et al., 2020; Young et al., 2020). Like other small molecule inhibitors that target fundamental cellular pathways (Bojkova et al. 2020), the broad-ranging effects of Sec61 inhibitors on host cell membrane and secretory protein synthesis (Morel et al., 2018; Zong et al. 2019), including the strong in vitro effect of Ipom-F on ACE2 biogenesis (cf. Grob et al. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 9 of 21 2020), present an obvious hurdle to their future use. Nevertheless, given that Ipom-F is a potent inhibitor of Sec61-mediated protein translocation in cell culture models (Zong et al., 2019), and appears well tolerated in mice (Zong et al., 2020), we propose that future studies investigating its effect on SARS-CoV-2 infection and propagation in cellular models are clearly warranted (cf. Bojkova et al. 2020). .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 10 of 21 Materials and Methods Ipom-F and Antibodies Ipom-F was synthesised as previously described (Zong et al., in press). Antibodies used to validate Sec61 and/or EMC subunit depletions in SP cells (Fig. S2) were purchased from Santa Cruz Biotechnology (goat polyclonal anti- LMNB1 (clone M-20, sc-6217), Bethyl Laboratories (rabbit polyclonal anti-EMC5 (A305-832-A)), Abcam (rabbit polyclonal anti-EMC6, (ab84902)), gifted by Sven Lang and Richard Zimmermann (University of Saarland, Homburg, Germany, rabbit anti-Sec61α) or as previously described (mouse monoclonal anti-OPG2 tag (McKenna et al., 2016) and rabbit polyclonal anti-OST48 (Wilson et al., 2007). DNA constructs The cDNA for human ACE2 (Uniprot: Q9BYF1) was purchased from Sino Biological (HG10108-M). cDNAs encoding the SARS-CoV-2 genes for ORF6, ORF8 and the E M and S proteins (Uniprot: P0DTC6, P0DTC8, P0DTC4, P0DTC5, P0DTC2 respectively) were kindly provided by Nevan Krogan (UCSF, US) (Gordon et al. 2020), amplified by PCR, subcloned into the pcDNA5 vector and constructs validated by DNA sequencing (GATC, Eurofins Genomics). ORF6-OPG2, ORF8-OPG2, M-OPG2 and S-OPG2 were generated by inserting the respective cDNAs in frame between NheI and AflII sites of a pcDNA5/FRT/V5- His vector (Invitrogen) containing a C-terminal OPG2 tag (MNGTEGPNFYVPFSNKTG). OPG2-E was generated by cloning the cDNA encoding the E-protein into the same pcDNA5-OPG2 vector using the KpnI and BamHI sites and deleting the stop codon after the OPG2 tag by site-directed mutagenesis (Stratagene QuikChange, Agilent Technologies). The N-terminal OPG2-tag of OPG2-ORF6-OPG2 was inserted by site-directed mutagenesis of ORF6-OPG2 using the relevant forward and reverse primers (Integrated DNA Technologies). Linear DNA templates were generated by PCR and mRNA transcribed using T7 polymerase. siRNA-mediated knockdown and SP cell preparation HeLa cells (human epithelial cervix carcinoma cells) were cultured in DMEM supplemented with 10% (v/v) FBS and maintained in a 5% CO2 humidified incubator at 37°C. Knockdown of target genes were performed as previously .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 11 of 21 described (O’Keefe et al., 2020 submitted) using 20 nM (final concentration) of either control siRNA (ON-TARGETplus Non-targeting control pool; Dharmacon), SEC61A1 siRNA (Sec61α-kd, GE Healthcare, sequence AACACUGAAAUGUCUACGUUUUU), MMGT1 siRNA (EMC5-kd, ThermoFisher Scientific, s41129) and INTERFERin (Polyplus, 409-10) as described by the manufacturer. 96 h post-initial transfection, cells were semi-permeabilsed using 80 μg/mL high purity digitonin (Calbiochem) and treated with 0.2 U Nuclease S7 Micrococcal nuclease from Staphylococcus aureus (Sigma-Aldrich, 10107921001) as previously described (O’Keefe et al., 2020 submitted; Wilson et al., 2007). SP cells lacking endogenous mRNA were resuspended (3x106 SP cells/mL as determined by trypan blue (Sigma-Aldrich, T8154) staining) in KHM buffer (110 mM KOAc, 2 mM Mg(OAc)2, 20 mM HEPES-KOH pH 7.2) prior to analysis by western blot, or inclusion in translation master mixes such that each translation reaction contained 2x105 cells/mL. In vitro ER import assays Standard translation and membrane translocation/insertion assays, supplemented with nuclease-treated canine pancreatic microsomes (from stock with OD280 = 44/mL) or siRNA-treated SP HeLa cells, were performed in nuclease-treated rabbit reticulocyte lysate (Promega) as previously described (Zong et al., 2019; O’Keefe et al., 2020 submitted): namely in the presence of EasyTag EXPRESS 35S Protein Labelling Mix containing [35S] methionine (Perkin Elmer) (0.533 MBq; 30.15 TBq/mmol), 25 μM amino acids minus methionine (Promega), 1 µM Ipom-F, or an equivalent volume of DMSO, 6.5% (v/v) ER- derived microsomes or SP cells and ~10% (v/v) of in vitro transcribed mRNA (~500 ng/μL) encoding the relevant precursor protein. Microsomal translation reactions (20 μL) were performed for 30 min at 30°C whereas those using SP HeLa cells were performed on a 1.5X scale (30 μL translation reactions) for 1 h at 30°C. As the S protein was most efficiently synthesised using the TNT® Coupled system (Fig. S1B), import assays of the comparatively higher molecular weight ACE2 and S proteins (50 μL reactions) were both performed using the TNT® Coupled Transcription/ Translation system (Promega) for 90 min at 30°C as described by the manufacturer (~50 ng/μL cDNA,1 µM Ipom-F or an equivalent .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 12 of 21 volume of DMSO, 12% (v/v) ER-derived microsomes or SP cells). All translation reactions were finished by incubating with 0.1 mM puromycin for 10 min at 30°C to ensure translation termination and ribosome release of newly synthesised proteins prior to analysis. Recovery and analysis of radiolabelled products Following puromycin treatment, microsomal membrane-associated fractions were recovered by centrifugation through an 80 μL high-salt cushion (0.75 M sucrose, 0.5 M KOAc, 5 mM Mg(OAc)2, 50 mM Hepes-KOH, pH 7.9) at 100,000 g for 10 min at 4°C and the pellet suspended directly in SDS sample buffer. To confirm the topology of ORF6 (Fig. S2), the membrane-associated fraction of the doubly-OPG2-tagged form (OPG2-ORF6-OPG2) was resuspended in KHM buffer (20 μL) and subjected to either carbonate extraction (0.1 M Na2CO3, pH 11.3) (McKenna et al., 2016) or a protease protection assay using trypsin (1 μg/mL) with or without 0.1% Triton X-100 (Ray-Sinha et al., 2009) prior to suspension in SDS sample buffer. For translation reactions using SP cells, the total reaction material was diluted with nine volumes of Triton immunoprecipitation buffer (10 mM Tris-HCl, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 5 mM PMSF, 1 mM methionine (to prevent background from the radiolabelled methionine), pH 7.5). Samples were incubated under constant agitation with an antibody recognising the OPG2 epitope (1:200 dilution) for 16 h at 4°C to recover both the membrane-associated and non-targeted nascent chains. Samples were next incubated under constant agitation with 10% (v/v) Protein-A-Sepharose beads (Genscript) for a further 2 h at 4°C before recovery by centrifugation at 13,000 g for 1 min. Protein-A-Sepharose beads were washed twice with Triton immunoprecipitation buffer prior to suspension in SDS sample buffer. Where indicated, samples were treated with 1000 U of a form of Endoglycosidase H that does not co-migrate with and hence potentially distort the radiolabelled products when resolved: Endoglycosidase Hf (translation products of ~10-50 kDa; New England Biolabs, P0703S) or Endoglycosidase H (translation products of ~50-150 kDa protein substrates; New England Biolabs, P0702S). All samples were solubilised for 12 h at 37°C and then sonicated prior to resolution by SDS-PAGE (10% or 16% PAGE, 120V, 120-180 min). Gels were .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 13 of 21 fixed for 5 min (20% MeOH, 10% AcOH), dried for 2 h at 65°C and radiolabelled products visualised using a Typhoon FLA-700 (GE Healthcare) following exposure to a phosphorimaging plate for 24-72 h. Western Blotting Following semi-permeabilisation, aliquots of siRNA-treated HeLa cells were suspended in SDS sample buffer, denatured for 12 h at 37°C and sonicated prior to resolution by SDS-PAGE (16% or 10% PAGE, 120V, 120-150 min). Following transfer to a PVDF membrane in transfer buffer (0.06 M Tris, 0.60 M glycine, 20% MeOH) at 300 mA for 2.5 h, PVDF membranes were incubated in 1X Casein blocking buffer (10X stock from Sigma-Aldrich, B6429) made up in TBS, incubated with appropriate primary antibodies (1:500 or 1:1000 dilution) and processed for fluorescence-based detection as described by LI-COR Biosciences using appropriate secondary antibodies (IRDye 680RD Donkey anti-Goat, IRDye 680RD Donkey anti-Rabbit, IRDye 800CW Donkey anti-Mouse) at 1:10,000 dilution. Signals were visualised using an Odyssey CLx Imaging System (LI-COR Biosciences). Quantitation and Statistical Analysis Bar graphs depict either the efficiency of membrane translocation/insertion calculated as the ratio of N-glycosylated protein relative to the amount of non-N- glycosylated protein (Fig. 2-3), or the efficiencies of siRNA-mediated knockdown in SP cells calculated as a proportion of the protein content when compared to the NT control (Fig. S2), with all control samples set to 100%. Normalised values were used for statistical comparison (one-way or two-way ANOVA; DF and F values are shown in each figure as appropriate and the multiple comparisons test used are indicated in the appropriate figure legend). Statistical significance is given as n.s., non-significant >0.1; *, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 14 of 21 Acknowledgements We thank Quentin Roebuck for technical assistance, Nevan Krogan (UCSF) for SARS-CoV-2 plasmids, Sven Lang (University of Saarland) for Sec61α antisera, Belinda Hall and Rachel Simmonds (University of Surrey) for useful discussions. We are indebted to Richard Zimmermann (University of Saarland) for catalyzing SARS-CoV-2 related discussions amongst the ER research community. Competing interests The authors declare no competing interests. Author Contributions K.B.D., G.Z. and H.S. participated in synthesis of Ipom-F and W.Q.S supervised the synthesis; P.R. generated SARS-CoV-2 plasmids; S.O’K. performed site- directed mutagenesis and experiments; S.O’K. and S.H. designed the study, analysed the data and wrote the manuscript. Funding This work was supported by a Wellcome Trust Investigator Award in Science 204957/Z/16/Z (S.H.), an AREA grant 2R15GM116032-02A1 from the National Institute of General Medical Sciences of the National Institutes of Health (NIH) and a Ball State University (BSU) Provost Startup Award (W.Q.S.). Supplementary Information Supplementary information Fig. S1 and Fig. S2 accompanies this report. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 15 of 21 References Adalja, A., and Inglesby, T. (2019). Broad-spectrum antiviral agents: A crucial pandemic tool. Exp. Rev. Anti-Infect. Ther. 17, 467-470. Bojkova, D., Klann, K., Koch, B., Widera, M., Krause, D., Ciesek, S., Cinatl, J., and Münch, C. (2020). Proteomics of SARS-CoV-2 infected host cells reveals therapy targets. Nature. 583, 469-472. Cantuti-Castelvetri, L., Ojha, R., Pedro, L. D., Djannatian, M., Franz, J., Kuivanen, S., van der Meer, F., Kallio, K., Kaya, T., Anastasina, M., et al. (2020). Neuropilin-1 facilitates SARS-CoV-2 cell entry and infectivity. Science. 370, 856-860. Chitwood, P. J., Juszkiewicz, S., Guna, A., Shao, S., and Hegde, R. S. (2018). EMC is required to initiate accurate membrane protein topogenesis. Cell. 175, 1- 13. Daly, J. L., Simonetti, B., Klein, K., Chen, K.-E., Kavanagh Williamson, M., Antón-Plágaro, C., Shoemark, D. K., Simón-Gracia, L., Bauer, M. et al. (2020). Neuropilin-1 is a host factor for SARS-CoV-2 infection. Science. 370, 861- 865. Drew, E. D., and Janes, R. W. (2020). Identification of a druggable binding pocket in the spike protein reveals a key site for existing drugs potentially capable of combating Covid-19 infectivity. BMC Mol. Cell Biol. 21, 49. Duart, G., García-Murria, M. J., Grau, B., Acosta-Cáceres, J. M., Martínez- Gil, L., and Mingarro I. (2020) SARS-CoV-2 envelope protein topology in eukaryotic membranes. Open Biol. 10, 200209. Firth, A. E. (2020). A putative new SARS-CoV protein, 3c, encoded in an ORF overlapping ORF3a. J. Gen. Virol. 101, 1085-1089. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 16 of 21 Gérard, S. F., Hall, B. S., Zaki, A. M., Corfield, K. A., Mayerhofer, P. U., Costa, C., Whelligan, D. K., Biggin P. C., Simmonds, R. E., and Higgins, M. K. (2020). Structure of the inhibited state of the Sec translocon. Mol. Cell. 79, 406- 415.e7. Gordon, D. E., Jang, G. M., Bouhaddou, M., Xu, J., Obernier, K., White, K. M., O’Meara, M. J., Rezelj, V. V., Guo, J. Z., Swaney, D. L., Tummino, T. A. et al. (2020) A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 583, 459-468. Grob, S., Jahn, C., Cushman, S., Bär, C., and Thum, T. (2020). SARS-CoV-2 receptor ACE2-dependent implications on the cardiovascular system: from basic science to clinical implications. J. Mol. Cell. Cardiol. 144, 47-53. Heaton, N. S., Moshkina, N., Fenouil, R., Gardner, T. J., Aguirre, S., Shah, P. S., Zhao, N., Manganaro, L., Hultquist, J. F., Noel, J. et al. (2016). Targeting viral proteostasis limits influenza virus, HIV, and Dengue virus infection. Immunity. 44, 46-58. Li, J.-Y., Liao, C.-H., Wang, Q., Tan, Y.-., Luo, R., Qiu, Y., and Ge, X.-Y. (2020). The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signalling pathway. Virus Res. 286, 198074. Luesch, H., and Paavilainen, V. O. (2020). Natural products as modulators of eukaryotic protein secretion. Nat. Prod. Rep. 37, 717-736. Mast, F. D., Navare, A. T., van der Sloot, A. M., Coulombe-Huntington, J. Rout, M. P., Baliga, N. S., Kaushansky, A., Chait, B. T., Aderem, A., Rice, C. M. et al. (2020). Crippling life support for SARS-CoV-2 and other viruses through synthetic lethality. J. Cell. Biol. 219, e202006159. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 17 of 21 McKenna, M., Simmonds, R.E., and High, S. (2016). Mechanistic insights into the inhibition of Sec61-dependent co- and post-translational translocation by mycolactone. J. Cell Sci. 129, 1404-1415. Morel, J. D., Paatero, A. O., Wei, J., Yewdell, J. W., Guenin-Macé, L., Van Haver, D., Impens, F., Pietrosemoli, N., Paavilainen, V. O., and Demangel, C. (2018). Proteomics reveals scope of mycolactone-mediated Sec61 blockade and distinctive stress signature. Mol. Cell Prot. 17, 1750-1765. Naqvi, A. A. T., Fatima, K., Mohammad, T., Fatima, U., Singh, I. K., Singh, A., Atif, A. M., Hariprasad, G., Hasan, G. M., and Hassan, M. I. (2020) Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: structural genomics approach. Biochim. Biophys. Acta. Mol. Basis Dis. 1866, 165878. Netland, J., Ferraro, D., Pewe, L., Olivares, H., Gallagher, T., and Perlman, S. (2007). Enhancement of murine coronavirus replication by severe acute respiratory syndrome coronavirus protein 6 requires the N-terminal hydrophobic region but not C-terminal sorting motifs. J. Virol. 81, 11520-11525. Nilsson, I. M., and von Heijne, G. (1993). Determination of the distance between the oligosaccharyltransferase active site and the endoplasmic reticulum membrane. J. Biol. Chem. 268, 5798-5801. Ray-Sinha, A., Cross, B. C. S., Mironov, A., Wiertz, E., and High, S. (2009). Endoplasmic reticulum-associated degradation of a degron-containing polytopic membrane protein. Mol. Membr. Biol. 26, 448-464. O’Keefe, S., Zong, G., Duah, K. B., Andrews, L. E., Shi, W. Q., and High, S. (2020). Type III transmembrane protein integration requires both the EMC and Sec61 complex. Submitted. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 18 of 21 Sicari, D., Chatziioannou, A., Koutsandreas, T., Sitia, R., and Chevet, E. (2020). Role of the early secretory pathway in SARS-CoV-2 infection. J. Cell. Biol. 219, e202006005. Shah, P. S., Link, N., Jang, G. M., Sharp, P. P., Zhu, T., Swaney, D. L., Johnson, J. R., Von Dollen, J., Ramage, H. R., Satkamp, L. et al. (2018). Comparative flavivirus-host protein interaction mapping reveals mechanisms of Dengue and Zika virus pathogenesis. Cell. 175, 1931-1945.e18. Von Heijne, G. (2007). The membrane protein universe: what’s out there and why bother? J. Intern. Med. 261, 543-547. Walls, A. C., Park, Y.-J., Tortorici, M. A., Wall, A., McGuire, A. T., and Veesler, D. (2020). Structure, function and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 181, P281-292.E6. Warner, F. J., Lew, R. A., Smith, A. I., Lambert, D. W., Hooper, N. M., and Turner, A. T. (2005). Angiotensin-converting enzyme 2 (ACE2), but not ACE, is preferentially localised to the apical surface of polarised kidney cells. J. Biol. Chem. 280, 39353-39362. Wilson, C. M., and High, S. (2007). Ribophorin I acts as a substrate-specific facilitator of N-glycosylation. J. Cell. Sci. 120, 648-657. Young, B. E., Fong, S.-W., Chan, Y. H., Mak, T.-M., Ang, L. W., Anderson, D. E., Yi-Pin Lee, C., Naqiah Amrun, S., Lee, B., Shan Goh, Y. et al. (2020). Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and inflammatory response: an observational cohort study. Lancet. 396, 603-611. Zhang, Y., Zhang, J., Chen. Y., Luo, B., Yuan, Y., Huang, F., Yang, T., Yu, F., Liu, J., Song, Z. et al. (2020). The ORF8 protein of SARS-CoV-2 mediates immune evasion through potently downregulating MHC-1. bioRxiv. doi: 10.1101/2020.05.24.111823 .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 19 of 21 Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., Li, B., Huang, C.-L. et al. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 579, 270-273. Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., et al. (2020). A novel coronavirus from patients with pneumonia in China, 2019. N. Eng. J. Med. 382, 727-733. Zong, G., Hu, Z., O’Keefe, S., Tranter, D., Iannotti, M. J., Baron, L., Hall, B., Corfield, K., Paatero, A., Henderson M. et al. (2019). Ipomoeassin F binds Sec61α to inhibit protein translocation. J. Am. Chem. Soc. 141, 8450-8461. Zong, G., Hu, Z., Duah, K., B., Andrews, L. E., Zhou, J., O’Keefe, S., Whisenhunt, L., Shim, J. S., Du, Y., High, S., et al. (2020) Ring-expansion leads to a more potent analogue of ipomoeassin F. J. Org. Chem. doi: 10.1021/acs.joc.0c01659 .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 20 of 21 Figure Legends Fig. 1. Ipom-F as a potential inhibitor of SARS-CoV-2 viral protein synthesis. (A) Schematic of (+) ssRNA genome architecture of SARS-CoV-2 (29903 nt) containing 5’ capped mRNA with a leader sequence (LS), 3’ end poly-A tail, 5’ and 3’ UTRs and open reading frames (ORFs): ORF1a, ORF1b, spike (S), ORF3a, envelope (E), membrane (M), ORF6, ORF7, ORF8, nucleoprotein (N) and ORF10 (Firth, 2020; Naqvi et al., 2020). An important mode of SARS-CoV-2 host entry proceeds via interaction of the viral S protein with human angiotensin- converting enzyme 2 (ACE2) (Walls et al., 2020). (B) Structure of Ipomoeassin- F (Ipom-F), a small molecule inhibitor of Sec61-mediated protein translocation. (C) Ipom-F efficiently blocks membrane translocation of secretory proteins and insertion of single-pass type I and type II TMPs, but not insertion of type III TMPs or tail-anchored (TA) proteins. SA denotes a signal anchor. (D) Based on known/predicted membrane topology of SARS-CoV-2 proteins, and sensitivity of comparable host cell proteins (Zong et al., 2019; O’Keefe et al., 2020 submitted), likely sensitivity to Ipom-F was anticipated. Fig. 2. Ipom-F selectively inhibits the ER membrane translocation of SARS- CoV-2 proteins. (A) Schematic of in vitro ER import assay using pancreatic microsomes. Following translation, fully translocated/membrane inserted radiolabelled precursor proteins are recovered and analysed by SDS-PAGE and phosphorimaging. N-glycosylated species were confirmed by treatment with endoglycosidase H (Endo H). (B) Protein precursors of the human angiotensin- converting enzyme 2 (ACE2) and OPG2-tagged versions of the SARS-CoV-2 ORF8 (ORF8-OPG2), spike (S-OPG2), envelope (OPG2-E), membrane (M- OPG2) and ORF6 (a doubly-OPG2 tagged version, OPG2-ORF6-OPG2, and two singly-OPG2 tagged forms, OPG2-ORF6 and ORF6-OPG2, with predominant N- glycosylated species in bold) were synthesised in rabbit reticulocyte lysate supplemented with ER microsomes without or with Ipom-F (lanes 1 and 3). Phosphorimages of membrane-associated products resolved by SDS-PAGE with representative substrate outlines are shown. N-glycosylation was used to measure the efficiency of membrane translocation/insertion and N-glycosylated (X-Gly) versus non-N-glycosylated (0Gly) species identified using Endo H (see lane 2). (C) The relative efficiency of membrane translocation/insertion in the .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Ipom-F as a potential antiviral agent Page 21 of 21 presence of Ipom-F was calculated using the ratio of N-glycosylated protein to non-glycosylated protein, relative to the DMSO treated control (set to 100% efficiency). Quantitations are given as mean±s.e.m for independent translation reactions performed in triplicate (n=3) and statistical significance (one-way ANOVA, DF and F values shown in the figure) was determined using Dunnett’s multiple comparisons test. Statistical significance: n.s., non-significant >0.1; ****, P < 0.0001. Fig. 3. SARS-CoV-2 proteins are variably dependent on the Sec61 complex and/or the EMC for ER membrane translocation/insertion. (A) Schematic of in vitro ER import assay using control SP cells, or those depleted of a subunit of the Sec61 complex and/or the EMC via siRNA. Following translation, OPG2- tagged translation products (i.e. membrane-associated and non-targeted nascent chains) were immunoprecipitated, resolved by SDS-PAGE and analysed by phosphorimaging. OPG2-tagged variants of the SARS-CoV-2 (B) spike (S- OPG2), (C) ORF8 (ORF8-OPG2), (D) envelope (OPG2-E) and (E) ORF6 (OPG2- ORF6-OPG2 species (labelled as for Fig. 2) were synthesised in rabbit reticulocyte lysate supplemented with control SP cells (lanes 1-2) or those with impaired Sec61 and/or EMC function (lanes 3-6). Radiolabelled products were recovered and analysed as in (A). Membrane translocation/insertion efficiency was determined using the ratio of the N-glycosylation of lumenal domains, identified using Endo H (EH, lane 1), relative to the NT control (set to 100% translocation/insertion efficiency). Quantitations (n=3) and statistical significance (two-way ANOVA, DF and F values shown in the figure) determined as for Figure 2. Statistical significance: n.s., non-significant >0.1; *, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint mqbssysh Typewritten Text Figure 1 https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint mqbssysh Typewritten Text Figure 2 https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint mqbssysh Typewritten Text Figure 3 https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Supplementary information for Ipomoeassin-F inhibits the in vitro biogenesis of the SARS-CoV-2 spike protein and its host cell membrane receptor Sa a O Kee e, Pe e a Roboti, Kwabena B. Duah, Guanghui Zong, Hayden Scheider, Wei Q. Shi and Stephen High .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Page 2 of 5 Fig. S1. Additional studies using ER microsomes, Related to Figures 2 and 3. (A) Non-tagged (lanes 1-3) and OPG2-tagged (lanes 4-6) versions of the SARS-CoV- 2 spike protein (S, S-OPG2), ORF8 (ORF8, ORF8-OPG2) and membrane protein (M, M-OPG2) were synthesised in rabbit reticulocyte lysate supplemented with ER- derived canine pancreatic microsomes in the absence and presence of Ipom-F (lanes 1 and 3). Phosphorimages of membrane-associated products resolved by SDS-PAGE together with representative substrate outlines are shown. N-glycosylated (X-Gly) versus non-N-glycosylated (0Gly) species were identified by treatment with endoglycosidase H (Endo H, lanes 2 and 5). (B) The S protein was synthesised in a Flexi® rabbit reticulocyte system with varying concentrations of magnesium acetate (lanes 1-5) and a TNT® Coupled system (lane 6) in the absence of ER-derived .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ microsomes. 5% of the total reaction material was resolved by SDS-PAGE and visualised by phosphorimaging. (C) The ER import of truncated variants of the S protein (S-short, S-s.s.-TMD, S-half-OPG2) was analysed as described for (A). (D) The membrane-associated products of the doubly tagged form of ORF6 (OPG2- ORF6-OPG2) were synthesised as in (A) and, following treatment with sodium carbonate buffer and centrifugation, the pellet, enriched for membrane-integrated material, and supernatant, largely containing peripherally membrane-associated material, were analysed for OPG2-ORF6-OPG2. (E) The membrane-associated products of OPG2-ORF6-OPG2 were treated with trypsin in the absence or presence of Triton-100 (TX-100, lanes 2-3). 1 .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Page 4 of 5 Fig. S2. Validation of Sec61 and/or EMC subunit depletions in SP cells, Related to Figure 3. (A) The effects of transfecting HeLa cells with non-targeting (NT; lane 1), Sec61D- targeting (lane 2), EMC5-targeting (lane 3) and Sec61D+EMC5-targeting (lane 4) siRNAs were determined after semi-permeabilisation by immunoblotting for target genes (Sec61D, EMC5). Controls to assess destabilisation of the wider EMC complex (EMC2 and EMC6), any effect on the N-glycosylation machinery (the ER-resident 48 kDa subunit of the oligosaccharyl-transferase complex (OST48) and the quantity of SP cells used in each experiment (the nuclear protein Lamin-B1 (LMNB1)), are also shown. (B) The efficiencies of siRNA-mediated knockdown (bold) were calculated as a proportion of the signal intensity obtained with the NT control (set as 100%). .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ Quantitations are given as mean±s.e.m for three separate siRNA treatments (n=3) with statistical significance of siRNA-mediated knockdowns (two-way ANOVA, DF and F a ) D c a . Statistical significance is given as n.s., non-significant >0.1; *, P < 0.05; ****, P < 0.0001. (C) Knockdown efficiencies (mean±s.e.m) for each of the target genes. (D) A truncated variant of the S protein (S-half-OPG2) was synthesised in rabbit reticulocyte lysate supplemented with SP cells with impaired Sec61 complex and/or EMC function and recovered by immunoprecipitation via the OPG2 tag. Radiolabelled products resolved by SDS-PAGE and analysed by phosphorimaging. N-glycosylated (14-Gly) versus non-N-glycosylated (0Gly) species were identified by treatment with endoglycosidase H (Endo H, lane 1). (E) Further analysis of the data presented in Fig. 3E of the main text. Here, the ratio of 3Gly and 4Gly bearing OPG2-ORF6-OPG2 N- glycosylated species relative to the 1Gly species present in the same sample was used as a proxy to estimate potential mis-insertion of the ORF6 protein in SP cells with impaired Sec61 complex and/or EMC function relative to the NT control (set to 100% efficiency). Quantitations are given as mean±s.e.m for independent translation reactions from separate siRNA treatments performed in triplicate (n=3) and statistical significance (two-way ANOVA, DF and F values shown in the figure) was determined D c a . S a ca ificance is given as n.s., non- significant >0.1; *, P < 0.05; ***, P < 0.001. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.24.390039doi: bioRxiv preprint https://doi.org/10.1101/2020.11.24.390039 http://creativecommons.org/licenses/by-nd/4.0/ 10_1101-2020_12_29_424482 ---- Structural basis for broad coronavirus neutralization 1 Structural basis for broad coronavirus neutralization Maximilian M. Sauer1, M. Alexandra Tortorici1,2, Young-Jun Park1, Alexandra C. Walls1, Leah Homad3, Oliver Acton1, John Bowen1, Chunyan Wang4, Xiaoli Xiong1$, Willem de van der Schueren5†, Joel Quispe1, Benjamin G. Hoffstrom6, Berend-Jan Bosch4, Andrew T. McGuire3,7,8*, David Veesler1* 1Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA. 2Institut Pasteur, Unité de Virologie Structurale, Paris, France; CNRS UMR 3569, Unité de Virologie Structurale, Paris, France. 3Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109 4Virology Division, Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands. 5Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA 6Antibody Technology Resource, Fred Hutchinson Cancer Research Center, Seattle, WA 98109 7Department of Global Health, University of Washington, Seattle, WA 98195, USA. 8Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA. $Present address: Guangzhou Regenerative Medicine and Health - Guangdong Laboratory, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China †Present address: Bluebird Bio, Seattle, WA, USA *Correspondence: dveesler@uw.edu, amcguire@fredhutch.org .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Three highly pathogenic β-coronaviruses crossed the animal-to-human species barrier in the past two decades: SARS-CoV, MERS-CoV and SARS-CoV-2. SARS- CoV-2 has infected more than 64 million people worldwide, claimed over 1.4 million lives and is responsible for the ongoing COVID-19 pandemic. We isolated a monoclonal antibody, termed B6, cross-reacting with eight β-coronavirus spike glycoproteins, including all five human-infecting β-coronaviruses, and broadly inhibiting entry of pseudotyped viruses from two coronavirus lineages. Cryo- electron microscopy and X-ray crystallography characterization reveal that B6 binds to a conserved cryptic epitope located in the fusion machinery and indicate that antibody binding sterically interferes with spike conformational changes leading to membrane fusion. Our data provide a structural framework explaining B6 cross-reactivity with β-coronaviruses from three lineages along with proof-of- concept for antibody-mediated broad coronavirus neutralization elicited through vaccination. This study unveils an unexpected target for next-generation structure- guided design of a pan-coronavirus vaccine. Introduction Four coronaviruses mainly associated with common cold-like symptoms are endemic in humans, namely OC43, HKU1, NL63 and 229E, whereas three highly pathogenic zoonotic coronaviruses emerged in the past two decades leading to epidemics and a pandemic. Severe acute respiratory syndrome coronavirus (SARS-CoV) was discovered in the Guangdong province of China in 2002 and spread to five continents through air travel routes, infecting 8,098 people and causing 774 deaths, with no cases reported after 2004(Drosten et al., 2003; Ksiazek et al., 2003). In 2012, Middle-East respiratory syndrome coronavirus (MERS-CoV) emerged in the Arabian Peninsula, where it still circulates, and was exported to 27 countries, infecting a total of ~2,494 individuals and claiming 858 lives as of January 2020 according to the World Health .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 Organization(Zaki et al., 2012). A recent study further suggested that undetected zoonotic MERS-CoV transmissions are currently occurring in Africa(Mok et al., 2020). A novel coronavirus, named SARS-CoV-2, was associated with an outbreak of severe pneumonia in the Hubei Province of China at the end of 2019 and has since infected over 64 million people and claimed more than 1.4 million lives worldwide during the ongoing COVID-19 pandemic(Zhou et al., 2020; Zhu et al., 2020b). SARS-CoV and SARS-CoV-2 likely originated in bats(Ge et al., 2013; Hu et al., 2017; Li et al., 2005; Yang et al., 2015; Zhou et al., 2020) with masked palm civets and racoon dogs acting as intermediate amplifying and transmitting hosts for SARS- CoV(Guan et al., 2003; Kan et al., 2005; Wang et al., 2005). Although MERS-CoV was also suggested to have originated in bats, repeated zoonotic transmissions occurred from dromedary camels(Haagmans et al., 2014; Memish et al., 2013). The identification of numerous coronaviruses in bats, including viruses related to SARS-CoV-2, SARS-CoV and MERS-CoV, along with evidence of spillovers of SARS-CoV-like viruses to humans strongly indicate that future coronavirus emergence events will continue to occur(Anthony et al., 2017; Ge et al., 2013; Hu et al., 2017; Li et al., 2019; Li et al., 2005; Menachery et al., 2015; Menachery et al., 2016; Wang et al., 2018; Yang et al., 2015; Zhou et al., 2020). The coronavirus spike (S) glycoprotein mediates entry into host cells and comprises two functional subunits mediating attachment to host receptors (S1 subunit) and membrane fusion (S2 subunit)(Ke et al., 2020; Kirchdoerfer et al., 2016; Turoňová et al., 2020; Walls et al., 2020b; Walls et al., 2016a; Walls et al., 2017; Wrapp et al., 2020). As the S homotrimer is prominently exposed at the viral surface and is the main target of neutralizing antibodies (Abs), it is a focus of therapeutic and vaccine design efforts(Tortorici and Veesler, 2019). We previously showed that the SARS-CoV-2 receptor-binding domain (RBD, part of the S1 subunit) is immunodominant, comprises multiple distinct antigenic sites, and is the target of 90% of the neutralizing activity present in COVID-19 convalescent plasma(Piccoli et al., 2020). Accordingly, monoclonal Abs (mAbs) with potent neutralizing activity were identified against the SARS-CoV-2, SARS- CoV and MERS-CoV RBDs and shown to protect against viral challenge in vivo (Alsoussi .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 et al., 2020; Barnes et al., 2020a; Barnes et al., 2020b; Brouwer et al., 2020; Corti et al., 2015; Hansen et al., 2020; Hassan et al., 2020a; Liu et al., 2020; Piccoli et al., 2020; Pinto et al., 2020; Rockx et al., 2008; Rockx et al., 2010; Rogers et al., 2020; Seydoux et al., 2020; Tortorici et al., 2020; Walls et al., 2019; Wang et al., 2020a; Zost et al., 2020). The isolation of S309 from a recovered SARS-CoV individual which neutralizes SARS-CoV-2 and SARS-CoV through recognition of a conserved RBD epitope demonstrated that potent neutralizing mAbs could inhibit β-coronaviruses belonging to different lineage B (sarbecovirus) clades (Pinto et al., 2020). An optimized version of S309 is currently under evaluation in phase 3 clinical trials in the US. Whereas a few other SARS-CoV-2 cross- reactive mAbs have been identified from either SARS-CoV convalescent sera (Huo et al., 2020; ter Meulen et al., 2006; Wec et al., 2020; Yuan et al., 2020) or immunization of transgenic mice (Wang et al., 2020a), the vast majority of SARS-CoV-2 S-specific mAbs isolated exhibit narrow binding specificity and neutralization breadth. Although the COVID-19 pandemic has accelerated the development of SARS- CoV-2 vaccines at an unprecedented pace(Case et al., 2020; Corbett et al., 2020; Folegatti et al., 2020; Hassan et al., 2020b; Jackson et al., 2020; Mulligan et al., 2020; Sahin et al., 2020; Walls et al., 2020a; Yu et al., 2020; Zhu et al., 2020a), worldwide deployment to achieve community protection is expected to take many more months. Based on available data, it appears unlikely that infection or vaccination will provide durable pan-coronavirus protection due to the immunodominance of the RBD and waning of Ab responses, leaving the human population vulnerable to the emergence of genetically distinct coronaviruses(Edridge et al., 2020; Piccoli et al., 2020). The availability of mAbs and other reagents cross-reacting with and broadly neutralizing distantly related coronaviruses is key for pandemic preparedness to enable detection, prophylaxis and therapy against zoonotic pathogens that might emerge in the future. We report the isolation of a mAb cross-reacting with the S-glycoprotein of at least eight β-coronaviruses from lineages A, B and C, including all five human-infecting β- coronaviruses. This mAb, designated B6, broadly inhibits entry of viral particles pseudotyped with the S glycoprotein of lineage C (MERS-CoV and HKU4) and lineage A .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 (OC43) coronaviruses, providing proof-of-concept of mAb-mediated broad β-coronavirus neutralization. A cryoEM structure of MERS-CoV S bound to B6 reveals that the mAb recognizes a linear epitope in the stem helix within a highly dynamic region of the S2 fusion machinery. Crystal structures of B6 in complex with MERS-CoV S, SARS- CoV/SARS-CoV-2 S, OC43 S and HKU4 S stem helix peptides combined with binding assays reveal an unexpected binding mode to a cryptic epitope, delineate the molecular basis of cross-reactivity and rationalize observed binding affinities for distinct coronaviruses. Collectively, our data indicate that B6 sterically interferes with S conformational changes leading to membrane fusion and identify a key target for next- generation structure-guided design of a pan-coronavirus vaccine. Results Isolation of a broadly neutralizing coronavirus mAb To elicit cross-reactive Abs targeting conserved coronavirus S epitopes, we immunized mice twice with the prefusion-stabilized MERS-CoV S ectodomain trimer and once with the prefusion-stabilized SARS-CoV S ectodomain trimer (Figure 1A). We subsequently generated hybridomas from immunized animals and implemented a selection strategy to identify those secreting Abs recognizing both MERS-CoV S and SARS-CoV S but not their respective S1 subunits (which are much less conserved than the S2 subunit(Walls et al., 2020b; Walls et al., 2016a)), the shared foldon trimerization domain or the his tag. We identified and sequenced a mAb, designated B6, that bound prefusion MERS-CoV S (lineage C) and SARS-CoV S (lineage B) trimers, the two immunogens used, as well as SARS-CoV-2 S (lineage B) and OC43 S (lineage A) trimers with nanomolar to picomolar avidities. Specifically, B6 bound most tightly to MERS-CoV S (Figure 1B), followed by OC43 S (with one order of magnitude lower apparent affinity, Figure 1C) and SARS-CoV/SARS-CoV-2 S (with three orders of magnitude reduced apparent affinity, Figure 1D-E). These results show that B6 is a broadly reactive mAb recognizing at least four distinct S glycoproteins distributed across three lineages of the β-coronavirus genus. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 To evaluate the neutralization potency and breadth of B6, we assessed S- mediated entry into cells of either vesicular stomatitis virus (VSV) (Kaname et al., 2010) or murine leukemia virus (MLV) (Millet and Whittaker, 2016; Walls et al., 2020b) pseudotyped with MERS-CoV S, OC43 S, SARS-CoV S, SARS-CoV-2 S and HKU4 S in the presence of varying concentrations of mAb. We determined half-maximal inhibitory concentrations of 1.7 ± 0.9 µg/mL, 4.0 ± 0.9 µg/mL and 2.4 ± 0.9 µg/mL for MERS-CoV S, OC43 S and HKU4 S pseudotyped viruses, respectively (Figure 1F-G) whereas no neutralization was observed for SARS-CoV S and SARS-CoV-2 S (Figure S1). B6 therefore broadly neutralizes S-mediated entry of pseudotyped viruses harboring β- coronavirus S glycoproteins from lineages A and C, but not from lineage B, putatively due to lower-affinity binding (Figure 1B-E). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 Figure 1. Identification and characterization of a cross-reactive and broadly neutralizing coronavirus mAb (A) Mouse immunization and B6 mAb selection scheme. MERS-CoV and SARS-CoV S1 subunits fused to human Fc and the respiratory syncytial virus fusion glycoprotein (RSV F) ectodomain trimer fused to a foldon and a his-tag were used as decoys during selection. (B-E) Binding of MERS-CoV S (B), OC43 S (C), SARS-CoV S (D) and SARS- CoV-2 S (E) ectodomain trimers to the B6 mAb immobilized at the surface of biolayer interferometry biosensors. Data were analyzed with the ForteBio software, and global fits are shown as dashed lines. The vertical dotted lines correspond to the transition between the association and dissociation phases. Approximate apparent equilibrium dissociation constants (KD, app) are reported due to the binding avidity resulting from the trimeric nature of S glycoproteins. (F-H) B6-mediated neutralization of VSV particles pseudotyped with MERS-CoV S (F), OC43 S (G) and HKU4 S (H). Data were evaluated using a non- linear sigmoidal regression model with variable Hill slope. Fit is shown as dashed lines and experiments were performed in triplicate with at least two independent mAb and pseudotyped virus preparations. B6 targets a linear epitope in the fusion machinery To identify the epitope recognized by B6, we determined a cryo-EM structure of the MERS-CoV S glycoprotein in complex with the B6 Fab fragment at 2.5 Å overall resolution (Figure 2A-B, Figure S2 and Table 1). 3D classification of the cryoEM data revealed incomplete Fab saturation, with one to three B6 Fabs bound to the MERS-CoV S trimer, and a marked conformational dynamic of bound B6 Fabs, yielding a continuum of conformations. Although these two factors compounded local resolution of the S/B6 interface, we identified that the B6 epitope resides in the stem helix (i.e. downstream from the connector domain and before the heptad-repeat 2 region) within the S2 subunit (so- called fusion machinery) (Figure 2A-B). Our 3D reconstructions further suggest that B6 binding disrupts the stem helix quaternary structure, which is presumed to form a 3-helix bundle (observed in the NL63 S(Walls et al., 2016b) and SARS-CoV/SARS-CoV-2 S structures(Gui et al., 2017; Kirchdoerfer et al., 2018; Walls et al., 2020b; Walls et al., 2019; Wrapp et al., 2020; Yuan et al., 2017)) but not maintained in the B6-bound MERS- CoV S structure (Figure 2A). Based on our cryoEM structure, we identified a conserved 15 residue sequence at the C-terminus of the last residue resolved in previously reported MERS-CoV S .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 structures(Pallesen et al., 2017; Park et al., 2019; Walls et al., 2019; Yuan et al., 2017) and confirmed by biolayer interferometry that it encompasses the B6 epitope using synthetic MERS-CoV S biotinylated peptides (Figure 2C-E and Figure S3). We further found that B6 bound to the corresponding stem helix peptides from all known human- infecting β-coronaviruses: SARS-CoV-2 and SARS-CoV, the sequence is strictly conserved among the two viruses, OC43 and HKU1 as well as mouse hepatitis virus and two MERS-CoV-related bat viruses (HKU4 and HKU5) in mAb and Fab formats (Figure 2D-E). B6 interacted most efficiently with the MERS-CoV S peptide, likely due to its major role in elicitation of this mAb, followed by all other coronavirus peptides tested, which bound with comparable affinities, except for HKU1 which interacted more weakly than other stem helix peptides. Figure 2. B6 targets a linear epitope in the coronavirus S2 fusion machinery. (A-B) Molecular surface representation of a composite model of the B6-bound MERS- CoV S cryoEM structure and of the B6-bound MERS-CoV S stem helix peptide crystal structure shown from the side (A) and viewed from the viral membrane (B). MERS-CoV S protomers are colored pink, cyan and gold and the B6 Fab heavy and light chains are colored purple and magenta, respectively. The composite model was generated by docking the crystal structure of B6 bound to the MERS-CoV stem helix in the cryoEM map. (C) Identification of a conserved 15 residue sequence spanning the stem helix. Residue numbering for MERS-CoV S and SARS-CoV-2 S are indicated on top and .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 bottom of the alignment, respectively. (D) Binding of 0.1 µM B6 mAb or (E) 1 µM B6 Fab to biotinylated coronavirus S stem helix peptides immobilized at the surface of biolayer interferometry biosensors. B6 recognizes a conserved epitope in the stem helix To obtain an atomic-level understanding of the broad B6 cross-reactivity, we determined five crystal structures of the B6 Fab in complex with peptide epitopes derived from MERS-CoV S (residues 1230-1240 or 1230-1244), SARS-CoV S (residues 1129- 1143), SARS-CoV-2 S (residues 1147-1161), OC43 S (residues 1232-1246) and HKU4 S (residues 1231-1245), at resolutions ranging from 1.4 to 1.8 Å (Figure 3 A-F, Figure S4 and Table 2). In all five structures, the stem helix epitope folds as an amphipatic ɑ- helix resolved for residues 1230-1240 (MERS-CoV S numbering) irrespective of the peptide length used for co-crystallization. B6 interacts with the helical epitope through shape complementarity, hydrogen-bonding and salt bridges using complementarity determining regions CDRH1-H3, framework region 3, CDRL1 and CDRL3 to bury ~600Å2 at the paratope/epitope interface. The stem helix docks its hydrophobic face, lined by residues F1231MERS-CoV, L1235MERS-CoV, F1238MERS-CoV and F1239MERS-CoV, into a hydrophobic groove formed by B6 heavy chain residues Y35, W49, V52 and L61 as well as light chain Y103 (Figure 2C and 3A, B and D). Moreover, B6 binding leads to the formation of a salt bridge triad, involving residue D1236MERS-CoV, CDRH3 residue R104 and CDRL1 residue H33. Comparison of the B6-bound structures of MERS-CoV, HKU4, SARS-CoV/SARS- CoV-2 and OC43 S stem helix peptides explains the broad mAb cross-reactivity with β- coronavirus S glycoproteins as shape complementarity is maintained through strict conservation of 3 out of 4 hydrophobic residues whereas F1238MERS-CoV is conservatively substituted with Y1137SARS-CoV/Y1155SARS-CoV-2 or W1240 OC43/W1237HKU1 (our structures demonstrate that all three aromatic side chains are accommodated by B6). Furthermore, the D1236MERS-CoV-mediated salt bridge triad is preserved, including with a non-optimal E1237HKU4 side chain, with the exception of S1235HKU1 which abrogates these interactions and explains the dampened B6 binding to the HKU1 peptide (Figure 2C-E and 3 B-F). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 B6 heavy chain residue L61 and CDRL1 residue H33 are mutated from germline and make major contributions to epitope recognition, highlighting the key contribution of affinity maturation to the cross-reactivity of this mAb. Figure 3. Molecular basis for the broad B6 cross-reactivity with a conserved coronavirus stem helix peptide. (A) Crystal structure of the B6 Fab (surface rendering) in complex with the MERS-CoV S stem helix peptide. (B-C) Crystal structures of the B6 Fab bound to the MERS-CoV S (B) or HKU4 S (C) stem helix reveal a conserved network of interactions except for the substitution of D1236MERS-CoV with E1237HKU4 which preserves the salt bridge triad formed with CDRH3 residue R104 and CDRL1 residue H33. (D-F) Crystal structures of the B6 Fab bound to the MERS-CoV S (D), OC43 S (E) or SARS-CoV/SARS-CoV-2 S (F) stem helix showcasing the conservation of the paratope/epitope interface except for the conservative substitution of F1238MERS-CoV with W1240OC43 or Y1137SARS-CoV/Y1155SARS-CoV-2. The B6 heavy and light chains are colored purple and magenta, respectively, and only selected regions are shown in panels (B-F) for clarity. The coronavirus S stem helix peptides are rendered in ribbon representation and colored gold with interacting side chains shown in stick representation. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Mechanism of B6-mediated neutralization We set out to elucidate the molecular basis of the B6-mediated broad neutralization of multiple coronaviruses from lineages A and C and lack of inhibition of lineage B coronaviruses. Our biolayer interferometry data indicate that although the B6 mAb efficiently interacted with the stem helix peptide of all but one of coronaviruses evaluated (HKU1, Figure 2D-E), the SARS-CoV-2 S and SARS-CoV S ectodomain trimers bound to B6 with three orders of magnitude reduced avidities compared to MERS-CoV S (Figure 1B-E). Whereas the B6 epitope is not resolved in any prefusion coronavirus S structures determined to date, the stem helix region directly upstream is resolved to a much greater extent for SARS-CoV-2 S and SARS-CoV S, indicating a rigid structure(Gui et al., 2017; Kirchdoerfer et al., 2018; Walls et al., 2020b; Walls et al., 2019; Yuan et al., 2017) compared to MERS-CoV S (Pallesen et al., 2017; Park et al., 2019; Walls et al., 2019; Yuan et al., 2017), OC43 S (Tortorici et al., 2019), HKU1 S (Kirchdoerfer et al., 2016) or MHV S (Walls et al., 2016a) (Figure 4A-C). Furthermore, we determined B6 Fab binding affinities of 0.3 µM and 1.5 µM for MERS-CoV S and OC43 S, respectively, whereas SARS-CoV S recognition was too weak to accurately quantitate (Figure S5). These findings along with the largely hydrophobic nature of the B6 epitope, which is expected to be occluded in the center of a 3-helix bundle (Figure 4 D-E) (as is the case for the region directly N-terminal to it), suggest that B6 recognizes a cryptic epitope and that binding to S trimers is modulated (at least in part) by the quaternary structure of the stem. The reduced conformational dynamics of the SARS-CoV-2 S and SARS-CoV S stem helix quaternary structure is expected to limit B6 accessibility to its cryptic epitope relative to other coronavirus S glycoproteins (Figure 4A-E). This hypothesis is supported by the correlation between neutralization potency and binding affinity which likely explains the lack of neutralization of lineage B β-coronaviruses. Analysis of the postfusion mouse hepatitis S (Walls et al., 2017), SARS-CoV-2 S (Cai et al., 2020) and SARS-CoV S (Fan et al., 2020) structures show that the B6 epitope is buried at the interface with the other two protomers of the rod-shaped trimer. As a result, B6 binding appears to be incompatible with adoption of the postfusion S conformation (Figure 4F). Collectively, the data presented here suggest that B6 binding .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 sterically interferes with S fusogenic conformational changes and likely block viral entry through inhibition of membrane fusion (Figure 4C-F), as proposed for fusion machinery- directed mAbs against influenza virus(Corti et al., 2011), ebolavirus(King et al., 2019) or HIV(Kong et al., 2016). Figure 4. B6 binding disrupts the stem helix bundle and sterically inhibits membrane fusion. (A) CryoEM map of prefusion SARS-CoV-2 S (EMD-21452) filtered at 6 Å resolution to emphasize the intact trimeric stem helix bundle. (B) CryoEM map of the MERS-CoV S–B6 complex showing a disrupted stem helix bundle. (C) Model of B6- induced S stem movement obtained through comparison of the apo SARS-CoV-2 S and B6-bound MERS-CoV S structures. (D-F) Proposed mechanism of inhibition mediated by the B6 mAb. B6 binds to the hydrophobic core (red) of the stem helix bundle and disrupts .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 its quaternary structure (D-E). The B6 disrupted state likely prevents S2 subunit refolding from the pre- to the post-fusion state and blocks viral entry (F). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 Discussion The high sequence variability of viral glycoproteins was long considered as an unsurmountable obstacle to the development of mAb therapies or vaccines conferring broad protection(Corti and Lanzavecchia, 2013). The identification of broadly neutralizing mAbs targeting conserved HIV-1 envelope epitopes from infected individuals brought about a paradigm shift for this virus undergoing extreme antigenic drift(Huang et al., 2012; Kong et al., 2016; Scheid et al., 2009; Walker et al., 2011; Walker et al., 2009; Wu et al., 2010; Zhou et al., 2010). Heterotypic influenza virus neutralization was also described for human cross-reactive mAbs recognizing the hemagglutinin receptor-binding site or the fusion machinery(Corti et al., 2011; Dreyfus et al., 2012; Ekiert et al., 2011; Ekiert et al., 2012; Kallewaard et al., 2016; Whittle et al., 2011). These findings were paralleled by efforts to identify broadly neutralizing Abs against respiroviruses(Corti et al., 2013), henipaviruses(Dang et al., 2019; Mire et al., 2019; Zhu et al., 2006), Dengue and Zika viruses(Barba-Spaeth et al., 2016; Dejnirattisai et al., 2015; Rouvinski et al., 2015) or ebolaviruses(Bornholdt et al., 2016; Flyak et al., 2018; King et al., 2019; West et al., 2018). The genetic diversity of coronaviruses circulating in chiropteran and avian reservoirs along with the recent emergence of multiple highly pathogenic coronaviruses showcase the need for vaccines and therapeutics that protect humans against a broad range of viruses. As the S2 fusion machinery contains several important antigenic sites and is more conserved than the S1 subunit, it is an attractive target for broad-coronavirus neutralization(Tortorici and Veesler, 2019; Walls et al., 2016a). Previous studies described conserved epitopes targeted by neutralizing Abs, such as the fusion peptide or heptad-repeats, as well as a variable loop in the MERS-CoV S connector domain (Daniel et al., 1993; Elshabrawy et al., 2012; Pallesen et al., 2017; Poh et al., 2020; Walls et al., 2016a; Wec et al., 2020; Zhang et al., 2004; Zheng et al., 2020). The discovery of the B6 mAb provides proof-of-concept of mAb-mediated broad β-coronavirus neutralization and uncovers a previously unknown conserved cryptic epitope that is predicted to be located in the hydrophobic core of the stem helix. B6 cross-reacts with at least eight distinct S glycoproteins, from β-coronaviruses belonging to lineages A, B and C, and broadly .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 neutralize two human and one bat pseudotyped viruses from lineages A and C. B6 could be used for detection or diagnostic of coronavirus infection and humanized versions of this mAb are promising candidate therapeutics against emerging and re-emerging β- coronaviruses from lineages A and C. Our data further suggest that affinity maturation of B6 using SARS-CoV-2 S and SARS-CoV S might enhance recognition of and extend neutralization breadth towards β-coronaviruses from lineage B. Finally, the identification of the conserved B6 epitope paves the way for epitope-focused vaccine design(Azoitei et al., 2011; Correia et al., 2014; Sesterhenn et al., 2020) that could elicit pan-coronavirus immunity, as supported by the elicitation of the B6 mAb through vaccination and the recent findings that humans and camels infected with MERS-CoV, humans infected with SARS-CoV-2 and humanized mice immunized with a cocktail of coronavirus S glycoproteins produce antibodies targeting an epitope similar to the one targeted by B6(Song et al., 2020; Wang et al., 2020b). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 Acknowledgments We thank Hideki Tani (University of Toyama) for providing the reagents necessary for preparing VSV pseudotyped viruses and Brooke Fiala for assisting with protein production. This study was supported by the National Institute of General Medical Sciences (R01GM120553 to D.V.), the National Institute of Allergy and Infectious Diseases (DP1AI158186 and HHSN272201700059C to D.V.), a Pew Biomedical Scholars Award (D.V.), an Investigators in the Pathogenesis of Infectious Disease Awards from the Burroughs Wellcome Fund (D.V.), a Fast Grants (D.V.), the University of Washington Arnold and Mabel Beckman cryoEM center, the Swiss National Science Foundation (P400PB_183942 to M.M.S.), the Pasteur Institute (M.A.T.) the M.J. Murdock Charitable Trust (A.T.M and B.H.), and beamlines 8.2.1 and 5.0.1 at the Advanced Light Source at Lawrence Berkley National Laboratory. Declaration of interests M.M.S, M.A.T., Y.J.P., A.C.W, A.T.M. and D.V. are named as inventors on patent applications filed by the University of Washington based on the studies presented in this paper. D.V. is a consultant for Vir Biotechnology Inc. The Veesler laboratory has received an unrelated sponsored research agreement from Vir Biotechnology Inc. The other authors declare no competing interests. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 Supplementary Information Table 1. CryoEM data collection and refinement statistics. B6/MERS-CoV-S (C3 map, post polishing) B6/MERS-CoV-S (C1 map, before polishing) Data collection Magnification 130,000 130,000 Voltage (kV) 300 300 Total exposure (e-/Å2) 70 70 Defocus range (µm) -0.5 to -3.0 -0.5 to -3.0 Pixel size (Å) 1.05 1.05 Initial particle stack 317,017 317,017 Final particle stack 144,792 32,687 Map resolution (0.143 FSC threshold) (Å) 2.5 4.7 Map B-factor -67.7 -153.2 Symmetry C3 C1 Model Refinement Model resolution (0.5 FSC threshold) (Å) 2.6 Model composition Nonhydrogen atoms 56,166 Protein residues 3477 Ligand 135 Mean B-factors (Å2) Protein 11.99 Ligand 18.93 R.M.S. deviations Bond lengths (Å) 0.017 Bond angles (°) 1.328 Validation Molprobity Score 0.69 Clash score 0.57 Rotamer outliers (%) 0.00 Ramachandran Favored (%) 98.00 Allowed (%) 99.81 Disallowed (%) 0.00 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 Table 2. X-ray crystallography data collection and refinement statistics. Complex B6/MERS-CoV 11aa B6/MERS-CoV 15aa B6/HKU4 B6/OC43 B6/SARS- CoV/SARS-CoV-2 Data collection Space group C 1 2 1 C 1 2 1 C 1 2 1 C 1 2 1 C 1 2 1 Cell constants a,b,c (Å) 95.06, 60.99, 80.26 93.591, 60.444, 79.71 93.59, 60.6, 79.77 92.99, 60.49, 79.39 93.18, 60.36, 79.70 a,b,g (˚) 90, 93.62, 90 90, 93.748, 90 90, 93.80, 90 90, 94.75, 90 90, 93.63, 90 Wavelength (Å) 1.000030 0.977410 0.977410 0.977410 0.977410 Resolution (Å) 43.89 -1.55 (1.61-1.55) 43.49-1.35 (1.4-1.35) 43.56-1.5 (1.55-1.5) 43.57-1.8 (1.86-1.8) 46.5-1.4 (1.45-1.4) Rmerge (%) 5.336 (49.25) 3.514 (62.63) 3.013 (49.45) 4.765 (44.28) 1.843 (39.76) I/s(I) 6.78 (1.34) 18.12 (1.44) 9.13 (1.08) 8.25 (1.20) 12.70 (1.28) CC(1/2) 0.994 (0.64) 1 (0.547) 0.999 (0.607) 0.998 (0.624) 1 (0.816) Completeness (%) 99.91 (99.97) 98.90 (93.96) 96.77 (94.19) 99.07 (97.35) 98.61 (95.06) Redundancy 2.0 (2.0) 3.2 (2.5) 1.9 (1.9) 1.9 (1.9) 1.9 (1.9) Refinement Resolution (Å) 43.89 -1.55 43.49-1.35 43.56-1.5 43.57-1.8 46.5-1.4 Unique reflections 66,514 95,828 69,063 40,478 85,691 Rwork/Rfree (%) 14.49/19.28 17.73/20.60 16.25/19.34 17.35/22.74 14.05/17.36 Number of protein atoms 3580 3574 3601 3562 3631 Number of waters 514 554 513 555 534 R.m.s.d. bond lengths (Å) 0.015 0.013 0.016 0.006 0.013 R.m.s.d. bond angles (˚) 1.29 1.31 1.48 0.88 1.35 Ramachandran favored (%) 97.75 98.64 98.65 97.5 98.19 Ramachandran allowed (%) 2.25 1.36 1.35 2.5 1.81 Ramachandran outliers (%) 0 0 0 0 0 aNumbers in parentheses refer to outer resolution shell .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 Figure S1. MERS-CoV S, SARS-CoV S and SARS-CoV-2 S pseudotyped virus neutralization. Neutralization assays of MLV (A-C) or VSV (D-F) particles pseudotyped with (A,D) MERS-CoV S (B,E) SARS-CoV S and (C,F) SARS-CoV-2 S were performed in the presence of the indicated concentration of B6 mAb. Data were evaluated using a non-linear sigmoidal regression model with variable Hill slope. Experiments were performed in triplicates with at least two independent mAb and pseudotyped virus preparations. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 Figure S2. CryoEM characterization of the B6-bound MERS-CoV S complex. (A) Representative cryoEM micrograph of the MERS-CoV S prefusion trimer bound to B6 embedded in vitreous ice. Scale bar: 20 nm. (B) Selected reference-free 2D class averages. Scale bar: 20 nm. (C) Fourier shell correlation curves for the reconstructions shown in panels D and E. (D) Reconstruction obtained with all selected particles and applying C3 symmetry colored by local resolution. (E) Reconstruction obtained with a subset of particles obtained through focused classification to improve B6 resolvability colored by local resolution. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 Figure S3. Protein sequence alignment of the stem region for selected β- coronavirus S glycoproteins. The sequence alignment was performed based on MERS-CoV S using the following S protein sequences: MERS-CoV EMC/2012 (GenBank: AFS88936.1), HKU4 (UniProtKB: A3EX94.1), HKU5 (UniProtKB: A3EXD0.1), HKU1 isolate N5 (UniProtKB: Q0ZME7.1), MHV A59 (UniProtKB: P11224.2), OC43 (UniProtKB: Q696P8), SARS-CoV Urbani (GenBank: AAP13441.1), SARS-CoV-2 (NCBI Reference Sequence: YP_009724390.1). Sequence alignment was performed using Multalin(Corpet, 1988) and visualized using ESPrint3.0(Robert and Gouet, 2014). The conserved stem helix recognized by B6 is indicated. Figure S4. Crystal structures of B6 bound to coronavirus S stem helix peptides. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 Stem peptides of (A) MERS-CoV S (B) OC43 S (C) SARSCoV/SARS-CoV-2 S and (D) HKU4 S are shown in stick representation with carbon atoms colored yellow. B6 is shown in ribbon representation with interacting residues rendered as stick representation in gray. Oxygen and nitrogen atoms are colored red and blue, respectively. The 2Fo-Fc maps for the different peptides are shown as a blue mesh at a contour level of 1 σ. Figure S5. B6 binding kinetics to different coronavirus S ectodomain trimers. A-C) Binding of B6 to immobilized (A) MERS-CoV S, (B) OC43 S and (C) SARS-CoV S measured by biolayer interferometry. The vertical dotted lines correspond to the transition between the association and dissociation phases. Data are shown for one representative measurement and were analyzed with the OctetBio software. Global fits are shown as dashed lines. We determined dissociation constant (KD) values of 0.28 (0.2) ± 0.001 and 1.50 (1.47) ± 0.01 µM for two independent batches of S protein for MERS-CoV S and OC43 S, respectively. The dissociation constant for SARS-CoV S could not be evaluated reliably, however, the predicted affinity is significantly lower compared to the other two S proteins. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 Methods Identification of the B6 broadly neutralizing mAb Ten-week-old CD-1 mice were injected twice with 50 µg of MERS-CoV S formulated with Adjuplex at weeks 0 and 2 and once with 50 µg of SARS-CoV S formulated with Adjuplex at week 8 at the Fred Hutchinson Cancer Research Center Antibody Technology Resource. 3 days after the final injection splenocytes were isolated from high titer mice and electrofused with P3x63-Ag8 myeloma cell line (BTX, Harvard Apparatus). Hybridoma supernatants were tested for binding to prefusion SARS-CoV S, MERS-CoV S, SARS-CoV S1 subunit, MERS-CoV S1 subunit and respiratory syncytial virus F (which harbors a foldon motif and a his tag similar to the SARS-CoV S and MERS-CoV S ectodomain trimer constructs) using a high throughput bead-based binding array. Hybridomas from wells containing supernatants that were positive for binding to prefusion SARS-CoV S and MERS-CoV S but negative for SARS-CoV S1, MERS-CoV S1, and respiratory syncytial virus F were sub-cloned by limiting dilution and re-screened for binding as above. The VH and VL sequences of B6 were recovered using the mouse iG primer set (Millipore) using the protocol outlined in (Siegel, 2009), and Sanger sequenced (Genewiz). The VH/VL sequences were codon-optimized and cloned into full-length pTT3 derived IgG1 and IgL kappa expression vectors containing human constant regions using Gibson assembly (Snijder et al., 2018). Protein expression and purification MERS-CoV 2P S, OC43 S, SARS-CoV 2P S and SARS-CoV-2 2P S were produced as previously described (Tortorici et al., 2019; Walls et al., 2020b; Walls et al., 2019). Briefly, all ectodomains were produced in HEK293F cells grown in suspension using FreeStyle 293 expression medium (Life Technologies) at 37 °C in a humidified 8% (v/v) CO2 incubator rotating at 130 r.p.m. The cultures were transfected using 293fectin (ThermoFisher Scientific) with cells grown to a density of 106 cells/ml and cultivated for three days. The supernatants were harvested and cells resuspended for another three days, yielding two harvests. For MERS-CoV 2P S, SARS-CoV 2P S and SARS-CoV-2 2P S, clarified supernatants were purified using a 5 ml Cobalt affinity column (Takara). HCoV- OC43 S was purified using a StrepTrap HP column (GE healthcare). Purified proteins .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 were concentrated, flash-frozen in Tris-saline (50 mM Tris, pH 8.0 (25°C), 150 mM NaCl) and stored at -80°C. The MERS-CoV S1-Fc and SARS-CoV S1-Fc were previously described (Raj et al., 2013), produced as aforementioned for the prefusion S trimers and purified using protein A affinity chromatography. For mAb B6 production, 250 µg of B6 heavy and 250 µg of B6 light chain encoding plasmids were co-transfected per liter of suspended HEK293F culture using 293Free transfection reagent (Millipore Sigma) according to the manufacturer’s instructions. Cells were transfected at a density of 106 cells/ml. Expression was carried out for 6 days after which cells and cellular debris were removed by centrifugation at 4,000 × g followed by filtration through a 0.22 µm filter. Clarified cell supernatant containing recombinant mAb was passed over Protein A Agarose resin (Thermo Fisher Scientific). Protein A resin was extensively washed with 25 mM Phosphate pH 7.4, 150 mM NaCl (PBS) and eluted with IgG elution buffer (Thermo Scientific). Purified B6 was extensively dialyzed against PBS, concentrated, flash-frozen and stored at -80°C. DS-Cav1-foldon-SpyTag (McLellan et al., 2013) was produced by lentiviral transduction of HEK293F cells using the Daedalus system (Bandaranayake et al., 2011). Lentivirus was produced by transient transfection of HEK293T cells (ATCC) using linear 25 kDa polyethyleneimine (PEI; Polysciences). Briefly, 4×10^6 cells were plated onto 10 cm tissue culture plates. After 24 h, 3 mg of psPAX2, 1.5 mg of pMD2G (Addgene plasmids #12260 and #12259, respectively), and 6 mg of lentiviral vector plasmid were mixed in 500 mL diluent (5 mM HEPES, 150 mM NaCl, pH 7.5) and 42 mL of PEI (1 mg/mL) and incubated for 15 minutes. The DNA/PEI complex was then added to the plate dropwise. Lentivirus was harvested 48 h post-transfection and concentrated 100× by centrifugation at 8000×g for 18 h. Transduction of the target cell line was carried out in 125 mL shake flasks containing 10×10^6 cells in 10 mL of growth media. 100 μL of 100× lentivirus was added to the flask and the cells were incubated with 225 rpm oscillation at 37°C in 8% CO2 for 4–6 hours, after which 20 mL of growth media was added to the shake flask. Transduced cells were expanded every other day to a density of 1×10^6 cells/mL until a final culture size of 4 L was reached. The media was harvested after 17 days of total incubation after measuring final cell concentration (~5×10^6 cells/mL) and viability (~90% .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 viable). Culture supernatant was harvested by low-speed centrifugation to remove cells from the supernatant. NaCl and NaN3 were added to final concentrations of 250 mM and 0.02%, respectively. The supernatant was loaded over one 5 mL HisTrap FF Crude column (GE Healthcare) at 5 mL/min by an AKTA Pure (GE Healthcare). The 5 mL HisTrap column was washed with 10 column volumes of wash buffer (2× GIBCO 14200- 075 PBS, 5 mM Imidazole, pH 7.5) followed by 6 column volumes of elution buffer (2× GIBCO 14200-075 PBS, 150 mM Imidazole, pH 7.5). The nickel elution was applied to a HiLoad 16/600 Superdex 200 pg column (GE Healthcare) and run in dPBS (GIBCO 14190-144) with 5% glycerol (Thermo BP229-1) to further purify the target protein by size- exclusion chromatography. The purified protein was snap frozen in liquid nitrogen and stored at -80°C. Kinetics of B6 mAb binding to coronavirus S proteins The avidities of complex formation between B6 mAb and selected coronavirus S proteins were determined in PBS supplemented with 0.005 % Tween20 and 0.1% BSA (PBSTB) at 30 °C and 1,000 RPM shaking on an Octet Red instrument (Fortebio). Curve fitting was performed using a 1:1 binding model and the ForteBio data analysis software. KD ranges were determined with a global fit. AHC biosensors (ForteBio) were hydrated in water and subsequently equilibrated in PBSTB buffer. 10 μg/mL B6 mAb was loaded to the biosensors to a shift of approximately 1nm. Then, the system was equilibrated in PBSTB buffer for 300 s prior to immersing the sensors in the respective coronavirus S protein (0 - 218 nM) for up to 600 s prior to dissociation in buffer for additional 600 s. Binding of B6 to different synthetic coronavirus S stem peptides B6 binding analysis to selected biotinylated coronavirus S stem helix peptides was performed in PBS supplemented with 0.005 % Tween20 (PBST) at 30 °C and 1,000 RPM shaking on an Octet Red instrument (Fortebio). 1 µg/ml biotinylated stem peptide (15- or 16-residue long stem peptide-PEG6-Lys-Biotin synthesized fom Genscript) was loaded on SA biosensors to a threshold of 0.5 nm. Then, the system was equilibrated in PBST for 300 s prior to immersing the sensors in 0.1 µM B6 mAb or 1 µM B6 Fab, respectively, for 300 s prior to dissociation in buffer for 300 s. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Kinetics of B6 Fab binding to different coronavirus S proteins The rate constants of binding (kon) and dissociation (koff) for the complex between the B6 Fab and selected coronavirus S proteins were performed in PBST at 30 °C and 1,000 RPM shaking on an Octet Red instrument (Fortebio). Global curve fitting was performed using a 1:1 binding model and the ForteBio data analysis software. For MERS-CoV S and SARS-CoV S, HIS1K or Ni-NTA biosensors (ForteBio) were hydrated in water and subsequently equilibrated in PBST buffer. 20 μg/mL SARS-CoV S or 10 μg/mL MERS- CoV S, respectively, were loaded to the biosensors for up to 1800 s (1- 4nm shift). The system was equilibrated in PBST for 300 s prior to immersing the sensors in B6 Fab (0 - 16 µM) for up to 1800 s prior to dissociation in buffer for 1800 s. For OC43 S, ARG2 biosensors were hydrated in water then activated for 300 s with an NHS-EDC solution (ForteBio) prior to amine coupling. 20 μg/mL OC43 was amine coupled to AR2G (ForteBio) sensors in 10 mM acetate pH 6.0 (ForteBio) respectively for 300 s and then quenched with 1M ethanolamine (ForteBio) for 300 s. The system was equilibrated in PBST for 300 s prior to immersing the sensors in B6 Fab (0 - 4 µM) for 75 s prior to dissociation in buffer for 75 s. Pseudovirus entry assays Production of OC43 S pseudotyped VSV virus and the neutralization assay was performed as described previously (Hulswit et al., 2019; Tortorici et al., 2019). Briefly, HEK-293T cells at 70~80% confluency were transfected with the pCAGGS expression vectors encoding full-length OC43 S with a truncation of the 17 C-terminal residues (to increase cell surface expression levels) along with fusion to a flag tag and the Fc-tagged bovine coronavirus hemagglutinin esterase protein at molar ratios of 8:1. 48 h after transfection, cells were transduced with VSV∆G/Fluc (bearing the Photinus pyralis firefly luciferase) (Kaname et al., 2010) at a multiplicity of infection of 1. Twenty-four hours later, supernatant was harvested and filtered through 0.45 μm membrane. Pseudotyped VSV virus was titrated on monolayer on HRT-18 cells. In the virus neutralization assay, serially diluted mAbs were pre-incubated with an equal volume of virus at room temperature for .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 1 h, and then inoculated on HRT-18 cells, and further incubated at 37˚C. After 20 h, cells were washed once with PBS, lysed with cell lysis buffer (Promega) and firefly luciferase expression was measured on a Berthold Centro LB 960 plate luminometer using D- luciferin as a substrate (Promega). Percentage of infectivity was calculated as the ratio of luciferase readout in the presence of mAbs normalized to luciferase readout in the absence of mAb, and half maximal inhibitory concentrations (IC50) were determined using 4-parameter logistic regression (GraphPad Prism v8.0). MERS-CoV S, SARS-CoV S and SARS-CoV-2 S pseudotyped VSV were prepared using 293T cells seeded in 10-cm dishes in DMEM supplemented with 10% FBS, 1% PenStrep and transfected with plasmids encoding for the corresponding S glycoprotein (24 µg/dish) using lipofectamine 2000 (Life Technologies) according to the manufacturer’s instructions. One day post-transfection, cells were infected with VSV(G*ΔG-luciferase). After 2 h, infected cells were washed four times with DMEM before medium supplemented with anti-VSV-G antibody (I1- mouse hybridoma supernatant diluted 1 to 50, from CRL- 2700, ATCC). Particles were harvested 18 h post-inoculation, clarified from cellular debris by centrifugation at 2,000 x g for 5 min and used for neutralization experiments. MERS-CoV S, SARS-CoV S, and SARS-CoV-2 S pseudotypes MLV were prepared as previously described (Walls et al., 2020b). For viral neutralization, Huh7 cells (for MERS-CoV S pseudotyped virus) or stable 293T cells expressing ACE2 (Crawford et al., 2020) (for SARS-CoV S and SARS-CoV-2 S pseudotyped viruses) in DMEM supplemented with 10% FBS, 1% PenStrep were seeded at 40,000 cells/well into clear bottom white walled 96-well plates and cultured overnight at 37°C. Twelve-point 3-fold serial dilutions of B6 mAb were prepared in DMEM and pseudotyped VSV were added 1:1 to each B6 dilution in the presence of anti-VSV-G mAb from I1- mouse hybridoma supernatant diluted 50 times. After 45 min incubation at 37 ̊C, 40 µl of the mixture was added to the cells and 2 h post-infection, 40 μL DMEM was added to the cells for 17-20 h. Following infection, medium was removed and 80 μL One- Glo-EX substrate (Promega) was added to the cells and incubated in the dark for 10 min prior reading on a Varioskan LUX plate reader (ThermoFisher). Western blots .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 SDS–PAGE (4x) loading buffer was added to all concentrated pseudovirus samples. The samples were run on a 4–20% (wt/vol) gradient Tris-glycine gel (BioRad) and transferred to PVDF membranes. B6 was used as primary Ab (1:500 dilution) and an Alexa Fluor 680-conjugated goat anti-human secondary Ab (1:50,000 dilution, Jackson Laboratory) were used for western blotting. A LI-COR processor was used to develop images. CryoEM sample preparation and data collection. Lacey carbon copper grids (400 mesh) were coated with a thin-layer of continuous carbon using a carbon evaporator. 1 mg/ml MERS-CoV S was incubated with 100 mM neuraminic acid (to promote the closed trimer conformation), 150mM Tris pH 8 (25°C) 150 mM NaCl for 16 h at 4°C. Then a 2-fold molar excess of B6 Fab over MERS-CoV S protomer was added to the solution and incubated for 1h at 37°C. The sample was diluted to 0.2 mg/ml S protein with 100 mM neuraminic acid-150mM Tris pH 8 (25°C) 150mM NaCl before 3 µl sample were applied on to a freshly glow discharged grid. Plunge freezing was performed using a TFS Vitrobot Mark IV (blot force: -1, blot time: 2.5 s, Humidity: 100 %, temperature: 25 °C). Data were acquired using an FEI Titan Krios transmission electron microscope operated at 300 kV and equipped with a Gatan K2 Summit direct detector and Gatan Quantum GIF energy filter, operated in zero-loss mode with a slit width of 20 eV. Automated data collection was carried out using Leginon (Suloway et al., 2005) at a nominal magnification of 130,000x with a pixel size of 0.525Å. The dose rate was adjusted to 8 counts/pixel/s, and each movie was acquired in super- resolution mode fractionated in 50 frames of 200 ms. 2,180 micrographs were collected in a single session with a defocus range comprised between -0.5 and -3.0 μm. CryoEM data processing Movie frame alignment, estimation of the microscope contrast-transfer function parameters, particle picking and extraction were carried out using Warp (Tegunov and Cramer, 2019). Particle images were extracted with a box size of 800 pixels2 binned to 400 pixels2 yielding a pixel size of 1.05 Å. Two rounds of reference-free 2D classification were performed using Relion3.0 (Zivanov et al., 2018) to select well-defined particle images. Subsequently, two rounds of 3D classification with 50 iterations each (angular .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 sampling 7.5° for 25 iterations and 1.8° with local search for 25 iterations), using the previously reported closed MERS-CoV S structure without the G4 Fab (PDB 5W9J) as initial model were carried out using Relion without imposing symmetry. For the high resolution map, particle images were subjected to Bayesian polishing (Zivanov et al., 2019) before performing non-uniform refinement, defocus refinement and non-uniform refinement again in cryoSPARC (Punjani et al., 2017). Finally, two rounds of global CTF refinement of beam-tilt, trefoil and tetrafoil parameters was performed before a final round of non-uniform refinement to produce the 2.5Å resolution map. For the lower resolution map, one additional round of focused classification in Relion with 50 iterations using a broad mask covering the region of interest (B6/stem) was carried out to further separate distinct B6 Fab conformations. 3D refinements of the best subclasses were carried out using homogenous refinement in cryoSPARC (Punjani et al., 2017). Reported resolutions are based on the gold-standard Fourier shell correlation (FSC) of 0.143 criterion and Fourier shell correlation curves were corrected for the effects of soft masking by high-resolution noise substitution (Chen et al., 2013). CryoEM model building and analysis UCSF Chimera (Pettersen et al., 2004) and Coot (Emsley et al., 2010) were used to fit atomic models into the cryoEM maps. The MERS-CoV S EM structure in complex with 5- N-acetyl neuraminic acid (PDB 6Q04, residue 18-1224) and the B6-MERS-CoV11 (residue 1230-1240) crystal structure were fit into the cryoEM map. Subsequently the linker connecting the stem helix to the rest of the MERS-CoV S ectodomain (residue 1225-1229) was manually built using Coot. N-linked glycans were hand-built into the density where visible and the models were refined and relaxed using Rosetta using both sharpened and unsharpened maps (Frenz et al., 2019; Wang et al., 2016). Models were analyzed using MolProbity (Chen et al., 2010), EMringer (Barad et al., 2015), Phenix (Liebschner et al., 2019) and privateer (Agirre et al., 2015) to validate the stereochemistry of both the protein and glycan components. Figures were generated using UCSF Chimera. Crystallization and structure determination .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 All crystallization experiments were performed at 23 °C in hanging drop vapor diffusion experiments with initial concentrations of 20 mg/ml and 1.5-fold molar excess of peptide ligand. Crystal trays were setup with a mosquito using 100 nL mother liquor solution and 100 or 150 nL B6/peptide complex solution, respectively. Crystals of B6/MERS-CoV11 and B6/OC4315 appeared after several weeks in 0.2 M Potassium Thiocyanate and 20% (w/v) PEG3350, B6/MERS-CoV15 in 0.2 M Magnesium Chloride and 20% (w/v) PEG3350, B6/HKU415 in 0.6 M Sodium Chloride, 0.1 M MES-NaOH, pH 6.5 and 20% (w/v) PEG 4000, B6-SARS-CoV/SARS-CoV-216 in 0.2 M Potassium Chloride and 20% (w/v) PEG3350. Crystals were cryoprotected by addition of glycerol to a final concentration of 25% (v/v) and flash cooled in liquid nitrogen. Diffraction data were collected at the beamlines 8.2.1 and 5.0.1 (Advanced Light Source, Berkeley, USA). All data were integrated, indexed and scaled using mosflm (Battye et al., 2011) and Aimless (Evans and Murshudov, 2013) or XDS (Kabsch, 2010). The structures were solved by molecular replacement using Phaser (McCoy et al., 2007) and the S230 Fab (PDB 6NB8) or B6 Fab without ligand as a search model. Model building was performed with Coot (Emsley et al., 2010) and structure refinement with Buster (Blanc et al., 2004) and Phenix (Liebschner et al., 2019). Validation used Molprobity (Chen et al., 2010) and Phenix (Liebschner et al., 2019). Data availability The atomic coordinates and cryoEM maps will be deposited to the Protein Data Bank and Electron Microscopy Data Bank. References Agirre, J., Iglesias-Fernández, J., Rovira, C., Davies, G.J., Wilson, K.S., and Cowtan, K.D. (2015). Privateer: software for the conformational validation of carbohydrate structures. Nat Struct Mol Biol 22, 833-834. Alsoussi, W.B., Turner, J.S., Case, J.B., Zhao, H., Schmitz, A.J., Zhou, J.Q., Chen, R.E., Lei, T., Rizk, A.A., McIntire, K.M., et al. (2020). A Potently Neutralizing Antibody Protects Mice against SARS-CoV-2 Infection. J Immunol. Anthony, S.J., Gilardi, K., Menachery, V.D., Goldstein, T., Ssebide, B., Mbabazi, R., Navarrete- Macias, I., Liang, E., Wells, H., Hicks, A., et al. (2017). Further Evidence for Bats as the Evolutionary Source of Middle East Respiratory Syndrome Coronavirus. MBio 8. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 Azoitei, M.L., Correia, B.E., Ban, Y.E., Carrico, C., Kalyuzhniy, O., Chen, L., Schroeter, A., Huang, P.S., McLellan, J.S., Kwong, P.D., et al. (2011). Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science 334, 373-376. Bandaranayake, A.D., Correnti, C., Ryu, B.Y., Brault, M., Strong, R.K., and Rawlings, D.J. (2011). Daedalus: a robust, turnkey platform for rapid production of decigram quantities of active recombinant proteins in human cell lines using novel lentiviral vectors. Nucleic Acids Res 39, e143. Barad, B.A., Echols, N., Wang, R.Y., Cheng, Y., DiMaio, F., Adams, P.D., and Fraser, J.S. (2015). EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat Methods 12, 943-946. Barba-Spaeth, G., Dejnirattisai, W., Rouvinski, A., Vaney, M.C., Medits, I., Sharma, A., Simon- Lorière, E., Sakuntabhai, A., Cao-Lormeau, V.M., Haouz, A., et al. (2016). Structural basis of potent Zika-dengue virus antibody cross-neutralization. Nature 536, 48-53. Barnes, C.O., Jette, C.A., Abernathy, M.E., Dam, K.-M.A., Esswein, S.R., Gristick, H.B., Malyutin, A.G., Sharaf, N.G., Huey-Tubman, K.E., Lee, Y.E., et al. (2020a). Structural classification of neutralizing antibodies against the SARS-CoV-2 spike receptor-binding domain suggests vaccine and therapeutic strategies. bioRxiv, 2020.2008.2030.273920. Barnes, C.O., West, A.P., Huey-Tubman, K.E., Hoffmann, M.A.G., Sharaf, N.G., Hoffman, P.R., Koranda, N., Gristick, H.B., Gaebler, C., Muecksch, F., et al. (2020b). Structures of Human Antibodies Bound to SARS-CoV-2 Spike Reveal Common Epitopes and Recurrent Features of Antibodies. Cell. Battye, T.G., Kontogiannis, L., Johnson, O., Powell, H.R., and Leslie, A.G. (2011). iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta Crystallogr D Biol Crystallogr 67, 271-281. Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S.M., and Bricogne, G. (2004). Refinement of severely incomplete structures with maximum likelihood in BUSTER-TNT. Acta Crystallogr D Biol Crystallogr 60, 2210-2221. Bornholdt, Z.A., Turner, H.L., Murin, C.D., Li, W., Sok, D., Souders, C.A., Piper, A.E., Goff, A., Shamblin, J.D., Wollen, S.E., et al. (2016). Isolation of potent neutralizing antibodies from a survivor of the 2014 Ebola virus outbreak. Science 351, 1078-1083. Brouwer, P.J.M., Caniels, T.G., van der Straten, K., Snitselaar, J.L., Aldon, Y., Bangaru, S., Torres, J.L., Okba, N.M.A., Claireaux, M., Kerster, G., et al. (2020). Potent neutralizing antibodies from COVID-19 patients define multiple targets of vulnerability. Science. Cai, Y., Zhang, J., Xiao, T., Peng, H., Sterling, S.M., Walsh, R.M., Rawson, S., Rits-Volloch, S., and Chen, B. (2020). Distinct conformational states of SARS-CoV-2 spike protein. Science 369, 1586-1592. Case, J.B., Rothlauf, P.W., Chen, R.E., Kafai, N.M., Fox, J.M., Smith, B.K., Shrihari, S., McCune, B.T., Harvey, I.B., Keeler, S.P., et al. (2020). Replication-Competent Vesicular Stomatitis Virus Vaccine Vector Protects against SARS-CoV-2-Mediated Pathogenesis in Mice. Cell Host Microbe 28, 465-474.e464. Chen, S., McMullan, G., Faruqi, A.R., Murshudov, G.N., Short, J.M., Scheres, S.H., and Henderson, R. (2013). High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24-35. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 Chen, V.B., Arendall, W.B., Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66, 12-21. Corbett, K.S., Edwards, D.K., Leist, S.R., Abiona, O.M., Boyoglu-Barnum, S., Gillespie, R.A., Himansu, S., Schäfer, A., Ziwawo, C.T., DiPiazza, A.T., et al. (2020). SARS-CoV-2 mRNA vaccine design enabled by prototype pathogen preparedness. Nature 586, 567-571. Corpet, F. (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 16, 10881-10890. Correia, B.E., Bates, J.T., Loomis, R.J., Baneyx, G., Carrico, C., Jardine, J.G., Rupert, P., Correnti, C., Kalyuzhniy, O., Vittal, V., et al. (2014). Proof of principle for epitope-focused vaccine design. Nature 507, 201-206. Corti, D., Bianchi, S., Vanzetta, F., Minola, A., Perez, L., Agatic, G., Guarino, B., Silacci, C., Marcandalli, J., Marsland, B.J., et al. (2013). Cross-neutralization of four paramyxoviruses by a human monoclonal antibody. Nature 501, 439-443. Corti, D., and Lanzavecchia, A. (2013). Broadly neutralizing antiviral antibodies. Annu Rev Immunol 31, 705-742. Corti, D., Voss, J., Gamblin, S.J., Codoni, G., Macagno, A., Jarrossay, D., Vachieri, S.G., Pinna, D., Minola, A., Vanzetta, F., et al. (2011). A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 influenza A hemagglutinins. Science 333, 850-856. Corti, D., Zhao, J., Pedotti, M., Simonelli, L., Agnihothram, S., Fett, C., Fernandez-Rodriguez, B., Foglierini, M., Agatic, G., Vanzetta, F., et al. (2015). Prophylactic and postexposure efficacy of a potent human monoclonal antibody against MERS coronavirus. Proc Natl Acad Sci U S A 112, 10473-10478. Crawford, K.H.D., Eguia, R., Dingens, A.S., Loes, A.N., Malone, K.D., Wolf, C.R., Chu, H.Y., Tortorici, M.A., Veesler, D., Murphy, M., et al. (2020). Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays. Viruses 12. Dang, H.V., Chan, Y.P., Park, Y.J., Snijder, J., Da Silva, S.C., Vu, B., Yan, L., Feng, Y.R., Rockx, B., Geisbert, T.W., et al. (2019). An antibody against the F glycoprotein inhibits Nipah and Hendra virus infections. Nat Struct Mol Biol. Daniel, C., Anderson, R., Buchmeier, M.J., Fleming, J.O., Spaan, W.J., Wege, H., and Talbot, P.J. (1993). Identification of an immunodominant linear neutralization domain on the S2 portion of the murine coronavirus spike glycoprotein and evidence that it forms part of complex tridimensional structure. J Virol 67, 1185-1194. Dejnirattisai, W., Wongwiwat, W., Supasa, S., Zhang, X., Dai, X., Rouvinski, A., Jumnainsong, A., Edwards, C., Quyen, N.T.H., Duangchinda, T., et al. (2015). A new class of highly potent, broadly neutralizing antibodies isolated from viremic patients infected with dengue virus. Nat Immunol 16, 170-177. Dreyfus, C., Laursen, N.S., Kwaks, T., Zuijdgeest, D., Khayat, R., Ekiert, D.C., Lee, J.H., Metlagel, Z., Bujny, M.V., Jongeneelen, M., et al. (2012). Highly conserved protective epitopes on influenza B viruses. Science 337, 1343-1348. Drosten, C., Gunther, S., Preiser, W., van der Werf, S., Brodt, H.R., Becker, S., Rabenau, H., Panning, M., Kolesnikova, L., Fouchier, R.A., et al. (2003). Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med 348, 1967-1976. Edridge, A.W.D., Kaczorowska, J., Hoste, A.C.R., Bakker, M., Klein, M., Loens, K., Jebbink, M.F., Matser, A., Kinsella, C.M., Rueda, P., et al. (2020). Seasonal coronavirus protective immunity is short-lasting. Nat Med. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 Ekiert, D.C., Friesen, R.H., Bhabha, G., Kwaks, T., Jongeneelen, M., Yu, W., Ophorst, C., Cox, F., Korse, H.J., Brandenburg, B., et al. (2011). A highly conserved neutralizing epitope on group 2 influenza A viruses. Science 333, 843-850. Ekiert, D.C., Kashyap, A.K., Steel, J., Rubrum, A., Bhabha, G., Khayat, R., Lee, J.H., Dillon, M.A., O'Neil, R.E., Faynboym, A.M., et al. (2012). Cross-neutralization of influenza A viruses mediated by a single antibody loop. Nature 489, 526-532. Elshabrawy, H.A., Coughlin, M.M., Baker, S.C., and Prabhakar, B.S. (2012). Human monoclonal antibodies against highly conserved HR1 and HR2 domains of the SARS-CoV spike protein are more broadly neutralizing. PLoS One 7, e50366. Emsley, P., Lohkamp, B., Scott, W.G., and Cowtan, K. (2010). Features and development of Coot. Acta Crystallographica Section D 66, 486-501. Evans, P.R., and Murshudov, G.N. (2013). How good are my data and what is the resolution? Acta Crystallogr D Biol Crystallogr 69, 1204-1214. Fan, X., Cao, D., Kong, L., and Zhang, X. (2020). Cryo-EM analysis of the post-fusion structure of the SARS-CoV spike glycoprotein. Nat Commun 11, 3618. Flyak, A.I., Kuzmina, N., Murin, C.D., Bryan, C., Davidson, E., Gilchuk, P., Gulka, C.P., Ilinykh, P.A., Shen, X., Huang, K., et al. (2018). Broadly neutralizing antibodies from human survivors target a conserved site in the Ebola virus glycoprotein HR2-MPER region. Nat Microbiol 3, 670-677. Folegatti, P.M., Ewer, K.J., Aley, P.K., Angus, B., Becker, S., Belij-Rammerstorfer, S., Bellamy, D., Bibi, S., Bittaye, M., Clutterbuck, E.A., et al. (2020). Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, single- blind, randomised controlled trial. Lancet. Frenz, B., Rämisch, S., Borst, A.J., Walls, A.C., Adolf-Bryfogle, J., Schief, W.R., Veesler, D., and DiMaio, F. (2019). Automatically Fixing Errors in Glycoprotein Structures with Rosetta. Structure 27, 134-139.e133. Ge, X.Y., Li, J.L., Yang, X.L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., Hu, B., Zhang, W., Peng, C., et al. (2013). Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535-538. Guan, Y., Zheng, B.J., He, Y.Q., Liu, X.L., Zhuang, Z.X., Cheung, C.L., Luo, S.W., Li, P.H., Zhang, L.J., Guan, Y.J., et al. (2003). Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science 302, 276-278. Gui, M., Song, W., Zhou, H., Xu, J., Chen, S., Xiang, Y., and Wang, X. (2017). Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding. Cell Res 27, 119-129. Haagmans, B.L., Al Dhahiry, S.H., Reusken, C.B., Raj, V.S., Galiano, M., Myers, R., Godeke, G.J., Jonges, M., Farag, E., Diab, A., et al. (2014). Middle East respiratory syndrome coronavirus in dromedary camels: an outbreak investigation. Lancet Infect Dis 14, 140-145. Hansen, J., Baum, A., Pascal, K.E., Russo, V., Giordano, S., Wloga, E., Fulton, B.O., Yan, Y., Koon, K., Patel, K., et al. (2020). Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail. Science. Hassan, A.O., Case, J.B., Winkler, E.S., Thackray, L.B., Kafai, N.M., Bailey, A.L., McCune, B.T., Fox, J.M., Chen, R.E., Alsoussi, W.B., et al. (2020a). A SARS-CoV-2 Infection Model in Mice Demonstrates Protection by Neutralizing Antibodies. Cell 182, 744-753.e744. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 Hassan, A.O., Kafai, N.M., Dmitriev, I.P., Fox, J.M., Smith, B.K., Harvey, I.B., Chen, R.E., Winkler, E.S., Wessel, A.W., Case, J.B., et al. (2020b). A Single-Dose Intranasal ChAd Vaccine Protects Upper and Lower Respiratory Tracts against SARS-CoV-2. Cell 183, 169-184.e113. Hu, B., Zeng, L.P., Yang, X.L., Ge, X.Y., Zhang, W., Li, B., Xie, J.Z., Shen, X.R., Zhang, Y.Z., Wang, N., et al. (2017). Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog 13, e1006698. Huang, J., Ofek, G., Laub, L., Louder, M.K., Doria-Rose, N.A., Longo, N.S., Imamichi, H., Bailer, R.T., Chakrabarti, B., Sharma, S.K., et al. (2012). Broad and potent neutralization of HIV-1 by a gp41-specific human antibody. Nature 491, 406-412. Hulswit, R.J.G., Lang, Y., Bakkers, M.J.G., Li, W., Li, Z., Schouten, A., Ophorst, B., van Kuppeveld, F.J.M., Boons, G.J., Bosch, B.J., et al. (2019). Human coronaviruses OC43 and HKU1 bind to 9-O-acetylated sialic acids via a conserved receptor-binding site in spike protein domain A. Proc Natl Acad Sci U S A. Huo, J., Zhao, Y., Ren, J., Zhou, D., Duyvesteyn, H.M.E., Ginn, H.M., Carrique, L., Malinauskas, T., Ruza, R.R., Shah, P.N.M., et al. (2020). Neutralisation of SARS-CoV-2 by destruction of the prefusion Spike. Cell Host & Microbe. Jackson, L.A., Anderson, E.J., Rouphael, N.G., Roberts, P.C., Makhene, M., Coler, R.N., McCullough, M.P., Chappell, J.D., Denison, M.R., Stevens, L.J., et al. (2020). An mRNA Vaccine against SARS-CoV-2 - Preliminary Report. N Engl J Med. Kabsch, W. (2010). XDS. Acta Crystallogr D Biol Crystallogr 66, 125-132. Kallewaard, N.L., Corti, D., Collins, P.J., Neu, U., McAuliffe, J.M., Benjamin, E., Wachter- Rosati, L., Palmer-Hill, F.J., Yuan, A.Q., Walker, P.A., et al. (2016). Structure and Function Analysis of an Antibody Recognizing All Influenza A Subtypes. Cell 166, 596-608. Kan, B., Wang, M., Jing, H., Xu, H., Jiang, X., Yan, M., Liang, W., Zheng, H., Wan, K., Liu, Q., et al. (2005). Molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms. J Virol 79, 11892-11900. Kaname, Y., Tani, H., Kataoka, C., Shiokawa, M., Taguwa, S., Abe, T., Moriishi, K., Kinoshita, T., and Matsuura, Y. (2010). Acquisition of complement resistance through incorporation of CD55/decay-accelerating factor into viral particles bearing baculovirus GP64. J Virol 84, 3210- 3219. Ke, Z., Oton, J., Qu, K., Cortese, M., Zila, V., McKeane, L., Nakane, T., Zivanov, J., Neufeldt, C.J., Cerikan, B., et al. (2020). Structures and distributions of SARS-CoV-2 spike proteins on intact virions. Nature. King, L.B., West, B.R., Moyer, C.L., Gilchuk, P., Flyak, A., Ilinykh, P.A., Bombardi, R., Hui, S., Huang, K., Bukreyev, A., et al. (2019). Cross-reactive neutralizing human survivor monoclonal antibody BDBV223 targets the ebolavirus stalk. Nat Commun 10, 1788. Kirchdoerfer, R.N., Cottrell, C.A., Wang, N., Pallesen, J., Yassine, H.M., Turner, H.L., Corbett, K.S., Graham, B.S., McLellan, J.S., and Ward, A.B. (2016). Pre-fusion structure of a human coronavirus spike protein. Nature 531, 118-121. Kirchdoerfer, R.N., Wang, N., Pallesen, J., Wrapp, D., Turner, H.L., Cottrell, C.A., Corbett, K.S., Graham, B.S., McLellan, J.S., and Ward, A.B. (2018). Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis. Sci Rep 8, 15701. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 Kong, R., Xu, K., Zhou, T., Acharya, P., Lemmin, T., Liu, K., Ozorowski, G., Soto, C., Taft, J.D., Bailer, R.T., et al. (2016). Fusion peptide of HIV-1 as a site of vulnerability to neutralizing antibody. Science 352, 828-833. Ksiazek, T.G., Erdman, D., Goldsmith, C.S., Zaki, S.R., Peret, T., Emery, S., Tong, S., Urbani, C., Comer, J.A., Lim, W., et al. (2003). A novel coronavirus associated with severe acute respiratory syndrome. N Engl J Med 348, 1953-1966. Li, H., Mendelsohn, E., Zong, C., Zhang, W., Hagan, E., Wang, N., Li, S., Yan, H., Huang, H., Zhu, G., et al. (2019). Human-animal interactions and bat coronavirus spillover potential among rural residents in Southern China. Biosaf Health 1, 84-90. Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J.H., Wang, H., Crameri, G., Hu, Z., Zhang, H., et al. (2005). Bats are natural reservoirs of SARS-like coronaviruses. Science 310, 676-679. Liebschner, D., Afonine, P.V., Baker, M.L., Bunkóczi, G., Chen, V.B., Croll, T.I., Hintze, B., Hung, L.W., Jain, S., McCoy, A.J., et al. (2019). Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75, 861-877. Liu, L., Wang, P., Nair, M.S., Yu, J., Rapp, M., Wang, Q., Luo, Y., Chan, J.F., Sahi, V., Figueroa, A., et al. (2020). Potent neutralizing antibodies against multiple epitopes on SARS- CoV-2 spike. Nature 584, 450-456. McCoy, A.J., Grosse-Kunstleve, R.W., Adams, P.D., Winn, M.D., Storoni, L.C., and Read, R.J. (2007). Phaser crystallographic software. J Appl Crystallogr 40, 658-674. McLellan, J.S., Chen, M., Joyce, M.G., Sastry, M., Stewart-Jones, G.B., Yang, Y., Zhang, B., Chen, L., Srivatsan, S., Zheng, A., et al. (2013). Structure-based design of a fusion glycoprotein vaccine for respiratory syncytial virus. Science 342, 592-598. Memish, Z.A., Mishra, N., Olival, K.J., Fagbo, S.F., Kapoor, V., Epstein, J.H., Alhakeem, R., Durosinloun, A., Al Asmari, M., Islam, A., et al. (2013). Middle East respiratory syndrome coronavirus in bats, Saudi Arabia. Emerg Infect Dis 19, 1819-1823. Menachery, V.D., Yount, B.L., Jr., Debbink, K., Agnihothram, S., Gralinski, L.E., Plante, J.A., Graham, R.L., Scobey, T., Ge, X.Y., Donaldson, E.F., et al. (2015). A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat Med 21, 1508-1513. Menachery, V.D., Yount, B.L., Jr., Sims, A.C., Debbink, K., Agnihothram, S.S., Gralinski, L.E., Graham, R.L., Scobey, T., Plante, J.A., Royal, S.R., et al. (2016). SARS-like WIV1-CoV poised for human emergence. Proc Natl Acad Sci U S A 113, 3048-3053. Millet, J.K., and Whittaker, G.R. (2016). Murine Leukemia Virus (MLV)-based Coronavirus Spike-pseudotyped Particle Production and Infection. Bio Protoc 6. Mire, C.E., Chan, Y.P., Borisevich, V., Cross, R.W., Yan, L., Agans, K.N., Dang, H.V., Veesler, D., Fenton, K.A., Geisbert, T.W., et al. (2019). A Cross-Reactive Humanized Monoclonal Antibody Targeting Fusion Glycoprotein Function Protects Ferrets Against Lethal Nipah Virus and Hendra Virus Infection. J Infect Dis. Mok, C.K.P., Zhu, A., Zhao, J., Lau, E.H.Y., Wang, J., Chen, Z., Zhuang, Z., Wang, Y., Alshukairi, A.N., Baharoon, S.A., et al. (2020). T-cell responses to MERS coronavirus infection in people with occupational exposure to dromedary camels in Nigeria: an observational cohort study. Lancet Infect Dis. Mulligan, M.J., Lyke, K.E., Kitchin, N., Absalon, J., Gurtman, A., Lockhart, S.P., Neuzil, K., Raabe, V., Bailey, R., Swanson, K.A., et al. (2020). Phase 1/2 Study to Describe the Safety and Immunogenicity of a COVID-19 RNA Vaccine Candidate (BNT162b1) in Adults 18 to 55 Years of Age: Interim Report. medRxiv, 2020.2006.2030.20142570. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 Pallesen, J., Wang, N., Corbett, K.S., Wrapp, D., Kirchdoerfer, R.N., Turner, H.L., Cottrell, C.A., Becker, M.M., Wang, L., Shi, W., et al. (2017). Immunogenicity and structures of a rationally designed prefusion MERS-CoV spike antigen. Proc Natl Acad Sci U S A 114, E7348- E7357. Park, Y.J., Walls, A.C., Wang, Z., Sauer, M.M., Li, W., Tortorici, M.A., Bosch, B.J., DiMaio, F., and Veesler, D. (2019). Structures of MERS-CoV spike glycoprotein in complex with sialoside attachment receptors. Nat Struct Mol Biol 26, 1151-1157. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605-1612. Piccoli, L., Park, Y.J., Tortorici, M.A., Czudnochowski, N., Walls, A.C., Beltramello, M., Silacci-Fregni, C., Pinto, D., Rosen, L.E., Bowen, J.E., et al. (2020). Mapping Neutralizing and Immunodominant Sites on the SARS-CoV-2 Spike Receptor-Binding Domain by Structure- Guided High-Resolution Serology. Cell 183, 1024-1042.e1021. Pinto, D., Park, Y.J., Beltramello, M., Walls, A.C., Tortorici, M.A., Bianchi, S., Jaconi, S., Culap, K., Zatta, F., De Marco, A., et al. (2020). Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody. Nature 583, 290-295. Poh, C.M., Carissimo, G., Wang, B., Amrun, S.N., Lee, C.Y., Chee, R.S., Fong, S.W., Yeo, N.K., Lee, W.H., Torres-Ruesta, A., et al. (2020). Two linear epitopes on the SARS-CoV-2 spike protein that elicit neutralising antibodies in COVID-19 patients. Nat Commun 11, 2806. Punjani, A., Rubinstein, J.L., Fleet, D.J., and Brubaker, M.A. (2017). cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods 14, 290-296. Raj, V.S., Mou, H., Smits, S.L., Dekkers, D.H., Muller, M.A., Dijkman, R., Muth, D., Demmers, J.A., Zaki, A., Fouchier, R.A., et al. (2013). Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC. Nature 495, 251-254. Robert, X., and Gouet, P. (2014). Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res 42, W320-324. Rockx, B., Corti, D., Donaldson, E., Sheahan, T., Stadler, K., Lanzavecchia, A., and Baric, R. (2008). Structural basis for potent cross-neutralizing human monoclonal antibody protection against lethal human and zoonotic severe acute respiratory syndrome coronavirus challenge. J Virol 82, 3220-3235. Rockx, B., Donaldson, E., Frieman, M., Sheahan, T., Corti, D., Lanzavecchia, A., and Baric, R.S. (2010). Escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus. J Infect Dis 201, 946-955. Rogers, T.F., Zhao, F., Huang, D., Beutler, N., Burns, A., He, W.T., Limbo, O., Smith, C., Song, G., Woehl, J., et al. (2020). Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model. Science. Rouvinski, A., Guardado-Calvo, P., Barba-Spaeth, G., Duquerroy, S., Vaney, M.C., Kikuti, C.M., Navarro Sanchez, M.E., Dejnirattisai, W., Wongwiwat, W., Haouz, A., et al. (2015). Recognition determinants of broadly neutralizing human antibodies against dengue viruses. Nature 520, 109-113. Sahin, U., Muik, A., Derhovanessian, E., Vogler, I., Kranz, L.M., Vormehr, M., Baum, A., Pascal, K., Quandt, J., Maurus, D., et al. (2020). Concurrent human antibody and T_H1 type T-cell responses elicited by a COVID-19 RNA vaccine. medRxiv, 2020.2007.2017.20140533. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 Scheid, J.F., Mouquet, H., Feldhahn, N., Seaman, M.S., Velinzon, K., Pietzsch, J., Ott, R.G., Anthony, R.M., Zebroski, H., Hurley, A., et al. (2009). Broad diversity of neutralizing antibodies isolated from memory B cells in HIV-infected individuals. Nature 458, 636-640. Sesterhenn, F., Yang, C., Bonet, J., Cramer, J.T., Wen, X., Wang, Y., Chiang, C.I., Abriata, L.A., Kucharska, I., Castoro, G., et al. (2020). De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368. Seydoux, E., Homad, L.J., MacCamy, A.J., Parks, K.R., Hurlburt, N.K., Jennewein, M.F., Akins, N.R., Stuart, A.B., Wan, Y.-H., Feng, J., et al. (2020). Characterization of neutralizing antibodies from a SARS-CoV-2 infected individual. bioRxiv, 2020.2005.2012.091298. Siegel, R.W. (2009). Antibody affinity optimization using yeast cell surface display. Methods Mol Biol 504, 351-383. Snijder, J., Ortego, M.S., Weidle, C., Stuart, A.B., Gray, M.D., McElrath, M.J., Pancera, M., Veesler, D., and McGuire, A.T. (2018). An Antibody Targeting the Fusion Machinery Neutralizes Dual-Tropic Infection and Defines a Site of Vulnerability on Epstein-Barr Virus. Immunity 48, 799-811.e799. Song, G., He, W.-t., Callaghan, S., Anzanello, F., Huang, D., Ricketts, J., Torres, J.L., Beutler, N., Peng, L., Vargas, S., et al. (2020). Cross-reactive serum and memory B cell responses to spike protein in SARS-CoV-2 and endemic coronavirus infection. bioRxiv, 2020.2009.2022.308965. Suloway, C., Pulokas, J., Fellmann, D., Cheng, A., Guerra, F., Quispe, J., Stagg, S., Potter, C.S., and Carragher, B. (2005). Automated molecular microscopy: the new Leginon system. J Struct Biol 151, 41-60. Tegunov, D., and Cramer, P. (2019). Real-time cryo-electron microscopy data preprocessing with Warp. Nat Methods 16, 1146-1152. ter Meulen, J., van den Brink, E.N., Poon, L.L., Marissen, W.E., Leung, C.S., Cox, F., Cheung, C.Y., Bakker, A.Q., Bogaards, J.A., van Deventer, E., et al. (2006). Human monoclonal antibody combination against SARS coronavirus: synergy and coverage of escape mutants. PLoS Med 3, e237. Tortorici, M.A., Beltramello, M., Lempp, F.A., Pinto, D., Dang, H.V., Rosen, L.E., McCallum, M., Bowen, J., Minola, A., Jaconi, S., et al. (2020). Ultrapotent human antibodies protect against SARS-CoV-2 challenge via multiple mechanisms. Science 370, 950-957. Tortorici, M.A., and Veesler, D. (2019). Structural insights into coronavirus entry. Adv Virus Res 105, 93-116. Tortorici, M.A., Walls, A.C., Lang, Y., Wang, C., Li, Z., Koerhuis, D., Boons, G.J., Bosch, B.J., Rey, F.A., de Groot, R.J., et al. (2019). Structural basis for human coronavirus attachment to sialic acid receptors. Nat Struct Mol Biol 26, 481-489. Turoňová, B., Sikora, M., Schürmann, C., Hagen, W.J.H., Welsch, S., Blanc, F.E.C., von Bülow, S., Gecht, M., Bagola, K., Hörner, C., et al. (2020). In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges. Science. Walker, L.M., Huber, M., Doores, K.J., Falkowska, E., Pejchal, R., Julien, J.P., Wang, S.K., Ramos, A., Chan-Hui, P.Y., Moyle, M., et al. (2011). Broad neutralization coverage of HIV by multiple highly potent antibodies. Nature 477, 466-470. Walker, L.M., Phogat, S.K., Chan-Hui, P.Y., Wagner, D., Phung, P., Goss, J.L., Wrin, T., Simek, M.D., Fling, S., Mitcham, J.L., et al. (2009). Broad and potent neutralizing antibodies from an African donor reveal a new HIV-1 vaccine target. Science 326, 285-289. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 Walls, A.C., Fiala, B., Schäfer, A., Wrenn, S., Pham, M.N., Murphy, M., Tse, L.V., Shehata, L., O'Connor, M.A., Chen, C., et al. (2020a). Elicitation of Potent Neutralizing Antibody Responses by Designed Protein Nanoparticle Vaccines for SARS-CoV-2. Cell 183, 1367-1382.e1317. Walls, A.C., Park, Y.J., Tortorici, M.A., Wall, A., McGuire, A.T., and Veesler, D. (2020b). Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281- 292.e286. Walls, A.C., Tortorici, M.A., Bosch, B.J., Frenz, B., Rottier, P.J.M., DiMaio, F., Rey, F.A., and Veesler, D. (2016a). Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer. Nature 531, 114-117. Walls, A.C., Tortorici, M.A., Frenz, B., Snijder, J., Li, W., Rey, F.A., DiMaio, F., Bosch, B.J., and Veesler, D. (2016b). Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy. Nat Struct Mol Biol 23, 899-905. Walls, A.C., Tortorici, M.A., Snijder, J., Xiong, X., Bosch, B.J., Rey, F.A., and Veesler, D. (2017). Tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion. Proc Natl Acad Sci U S A 114, 11157-11162. Walls, A.C., Xiong, X., Park, Y.J., Tortorici, M.A., Snijder, J., Quispe, J., Cameroni, E., Gopal, R., Dai, M., Lanzavecchia, A., et al. (2019). Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion. Cell 176, 1026-1039.e1015. Wang, C., Li, W., Drabek, D., Okba, N.M.A., van Haperen, R., Osterhaus, A.D.M.E., van Kuppeveld, F.J.M., Haagmans, B.L., Grosveld, F., and Bosch, B.J. (2020a). A human monoclonal antibody blocking SARS-CoV-2 infection. Nat Commun 11, 2251. Wang, C., van Haperen, R., Gutiérrez-Álvarez, J., Li, W., Okba, N.M.A., Albulescu, I., Widjaja, I., van Dieren, B., Fernandez-Delgado, R., Sola, I., et al. (2020b). Isolation of cross-reactive monoclonal antibodies against divergent human coronaviruses that delineate a conserved and vulnerable site on the spike protein. bioRxiv, 2020.2010.2020.346916. Wang, M., Yan, M., Xu, H., Liang, W., Kan, B., Zheng, B., Chen, H., Zheng, H., Xu, Y., Zhang, E., et al. (2005). SARS-CoV infection in a restaurant from palm civet. Emerg Infect Dis 11, 1860-1865. Wang, N., Li, S.Y., Yang, X.L., Huang, H.M., Zhang, Y.J., Guo, H., Luo, C.M., Miller, M., Zhu, G., Chmura, A.A., et al. (2018). Serological Evidence of Bat SARS-Related Coronavirus Infection in Humans, China. Virol Sin 33, 104-107. Wang, R.Y., Song, Y., Barad, B.A., Cheng, Y., Fraser, J.S., and DiMaio, F. (2016). Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta. Elife 5. Wec, A.Z., Wrapp, D., Herbert, A.S., Maurer, D.P., Haslwanter, D., Sakharkar, M., Jangra, R.K., Dieterle, M.E., Lilov, A., Huang, D., et al. (2020). Broad neutralization of SARS-related viruses by human monoclonal antibodies. Science. West, B.R., Moyer, C.L., King, L.B., Fusco, M.L., Milligan, J.C., Hui, S., and Saphire, E.O. (2018). Structural Basis of Pan-Ebolavirus Neutralization by a Human Antibody against a Conserved, yet Cryptic Epitope. mBio 9. Whittle, J.R., Zhang, R., Khurana, S., King, L.R., Manischewitz, J., Golding, H., Dormitzer, P.R., Haynes, B.F., Walter, E.B., Moody, M.A., et al. (2011). Broadly neutralizing human antibody that recognizes the receptor-binding pocket of influenza virus hemagglutinin. Proc Natl Acad Sci U S A 108, 14216-14221. Wrapp, D., Wang, N., Corbett, K.S., Goldsmith, J.A., Hsieh, C.L., Abiona, O., Graham, B.S., and McLellan, J.S. (2020). Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 Wu, X., Yang, Z.Y., Li, Y., Hogerkorp, C.M., Schief, W.R., Seaman, M.S., Zhou, T., Schmidt, S.D., Wu, L., Xu, L., et al. (2010). Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1. Science 329, 856-861. Yang, X.L., Hu, B., Wang, B., Wang, M.N., Zhang, Q., Zhang, W., Wu, L.J., Ge, X.Y., Zhang, Y.Z., Daszak, P., et al. (2015). Isolation and Characterization of a Novel Bat Coronavirus Closely Related to the Direct Progenitor of Severe Acute Respiratory Syndrome Coronavirus. J Virol 90, 3253-3256. Yu, J., Tostanoski, L.H., Peter, L., Mercado, N.B., McMahan, K., Mahrokhian, S.H., Nkolola, J.P., Liu, J., Li, Z., Chandrashekar, A., et al. (2020). DNA vaccine protection against SARS- CoV-2 in rhesus macaques. Science. Yuan, M., Wu, N.C., Zhu, X., Lee, C.D., So, R.T.Y., Lv, H., Mok, C.K.P., and Wilson, I.A. (2020). A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV. Science. Yuan, Y., Cao, D., Zhang, Y., Ma, J., Qi, J., Wang, Q., Lu, G., Wu, Y., Yan, J., Shi, Y., et al. (2017). Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains. Nat Commun 8, 15092. Zaki, A.M., van Boheemen, S., Bestebroer, T.M., Osterhaus, A.D., and Fouchier, R.A. (2012). Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N Engl J Med 367, 1814-1820. Zhang, H., Wang, G., Li, J., Nie, Y., Shi, X., Lian, G., Wang, W., Yin, X., Zhao, Y., Qu, X., et al. (2004). Identification of an antigenic determinant on the S2 domain of the severe acute respiratory syndrome coronavirus spike glycoprotein capable of inducing neutralizing antibodies. J Virol 78, 6938-6945. Zheng, Z., Monteil, V.M., Maurer-Stroh, S., Yew, C.W., Leong, C., Mohd-Ismail, N.K., Cheyyatraivendran Arularasu, S., Chow, V.T.K., Lin, R.T.P., Mirazimi, A., et al. (2020). Monoclonal antibodies for the S2 subunit of spike of SARS-CoV-1 cross-react with the newly- emerged SARS-CoV-2. Euro Surveill 25. Zhou, P., Yang, X.L., Wang, X.G., Hu, B., Zhang, L., Zhang, W., Si, H.R., Zhu, Y., Li, B., Huang, C.L., et al. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. Zhou, T., Georgiev, I., Wu, X., Yang, Z.Y., Dai, K., Finzi, A., Kwon, Y.D., Scheid, J.F., Shi, W., Xu, L., et al. (2010). Structural basis for broad and potent neutralization of HIV-1 by antibody VRC01. Science 329, 811-817. Zhu, F.C., Li, Y.H., Guan, X.H., Hou, L.H., Wang, W.J., Li, J.X., Wu, S.P., Wang, B.S., Wang, Z., Wang, L., et al. (2020a). Safety, tolerability, and immunogenicity of a recombinant adenovirus type-5 vectored COVID-19 vaccine: a dose-escalation, open-label, non-randomised, first-in-human trial. Lancet 395, 1845-1854. Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., et al. (2020b). A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. Zhu, Z., Dimitrov, A.S., Bossart, K.N., Crameri, G., Bishop, K.A., Choudhry, V., Mungall, B.A., Feng, Y.R., Choudhary, A., Zhang, M.Y., et al. (2006). Potent neutralization of Hendra and Nipah viruses by human monoclonal antibodies. J Virol 80, 891-899. Zivanov, J., Nakane, T., Forsberg, B.O., Kimanius, D., Hagen, W.J., Lindahl, E., and Scheres, S.H. (2018). New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 Zivanov, J., Nakane, T., and Scheres, S.H.W. (2019). A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ 6, 5-17. Zost, S.J., Gilchuk, P., Case, J.B., Binshtein, E., Chen, R.E., Nkolola, J.P., Schäfer, A., Reidy, J.X., Trivette, A., Nargi, R.S., et al. (2020). Potently neutralizing and protective human antibodies against SARS-CoV-2. Nature 584, 443-449. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.12.29.424482doi: bioRxiv preprint https://doi.org/10.1101/2020.12.29.424482 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_09_18_291195 ---- Dimerization mechanism and structural features of human LI-cadherin 1 Dimerization mechanism and structural features of human LI-cadherin Anna Yui1, Jose M. M. Caaveiro1,2*, Daisuke Kuroda1,3, Makoto Nakakido1, Satoru Nagatoishi4, Shuichiro Goda5, Takahiro Maruno6, Susumu Uchiyama6 and Kouhei Tsumoto1,4,7* 1Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan 2Department of Global Healthcare, Graduate School of Pharmaceutical Sciences, Kyushu University, 3-1- 1, Maidashi, Higashi-ku, Fukuoka-shi, Fukuoka 812-8582, Japan 3Medical Device Development and Regulation Research Center, School of Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan 4Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokanedai, Minato-ku, Tokyo 108-8639, Japan 5Graduate School of Science and Engineering, Soka University, 1-236, Tangi-cho, Hachioji-shi, Tokyo 192-8577 Japan 6Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita-shi, Osaka 565-0871, Japan 7Department of Chemistry and Biotechnology, School of Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan *Corresponding author: Jose M. M. Caaveiro and Kouhei Tsumoto E-mails: jose@phar.kyushu-u.ac.jp; tsumoto@bioeng.t.u-tokyo.ac.jp; Running title: Dimerization mechanism of LI-cadherin Keywords: Cadherin, dimerization, cell adhesion, protein chemistry, crystal structure, small‐angle X‐ray scattering (SAXS), analytical ultracentrifugation, molecular dynamics .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint mailto:jose@phar.kyushu-u.ac.jp mailto:tsumoto@bioeng.t.u-tokyo.ac.jp https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract LI-cadherin is a member of cadherin superfamily which is a Ca2+-dependent cell adhesion protein. Its expression is observed on various types of cells in the human body such as normal small intestine and colon cells, and gastric cancer cells. Because its expression is not observed on normal gastric cells, LI-cadherin is a promising target for gastric cancer imaging. However, since the cell adhesion mechanism of LI-cadherin has remained unknown, rational design of therapeutic molecules targeting this cadherin has been complicated. Here, we have studied the homodimerization mechanism of LI- cadherin. We report the crystal structure of the LI- cadherin EC1-4 homodimer. The EC1-4 homodimer exhibited a unique architecture different from that of other cadherins reported so far. The crystal structure also revealed that LI-cadherin possesses a noncanonical calcium ion-free linker between EC2 and EC3. Various biochemical techniques and molecular dynamics (MD) simulations were employed to elucidate the mechanism of homodimerization. We also showed that the formation of the homodimer observed by the crystal structure is necessary for LI-cadherin- dependent cell adhesion by performing cell aggregation assay. Introduction Cadherins are a family of glycoproteins responsible for calcium ion-dependent cell adhesion (1). There are more than 100 types of cadherins in humans and many of them are responsible not only for cell adhesion but also involved in tumorigenesis (2). Human liver intestine-cadherin (LI-cadherin) is a nonclassical cadherin composed of extracellular region which includes seven extracellular cadherin (EC) repeats, single transmembrane domain and a short cytoplasmic domain (3). Previous studies have reported the expression of LI-cadherin on various types of cells, such as normal intestine cells, intestinal metaplasia, colorectal cancer cells and lymph node metastatic gastric cancer cells (4, 5). Because human LI-cadherin is expressed on gastric cancer cells but not on normal stomach tissues, LI- cadherin has been proposed as a target for imaging of metastatic gastric cancer (6). Previous studies have reported that LI-cadherin works not only as a calcium ion-dependent cell adhesion molecule as other cadherins do (7), but also shown that trans- dimerization of LI-cadherin is necessary for water transport in normal intestinal cells (8). Sequence analysis of mouse LI-, E-, N-, and P-cadherins has revealed sequence homology between EC1-2 of LI- cadherin and EC1-2 of E-, N-, and P-cadherins, as well as between EC3-7 of LI-cadherin and EC1-5 of classical cadherins (9). From the sequence similarity and the proposed absence of calcium ion- binding motifs (10, 11) between domains EC2 and EC3, there is speculation that LI-cadherin has evolved from the same five-domain cadherin precursor as that of classical cadherins (9). However, LI-cadherin is different from classical cadherins in some points such as the number of .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 extracellular cadherin repeats and the length and the sequence of the cytoplasmic domain. Classical cadherins possess five cadherin repeats whereas LI- cadherin possesses seven (2). Classical cadherins possess a conserved cytoplasmic domain comprising more than 100 amino acids, whereas LI- cadherin possess a short cytoplasmic domain consisting of 20 residues with little or no sequence homology (7, 12). The characteristics of LI-cadherin at the molecular level, including the homodimerization mechanism, still remain unknown. Homodimerization is the fundamental event in cadherin-mediated cell adhesion as has been shown previously (13, 14). For example, classical cadherins form a homodimer mediated by the interaction between their two N- terminal cadherin repeats (EC1-2) (10, 15). In this study, we aimed to characterize LI-cadherin at the molecular level as the molecular characteristics of the target protein may be significant for the rational design of therapeutic approaches. We have extensively validated LI- cadherin to identify the homodimer architecture of LI-cadherin. Here, we report the crystal structure of human LI-cadherin EC1-4 homodimer. The crystal structure revealed a dimerization architecture different from that of any other cadherin reported so far. It also showed canonical calcium binding motifs between EC1 and EC2, and between EC3 and EC4, but not between EC2 and EC3. By performing various biochemical and computational analysis based on this crystal structure, we interpreted the characteristics of LI-cadherin molecule. Additionally, we showed that the EC1-4 homodimer is necessary for LI-cadherin-dependent cell adhesion through cell aggregation assays. Our study revealed possible architectures of LI-cadherin homodimers at the cell surface and suggested the differential role of the two additional domains at the N-terminus compared with classical cadherins. Results Investigation of the domains responsible for the homodimerization of LI-cadherin In order to predict which extracellular cadherin (EC) repeats are responsible for the homodimerization of LI-cadherin, we compared the sequence of human LI-cadherin and human classical cadherins (E-, N- and P-cadherins) using ClustalW. As has been pointed out in the previous study (9), it was revealed that EC3-7 of human LI- cadherin has sequence homology with EC1-5 of human classical cadherins, and EC1-2 of human LI- cadherin has sequence homology with EC1-2 of classical cadherins (Fig. 1). Notably, Trp239 locates at the N-terminus of LI-cadherin EC3 and it has been suggested that this Trp residue might function as the conserved Trp2 of classical cadherin EC1, which plays a crucial role in the formation of strand swap-dimer (ss-dimer) (9, 10, 16–18). Considering that EC1-2 of classical cadherins is responsible for homodimerization, we predicted that EC1-2 and EC3-4 of LI-cadherin, which have sequence homology with EC1-2 of classical .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 cadherins are responsible for its dimerization. Therefore, we first analyzed the homodimerization propensity of EC1-4, EC1-2, and EC3-4 (Table S1) of human LI-cadherin. The dissociation constant (KD) of the EC1-4 homodimer and the EC1-2 homodimer were determined by sedimentation velocity analytical ultracentrifugation (SV-AUC), obtaining values of 39.8 M and 75.0 M, respectively (Fig. 2A). We did not observe dimer fraction when employing EC3-4 despite the sequence similarity with EC1-2 of classical cadherins and the presence of Trp239 in EC3 located at the analogous position to that of Trp2 in EC1 of classical cadherin (Fig. 2A). The solution structure of EC3-4 was monomeric as determined by small angle X-ray scattering (SAXS), supporting the results of SV-AUC (Fig. 2B, Fig. S1 and Table S2). Crystal structure analysis of EC1-4 homodimer We successfully obtained the X-ray crystal structure of EC1-4 at 2.7 Å resolution (Fig. 3 and Table 1). Each EC domain was composed of the typical seven -strands seen in classical cadherins, and three calcium ions bound to each of the linkers connecting EC1 and EC2, and EC3 and EC4 (Fig. 3). We also observed four N-glycans and two N- glycans bound to chain A and B, respectively, as predicted from the amino acid sequence. We could not resolve the entire length of these N-glycans because of their high flexibility. From the portion resolved, all N-glycans seem to face the opposite side of the dimer interface. Two unique characteristics were observed in the crystal structure of LI-cadherin: (i) the existence of a calcium-free linker between EC2 and EC3, and (ii) the architecture of the homodimer. A previous study had suggested that LI-cadherin lacks a calcium-binding motif between EC2 and EC3 (9) and our crystal structure has confirmed that hypothesis experimentally. Crystal structures of cadherins which possess calcium-free linker have been reported previously and the biological significance of the calcium-free linker has been discussed (19, 20). The EC1-4 region of LI- cadherin assembled as an antiparallel homodimer in a conformation different from that of other cadherins, such as classical cadherins, which exhibit two step binding mode (15) and to that of protocadherin B3, which forms an antiparallel homodimer (14) but with distinct characteristics to that of LI-cadherin EC1-4. We performed SV-AUC using LI-cadherin EC1-5 and obtained a KD value of 22.8 µM (Fig. S2). The slight increase of the affinity suggested some contact between EC1 and EC5, as can be predicted from the arrangement of EC1 of one chain and EC4 of the other chain in the crystal structure, although this interaction does not seem strong. Calcium-free linker We first investigated the calcium-free linker between EC2 and EC3. Classical cadherins .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 generally adopt a crescent-like shape (17, 21). However, in LI-cadherin, the arch-shape was disrupted at the calcium-free linker region and because of that EC1-4 exhibited unique alternating positioning of EC1-2 with respect to EC3-4. Generally, three calcium ions bound to the linker between each EC domain confer rigidity to the structure (11). In fact, previous study on calcium- free linker of cadherin has shown that the linker showed some flexibility (20). To compare the rigidity of the canonical linker with three calcium ions and the calcium-free linker in LI-cadherin, we performed MD simulations. In addition to the monomeric states, we also used the structure of the EC1-4 homodimer as the initial structure of the simulations. After confirming the convergence of the simulations by calculating RMSD values of C atoms (Fig. S3, see Experimental Procedures for the details), we compared the rigidity of the linkers by calculating the RMSD values of C atoms of EC1 and EC3, respectively, after superposing those of EC2 domain alone (Fig. 4A, B). The EC3 domain in the monomer conformation exhibited the largest RMSD. The RMSD values of EC3 in the homodimer were significantly smaller than those of EC3 in the monomer form. Dihedral angles consisting of C atoms of residues at the edge of each domain also indicated that the EC1-4 monomer bends largely at the Ca2+-free linker (Fig. S4A-C). These results showed that the calcium-free linker between EC2 and EC3 is more flexible than the canonical linker (Movie 1, 2). Another unique characteristic in the region surrounding the calcium-free linker was the existence of an -helix at the bottom of EC2. To our best knowledge, this element at the bottom of the EC2 domain is not found in classical cadherins. The sequence alignment of the EC1-2 domains of human LI-, E-, N- and P-cadherin by ClustalW indicated that the insertion of the -helix forming residues corresponded to the position immediately preceding the canonical calcium-binding motif DXE in classical cadherins (10) (Fig. S5). The Asp and Glu residues of the DXE motif in LI-cadherin dimer EC1 and EC3 coordinate with calcium ions (Fig. S6A, B) and was maintained throughout the simulation (Fig. S6C~J). The -helix in EC2 might compensate for the absence of calcium by conferring some rigidity to the molecule. Interaction analysis of EC1-4 homodimer To validate if LI-cadherin-dependent cell adhesion is mediated by the formation of homodimer observed in the crystal structure, it was necessary to find a mutant which exhibits decreased dimerization tendency. First, we analyzed the interaction between two EC1-4 molecules in the crystal structure using the PISA server (Table S3) (22). The interaction was mostly mediated by EC2 of one chain of LI-cadherin and EC4 of the other chain, engaging in hydrogen bonds and hydrophobic contacts (Fig. 5). The dimerization interface area was 1,253.3 Å2 and the number of hydrogen bonds (distance between heavy atoms < .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 3.5 Å) was seven. Based on the analysis of these interactions, we conducted site-directed mutagenesis to assess the contribution of each residue to the dimerization of LI-cadherin. Eleven residues showing a percentage of buried area greater than 50%, or one or more intermolecular hydrogen bonds (distance between heavy atoms < 3.5 Å) were individually mutated to Ala. To quickly find the mutant with weaker homodimerization propensity, SEC-MALS was employed. We injected EC1-4 WT or each mutant at 100 µM in the chromatographic column. Analysis of the molecular weight (MW) showed that the MW of F224A was the smallest among all the mutations evaluated, and also including WT (Fig. 6A and Table 2). The same observation was made when the samples were injected at 50 µM (Fig. S7A). It was also revealed that among the 12 samples analyzed, the elution volume of F224A was the largest (Fig. 6B and Fig. S7B). In summary, the mutational study using SEC-MALS suggested that the mutation F224A was the most significant inhibiting homodimerization of EC1-4. We must note that the samples eluted as a single peak, corresponding to a fast equilibrium between monomers and dimers as reported in a previous study employing other cadherins (23). Although the samples were injected at 100 µM, they eluted at ~ 4 µM since SEC will dilute the samples as they advance through the column. Considering that the KD of dimerization of EC1-4 WT determined by AUC was 39.8 µM, at a protein concentration of 4 µM, the largest fraction of the eluted sample should be monomer. This explains why the MW of the WT sample was smaller than the MW of the homodimer (99.6 kDa), and why the differences in MW among the constructs were small. However, we assume that the decrease of MW and the increase of elution volume indicate the decrease of the proportion of homodimer in the eluted sample, indicating a smaller dimerization tendency caused by the mutations introduced in the protein. Contribution of Phe224 to dimerization Although F224 does not seem to form extensive specific interactions with the partner molecule of LI-cadherin in our crystal structure (Fig. S8), its buried area upon dimerization was calculated to be 94% by the PISA server, engaging in Van der Waals interactions with other residues of EC1-4. To understand the role of Phe224 in dimerization of LI- cadherin, we conducted MD simulations of EC1-4 WT and EC1-4 F224A in the monomeric states, respectively. We first calculated the intramolecular distance between C atoms of the residues 224 and 122. The simulations revealed that Ala224 moves away from the strand that contains Asn122 whereas Phe224 kept closer to Asn122 (Fig. 7A, Fig. S9 and Movie 3, 4). This movement suggests that the side chain of Phe224 forms intramolecular interaction and is stabilized inside the pocket. Superposition of EC2 (chain A) in the crystal structure of EC1-4 and EC2 during the simulation of EC1-4 F224A monomer suggests that the large movement of the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 loop including Ala224 would cause steric hindrance and would inhibit dimerization (Fig. 7B). Thermal stability analysis using differential scanning calorimetry (DSC) revealed that EC1-4 F224A had two unfolding peaks whereas that of EC1-4 WT had a single peak (Fig. 7C). These results suggested that a part of EC1-4 F224A molecule was destabilized by the mutation. In combination with the data from MD simulations, we propose that Phe224 contributes to dimerization of LI-cadherin by restricting the movement of the residues around Phe224 and thus preventing the steric hindrance by the large movement as observed by MD simulations. DSC measurements showed that some of other mutants have lower thermal stability than wild type (Table 2 and Fig. S10). However, because TM1 of F224A is the lowest among the mutants evaluated, and because other mutants displaying lower TM1 than wild type did not exhibit a drastic decrease in homodimer affinity like F224A, we conclude that among the residues evaluated by Ala scanning, F224 was the most critical for the maintenance of homodimer structure and thermal stability. Functional analysis of LI-cadherin on cells To investigate if the disruption of the formation of EC1-4 homodimer influences cell adhesion, we established a CHO cell line expressing full-length LI-cadherin WT or the mutant F224A (including the transmembrane and cytoplasmic domains fused to GFP) that we termed EC1-7GFP and EC1- 7F224AGFP (Table S1 and Fig. S11). We conducted cell aggregation assays and compared the cell adhesion ability of cells expressing each construct and mock cells (non-transfected Flp-In CHO) in the presence of calcium or in the presence of EDTA. The size distribution of cell aggregates was quantified using a micro-flow imaging (MFI) apparatus. EC1-7GFP showed cell aggregation ability in the presence of CaCl2. In contrast, EC1- 7F224AGFP and mock cells did not show obvious cell aggregates in the presence of CaCl2 (Fig. 8A- C). From this result, it was revealed that F224 was crucial for LI-cadherin-dependent cell adhesion and the formation of EC1-4 homodimer in the cellular environment was indicated. Difference of LI-cadherin and classical cadherin We next performed cell aggregation assays using CHO cells expressing various constructs of LI- cadherin in which domains were deleted, to elucidate the mechanism of cell-adhesion induced by LI-cadherin. LI-cadherin EC1-5 and EC3-7 expressing cells were separately established (EC1- 5GFP and EC3-7GFP) (Table S1 and Fig. S11). Importantly, neither EC1-5 nor EC3-7 expressing cells showed cell aggregation ability in the presence of CaCl2 (Fig. 9). EC1-5 expressing cells did not aggregate, suggesting that effective dimerization requires full- length protein. The EC1-4 homodimer observed by X-ray crystallography and detected by AUC cannot be replicated by the EC1-5 construct in a cellular .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 environment, suggesting that the overhang EC1 domain in the dimer belonging to one cell collides with the membrane of the opposing cell (steric hindrance) (Fig. S12A). It is also possible that inappropriate orientation of the approaching LI- cadherin molecules would also contribute to the inability of EC1-5 to dimerize (Fig. S12B). An alternative possibility is that the weaker dimerization of EC1-2 (detected by AUC) cannot maintain cell adhesion due to the mobility of the Ca2+-free linker between EC2 and EC3. Contrary to the canonical Ca2+-bound linker, such as the linker between EC1 and EC2, the linker between EC2 and EC3 in LI-cadherin does not possess a Ca2+. The lack of Ca2+ resulted in greater mobility when EC1- 4 homodimer observed by crystal structure (Fig. 3) was not formed. The combination of low dimerization affinity and high mobility likely explain the absence of EC1-2 driven cell adhesion (Fig. S12C, D). Expression of EC3-7 on the surface of the cells did not result in cell aggregation, an observation agreeing with the results of AUC and SAXS, which shows that EC3-4 does not form a dimer. The truncation of EC1-2 from LI-cadherin generates cadherin similar to classical cadherin in the point of view that it has five extracellular domains and that it has a Trp residue at the N-terminus. Together with the crystal structure of EC1-4 homodimer, which showed that Trp239 was buried in its own hydrophobic pocket and was not participating in homodimerization (Fig. S13), the fact that LI- cadherin EC3-7 did not aggregate represents a unique dimerization mechanism in LI-cadherin. EC1-5 and EC3-7 expressing cells did not show aggregation ability even when they were mixed in equal amounts (Fig. S14). This result excluded the possibility of nonsymmetrical interaction of the domains (e.g. EC1-2 and EC3-4, EC1-2 and EC6-7, etc.). Discussion Here, we show the homodimer architecture of LI- cadherin EC1-4 and the flexibility of Ca2+-free linker in LI-cadherin monomer for the first time. The X-dimer or the strand-swap dimer formed by classical cadherins do not seem effective to drive LI-cadherin-dependent cell adhesion, as these dimers would lead to large movements at the Ca2+- free linker even if the dimer was formed. We assume that the unique architecture of LI-cadherin EC1-4 homodimer was necessary to restrict the movement of Ca2+-free linker to maintain LI- cadherin-dependent cell adhesion (Fig. S15). Several differences between LI-cadherin and E- cadherin might explain the reason for the existence of non-canonical Ca2+-free linker. Both LI-cadherin and E-cadherin are expressed on normal intestine cells, however, their expression sites are different. LI-cadherin is expressed at intercellular cleft and is excluded from adherence junction (7), where E- cadherin is expressed (24). Even though LI- cadherin is excluded from adherence junction, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 trans-interaction of LI-cadherin is necessary to maintain water transport through intercellular cleft of intestine cells (8). Clustering on cell membrane might also be different. Classical cadherins including E-cadherin are considered to form cluster on cell membrane to achieve cell adhesion (17). Lateral interaction interface of these cadherins was estimated from the crystal lattices. In contrast, we did not observe any crystal packing which suggests lateral interaction in our crystal structure. Indeed, our crystal structure shows that N-terminal sugar chains are extended toward the opposite side of the homodimer interface, and this suggests that each homodimer does not participate in cis-interaction. Considering that the interface area of the X-dimer and strand-swap dimer are much smaller than that of LI-cadherin EC1-4 dimer, we speculate that LI- cadherin form homodimers with a broader interface to be able to maintain trans-interaction without formation of clusters on the cell membrane. Expression of LI-cadherin is also observed on various cancer cells such as gastric adenocarcinoma, colorectal cancer cells and pancreatic cancer cells (4, 25, 26). The roles of LI-cadherin on cancer cells have been discussed previously. For example, it was shown that inoculation of LI-cadherin gene (CDH17)-silenced cells in nude mice inhibited the progression of colorectal cancer (27). In case of gastric cancer, the size of LI-cadherin-positive tumor was significantly larger than that of LI- cadherin-negative tumor (28). Considering that loss of cell adhesion ability by the downregulation of E- cadherin by epithelial mesenchymal transition (EMT) is often observed in cancer cells (29, 30), the fact that LI-cadherin is upregulated in various types of cancer cells suggest that LI-cadherin acts differently with E-cadherin on cancer cells. The unique architecture of the LI-cadherin homodimer and the absence of interactions with intracellular (cytoplasmic) proteins (31) suggest a distinctive role of LI-cadherin in cancer cells with respect to that of classical cadherins. In summary, our study shows the novel characteristics of LI-cadherin at the molecular level. Our results suggest that molecules targeting interface of LI-cadherin homodimer abrogate the LI-cadherin-dependent cell adhesion. On the other hand, we estimate that molecules which restrict the movement of Ca2+-free linker might strengthen LI- cadherin-dependent cell adhesion by stabilizing LI- cadherin homodimer. Experimental procedures Protein sequence Amino acid sequence of recombinant protein and LI-cadherin expressing CHO cells are summarized in Table S1. Expression and purification of recombinant LI- cadherin All LI-cadherin constructs were expressed and .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 purified using the same method. All constructs were cloned in pcDNA 3.4 vector (ThermoFisher Scientific). Recombinant protein was expressed using Expi293FTM Cells (ThermoFisher Scientific) following manufacturer’s protocol. Cells were cultured for three days after transfection at 37 °C and 8% CO2. The supernatant was collected and filtered followed by dialysis against a solution composed of 20 mM Tris-HCl at pH 8.0, 300 mM NaCl, and 3 mM CaCl2. Immobilized metal affinity chromatography was performed using Ni-NTA Agarose (Qiagen). Protein was eluted by 20 mM Tris-HCl at pH 8.0, 300 mM NaCl, 3 mM CaCl2, and 300 mM Imidazole. Final purification was performed by size exclusion chromatography (SEC) using HiLoad 26/600 Superdex 200 pg column (Cytiva) at 4 °C equilibrated in buffer A (10 mM HEPES-NaOH at pH 7.5, 150 mM NaCl, and 3 mM CaCl2). Unless otherwise specified, samples were dialyzed against buffer A before analysis and filtered dialysis buffer was used for assays. Sedimentation velocity analytical ultracentrifugation (SV-AUC) SV-AUC experiments were conducted using the Optima AUC (Beckman Coulter) equipped with an 8-hole An50 Ti rotor at 20 °C with 1, 2.5, 5, 10, 20, 40, and 60 µM of EC1-2, EC3-4, EC1-4 and EC1- 5, dissolved in buffer A. Protein sample (390 µL) was loaded into the sample sector of a cell equipped with sapphire windows and 12 mm double-sector charcoal-filled upon centerpiece. A volume of 400 µL of buffer was loaded into the reference sector of each cell. Data were collected at 42,000 rpm with a radial increment of 10 µm using a UV detection system. The collected data were analyzed using continuous c(s) distribution model implemented in program SEDFIT (version 16.2b) (32) fitting for the frictional ratio, meniscus, time-invariant noise, and radial-invariant noise using a regularization level of 0.68. The sedimentation coefficient ranges of 0-15 S were evaluated with a resolution of 150. The partial specific volumes of EC1-2, EC3-4, EC1-4 and EC1-5 were calculated based on the amino acid composition of each sample using program SEDNTERP 1.09 (33) and were 0.730 cm3/g, 0.733 cm3/g, 0.732 cm3/g, and 0.734 cm3/g, respectively. The buffer density and viscosity were calculated using program SEDNTERP 1.09 as 1.0055 g/cm3 and 1.025 cP, respectively. Figures of c(s20, w) distribution were generated using program GUSSI (version 1.3.2) (34). The weight-average sedimentation coefficient of each sample was calculated by integrating the range of sedimentation coefficients where peaks with obvious concentration dependence were observed. For the determination of the dissociation constant of monomer-dimer equilibrium, KD, the concentration dependence of the weight-average sedimentation coefficient was fitted to the monomer-dimer self- association model implemented in program SEDPHAT (version 15.2b) (35). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Solution structure analysis using SAXS All measurements were performed at beamline BL- 10C (36) of the Photon Factory (Tsukuba, Japan). The experimental procedure is described previously (18). Concentrations of EC3-4 was 157 µM. Data were collected using a PILATUS3 2M (Dectris). A wavelength was 1.488 Å with a camera distance 101 cm. Exposure time was 60 seconds and raw data between s values of 0.010 and 0.84 Å-1 were measured. The background scattering intensity of buffer was subtracted from each measurement. The scattering intensities of four measurements were averaged to produce the scattering curve of EC3-4. Data are placed on an absolute intensity scale. Conversion factor was calculated based on the scattering intensity of water. The calculation of the theoretical curves of SAXS and 2 values were performed using FoXS server (37, 38). MD simulation Molecular dynamics simulations of LI-cadherin were performed using GROMACS 2016.3 (39) with the CHARMM36m force field (40). A whole crystal structure of EC1-4 homodimer, EC1-4 monomer form, EC1-4 F224A monomer form and EC3-4 monomer form was used as the initial structure of the simulations, respectively. EC1-4 and EC3-4 of chain A was extracted from EC1-4 homodimer crystal structure to generate EC1-4 monomer form and EC3-4 monomer form, respectively. Sugar chains were removed from the original crystal structure. Missing residues were modelled by MODELLER 9.18 (41). Solvation of the structures were performed with TIP3P water (42) in a rectangular box such that the minimum distance to the edge of the box was 15 Å under periodic boundary conditions through the CHARMM-GUI (43). Addition of N-bound type sugar chains (G0F) and the mutation of Phe224 in EC1-4 monomer to Ala224 were also performed through the CHARMM-GUI (43, 44). The protein charge was neutralized with added Na or Cl, and additional ions were added to imitate a salt solution of concentration 0.15 M. Each system was energy- minimized for 5000 steps and equilibrated with the NVT ensemble (298 K) for 1 ns. Further simulations were performed with the NPT ensemble at 298 K. The time step was set to 2 fs throughout the simulations. A cutoff distance of 12 Å was used for Coulomb and van der Waals interactions. Long-range electrostatic interactions were evaluated by means of the particle mesh Ewald method (45). Covalent bonds involving hydrogen atoms were constrained by the LINCS algorithm (46). A snapshot was saved every 10 ps. All trajectories were analyzed using GROMACS tools. RMSD, dihedral angles, distances between two atoms and clustering were computed by rms, gangle, distance and cluster modules, respectively. The convergence of the trajectories was confirmed by calculating RMSD values of C atoms (Fig. S4A, B and S7A, B). As the molecule showed high flexibility at Ca2+-free linker, as for EC1-4 WT .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 monomer, EC1-4 F224A monomer and EC1-4 dimer, RMSD of each domain was calculated individually. Five C atoms at N-terminus were excluded from the calculation of RMSD of EC1 as they were disordered. As the RMSD values were stable after running 20 ns of simulations, we did not consider the first 20 ns when we analyzed the trajectories. Generation of EC3-4_plus MD simulation of the EC3-4 monomer was performed for 220 ns. The trajectories from 20 ns to 220 ns were clustered using the ‘cluster’ tool of GROMACS. The structure which exhibited the smallest average RMSD from all other structures of the largest cluster was termed EC3-4_plus and used for the purpose of comparison with the data in solution (SAXS). Crystallization of LI-cadherin EC1-4 Purified LI-cadherin EC1-4 was dialyzed against 10 mM HEPES-NaOH at pH 7.5, 30 mM NaCl, and 3 mM CaCl2. After the dialysis, the protein was concentrated to 314 µM. Optimal condition for crystallization was screened using an Oryx8 instrument (Douglas Instruments) using commercial screening kits (Hampton Research). The crystal used for data collection was obtained in a crystallization solution containing 200 mM sodium sulfate decahydrate and 20% w/v Polyethylene glycol 3,350 at 20 °C. Suitable crystals were harvested, briefly incubated in mother liquor supplemented with 20% glycerol, and transferred to liquid nitrogen for storage until data collection. Data collection and refinement Diffraction data from a single crystal EC1-4 were collected in beamline BL-5A at the Photon Factory (Tsukuba, Japan) under cryogenic conditions (100 K). Diffraction images were processed with the program MOSFLM and merged and scaled with the program SCALA (47) of the CCP4 suite (48). The structure of the WT protein was determined by the molecular replacement method using the coordinates of P-cadherin (PDB entry code 4ZMY) (49) with the program PHASER (50). The models were refined with the programs REFMAC5 (51) and built manually with COOT (52). Validation was carried out with PROCHECK (53). Data collection and structure refinement statistics are given in Table 1. UCSF Chimera was used to render all of the molecular graphics (54). Site-directed mutagenesis Introduction of mutation to plasmid was performed as described previously (55). Size exclusion chromatography with multi-angle light scattering (SEC-MALS) The molecular weight of LI-cadherin was .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 determined using superose 12 10/300 GL column (Cytiva) with inline DAWN8+ multi angle light scattering (MALS) (Wyatt Technology), UV detector (Shimadzu), and refractive index (RI) detector (Shodex). Protien samples (45 µL) were injected at 100 µM or 50 µM. Analysis was performed using ASTRA software (Wyatt Technology). Concentration at the end of the chromatographic column was measured based on the UV absorbance. The protein conjugate method was employed for the analysis as sugar chains were bound to LI-cadherin. All detectors were calibrated using bovine serum albumin (BSA) (Sigma- Aldrich). Comparison of thermal stability by DSC DSC measurement was performed using MicroCal VP-Capillary DSC (Malvern). The measurement was performed from 10 °C to 100 °C at the scan rate of 1 °C min-1. Data was analyzed using Origin7 software. Establishment of CHO cells expressing LI- cadherin The DNA sequence of monomeric GFP was fused at the C-terminal of all human LI-cadherin constructs of which stable cell lines were established and was cloned in pcDNATM5/FRT vector (ThermoFisher Scientific). CHO cells stably expressing LI-cadherin-GFP were established using Flp-InTM-CHO Cell Line following the manufacturer’s protocol (ThermoFisher Scientific). Cloning was performed by the limiting dilution- culture method. Cells expressing GFP were selected and cultivated. Observation of the cells were performed by In Cell Analyzer 2000 (Cytiva). The cells were cultivated in Ham’s F-12 Nutrient Mixture (ThermoFisher Scientific) supplemented with 10 % fetal bovine serum (FBS), 1% L- Glutamine or 1% GlutaMAXTM-I (ThermoFisher Scientific), 1% penicillin-streptomycin, and 0.5 mg mL-1 Hygromycin B at 37 °C and 5.0% CO2. Cell Imaging Cells (100 µL) were added to a 96-well plate (Greiner) at 1 x 105 cells mL-1 and cultured overnight. After washing the cells with wash medium (Ham’s F-12 Nutrient Mixture (ThermoFisher Scientific) supplemented with 10 % fetal bovine serum (FBS), 1% GlutaMAXTM-I, 1% penicillin-streptomycin), Hoechst 33342 (ThermoFisher Scientific) (100 µL) was added to each well at 0.25 µg mL-1. The plate was incubated at room temperature for 30 minutes. Cells were washed with wash medium twice and with 1x HMF (10 mM HEPES-NaOH at pH 7.5, 137 mM NaCl, 5.4 mM KCl, 0.34 mM Na2HPO4, 1 mM CaCl2, and 5.5 mM glucose) twice. After that, 1x HMF (200 µL) was loaded to each well and the images were taken with an In Cell Analyzer 2000 instrument (Cytiva) using the FITC filter (490/20 excitation, 525/36 emission) and the DAPI filter (350/50 excitation, 455/50 emission) with 60 x 0.70 NA .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 objective lens (Nikon). Cell aggregation assay Cell aggregation assay was performed by modifying the methods described previously (56, 57). Cells were detached from cell culture plate by adding 1x HMF supplemented with 0.01% trypsin and placing on a shaker at 80 rpm for 15 minutes at 37 °C. FBS was added to the final concentration of 20% to stop the trypsinization. Cells were washed with 1x HMF supplemented with 20% FBS once and with 1x HMF twice to remove trypsin. Cells were suspended in 1x HMF at 1 x 105 cells mL-1. 500 µL of the cell suspension was loaded into 24- well plate coated with 1% w/v BSA. EDTA was added if necessary. After incubating the plate at room temperature for 5 minutes, 24-well plate was placed on a shaker at 80 rpm for 60 minutes at 37 °C. Micro-Flow Imaging (MFI) Micro-Flow Imaging (Brightwell Technologies) was used to count the particle number and to visualize the cell aggregates after cell aggregation assay. After the cell aggregation assay described above, the plate was incubated at room temperature for 10 minutes and 500 µL of 4% Paraformaldehyde Phosphate Buffer Solution (Nacalai Tesque) was loaded to each well. The plate was incubated on ice for more than 20 minutes. Images of the cells were taken using EVOS® XL Core Imaging System (ThermoFisher Scientific) if necessary. After that, cells were injected to MFI. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 Data availability The coordinates and structure factors of LI-Cadherin EC1-4 have been deposited in the Protein Data Bank with entry code 7CYM. All remaining data are contained within the article. Acknowledgements We thank Dr. S. Kudo and Dr. H. Akiba for expert advice. We thank Dr. O. Kusano-Arai, Dr. H. Iwanari and Dr. T. Hamakubo for providing us with gene sequence of LI-cadherin. Funding and additional information The supercomputing resources in this study were provided by the Human Genome Center at the Institute of Medical Science, The University of Tokyo, Japan. This work was funded by a Grant-in-Aid for Scientific Research (A) 16H02420 (K.T.) and a Grant-in-Aid for Scientific Research (B) 20H02531 (K.T.) from Japan Society for the Promotion of Science, a Grant-in-Aid for Scientific Research on Innovative Areas 19H05760 and 19H05766 (K.T.) from Ministry of Education, Culture, Sports, Science and Technology, and a Grant-in-Aid for JSPS fellows 18J22330 (A.Y.) from Japan Society for the Promotion of Science. We are grateful to the staff of the Photon Factory (Tsukuba, Japan) for excellent technical support. Access to beamlines BL-5A and BL-10C was granted by the Photon Factory Advisory Committee (Proposal Numbers 2018G116 and 2017G661). Conflict of Interest The authors declare that they have no conflicts of interest with the contents of this article. References 1. Takeichi Masatoshi (1988) The cadherins: Cell-cell adhesion molecules controlling animal morphogenesis. Development. 102, 639–655 2. Van Roy, F. (2014) Beyond E-cadherin: Roles of other cadherin superfamily members in cancer. Nat. Rev. Cancer. 14, 121–134 3. Wendeler, M. W., Praus, M., Jung, R., Hecking, M., Metzig, C., and Geßner, R. (2004) Ksp- cadherin is a functional cell-cell adhesion molecule related to LI-cadherin. Exp. Cell Res. 294, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 345–355 4. Hinoi, T., Lucas, P. C., Kuick, R., Hanash, S., Cho, K. R., and Fearon, E. R. (2002) CDX2 regulates liver intestine-cadherin expression in normal and malignant colon epithelium and intestinal metaplasia. Gastroenterology. 123, 1565–1577 5. Ko, S., Chu, K. M., Luk, J. M., Wong, B. W., Yuen, S. T., Leung, S. Y., and Wong, J. (2004) Overexpression of LI-cadherin in gastric cancer is associated with lymph node metastasis. Biochem. Biophys. Res. Commun. 319, 562–568 6. Matsusaka, K., Ushiku, T., Urabe, M., Fukuyo, M., Abe, H., Ishikawa, S., Seto, Y., Aburatani, H., Hamakubo, T., Kaneda, A., and Fukayama, M. (2016) Coupling CDH17 and CLDN18 markers for comprehensive membrane-targeted detection of human gastric cancer. Oncotarget. 7, 64168– 64181 7. Berndorff, D., Gessner, R., Kreft, B., Schnoy, N., Lajous-Petter, A. M., Loch, N., Reutter, W., Hortsch, M., and Tauber, R. (1994) Liver-intestine cadherin: Molecular cloning and characterization of a novel Ca2+-dependent cell adhesion molecule expressed in liver and intestine. J. Cell Biol. 125, 1353–1369 8. Weth, A., Dippl, C., Striedner, Y., Tiemann-Boege, I., Vereshchaga, Y., Golenhofen, N., Bartelt- Kirbach, B., and Baumgartner, W. (2017) Water transport through the intestinal epithelial barrier under different osmotic conditions is dependent on LI-cadherin trans-interaction. Tissue Barriers. 10.1080/21688370.2017.1285390 9. Jung, R., Wendeler, M. W., Danevad, M., Himmelbauer, H., and Geßner, R. (2004) Phylogenetic origin of LI-cadherin revealed by protein and gene structure analysis. Cell. Mol. Life Sci. 61, 1157–1166 10. Shapiro, L., Fannon, A. M., Kwong, P. D., Thompson, A., Lehmann, M. S., Gerhard, G., Als- Nielsen, J., Als-Nielsen, J., Colman, D. R., and Hendrickson, W. A. (1995) Structural basis of cell- cell adhesion by cadherins. Nature. 374, 327–337 11. Nagar, B., Overduin, M., Ikura, M., and Rini, J. M. (1996) Structural basis of calcium-induced E- cadherin rigidification and dimerization. Nature. 380, 360–364 12. Kreft, B., Berndorff, D., Böttinger, A., Finnemann, S., Wedlich, D., Hortsch, M., Tauber, R., and Gener, R. (1997) LI-cadherin-mediated cell-cell adhesion does not require cytoplasmic interactions. J. Cell Biol. 136, 1109–1121 13. Brasch, J., Harrison, O. J., Honig, B., and Shapiro, L. (2012) Thinking outside the cell: How cadherins drive adhesion. Trends Cell Biol. 22, 299–310 14. Nicoludis, J. M., Vogt, B. E., Green, A. G., Schärfe, C. P. I., Marks, D. S., and Gaudet, R. (2016) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 Antiparallel protocadherin homodimers use distinct affinity-and specificity-mediating regions in cadherin repeats 1-4. Elife. 5, 1–14 15. Harrison, O. J., Bahna, F., Katsamba, P. S., Jin, X., Brasch, J., Vendome, J., Ahlsen, G., Carroll, K. J., Price, S. R., Honig, B., and Shapiro, L. (2010) Two-step adhesive binding by classical cadherins. Nat. Struct. Mol. Biol. 17, 348–357 16. Parisini, E., Higgins, J. M. G., Liu, J. huan, Brenner, M. B., and Wang, J. huai (2007) The Crystal Structure of Human E-cadherin Domains 1 and 2, and Comparison with other Cadherins in the Context of Adhesion Mechanism. J. Mol. Biol. 373, 401–411 17. Boggon, T. J., Murray, J., Chappuis-Flament, S., Wong, E., Gumbiner, B. M., and Shapiro, L. (2002) C-cadherin ectodomain structure and implications for cell adhesion mechanisms. Science (80-. ). 296, 1308–1313 18. Kudo, S., Caaveiro, J. M. M., Miyafusa, T., Goda, S., Ishii, K., Matsuura, T., Sudou, Y., Kodama, T., Hamakubo, T., and Tsumoto, K. (2012) Structural and thermodynamic characterization of the self-adhesive properties of human P-cadherin. Mol. Biosyst. 8, 2050–2053 19. Jin, X., Walker, M. A., Felsövályi, K., Vendome, J., Bahna, F., Mannepalli, S., Cosmanescu, F., Ahlsen, G., Honig, B., and Shapiro, L. (2012) Crystal structures of Drosophila N-cadherin ectodomain regions reveal a widely used class of Ca 2+-free interdomain linkers. Proc. Natl. Acad. Sci. U. S. A. 109, E127–E134 20. Araya-Secchi, R., Neel, B. L., and Sotomayor, M. (2016) An elastic element in the protocadherin- 15 tip link of the inner ear. Nat. Commun. 10.1038/ncomms13458 21. Harrison, O. J., Jin, X., Hong, S., Bahna, F., Ahlsen, G., Brasch, J., Wu, Y., Vendome, J., Felsovalyi, K., Hampton, C. M., Troyanovsky, R. B., Ben-Shaul, A., Frank, J., Troyanovsky, S. M., Shapiro, L., and Honig, B. (2011) The extracellular architecture of adherens junctions revealed by crystal structures of type i cadherins. Structure. 19, 244–256 22. Krissinel, E., and Henrick, K. (2007) Inference of Macromolecular Assemblies from Crystalline State. J. Mol. Biol. 372, 774–797 23. Harrison, O. J., Bahna, F., Katsamba, P. S., Jin, X., Brasch, J., Vendome, J., Ahlsen, G., Carroll, K. J., Price, S. R., Honig, B., and Shapiro, L. (2010) Two-step adhesive binding by classical cadherins. Nat. Struct. Mol. Biol. 17, 348–357 24. Boller, K., Vestweber, D., and Kemler, R. (1985) Cell-adhesion molecule uvomorulin is localized in the intermediate junctions of adult intestinal epithelial cells. J. Cell Biol. 100, 327–332 25. Grötzinger, C., Kneifel, J., Patschan, D., Schnoy, N., Anagnostopoulos, I., Faiss, S., Tauber, R., Wiedenmann, B., and Geßner, R. (2001) LI-cadherin: A marker of gastric metaplasia and .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 neoplasia. Gut. 49, 73–81 26. Liu, X., Huang, Y., Yuan, H., Qi, X., Manjunath, Y., Avella, D., Kaifi, J. T., Miao, Y., Li, M., Jiang, K., and Li, G. (2019) Disruption of oncogenic liver-intestine cadherin (CDH17) drives apoptotic pancreatic cancer death. Cancer Lett. 454, 204–214 27. Bartolomé, R. A., Barderas, R., Torres, S., Fernandez-Aceñero, M. J., Mendes, M., García- Foncillas, J., Lopez-Lucendo, M., and Casal, J. I. (2014) Cadherin-17 interacts with α2β1 integrin to regulate cell proliferation and adhesion in colorectal cancer cells causing liver metastasis. Oncogene. 33, 1658–1669 28. Wang, J., Yu, J. C., Kang, W. M., Wang, W. Z., Liu, Y. Q., and Gu, P. (2012) The predictive effect of cadherin-17 on lymph node micrometastasis in pN0 gastric cancer. Ann. Surg. Oncol. 19, 1529–1534 29. Huang, R. Y. J., Guilford, P., and Thiery, J. P. (2012) Early events in cell adhesion and polarity during epithelialmesenchymal transition. J. Cell Sci. 125, 4417–4422 30. Lamouille, S., Xu, J., and Derynck, R. (2014) Molecular mechanisms of epithelial–mesenchymal transition. Nat. Rev. Mol. Cell Biol. 15, 178–196 31. Kreft, B., Berndorff, D., Böttinger, A., Finnemann, S., Wedlich, D., Hortsch, M., Tauber, R., and Gener, R. (1997) LI-cadherin-mediated cell-cell adhesion does not require cytoplasmic interactions. J. Cell Biol. 136, 1109–1121 32. Schuck, P. (2000) Size-distribution analysis of macromolecules by sedimentation velocity ultracentrifugation and Lamm equation modeling. Biophys. J. 78, 1606–1619 33. Laue, T. M., Shah, B., Ridgeway, T. M., and Pelletier, S. L. (1992) Computer-aided interpretation of analytical sedimentation data for proteins. Anal. ultracentrifugation Biochem. Polym. Sci. 34. Brautigam, C. A. (2015) Chapter Five - Calculations and Publication-Quality Illustrations for Analytical Ultracentrifugation Data. in Methods in Enzymology (Cole, J. L. B. T.-M. in E. ed), pp. 109–133, Academic Press, 562, 109–133 35. Schuck, P. (2003) On the analysis of protein self-association by sedimentation velocity analytical ultracentrifugation. Anal. Biochem. 320, 104–124 36. Shimizu, N., Mori, T., Nagatani, Y., Ohta, H., Saijo, S., Takagi, H., Takahashi, M., Yatabe, K., Kosuge, T., and Igarashi, N. (2019) BL-10C, the small-angle x-ray scattering beamline at the photon factory. AIP Conf. Proc. 10.1063/1.5084672 37. Schneidman-Duhovny, D., Hammel, M., Tainer, J. A., and Sali, A. (2013) Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 105, 962–974 38. Schneidman-Duhovny, D., Hammel, M., Tainer, J. A., and Sali, A. (2016) FoXS, FoXSDock and .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 MultiFoXS: Single-state and multi-state structural modeling of proteins and their complexes based on SAXS profiles. Nucleic Acids Res. 44, W424–W429 39. Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., and Lindahl, E. (2015) GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 1–2, 19–25 40. Huang, J., Rauscher, S., Nawrocki, G., Ran, T., Feig, M., De Groot, B. L., Grubmüller, H., and MacKerell, A. D. (2016) CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods. 14, 71–73 41. Eswar, N., Webb, B., Marti-Renom, M. A., Madhusudhan, M. S., Eramian, D., Shen, M., Pieper, U., and Sali, A. (2006) Comparative Protein Structure Modeling Using Modeller. Curr. Protoc. Bioinforma. 15, 5.6.1-5.6.37 42. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein, M. L. (1983) Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 43. Jo, S., Kim, T., Iyer, V. G., and Im, W. (2008) CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem. 29, 1859–1865 44. Jo, S., Song, K. C., Desaire, H., MacKerell, A. D., and Im, W. (2011) Glycan reader: Automated sugar identification and simulation preparation for carbohydrates and glycoproteins. J. Comput. Chem. 32, 3135–3141 45. Darden, T., York, D., and Pedersen, L. (1993) Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 46. Hess, B., Bekker, H., Berendsen, H. J. C., and Fraaije, J. G. E. M. (1997) LINCS: A Linear Constraint Solver for molecular simulations. J. Comput. Chem. 18, 1463–1472 47. Evans, P. (2006) Scaling and assessment of data quality. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 72–82 48. Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A., and Wilson, K. S. (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr. Sect. D Biol. Crystallogr. 67, 235– 242 49. Kudo, S., Caaveiro, J. M. M., and Tsumoto, K. (2016) Adhesive Dimerization of Human P- Cadherin Catalyzed by a Chaperone-like Mechanism. Structure. 24, 1523–1536 50. McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C., and Read, R. J. (2007) Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 51. Murshudov, G. N. (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. 53, 240–255 52. Emsley, P., Lohkamp, B., Scott, W. G., and Cowtan, K. (2010) Features and development of Coot. Acta Crystallogr. Sect. D Biol. Crystallogr. 66, 486–501 53. Laskowski, R. A. (1993) PROCHECK-a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 54. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera—A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 55. Yui, A., Akiba, H., Kudo, S., Nakakido, M., Nagatoishi, S., and Tsumoto, K. (2017) Thermodynamic analyses of amino acid residues at the interface of an antibody B2212A and its antigen roundabout homolog 1. J. Biochem. 10.1093/jb/mvx050 56. Urushihara, H., Takeichi, M., Hakura, A., and Okada, T. S. (1976) Different cation requirements for aggregation of BHK cells and their transformed derivatives. J. Cell Sci. 22, 685–695 57. Urushihara, H., Ozaki, H. S., and Takeichi, M. (1979) Immunological detection of cell surface components related with aggregation of Chinese hamster and chick embryonic cells. Dev. Biol. 70, 206–216 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 Table 1: Data collection and refinement statistics. Statistical values given in parenthesis refer to the highest resolution bin. Data Collection LI-cadherin (EC1-4) Space Group P 1 21 1 Unit cell a, b, c (Å) 80.36, 70.84, 134.22 α, β, γ (°) 90.0, 98.7, 90.0 Resolution (Å) 55.19 - 2.70 (2.85 - 2.70) Wavelength 1.0000 Observations 252,491 (37,071) Unique reflections 41,272 (5,955) Rmerge. 0.095 (0.858) Rp.i.m. 0.041 (0.371) CC1/2 0.998 (0.907) I / σ (I) 11.6 (1.8) Multiplicity 6.1 (6.2) Completeness (%) 99.9 (100.0) Refinement Statistics Resolution (Å) 55.19 - 2.70 Rwork / Rfree (%) 22.2 / 27.4 No. protein chains 2 No. atoms Protein 6,834 Ca2+ 12 Water 33 B-factor (Å2) Protein 78.2 Ca2+ 69.3 Water 62.2 Ramachandran Plot Preferred (%) 85.7 Allowed (%) 14.3 Outliers (%) 0 RMSD Bond (Å) 0.013 RMSD Angle (°) 1.83 PDB entry code 7CYM .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 Table 2: Results of Ala scanning SEC-MALS DSC Sample Concentration (µM) MW1 (kDa) Concentration (µM) TM1 (°C) TM2 (°C) WT 100 53.7 7.9 60.1 ± 0.02 N.D.3 50 51.9 I169A 100 50.5 7.9 61.7 ± 0.0 N.D.3 50 50.1 L171A 100 51.1 7.9 60.3 ± 0.0 N.D.3 50 49.7 N176A 100 53.1 7.5 57.7 ± 0.0 63.7 ± 0.0 50 50.7 V210A 100 51.5 7.9 56.5 ± 0.0 63.0 ± 0.0 50 50.3 N222A 100 52.2 7.4 58.7 ± 0.0 63.9 ± 0.0 50 51.6 F224A 100 49.5 8.1 54.2 ± 0.0 62.5 ± 0.0 50 49.4 L355A 100 54.8 7.8 59.9 ± 0.0 65.3 ± 0.1 50 52.5 N371A 100 52.4 7.1 59.5 ± 0.0 N.D.3 50 N.D.3 F376A 100 55.1 7.8 60.4 ± 0.0 N.D.3 50 50.6 Y399A 100 53.6 7.5 59.9 ± 0.0 N.D.3 50 51.7 Q404A 100 52.7 7.8 59.6 ± 0.0 N.D.3 50 N.D.3 1. The molecular weight of the protein does not include the glycan moiety. The theoretical molecular weight of EC1-4 wild type without glycan is 49.8 kDa. 2. Tm ± error is shown. 3. Not determined. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 Figure 1. Schematic view of extracellular cadherin (EC) domains of classical cadherin and LI-cadherin. Domains connected by dotted lines have sequence homology. EC1-4, EC1-2 and EC3-4, which were used for the experiments are indicated by parentheses. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 Figure 2. Analysis on dimerization state of LI-cadherin EC1-4, EC1-2 and EC3-4. A. Sedimentation plot of SV-AUC. Dimerization of EC1-4 and EC1-2 was confirmed. KD of EC1-4 and EC1-2 homodimer was 39.8 µM and 75.0 µM, respectively. B. Scattering curve of SAXS and theoretical curve of EC3-4 calculated from modified crystal structure. Method to produce modified crystal structure is explained in experimental procedures and supplementary information. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 Figure 3. Crystal structure of EC1-4 homodimer. Calcium ions are depicted in magenta. No calcium ions were observed between domains EC2 and EC3 in either molecule. Four partial N-glycans were modeled in chain A (light green) and two in chain B (cyan) (the amino acid sequence of EC1-4 is given in Table S1). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Figure 4. Computational analysis of the flexibility of calcium-free linker. A. Schematic view of how RMSD values were calculated. B. RMSD values of EC1 C or EC3 C against EC2 C. Chain A of EC1-4 dimer structure was employed as the initial structure. C. RMSD values of EC1 C or EC3 C in chain A of the dimer structure against EC2 C in the chain A. RMSD values and standard deviations are shown in parenthesis in angstrom unit. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Figure 5. Residues involved in the intermolecular interaction in EC1-4 homodimer crystal structure. The non-polar interaction residues are shown in black and purple rectangles (top panels). Residues involved in hydrogen bonds (black solid lines) are shown within the red and blue rectangles (bottom panel). Residues indicated with an asterisk were individually mutated to Ala to evaluate their contribution to dimerization. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 Figure 6. Mutagenesis analysis by SEC-MALS. A. Molecular weight measured by MALS. F224A exhibited the smallest molecular weight among all constructs tested. The samples were injected at 100 µM. Error bars indicate experimental uncertainties. B. SEC chromatograms obtained using SEC-MALS. Protein was injected at 100 µM. Chromatogram of WT and F224A are indicated in black (bold line) and green, respectively. Elution volume of the peak top of F224A was the largest among all constructs. C. SEC chromatogram and MW plots of EC1-4 WT and F224A. Graphs of other mutants are shown in Fig. S7C. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 Figure 7. Mechanism of dimerization facilitated by Phe224. A. The distance between Phe224 (orange) C or Ala224 (purple) C and Asn122 (grey) C was evaluated by MD simulations. The C atoms are indicated by black circles. The distance calculated by the simulations is indicated with red line. Time course is shown on the portion of the panel at the upper part of each structure. Each MD simulation run is shown in red, black and blue. Averages and standard deviations from 20 to 220 ns of each simulation are shown in parentheses. B. Structure alignment of EC2 (chain A) in EC1-4 homodimer crystal structure and EC2 during the MD simulation of EC1-4F224A monomer. A snapshot of 103.61 ns in Run 1 was chosen as it showed the largest distance between Asn122 and Ala224. Ala224 is indicated in purple. The loop indicated with the black arrow would cause steric hindrance towards the formation of the homodimer. C. Thermal stability of EC1-4 WT and F224A determined by differential scanning calorimetry. Two transitions appeared in the sample of F224A. The first transition at lower temperature seems to have appeared due to the loss of intramolecular interaction around the residue at position 224. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Figure 8. Cell aggregation assay. A. Size distribution of cell aggregates determined by MFI. Particles that were 25 µm or larger were regarded as cell aggregates. Only EC1-7 WT expressing cells in the existence of 1 mM CaCl2 showed significant number of cell aggregates that were 40 µm or larger. Data show the mean ± SEM of four measurements. B. Microscopic images of cell aggregates taken after adding 4% PFA and incubating the plate on ice for 20 minutes. C. Images of cell aggregates taken by MFI. Cell aggregates belonging to the largest size population of each construct obtained in the presence of 1 mM CaCl2 (70~100 m for EC1-7GFP, 50~70 m for EC1-7F224AGFP and 40~70 m for Flp-In CHO) are shown. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 Figure 9. Cell adhesion mediated by short constructs. Cell aggregation assay using EC1-5GFP and EC3- 7GFP. EC1-7GFP and Flp-In CHO (mock cells) were used as positive and negative control, respectively. Particles that were 25 µm or larger were considered as cell aggregates. The number of cell aggregates of both EC1-5GFP and EC3-7GFP in the presence or absence of Ca2+ were determined. Data show mean ± SEM of four measurements. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2020.09.18.291195doi: bioRxiv preprint https://doi.org/10.1101/2020.09.18.291195 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_12_30_424894 ---- Boosting detection of low abundance proteins in thermal proteome profiling experiments by addition of an isobaric trigger channel to TMT multiplexes Boosting detection of low abundance proteins in thermal proteome profiling experiments by addition of an isobaric trigger channel to TMT multiplexes Sarah A. Peck Justice†, Neil A. McCracken, José F. Victorino‡, Aruna B. Wijeratne, Amber L. Mosley* Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States ABSTRACT: The study of low abundance proteins is a challenge to discovery-based proteomics. Mass-spectrometry (MS) applica- tions, such as thermal proteome profiling (TPP) face specific challenges in detection of the whole proteome as a consequence of the use of nondenaturing extraction buffers. TPP is a powerful method for the study of protein thermal stability, but quantitative accuracy is highly dependent on consistent detection. Therefore, TPP can be limited in its amenability to study low abundance proteins that tend to have stochastic or poor detection by MS. To address this challenge, we incorporated an affinity purified protein complex sample at submolar concentrations as an isobaric trigger channel into a mutant TPP (mTPP) workflow to provide reproducible detec- tion and quantitation of the low abundance subunits of the Cleavage and Polyadenylation Factor (CPF) complex. The inclusion of an isobaric protein complex trigger channel increased detection an average of 40x for previously detected subunits and facilitated detec- tion of CPF subunits that were previously below the limit of detection. Importantly, these gains in CPF detection did not cause large changes in melt temperature (Tm) calculations for other unrelated proteins in the samples, with a high positive correlation between Tm estimates in samples with and without isobaric trigger channel addition. Overall, the incorporation of affinity purified protein complex as an isobaric trigger channel within a TMT multiplex for mTPP experiments is an effective and reproducible way to gather thermal profiling data on proteins that are not readily detected using the original TPP or mTPP protocols. Proteins are the functional units of a cell, carrying out and controlling processes at specific times and locations to maintain homeostasis and respond to external stimuli. As a consequence of functional changes, proteins can exist in a variety of biophysical states within cells as a consequence of variants in their primary sequence, post-translational modification (PTM) state, and/or subcellular localization. In many cases a protein’s biophysical state is impacted by associations with other proteins, including both transient and stable protein-protein inter- actions. The characterization of protein-protein interactions (PPIs) is fundamental to gaining a full understanding of biological mechanism. In fact, PPIs are so critical to proper protein function that disruptions in these interactions often lead to disease and/or cell death 1. Advances in mass spectrometry (MS)-based proteomics workflows continue to increase our ability to study protein complex dynamics and PPIs2-7. MS-based approaches for protein interaction analysis rely on discov- ery-based proteomics performed using data-dependent acquisition (DDA). Generally in DDA, peptides with the most intense ions from MS1 are selected for fragmentation and MS2 analysis 8. This approach maximizes signal to noise levels and thereby increases confidence in the selection and subsequent identification of the peptide ions. Challenges with the use of DDA include selection of peptide ions from protein(s) of interest that are present at low relative abundance levels or when peptides of interest (such as PTM containing peptides) are pre- sent at low relative levels to their unmodified counterparts. Low abun- dance peptides may be present at insufficient MS1 signal intensity lev- els to trigger fragmentation and MS2 analysis based on instrument set- tings for MS2 analysis. While fractionation and an extended HPLC gra- dient help to spread out the elution of peptides into the mass spectrom- eter, many peptides may still co-elute such that highly abundant ion species will outcompete those that are less abundant 9. A number of strategies have recently emerged to improve MS detection of low abun- dance proteins and post-translational modifications (PTMs) for a vari- ety of applications including single cell proteomics10-16. Although we will not discuss all of the recently established strategies here, one such strategy, Boosting to Amplify Signal with Isobaric Labeling (BASIL), has similarities that have informed the current work. Specifically, BASIL has been shown to successfully increase detection of low abun- dance phosphopeptides through addition of a boosting sample to a tan- dem mass tag (TMT)-based multiplex17. TMTPro labeling allows for the multiplexing and relative quantitation of up to 16 samples18-20. As each TMT label is isobaric, labeled peptides from the multiplexed sam- ples elute into the mass spectrometer together and are analyzed simul- taneously as one ion peak during MS1 scans which is distinguished in fragment ion scans during MSn (typically MS2 or MS3) analysis. By incorporating a phospho-enriched sample into a single channel in the TMT multiplex, Yi et al increased ion abundance of phosphopeptides in the MS1 scan to the extent that MS2 was triggered for phosphopep- tides that were typically below the level of detection in standard DDA approaches 17. BASIL allowed for the identification and quantification of phosphopeptides in other TMT channels where enrichment had not been performed17. The BASIL method has since been optimized for de- tection of phosphopeptides in single cells21 and similar approaches have been applied to phosphotyrosine-containing peptides22, SILAC- labeled peptides23, and using synthetic peptides to particular peptides of interest24. BASIL and other similar methods that take advantage of isobaric carrier channels could have numerous applications in DDA- based quantitative workflows. The challenges to studying low abundance proteins in DDA proteomics experiments extend in particular to the mass spectrometry-based ther- mal proteome profiling (TPP) methods and are the focus of this study. TPP analysis takes advantage of TMT labeling technology to produce protein melt curves that can then be compared across conditions to measure alterations in protein thermal stability25, 26. Although TPP was originally developed to study drug and ligand binding, it has been shown to also be a robust approach to probe PPIs in a number of dif- ferent applications (recently reviewed by Mateus et al 27). We recently developed a new application of TPP referred to as mutant TPP (mTPP), that is used to study the effects of protein missense mutations on the proteome at large with the ability to focus in on specific protein com- plexes and their PPIs28. mTPP analysis is advantageous to other .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ methods for the study of PPIs in that it does not require antibodies, addition of reagents such as crosslinkers, or the genetic manipulations (such as the production of fusion proteins) typically necessary for many other PPI analyses. Additionally, mTPP can be performed with signif- icantly less starting material than traditional affinity purification or en- richment approaches, making it applicable to a wider variety of sample types. Despite these advantages, we have quickly encountered chal- lenges associated with quantitative analysis of specific target proteins and their interaction partners. Therefore, a strategy for increasing the ion intensity of proteins of interest in mTPP experiments would have a significant impact on our ability to study PPI dynamics of low abun- dance protein complexes while still retaining the context of changes within the overall proteome. One advantage of TMT- and iTRAQ- based multiplexed workflows for global proteomics studies is that the pooling of multiple samples generates increased protein starting mate- rial that can then be subjected to extensive biochemical fractionation to facilitate deep proteome coverage 29-33. This advantage can be coupled with protein extraction methods using denaturants such as urea or SDS to isolate the full proteome of many cells and tissues 34. The workflow for TPP cannot exploit these advantages since: 1) Temperature treat- ment of lysates for TPP results in unequal levels of protein mixture across the multiplex that, in our hands, vary on average at least 10-fold from the lowest to the highest temperature treatment 28; and 2) Non- denaturing protein extraction buffers must be used to maintain protein structure, PPIs, and protein interactions with other molecules (includ- ing but not limited to lipids, metabolites, small molecules, and drugs) 25-27. As a consequence, TPP workflows typically result in decreased proteome coverage relative to denaturant extracted proteomes even when equivalent amounts of starting material are used 28. To expand proteome coverage for our mTPP workflow, we have devel- oped a BASIL-like approach to increase the signal of low abundance protein complexes and their representative peptides in mTPP experi- ments using a protein complex affinity purification trigger channel in place of the phosphopeptides isobaric boosting channel used in BASIL17. As a proof-of-concept, we investigated the ability of this ap- proach to enhance detection of the relatively low abundance protein complex cleavage and polyadenylation factor (CPF) complex in a mTPP workflow. Affinity purified CPF that we have previously char- acterized 35-39 was incorporated as an isobaric trigger channel into our mTPP workflow at a ratio to the lowest heat-treated mTPP sample of ~1:8 and ~1:50. Using this approach, a significant increase in the abun- dance of CPF complex members was observed, including those that were not readily identified without the isobaric trigger channel. Im- portantly, addition of an isobaric trigger channel into our mTPP work- flow does not appear to have a significant impact on the melt tempera- ture (Tm) calculation of proteins detected both with and without the trigger. Overall, the use of an isobaric trigger channel is a robust ap- proach to prioritize DDA selection of proteins or peptides of interest such as missense mutant containing proteins and their interaction part- ners, which are of particular focus within mTPP experiments. EXPERIMENTAL SECTION Yeast strains and growth All experiments were performed in Saccharomyces cerevisiae. The pa- rental strain SMY732, described previously,40 was obtained from the Mirkin lab and used in the trigger experiments comparing technical replicates. For the biological replicate experiments, the wildtype strain used was the commercially available BY4741 strain (Open Biosys- tems). The ssu72-2 temperature sensitive mutant (first described by the Hampsey lab 41) was purchased from Euroscarf. The Pta1-FLAG strain was made via homologous recombination. The 3xFLAG tag DNA se- quence was amplified from plasmids obtained from Funakoshi and Hochstrasser 42 to insert the FLAG epitope tag into the genome at the 3’-end of the PTA1 gene in WT (BY4741). Successful incorporation of the FLAG tag was confirmed via Western blot. For mTPP experiments, cells were inoculated at an OD600 = 0.3 and grown to an OD600 = 0.8 in yeast extract, peptone, dextrose (YPD) me- dium at permissive temperature (30°C or 25°C). YPD was removed by filtration through a nitrocellulose membrane (Millipore, Burlington, MA). Cells were flash frozen with liquid nitrogen and stored at -80°C to be used in subsequent sample preparation steps. For affinity purifi- cation of CPF via Pta1-FLAG, cells were grown overnight at 30°C in YPD to an OD600 ~3. Cells were pelleted, washed, and transferred to 50ml conical tubes for storage at -80° until subsequent sample prepa- ration steps. Sample preparation BY4741 and ssu72-2 samples for mTPP were prepared as described in Peck Justice et al28 with the exception of an extended temperature range for the heat treatment. For the no trigger mTPP experiments, lysate was treated at the following ten temperatures: untreated, 25°, 35°, 46.2°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C. A TMT 10plex kit (Thermo Scientific, Waltham, MA) with channels TMT126; TMT127N; TMT127C; TMT128N; TMT128C; TMT129N; TMT129C; TMT130N; TMT130C and TMT131 were respectively used to label peptide solutions derived from untreated, 25°, 35°, 46.2°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C temperature treatments in WT. In ssu72-2, channels TMT126; TMT127N; TMT127C; TMT128N; TMT128C; TMT129N; TMT129C; TMT130N; TMT130C and TMT131 were respectively used to label peptide solutions derived from untreated, 25°, 35°, 48.8°, 46.2°, 51.2°, 74.9°C, 53.2°, 55.2°, and 56.5° temperature treatments. TMT labeling steps were performed ac- cording to manufacturer provided instructions. To boost detection of the native CPF subunits, subsequent mTPP rep- licates of WT and ssu72-2 included the addition of a trigger channel consisting of an affinity-purified CPF complexes. Affinity purification of CPF via Pta1-FLAG was performed as described previously for Ssu72-FLAG purifications 35. The Pta1-FLAG affinity purified sample was added at a ratio of 6.25 ug trigger to 50 ug of the lowest heat- treated sample (1:8 ratio) for the initial study. The untreated samples were removed from the multiplex from no trigger samples to accom- modate for the isobaric trigger channel to be labeled with TMT126. The remainder of the channels, TMT127N; TMT127C; TMT128N; TMT128C; TMT129N; TMT129C; TMT130N; TMT130C and TMT131 were used to label peptide solutions derived from 25°, 35°, 46.2°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C temperature treat- ments. Subsequent sample preparation steps were performed as de- scribed in Peck Justice et al28. SMY732 samples for independent replicate experiments were prepared as described in Peck Justice et al28. Lysate was treated at the following eight temperatures: 25°, 35°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C. A TMT 16plex kit (Thermo Scientific, Waltham, MA) with channels TMT127N; TMT127C; TMT128N; TMT128C; TMT129N; TMT129C; TMT130N; TMT130C were respectively used to label pep- tide solutions derived from 25°, 35°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C temperature treatments in parental culture samples. Note that some channels in the 16plex were used for other samples not de- scribed in this report. These heat-treated lysates were analyzed twice and as separate LC-MS experiments for comparison of technical repli- cate reproducibility. In one experiment, the set of combined labeled samples was analyzed with a ninth trigger channel (TMT126) at a ratio of 1 ug total isobaric trigger channel protein to 50 ug of the lowest heat- treated sample (1:50 ratio) which included the Pta1-FLAG affinity pu- rified material (described previously) while in the second experiment, the trigger was not added. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ LC-MS/MS analysis Following multiplex preparation as described above, samples were sub- jected to high-pH reversed phase fractionation as previously described 28. NanoLC-MS/MS analyses were performed on an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific, Waltham, MA) coupled to an EASY-nLC HPLC (Thermo Scientific, Waltham, MA). One-third of the resuspended fractions were loaded onto an in-house prepared re- versed phase column using 600 bar as applied maximum pressure to an Easy-Nano 25cm column with 2µm reversed phase resin. The peptides were eluted using a 180-minute gradient increasing from 95% buffer A (0.1% formic acid in water) and 5% buffer B (0.1% formic acid in ac- etonitrile) to 25% buffer B at a flow rate of 400 nL/min. The peptides were eluted using a 180- minute gradient increasing from 95% buffer A (0.1% formic acid in water) and 5% buffer B (0.1% formic acid in acetonitrile) to 25% buffer B at a flow rate of 400 nL/min. Nano-LC mobile phase was introduced into the mass spectrometer using a Nan- ospray Source (Thermo Scientific, Waltham, MA). During peptide elu- tion, the heated capillary temperature was kept at 275°C and ion spray voltage was kept at 2.6 kV. The mass spectrometer method was oper- ated in positive ion mode for 180 minutes having a cycle time of 4 sec- onds for MS/MS acquisition. MS data was acquired using a data-de- pendent acquisition using a top speed method following the first survey MS scan. During MS1, using a wide quadrupole isolation, survey scans were obtained with an Orbitrap resolution of 120 k with vendor defined parameters―m/z scan range, 375-1500; maximum injection time, 50; AGC target, 4E5; micro scans, 1; RF Lens (%), 30; “DataType”, pro- file; Polarity, Positive with no source fragmentation and to include charge states 2 to 7 for fragmentation. Dynamic exclusion for fragmen- tation was kept at 60 seconds. During MS2, the following vendor de- fined parameters were assigned to isolate and fragment the selected precursor ions. Isolation mode = Quadrupole; Isolation Offset = Off; Isolation Window = 0.7; Multi-notch Isolation = False; Scan Range Mode = Auto Normal; FirstMass = 120; Activation Type = CID; Col- lision Energy (%) = 35; Activation Time = 10 ms; Activation Q = 0.25; Multistage Activation = False; Detector Type = IonTrap; Ion Trap Scan Rate = Turbo; Maximum Injection Time = 50 ms; AGC Target = 1E4; Microscans = 1; DataType = Centroid. During MS3, daughter ions se- lected from neutral losses (e.g. H2O or NH3) of precursor ion CID dur- ing MS2 were subjected to further fragmentation using higher-energy C-trap dissociation (HCD) to obtain TMT reporter ions and peptide specific fragment ions using following vendor defined parameters. Iso- lation Mode = Quadrupole; Isolation Window =2; Multi-notch Isola- tion = True; MS2 Isolation Window (m/z) = 2; Number of notches = 3; Collision Energy (%) = 65; Orbitrap Resolution = 50k; Scan Range (m/z) = 100- 500; Maximum Injection Time = 105 ms; AGC Target = 1E5; DataType = Centroid. The data were recorded using Thermo Sci- entific Xcalibur (4.1.31.9) software (Copyright 2017 Thermo Fisher Scientific Inc.). Protein Identification and Quantification Resulting RAW files were analyzed using Proteome DiscovererTM 2.4 (Thermo Scientific, Waltham, MA). The SEQUEST HT search engine was used to search against a yeast protein database from the UniProt sequence database (December 2015) containing 6,279 yeast protein and common contaminant sequences (FASTA file used available on Prote- omeXchange under accession PXD020689). Specific search parame- ters used were: trypsin as the proteolytic enzyme, peptides with a max of two missed cleavages, precursor mass tolerance of 10 ppm, and a fragment mass tolerance of 0.02 Da. Static modifications used for the search were, 1) carbamidomethylation on cysteine residues; 2) TMTsixplex label on lysine (K) residues and the N-termini of peptides. Dynamic modifications used for the search were oxidation of methio- nine and acetylation of N-termini. Percolator False Discovery Rate was set to a strict setting of 0.01. Values from both unique and razor pep- tides were used for quantification. No normalization setting was used for protein quantification since the different temperature treatments are expected to have different protein amounts. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consor- tium via the PRIDE43 partner repository with the dataset identifier PXD020689 and doi: 10.6019/PXD020689. Data analysis Venn Diagrams were created using Venny 2.144. Dot plots, scatter plots, and waterfall plots were created using ggplot245 in R Studio (R Studio for Mac, Version 1.2.5001). Bar graphs were created in Excel (Microsoft Excel for Mac, Version 16.38). The TPP package (v3.12.0)46 in R Studio was used to generate normalized melt curves and to determine protein melt temperatures as described previously26. Resulting data processing and analysis also occurred in R Studio. Change in Tm (ΔTm) values were calculated by taking WT Tm -ssu72-2 Tm, thereby limiting calculations to proteins detected in both WT and mutant. Further parsing was accomplished by limiting our data to melt curves with r2 values > 0.9 and then by proteins that were detected in at least two of the three replicates. Proteins were ranked according to median change in Tm and ordered from the largest change (proteins that were destabilized in the mutant) to smallest change (proteins that were stabilized in the mutant). Changes in Tm that were outside of ± 2𝝈 (𝝈 being the standard deviation), were considered statistically significant, and identified as proteins destabilized or stabilized due to the mutations in SSU72. RESULTS AND DISCUSSION Addition of an affinity purified isobaric trigger channel to mTPP multiplexes does not cause large changes in peptide coverage or quantitation Figure 1. Workflow overview for mTPP with isobaric trigger channel addition. Equal amounts of protein from each lysate for every biologi- cal replicate sample were subjected to different temperature treatments: 25°, 35°, 46.2°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C, to in- duce protein denaturation. The soluble fractions from each treatment as well as a Pta1-FLAG affinity purification sample were digested in-solu- tion with Trypsin/Lys-C. Resulting peptides were labeled with isobaric mass tags (TMT 10plex) as shown and mixed prior to mass spectrometry (MS) analysis. Resulting MS/MS data were analyzed using Proteome Dis- covererTM 2.4 to identify and quantify abundance levels of peptides for each temperature treatment and each biological replicate across geno- types. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ We hypothesized that incorporation of a well-characterized affinity pu- rified sample isolated from our system of interest as an isobaric trigger channel would increase MS1 ion intensity of peptides of interest within the TMT multiplex. As a consequence, the identification of peptides from the affinity purified protein complex would boost the identifica- tion in the remaining experimental mTPP channels used for melt curve production and subsequent Tm calculation when comparing different experimental samples. Similar to the approach used in BASIL17, the incorporation of an affinity purified CPF complex purified from our system of interest has numerous potential advantages including native levels of CPF post-translational modifications and protein interaction partners. Similar to mTPP, the affinity purifications for the CPF com- plex were performed using non-denaturing buffers to preserve PPIs. Qualitatively, the MS/MS fragment data for CPF complexes will be improved from inclusion of the isobaric trigger channel increasing the ion abundance of the fragments and therefore the probability of CPF identification at the peptide spectrum match (PSM) level. From a quan- titative perspective, TMT126 information will be obtained during data processing but will be excluded for interpretation of the mTPP melt curves for each protein. Pta1-3xFLAG affinity purifications were digested with LysC/Trypsin and labeled with TMT126 for inclusion within the mTPP multiplex. mTPP quantitative analysis and curve generation was performed using the remaining channels as described in the methods (Fig. 1). The mTPP samples were subjected to eight or nine different temperatures (25°, 35°, 46.2°, 48.8°, 51.2°, 53.2°, 55.2°, 56.5°, and 74.9°C) and then cen- trifuged to separate soluble and insoluble material as previously de- scribed 28. For samples with eight temperature points no 46.2° treatment sample was included. Samples were then processed and subjected to LC-MS/MS analysis using an MS2-based fragmentation and TMT quantitation workflow (Fig. 1). Using SEQUEST HT and Proteome Discoverer 2.4 for qualitative and quantitative analysis, between 1,750 and 3,150 proteins were detected and quantified depending on the rep- licate (Supp. Tab. 1). Replicates are designated as preparation 1, 2, 3 (hence p1, p2, p3). The p1 replicate had less IDs overall but p2 and p3 had very similar peptide detection levels (Supp. Tab. 1). To gain in- sights into general trends with the quantitative data, dot plots were gen- erated to show the abundance value for each quantified protein (Fig. 2). Consistent with previous mTPP experiments28, there was an overall de- crease in protein abundance as the temperature at which the sample was treated increased. Importantly, incorporation of a protein complex iso- baric trigger channel into the multiplex did not alter the overall trend of decreasing protein abundance with increased temperature (Figure 2B&D) or have a significant effect on the number of proteins detected. The average ion abundance at each temperature treatment also re- mained consistent between samples plus or minus the isobaric trigger channel (compare Figure 2A to B and C to D). Finally, the average quantitative ratio of the isobaric trigger channel to the mTPP experi- mental sample processed at 25°C remains consistent at a 1:50 (Figure 2B) or 1:8 (Figure 2D) mirroring the ratios used for mixing of the mul- tiplex. The impact of the trigger on mTPP analysis was investigated using both technical replicates and biological replicates so that we could evaluate differences in our workflow and their impact on qualitative and quan- titative parameters. For the technical replicates, the same labeled sam- ples were split into two TMT multiplexes; one multiplex without an isobaric CPF trigger (no trigger) and one multiplex with an isobaric CPF trigger labeled with TMT126 (trigger) with a quantitative ratio (based on protein assays) to lowest temperature treatment of ~1:50. For the biological replicates, four biological replicate samples were grown and prepared independently of one another. One replicate contained a non-heat treated (untreated) sample that was labeled with TMT126 (no trigger sample) and the remaining three replicates were multiplexes with a CPF trigger labeled with TMT126 (trigger) with a trigger to low- est temperature treatment ratio of ~1:8. While there was not an obvious effect on the overall abundance of pro- teins in the samples, it is possible that the trigger could affect the de- tection and identification of proteins by biasing the mass spectrometer towards proteins present in the affinity purification. Comparisons of MS-based measurements across the technical replicates showed that the trigger channel incorporation did not have a significant impact on protein identification and quantification (Fig. 3A). Technical replicate analyses showed very similar numbers of detected PSMs, peptides, and proteins suggesting that the addition of the trigger channel at a ratio of 1:50 has little impact on overall LC-MS/MS detection (Fig. 3A, yel- low). The biological replicates showed more variation across samples which is attributed to their separate processing for TPP in addition to variation that could occur from trypsin digestion and other processing steps 47, 48. Trigger p1 in the biological replicate study did have overall lower levels of proteins detected but this was not likely a consequence of trigger channel addition considering that Trigger p2 and Trigger p3 samples had similar detection levels to the No trigger sample (Fig. 3A, green). Direct comparison of proteins quantified in the No trigger vs. Trigger samples showed an 80% overlap in quantified proteins with unique proteins present in all individual datasets (Figure 3B&C). Over- all, these data suggest that the addition of an isobaric trigger channel Figure 2. The use of an isobaric trigger channel does not alter mTPP experimental channel abundance values. Dot plots of protein abundance values for each protein detected in WT cells in technical replicates without (A) and with (B) the isobaric trigger channel (trigger) addition and repre- sentative biological replicates without (C) and with (D) the isobaric trigger addition. The same general decrease of protein abundances with increase in temperature treatment is seen across all replicates. Dot plots for addi- tional replicates are provided in Supp. Fig. 1. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ has little to no impact on overall proteome detection outside of the in- herent variability seen in independent sample processing (for the bio- logical replicates) and LC-MS/MS runs. A critical feature of mTPP analysis is the ability to accurately calculate melt temperature (Tm) from the resulting melt curves. To ensure that incorporation of the trigger did not have major impacts on Tm calcula- tion of proteins outside of the CPF complex, we performed Pearson correlation analysis of the Tms of proteins detected in both the no trig- ger and trigger samples (Figure 3D, Tm data from the TPP package in Supp. Tab. 2). From these we can see a high degree of correlation of 0.82 between the no trigger and trigger samples for proteins which met the criteria for quantitation in our mTPP data analysis workflow (in- cluding the number of proteins with melt curves having an r2 greater than or equal to 0.9). Additionally, even across biological replicates, there is a strong positive correlation of 0.72 between Tm calculations in the no trigger vs. trigger samples (Figure 3E, Tm data from the TPP package in Supp. Tab. 2). The ability to make comparisons using bio- logical replicate data would be beneficial in settings with limiting sam- ples where technical replicates may not be feasible in addition to their importance for rigorous statistical analysis. An isobaric trigger channel facilitates mTPP analysis of the Cleav- age and Polyadenylation Factor Complex CPF and its accessory factors cleavage factor IA and IB play major roles in RNA processing. CPF is responsible for efficient and specific cleavage and polyadenylation of messenger RNAs 49, 50 and has been shown to have important roles in termination of RNA Polymerase II transcription51, 52. The CPF complex is currently described as having 14 subunits (Figure 4A) which provide the complex with numerous activ- ities including endonuclease, polyadenylation, and phosphatase func- tions53. Ssu72, which is mutated in the ssu72-2 yeast strain, is an inte- gral subunit of CPF (Fig. 4A, indicated with a star). Performing mTPP according to the established protocol28 resulted in limited detection of CPF (Figure 4C-F, no trigger samples shown in dark/light gray). One notable exception to the low detection of CPF was the subunit Glc7. Along with its presence in CPF, Glc7 is also the catalytic subunit of PP154 and thereby functions in many other protein complexes in eukar- yotic cells (reviewed in55, 56) where it plays roles in cell cycle regulation and nutrient regulation54, 57, 58. Due to these many roles, Glc7 has a higher global abundance than other CPF subunits and is thereby more readily detected. Previously performed experiments found that the entire CPF complex copurifies with FLAG-tagged Pta135. In theory, addition of an affinity purified CPF sample to one channel of the TMT multiplex would in- crease the MS1 ion intensity of CPF subunits and would “trigger” the mass spectrometer to pick peptides from CPF complex subunits more often in a DDA analysis than in samples that lack an isobaric trigger. We have previously shown that PSM level detection of affinity purified protein complexes results in highly reproducible quantitation of protein complexes in label-free quantitation workflows 38, 39. This prior work found that RNA Polymerase II complex digestions result in the gener- ation of a number of highly detectable peptides and it is likely that this would also be the case for CPF affinity purifications 39. If these findings hold true, there should be a significant overlap in unique peptide iden- tifications across the independent LC-MS/MS runs for biological rep- licates. As shown in Fig. 4B, a significant overlap of unique peptides from CPF complex subunits were identified across the three biological replicates containing the isobaric CPF trigger (peptide data provided in Supp. Tab. 4). Due to the lower overall protein levels in the Trigger p1 sample, a higher level of unique peptide overlap was also observed be- tween Trigger p2 and p3 than was observed between p1/p3 or p1/p2 (Fig. 4B). From an individual subunit perspective, incorporation of the isobaric Pta1-FLAG trigger channel significantly increased identifica- tion of most CPF subunits substantially (Figure 3C-F, colored sam- ples). While similar levels of Glc7 were detected across all samples, detection of other complex members was improved significantly in the presence of the isobaric CPF trigger channel. In fact, some CPF subu- nits that were previously not detected in no trigger samples (such as Cft1, Cft2, and Pfs2) were detected by hundreds of PSMs by utilizing the isobaric CPF trigger channel (Fig. 4C & D). The increased level of PSM detection was accompanied by increased normalized ion abun- dance (Fig. 4E & F). Overall, this data supports that we can specifically increase reproducible detection and quantitation of proteins of interest for thermal profiling experiments using an isobaric affinity purified trigger channel. Mutations in ssu72-2 do not impact the thermal stability of the CPF protein complex The CPF complex contains two protein phosphatases, Glc7 and Ssu72. Ssu72 is an integral component of CPF and its function is required for proper termination and 3’-end processing of RNAs 59-63. Additionally, its interactions with TFIIB have shown to be critical for the formation of gene loops, which regulate gene expression by linking transcription termination and initiation factors 64-67. Much of the characterization of Ssu72 has been accomplished through studies using the ssu72-2 mutant Figure 3. Dataset comparisons from isobaric trigger channel addition. A) Summary of LC-MS/MS data in technical and biological replicates with and without isobaric trigger channel addition. Venn diagrams com- paring quantified proteins in no trigger (gray) vs. trig- ger (yellow/green) in B) technical replicates and C) bi- ological replicate using trigger p2. Correlation plot of the calculated Tms in no trigger vs. trigger in D) tech- nical replicates and E) biological replicates. The blue line represents the linear fit of the data. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ yeast strain 41, 59, 65, 68. The ssu72-2 TS mutant contains a single mutation, R129A, that confers temperature sensitivity at 37°C. This mutation im- pairs the catalytic activity of Ssu72, leading to a decrease in transcrip- tion elongation efficiency 41, 68 and defects in gene looping 65, 67. Whether the disrupted phosphatase function in the ssu72-2 mutant af- fects the thermal stability of Ssu72 or the CPF complex had not been previously examined. Detection of CPF with and without the trigger channel resulted in sim- ilar numbers of CPF subunits PSMs in ssu72-2 as in WT which facili- tates mTPP analysis of CPF complex thermal stability from a quantita- tive perspective (Fig. 4C&D). Protein melt curve analysis using the TPP R package (Fig. 5A, mTPP result data in Supp.Tab. 3) showed no obvious changes in any of the 14 CPF subunits in ssu72-2 relative to WT. Using all biological replicate data, we can define statistically sig- nificant changes in protein thermal stability as any ΔTm which falls at least two standard deviations above or below the average ΔTm across the three ssu72-2 replicates relative to WT. Whole proteome analysis of ΔTm using mTPP found statistically significant decreases in the ther- mal stability of 59 proteins and increases in the thermal stability of 69 proteins in ssu72-2 cells (Fig. 5B, Supp. Tab. 5). GO term analysis 69 of proteins that had a significant change in thermal stability in ssu72-2 showed a 2.40-fold enrichment in proteins involved in nucleobase-con- taining compound biosynthetic process with a p-value of 4.14e-5. These results suggest that the defects in transcription caused by disrupted cat- alytic activity of Ssu72 in this mutant strain are not due to impacts on the stability of Ssu72 or CPF. However, secondary effects of ssu72-2 functional disruption have been associated with changes in the Nrd1- Nab3-Sen1 complex activity which impact a variety of processes in- cluding GTP production 63, 70, 71. The temperature sensitivity of this strain is instead likely to be a result of a need for efficient transcription at higher temperatures in order to respond to heat stress72, 73. A deeper investigation into the proteins with changes in thermal stability will help to further elucidate the impacts of this catalytic mutant on gene expression. CONCLUSIONS The integration of an isobaric affinity-purified protein complex trigger channel increased our ability to analyze the low abundance protein complex CPF via mTPP. Our analysis did not observe major effects on the Tm estimates of unrelated proteins present in the cell. Protocols for affinity purification would need to be optimized for purity and speci- ficity for optimal use as an isobaric trigger channel. However, since protein complex digestion results in detection of a highly reproducible peptide population, a reasonable alternative approach could include use of a population of purified synthetic peptides or digested recombinant proteins. The use of natively expressed purifications from the system of interest, however, has distinct advantages such as: native protein pro- cessing, post-translational modifications, and protein interaction part- ners. Use of isobaric purified protein complex trigger channels in TPP stud- ies, and potentially other global proteomics applications, will improve the ability to perform proteomic analysis of low abundance protein complexes and measure systems-level perturbations due to genetic var- iation(s). The potential for this method to be used across different or- ganisms, even those that are difficult to get large amounts of protein from, is further supported by the adaptation of BASIL for single-cell phosphoproteomics21. As many biologically relevant, as well as dis- ease relevant, protein complexes are of relatively low abundance in the Figure 4. Peptide detection and quantitation for subunits of the Cleavage and Polyadenylation Factor Complex present in the Pta1-FLAG isobaric trigger channel. A) Model of CPF adapted from Casañal et al 2017. The red star denotes the mutant protein used in these studies, ssu72-2; the white square denotes the FLAG-tagged subunit used for the trigger channel affinity purification, Pta1. B) Venn diagram showing the unique peptides detected for CPF subunits across each WT biological replicate. Number of PSMs for CPF subunits in each C) WT and D) ssu72-2 replicate experiment. Ion abun- danace for CPF subunits normalized to abundance of Pgk1 (x1000) in each E) WT and F) ssu72-2 replicate experiment. Figure 5. Effects of ssu72-2 on CPF complex stability and the global proteome A) mTPP normalized CPF subunit melt curves. Plots for each of the CPF subunits normalized by the TPP package for a representative rep- licate, Trigger p2. Curves shown in gray are WT and turquoise are ssu72- 2. Each line represents one of the 14 CPF subunits. Replicates for A are provided in Supp. Fig. 4. B) Waterfall plots visualizing whole proteome changes in melt temperature (Tm), WT- ssu72-2. A total of 2,180 proteins were ordered according to change in Tm and plotted. Shown are median values for proteins that were quantified in at least two replicates. Dotted lines signify a confidence interval of 95%. There were significant decreases in thermal stability of 59 proteins and significant increases in thermal sta- bility of 69 proteins. Change in Tm and median values provided in Supp. Tab. 5. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ cell74, improvements in the reproducible detection of such proteins in proteomics experiments would be beneficial to increasing our under- standing of the critical cellular mechanisms in normal and disease states. Supplementary Material The supplementary material is available as a PDF and associated XLS tables. AUTHOR INFORMATION Corresponding Author *E-mail: almosley@iu.edu Telephone: (317) 278-2350; Fax: (317) 274-4686 ORCID Sarah A. Peck Justice: 0000-0002-0658-732X Neil A. McCracken: 0000-0003-4897-929X José F. Victorino: 0000-0002-5922-9526 Aruna B. Wijeratne: 0000-0001-8366-2074 Amber L. Mosley: 0000-0001-5822-2894 Present Addresses †Department of Biology, Taylor University, Upland, Indiana, 46989, United States ‡Translational Genomics Research Institute, Phoenix, Arizona, 85005, United States Author Contributions S.A.P.J.: designed and performed mTPP experiments on biologi- cal replicates, analyzed data, prepared the figures, and wrote the manuscript. N.A.M. performed technical replicate mTPP experi- ments and contributed to the manuscript. J.F.V. affinity purified CPF and confirmed purification via AP-MS (data shown else- where). ABW: contributed to the design of experiments. A.L.M.: Oversaw various aspects of the project and provided funding for the project, provided direction on data analysis and figure prepa- ration, and wrote the manuscript. The manuscript was written through contributions of all authors. All authors have given ap- proval to the final version of the manuscript. Notes The authors declare no competing financial interests. ACKNOWLEDGMENTS We would like to thank the current members of the Mosley lab: Whit- ney Smith-Kinnaman, Katlyn Hughes Burriss, Lynn Bedard, Dominique Baldwin, H.R. Sagara Wijeratne, Gitanjali Roy, and the IUSM proteomics core: Emma Doud and Guihong Qi. A portion of the funding for this project was provided by National In- stitute of Health T32 HL007910 (SAPJ) and by the Showalter Research Trust (ALM). NAM was supported in part by the Indiana University Diabetes and Obesity Research Training Program, DeVault Fellow- ship. This project was supported, in part, with support from the Indiana Clinical and Translational Sciences Institute which is funded by Award Number UL1TR002529 from the National Institutes of Health, Na- tional Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. Acquisition of the IUSM Proteomics core instrumentation used for this project was provided by the Indiana University Precision Health Initiative. Some of the TMT reagents were graciously provided via the Thermo Scientific TMT Research Award (SAPJ). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. REFERENCES 1. Sahni, N.; Yi, S.; Taipale, M.; Fuxman Bass, J. I.; Coulombe- Huntington, J.; Yang, F.; Peng, J.; Weile, J.; Karras, G. I.; Wang, Y.; Kovacs, I. A.; Kamburov, A.; Krykbaeva, I.; Lam, M. H.; Tucker, G.; Khurana, V.; Sharma, A.; Liu, Y. Y.; Yachie, N.; Zhong, Q.; Shen, Y.; Palagi, A.; San-Miguel, A.; Fan, C.; Balcha, D.; Dricot, A.; Jordan, D. M.; Walsh, J. M.; Shah, A. A.; Yang, X.; Stoyanova, A. K.; Leighton, A.; Calderwood, M. A.; Jacob, Y.; Cusick, M. E.; Salehi-Ashtiani, K.; Whitesell, L. J.; Sunyaev, S.; Berger, B.; Barabasi, A. L.; Charloteaux, B.; Hill, D. E.; Hao, T.; Roth, F. P.; Xia, Y.; Walhout, A. J. M.; Lindquist, S.; Vidal, M., Widespread macromolecular interaction perturbations in human genetic disorders. Cell 2015, 161 (3), 647-660. 2. Huttlin, E. L.; Bruckner, R. J.; Paulo, J. A.; Cannon, J. R.; Ting, L.; Baltier, K.; Colby, G.; Gebreab, F.; Gygi, M. P.; Parzen, H.; Szpyt, J.; Tam, S.; Zarraga, G.; Pontano-Vaites, L.; Swarup, S.; White, A. E.; Schweppe, D. K.; Rad, R.; Erickson, B. K.; Obar, R. A.; Guruharsha, K. G.; Li, K.; Artavanis-Tsakonas, S.; Gygi, S. P.; Harper, J. W., Architecture of the human interactome defines protein communities and disease networks. Nature 2017, 545 (7655), 505-509. 3. Chick, J. M.; Munger, S. C.; Simecek, P.; Huttlin, E. L.; Choi, K.; Gatti, D. M.; Raghupathy, N.; Svenson, K. L.; Churchill, G. A.; Gygi, S. P., Defining the consequences of genetic variation on a proteome-wide scale. Nature 2016, 534 (7608), 500-5. 4. Gavin, A. C.; Bosche, M.; Krause, R.; Grandi, P.; Marzioch, M.; Bauer, A.; Schultz, J.; Rick, J. M.; Michon, A. M.; Cruciat, C. M.; Remor, M.; Hofert, C.; Schelder, M.; Brajenovic, M.; Ruffner, H.; Merino, A.; Klein, K.; Hudak, M.; Dickson, D.; Rudi, T.; Gnau, V.; Bauch, A.; Bastuck, S.; Huhse, B.; Leutwein, C.; Heurtier, M. A.; Copley, R. R.; Edelmann, A.; Querfurth, E.; Rybin, V.; Drewes, G.; Raida, M.; Bouwmeester, T.; Bork, P.; Seraphin, B.; Kuster, B.; Neubauer, G.; Superti-Furga, G., Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415 (6868), 141-7. 5. Lambert, J. P.; Ivosev, G.; Couzens, A. L.; Larsen, B.; Taipale, M.; Lin, Z. Y.; Zhong, Q.; Lindquist, S.; Vidal, M.; Aebersold, R.; Pawson, T.; Bonner, R.; Tate, S.; Gingras, A. C., Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. Nat Methods 2013, 10 (12), 1239-45. 6. Go, C. D.; Knight, J. D. R.; Rajasekharan, A.; Rathod, B.; Hesketh, G. G.; Abe, K. T.; Youn, J.-Y.; Samavarchi-Tehrani, P.; Zhang, H.; Zhu, L. Y.; Popiel, E.; Lambert, J.-P.; Coyaud, É.; Cheung, S. W. T.; Rajendran, D.; Wong, C. J.; Antonicka, H.; Pelletier, L.; Raught, B.; Palazzo, A. F.; Shoubridge, E. A.; Gingras, A.-C., A proximity biotinylation map of a human cell. bioRxiv 2019. 7. Rolland, T.; Tasan, M.; Charloteaux, B.; Pevzner, S. J.; Zhong, Q.; Sahni, N.; Yi, S.; Lemmens, I.; Fontanillo, C.; Mosca, R.; Kamburov, A.; Ghiassian, S. D.; Yang, X.; Ghamsari, L.; Balcha, D.; Begg, B. E.; Braun, P.; Brehme, M.; Broly, M. P.; Carvunis, A. R.; Convery-Zupan, D.; Corominas, R.; Coulombe-Huntington, J.; Dann, E.; Dreze, M.; Dricot, A.; Fan, C.; Franzosa, E.; Gebreab, F.; Gutierrez, B. J.; Hardy, M. F.; Jin, M.; Kang, S.; Kiros, R.; Lin, G. N.; Luck, K.; MacWilliams, A.; Menche, J.; Murray, R. R.; Palagi, A.; Poulin, M. M.; Rambout, X.; Rasla, J.; Reichert, P.; Romero, V.; Ruyssinck, E.; Sahalie, J. M.; Scholz, A.; Shah, A. A.; Sharma, A.; Shen, Y.; Spirohn, K.; Tam, S.; Tejeda, A. O.; Trigg, S. A.; Twizere, J. C.; Vega, K.; Walsh, J.; Cusick, M. E.; Xia, Y.; Barabasi, A. L.; Iakoucheva, L. M.; Aloy, P.; De Las Rivas, J.; Tavernier, J.; Calderwood, M. A.; Hill, D. E.; Hao, T.; Roth, F. P.; Vidal, M., A proteome-scale map of the human interactome network. Cell 2014, 159 (5), 1212-1226. 8. Aebersold, R.; Mann, M., Mass-spectrometric exploration of proteome structure and function. Nature 2016, 537 (7620), 347-55. 9. Altelaar, A. F.; Munoz, J.; Heck, A. J., Next-generation proteomics: towards an integrative view of proteome dynamics. Nat Rev Genet 2013, 14 (1), 35-48. 10. Meier, F.; Geyer, P. E.; Virreira Winter, S.; Cox, J.; Mann, M., BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes. Nature Methods 2018, 15 (6), 440-448. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11. Potel, C. M.; Lin, M.-H.; Heck, A. J. R.; Lemeer, S., Defeating Major Contaminants in Fe3+- Immobilized Metal Ion Affinity Chromatography (IMAC) Phosphopeptide Enrichment. Molecular & Cellular Proteomics 2018, 17 (5), 1028-1034. 12. Humphrey, S. J.; Azimifar, S. B.; Mann, M., High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics. Nature Biotechnology 2015, 33 (9), 990-995. 13. Specht, H.; Slavov, N., Optimizing Accuracy and Depth of Protein Quantification in Experiments Using Isobaric Carriers. J Proteome Res 2020. 14. Slavov, N., Single-cell protein analysis by mass spectrometry. Curr Opin Chem Biol 2020, 60, 1-9. 15. Zhu, Y.; Scheibinger, M.; Ellwanger, D. C.; Krey, J. F.; Choi, D.; Kelly, R. T.; Heller, S.; Barr-Gillespie, P. G., Single-cell proteomics reveals changes in expression during hair-cell development. Elife 2019, 8. 16. Budnik, B.; Levy, E.; Harmange, G.; Slavov, N., SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol 2018, 19 (1), 161. 17. Yi, L.; Tsai, C. F.; Dirice, E.; Swensen, A. C.; Chen, J.; Shi, T.; Gritsenko, M. A.; Chu, R. K.; Piehowski, P. D.; Smith, R. D.; Rodland, K. D.; Atkinson, M. A.; Mathews, C. E.; Kulkarni, R. N.; Liu, T.; Qian, W. J., Boosting to Amplify Signal with Isobaric Labeling (BASIL) Strategy for Comprehensive Quantitative Phosphoproteomic Characterization of Small Populations of Cells. Anal Chem 2019, 91 (9), 5794-5801. 18. McAlister, G. C.; Huttlin, E. L.; Haas, W.; Ting, L.; Jedrychowski, M. P.; Rogers, J. C.; Kuhn, K.; Pike, I.; Grothe, R. A.; Blethrow, J. D.; Gygi, S. P., Increasing the multiplexing capacity of TMTs using reporter ion isotopologues with isobaric masses. Anal Chem 2012, 84 (17), 7469-78. 19. Thompson, A.; Schafer, J.; Kuhn, K.; Kienle, S.; Schwarz, J.; Schmidt, G.; Neumann, T.; Johnstone, R.; Mohammed, A. K.; Hamon, C., Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 2003, 75 (8), 1895-904. 20. Thompson, A.; Wolmer, N.; Koncarevic, S.; Selzer, S.; Bohm, G.; Legner, H.; Schmid, P.; Kienle, S.; Penning, P.; Hohle, C.; Berfelde, A.; Martinez-Pinna, R.; Farztdinov, V.; Jung, S.; Kuhn, K.; Pike, I., TMTpro: Design, Synthesis, and Initial Evaluation of a Proline-Based Isobaric 16-Plex Tandem Mass Tag Reagent Set. Anal Chem 2019, 91 (24), 15941-15950. 21. Tsai, C. F.; Zhao, R.; Williams, S. M.; Moore, R. J.; Schultz, K.; Chrisler, W. B.; Pasa-Tolic, L.; Rodland, K. D.; Smith, R. D.; Shi, T.; Zhu, Y.; Liu, T., An Improved Boosting to Amplify Signal with Isobaric Labeling (iBASIL) Strategy for Precise Quantitative Single-cell Proteomics. Mol Cell Proteomics 2020, 19 (5), 828-838. 22. Chua, X. Y.; Mensah, T.; Aballo, T. J.; Mackintosh, S. G.; Edmondson, R. D.; Salomon, A. R., Tandem Mass Tag approach utilizing pervanadate BOOST channels delivers deeper quantitative characterization of the tyrosine phosphoproteome. Mol Cell Proteomics 2020, mcp.TIR119.0018. 23. Klann, K.; Tascher, G.; Munch, C., Functional Translatome Proteomics Reveal Converging and Dose-Dependent Regulation by mTORC1 and eIF2alpha. Mol Cell 2020, 77 (4), 913-925 e4. 24. Yamamoto, W. R.; Bone, R. N.; Sohn, P.; Syed, F.; Reissaus, C. A.; Mosley, A. L.; Wijeratne, A. B.; True, J. D.; Tong, X.; Kono, T.; Evans-Molina, C., Endoplasmic reticulum stress alters ryanodine receptor function in the murine pancreatic beta cell. J Biol Chem 2019, 294 (1), 168- 181. 25. Savitski, M. M.; Reinhard, F. B.; Franken, H.; Werner, T.; Savitski, M. F.; Eberhard, D.; Martinez Molina, D.; Jafari, R.; Dovega, R. B.; Klaeger, S.; Kuster, B.; Nordlund, P.; Bantscheff, M.; Drewes, G., Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 2014, 346 (6205), 1255784. 26. Franken, H.; Mathieson, T.; Childs, D.; Sweetman, G. M.; Werner, T.; Togel, I.; Doce, C.; Gade, S.; Bantscheff, M.; Drewes, G.; Reinhard, F. B.; Huber, W.; Savitski, M. M., Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nat Protoc 2015, 10 (10), 1567-93. 27. Mateus, A.; Kurzawa, N.; Becher, I.; Sridharan, S.; Helm, D.; Stein, F.; Typas, A.; Savitski, M. M., Thermal proteome profiling for interrogating protein interactions. Mol Syst Biol 2020, 16 (3), e9232. 28. Peck Justice, S. A.; Barron, M. P.; Qi, G. D.; Wijeratne, H. R. S.; Victorino, J. F.; Simpson, E. R.; Vilseck, J. Z.; Wijeratne, A. B.; Mosley, A. L., Mutant thermal proteome profiling for characterization of missense protein variants and their associated phenotypes within the proteome. J Biol Chem 2020. 29. Batth, T. S.; Francavilla, C.; Olsen, J. V., Off-line high-pH reversed-phase fractionation for in-depth phosphoproteomics. J Proteome Res 2014, 13 (12), 6176-86. 30. Wang, Y.; Yang, F.; Gritsenko, M. A.; Wang, Y.; Clauss, T.; Liu, T.; Shen, Y.; Monroe, M. E.; Lopez-Ferrer, D.; Reno, T.; Moore, R. J.; Klemke, R. L.; Camp, D. G., 2nd; Smith, R. D., Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 2011, 11 (10), 2019-26. 31. Mertins, P.; Tang, L. C.; Krug, K.; Clark, D. J.; Gritsenko, M. A.; Chen, L.; Clauser, K. R.; Clauss, T. R.; Shah, P.; Gillette, M. A.; Petyuk, V. A.; Thomas, S. N.; Mani, D. R.; Mundt, F.; Moore, R. J.; Hu, Y.; Zhao, R.; Schnaubelt, M.; Keshishian, H.; Monroe, M. E.; Zhang, Z.; Udeshi, N. D.; Mani, D.; Davies, S. R.; Townsend, R. R.; Chan, D. W.; Smith, R. D.; Zhang, H.; Liu, T.; Carr, S. A., Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography-mass spectrometry. Nat Protoc 2018, 13 (7), 1632-1661. 32. Hogrebe, A.; von Stechow, L.; Bekker-Jensen, D. B.; Weinert, B. T.; Kelstrup, C. D.; Olsen, J. V., Benchmarking common quantification strategies for large-scale phosphoproteomics. Nat Commun 2018, 9 (1), 1045. 33. Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C., Orthogonality of separation in two-dimensional liquid chromatography. Anal Chem 2005, 77 (19), 6426-34. 34. Ludwig, K. R.; Schroll, M. M.; Hummon, A. B., Comparison of In-Solution, FASP, and S-Trap Based Digestion Methods for Bottom-Up Proteomic Studies. J Proteome Res 2018, 17 (7), 2480-2490. 35. Victorino, J. F.; Fox, M. J.; Smith-Kinnaman, W. R.; Peck Justice, S. A.; Burriss, K. H.; Boyd, A. K.; Zimmerly, M. A.; Chan, R. R.; Hunter, G. O.; Liu, Y.; Mosley, A. L., RNA Polymerase II CTD phosphatase Rtr1 fine-tunes transcription termination. PLoS Genet 2020, 16 (3), e1008317. 36. Bedard, L. G.; Dronamraju, R.; Kerschner, J. L.; Hunter, G. O.; Axley, E. D.; Boyd, A. K.; Strahl, B. D.; Mosley, A. L., Quantitative Analysis of Dynamic Protein Interactions during Transcription Reveals a Role for Casein Kinase II in Polymerase-associated Factor (PAF) Complex Phosphorylation and Regulation of Histone H2B Monoubiquitylation. J Biol Chem 2016, 291 (26), 13410-20. 37. Smith-Kinnaman, W. R.; Berna, M. J.; Hunter, G. O.; True, J. D.; Hsu, P.; Cabello, G. I.; Fox, M. J.; Varani, G.; Mosley, A. L., The interactome of the atypical phosphatase Rtr1 in Saccharomyces cerevisiae. Mol Biosyst 2014, 10 (7), 1730-41. 38. Mosley, A. L.; Hunter, G. O.; Sardiu, M. E.; Smolle, M.; Workman, J. L.; Florens, L.; Washburn, M. P., Quantitative proteomics demonstrates that the RNA polymerase II subunits Rpb4 and Rpb7 dissociate during transcriptional elongation. Mol Cell Proteomics 2013, 12 (6), 1530-8. 39. Mosley, A. L.; Sardiu, M. E.; Pattenden, S. G.; Workman, J. L.; Florens, L.; Washburn, M. P., Highly reproducible label free quantitative proteomic analysis of RNA polymerase complexes. Mol Cell Proteomics 2011, 10 (2), M110 000687. 40. McGinty, R. J.; Puleo, F.; Aksenova, A. Y.; Hisey, J. A.; Shishkin, A. A.; Pearson, E. L.; Wang, E. T.; Housman, D. E.; Moore, C.; Mirkin, S. M., A Defective mRNA Cleavage and Polyadenylation Complex Facilitates Expansions of Transcribed (GAA)n Repeats Associated with Friedreich's Ataxia. Cell Rep 2017, 20 (10), 2490-2500. 41. Pappas, D. L.; Hampsey, M., Functional Interaction between Ssu72 and the Rpb2 Subunit of RNA Polymerase II in Saccharomyces cerevisiae. 2000, 20 (22), 8343-8351. 42. Funakoshi, M.; Hochstrasser, M., Small epitope-linker modules for PCR-based C-terminal tagging inSaccharomyces cerevisiae. Yeast 2009, 26 (3), 185-192. 43. Perez-Riverol, Y.; Csordas, A.; Bai, J.; Bernal-Llinares, M.; Hewapathirana, S.; Kundu, D. J.; Inuganti, A.; Griss, J.; Mayer, G.; .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ Eisenacher, M.; Pérez, E.; Uszkoreit, J.; Pfeuffer, J.; Sachsenberg, T.; Yılmaz, Ş.; Tiwary, S.; Cox, J.; Audain, E.; Walzer, M.; Jarnuczak, A. F.; Ternent, T.; Brazma, A.; Vizcaíno, J. A., The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Research 2019, 47 (D1), D442-D450. 44. Oliveros, J. C., Venny. An interactive tool for comparing lists with Venn's diagrams. 2007-2015. 45. Wickham, H. ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag New York: 2016. 46. Childs D, K. N., Franken H, Doce C, Savitski M, Huber W TPP: Analyze thermal proteome profiling (TPP) experiments, 3.10.0; 2018. 47. Walmsley, S. J.; Rudnick, P. A.; Liang, Y.; Dong, Q.; Stein, S. E.; Nesvizhskii, A. I., Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J Proteome Res 2013, 12 (12), 5666-80. 48. Burkhart, J. M.; Schumbrutzki, C.; Wortelkamp, S.; Sickmann, A.; Zahedi, R. P., Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MS-based proteomics. J Proteomics 2012, 75 (4), 1454-62. 49. Chen, J.; Moore, C., Separation of factors required for cleavage and polyadenylation of yeast pre-mRNA. 1992, 12 (8), 3470-3481. 50. Kessler, M. M.; Zhao, J.; Moore, C. L., Purification of the Saccharomyces cerevisiae cleavage/polyadenylation factor I. Separation into two components that are required for both cleavage and polyadenylation of mRNA 3' ends. J Biol Chem 1996, 271 (43), 27167-75. 51. Proudfoot, N. J., Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science 2016, 352 (6291), aad9926. 52. Eaton, J. D.; Davidson, L.; Bauer, D. L. V.; Natsume, T.; Kanemaki, M. T.; West, S., Xrn2 accelerates termination by RNA polymerase II, which is underpinned by CPSF73 activity. Genes Dev 2018, 32 (2), 127-139. 53. Casanal, A.; Kumar, A.; Hill, C. H.; Easter, A. D.; Emsley, P.; Degliesposti, G.; Gordiyenko, Y.; Santhanam, B.; Wolf, J.; Wiederhold, K.; Dornan, G. L.; Skehel, M.; Robinson, C. V.; Passmore, L. A., Architecture of eukaryotic mRNA 3'-end processing machinery. Science 2017, 358 (6366), 1056-1059. 54. Feng, Z. H.; Wilson, S. E.; Peng, Z. Y.; Schlender, K. K.; Reimann, E. M.; Trumbly, R. J., The Yeast Glc7-Gene Required for Glycogen Accumulation Encodes a Type-1 Protein Phosphatase. Journal of Biological Chemistry 1991, 266 (35), 23796-23801. 55. Martín, R.; Stonyte, V.; Lopez-Aviles, S., Protein Phosphatases in G1 Regulation. International Journal of Molecular Sciences 2020, 21 (2), 395. 56. Moura, M.; Conde, C., Phosphatases in Mitosis: Roles and Regulation. Biomolecules 2019, 9 (2), 55. 57. Tu, J.; Carlson, M., The GLC7 type 1 protein phosphatase is required for glucose repression in Saccharomyces cerevisiae. Mol Cell Biol 1994, 14 (10), 6789-96. 58. Ramaswamy, N. T.; Li, L.; Khalil, M.; Cannon, J. F., Regulation of yeast glycogen metabolism and sporulation by Glc7p protein phosphatase. Genetics 1998, 149 (1), 57-72. 59. Dichtl, B.; Blank, D.; Ohnacker, M.; Friedlein, A.; Roeder, D.; Langen, H.; Keller, W., A Role for SSU72 in Balancing RNA Polymerase II Transcription Elongation and Termination. Molecular Cell 2002, 10 (5), 1139-1150. 60. Nedea, E.; He, X.; Kim, M.; Pootoolal, J.; Zhong, G.; Canadien, V.; Hughes, T.; Buratowski, S.; Moore, C. L.; Greenblatt, J., Organization and Function of APT, a Subcomplex of the Yeast Cleavage and Polyadenylation Factor Involved in the Formation of mRNA and Small Nucleolar RNA 3'-Ends. 2003, 278 (35), 33000-33010. 61. He, X.; Khan, A. U.; Cheng, H.; Pappas, D. L., Jr.; Hampsey, M.; Moore, C. L., Functional interactions between the transcription and mRNA 3' end processing machineries mediated by Ssu72 and Sub1. Genes Dev 2003, 17 (8), 1030-42. 62. Steinmetz, E. J.; Brow, D. A., Ssu72 Protein Mediates Both Poly(A)-Coupled and Poly(A)-Independent Termination of RNA Polymerase II Transcription. 2003, 23 (18), 6339-6349. 63. Zhang, D. W.; Mosley, A. L.; Ramisetty, S. R.; Rodriguez- Molina, J. B.; Washburn, M. P.; Ansari, A. Z., Ssu72 phosphatase- dependent erasure of phospho-Ser7 marks on the RNA polymerase II C- terminal domain is essential for viability and transcription termination. J Biol Chem 2012, 287 (11), 8541-51. 64. Ansari, A.; Hampsey, M., A role for the CPF 3'-end processing machinery in RNAP II-dependent gene looping. Genes Dev 2005, 19 (24), 2969-78. 65. Allepuz-Fuster, P.; O'Brien, M. J.; Gonzalez-Polo, N.; Pereira, B.; Dhoondia, Z.; Ansari, A.; Calvo, O., RNA polymerase II plays an active role in the formation of gene loops through the Rpb4 subunit. Nucleic Acids Res 2019, 47 (17), 8975-8987. 66. Singh, B. N.; Hampsey, M., A transcription-independent role for TFIIB in gene looping. Mol Cell 2007, 27 (5), 806-16. 67. Tan-Wong, S. M.; Zaugg, J. B.; Camblong, J.; Xu, Z.; Zhang, D. W.; Mischo, H. E.; Ansari, A. Z.; Luscombe, N. M.; Steinmetz, L. M.; Proudfoot, N. J., Gene loops enhance transcriptional directionality. Science 2012, 338 (6107), 671-5. 68. Reyes-Reyes, M.; Hampsey, M., Role for the Ssu72 C-terminal domain phosphatase in RNA polymerase II transcription elongation. Mol Cell Biol 2007, 27 (3), 926-36. 69. Mi, H.; Huang, X.; Muruganujan, A.; Tang, H.; Mills, C.; Kang, D.; Thomas, P. D., PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 2017, 45 (D1), D183-D189. 70. Ganem, C.; Devaux, F.; Torchet, C.; Jacq, C.; Quevillon- Cheruel, S.; Labesse, G.; Facca, C.; Faye, G., Ssu72 is a phosphatase essential for transcription termination of snoRNAs and specific mRNAs in yeast. EMBO J 2003, 22 (7), 1588-98. 71. Loya, T. J.; O'Rourke, T. W.; Reines, D., A genetic screen for terminator function in yeast identifies a role for a new functional domain in termination factor Nab3. Nucleic Acids Res 2012, 40 (15), 7476-91. 72. Mahat, D. B.; Salamanca, H. H.; Duarte, F. M.; Danko, C. G.; Lis, J. T., Mammalian Heat Shock Response and Mechanisms Underlying Its Genome-wide Transcriptional Regulation. Mol Cell 2016, 62 (1), 63-78. 73. Duarte, F. M.; Fuda, N. J.; Mahat, D. B.; Core, L. J.; Guertin, M. J.; Lis, J. T., Transcription factors GAF and HSF act at distinct regulatory steps to modulate stress-induced gene activation. Genes Dev 2016, 30 (15), 1731-46. 74. Ho, B.; Baryshnikova, A.; Brown, G. W., Unification of Protein Abundance Datasets Yields a Quantitative Saccharomyces cerevisiae Proteome. Cell Syst 2018, 6 (2), 192-205 e3. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2020. ; https://doi.org/10.1101/2020.12.30.424894doi: bioRxiv preprint https://doi.org/10.1101/2020.12.30.424894 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_12_31_424931 ---- Structural insights into Cullin4-RING ubiquitin ligase remodelling by Vpr from simian immunodeficiency viruses 1 Structural insights into Cullin4-RING ubiquitin ligase remodelling by Vpr from 1 simian immunodeficiency viruses 2 3 Sofia Banchenko1¶, Ferdinand Krupp1¶, Christine Gotthold1, Jörg Bürger1,2, Andrea Graziadei3, Francis 4 O’Reilly3, Ludwig Sinn3, Olga Ruda1, Juri Rappsilber3,4, Christian M. T. Spahn1, Thorsten Mielke2, Ian 5 A. Taylor5, David Schwefel1* 6 7 1 Institute of Medical Physics and Biophysics, Charité – Universitätsmedizin Berlin, corporate member 8 of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 9 Germany 10 2 Microscopy and Cryo-Electron Microscopy Service Group, Max-Planck-Institute for Molecular 11 Genetics, Berlin, Germany 12 3 Bioanalytics Unit, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany 13 4 Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom 14 5 Macromolecular Structure Laboratory, The Francis Crick Institute, London, United Kingdom 15 16 *Corresponding author 17 E-mail: david.schwefel@charite.de (DS) 18 19 ¶These authors contributed equally to this work 20 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 2 Abstract 21 Viruses have evolved means to manipulate the host’s ubiquitin-proteasome system, in order to down-22 regulate antiviral host factors. The Vpx/Vpr family of lentiviral accessory proteins usurp the substrate 23 receptor DCAF1 of host Cullin4-RING ligases (CRL4), a family of modular ubiquitin ligases involved 24 in DNA replication, DNA repair and cell cycle regulation. CRL4DCAF1 specificity modulation by Vpx 25 and Vpr from certain simian immunodeficiency viruses (SIV) leads to recruitment, poly-ubiquitylation 26 and subsequent proteasomal degradation of the host restriction factor SAMHD1, resulting in enhanced 27 virus replication in differentiated cells. To unravel the mechanism of SIV Vpr-induced SAMHD1 28 ubiquitylation, we conducted integrative biochemical and structural analyses of the Vpr protein from 29 SIVs infecting Cercopithecus cephus (SIVmus). X-ray crystallography reveals commonalities between 30 SIVmus Vpr and other members of the Vpx/Vpr family with regard to DCAF1 interaction, while cryo-31 electron microscopy and cross-linking mass spectrometry highlight a divergent molecular mechanism 32 of SAMHD1 recruitment. In addition, these studies demonstrate how SIVmus Vpr exploits the dynamic 33 architecture of the multi-subunit CRL4DCAF1 assembly to optimise SAMHD1 ubiquitylation. Together, 34 the present work provides detailed molecular insight into variability and species-specificity of the 35 evolutionary arms race between host SAMHD1 restriction and lentiviral counteraction through Vpx/Vpr 36 proteins. 37 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 3 Author summary 38 Due to the limited size of virus genomes, virus replication critically relies on host cell components. In 39 addition to the host cell’s energy metabolism and its DNA replication and protein synthesis apparatus, 40 the protein degradation machinery is an attractive target for viral re-appropriation. Certain viral factors 41 divert the specificity of host ubiquitin ligases to antiviral host factors, in order to mark them for 42 destruction by the proteasome, to lift intracellular barriers to virus replication. Here, we present 43 molecular details of how the simian immunodeficiency virus accessory protein Vpr interacts with a 44 substrate receptor of host Cullin4-RING ubiquitin ligases, and how this interaction redirects the 45 specificity of Cullin4-RING to the antiviral host factor SAMHD1. The studies uncover the mechanism 46 of Vpr-induced SAMHD1 recruitment and subsequent ubiquitylation. Moreover, by comparison to 47 related accessory proteins from other immunodeficiency virus species, we illustrate the surprising 48 variability in the molecular strategies of SAMHD1 counteraction, which these viruses adopted during 49 evolutionary adaptation to their hosts. Lastly, our work also provides deeper insight into the inner 50 workings of the host’s Cullin4-RING ubiquitylation machinery. 51 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 4 Introduction 52 A large proportion of viruses have evolved means to co-opt their host’s ubiquitylation machinery, in 53 order to improve replication conditions, either by introducing viral ubiquitin ligases and deubiquitinases, 54 or by modification of host proteins involved in ubiquitylation [1-3]. In particular, host ubiquitin ligases 55 are a prominent target for viral usurpation, to redirect specificity towards antiviral host restriction 56 factors. This results in recruitment of restriction factors as non-endogenous neo-substrates, inducing 57 their poly-ubiquitylation and subsequent proteasomal degradation [4-8]. This counteraction of the host’s 58 antiviral repertoire is essential for virus infectivity and spread [9-12], and mechanistic insights into these 59 specificity changes extend our understanding of viral pathogenesis and might pave the way for novel 60 treatments. 61 Frequently, virally encoded modifying proteins associate with, and adapt the Cullin4-RING ubiquitin 62 ligases (CRL4) [5]. CRL4 consists of a Cullin4 (CUL4) scaffold that bridges the catalytic RING-domain 63 subunit ROC1 to the adaptor protein DDB1, which in turn binds to exchangeable substrate receptors 64 (DCAFs, DDB1- and CUL4-associated factors) [13-17]. In some instances, the DDB1 adaptor serves as 65 an anchor for virus proteins, which then act as “viral DCAFs” to recruit the antiviral substrate. Examples 66 are the simian virus 5 V protein and mouse cytomegalovirus M27, which bind to DDB1 and recruit 67 STAT1/2 proteins for ubiquitylation, in order to interfere with the host’s interferon response [18-20]. 68 Similarly, CUL4-dependent downregulation of STAT signalling is important for West Nile Virus 69 replication [21]. In addition, the hepatitis B virus X protein hijacks DDB1 to induce proteasomal 70 destruction of the structural maintenance of chromosome (SMC) complex to promote virus replication 71 [22, 23]. 72 Viral factors also bind to and modify DCAF receptors in order to redirect them to antiviral substrates. 73 Prime examples are the lentiviral accessory proteins Vpr and Vpx. All contemporary human and simian 74 immunodeficiency viruses (HIV/SIV) encode Vpr, while only two lineages, represented by HIV-2 and 75 SIV infecting mandrills, carry Vpx [24]. Vpr and Vpx proteins are packaged into progeny virions and 76 released into the host cell upon infection, where they bind to DCAF1 in the nucleus [25]. In this work, 77 corresponding simian immunodeficiency virus Vpx/Vpr proteins will be indicated with their host 78 species as subscript, with the following abbreviations used: mus – moustached monkey (Cercopithecus 79 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 5 cephus), mnd – mandrill (Mandrillus sphinx), rcm – red-capped mangabey (Cercocebus torquatus), sm 80 – sooty mangabey (Cercocebus atys), deb – De Brazza’s monkey (Cercopithecus neglectus), syk – 81 Syke’s monkey (Cercopithecus albogularis), agm – african green monkey (Chlorocebus spec). 82 VprHIV-1 is important for virus replication in vivo and in macrophage infection models [26]. Recent 83 proteomic analyses revealed that DCAF1 specificity modulation by VprHIV-1 proteins results in down-84 regulation of hundreds of host proteins in a DCAF1- and proteasome-dependent manner [27], including 85 the previously reported VprHIV-1 degradation targets UNG2 [28], HLTF [29], MUS81 [30, 31], MCM10 86 [32] and TET2 [33]. This surprising promiscuity in degradation targets is also partially conserved in 87 more distant clades exemplified by Vpragm and Vprmus [27]. However, Vpr pleiotropy, and the lack of 88 easily accessible experimental models, have prevented a characterisation of how these degradation 89 events precisely promote replication [26]. 90 By contrast, Vpx, exhibits a much narrower substrate range. It has recently been reported to target 91 stimulator of interferon genes (STING) and components of the human silencing hub (HUSH) complex 92 for degradation, leading to inhibition of antiviral cGAS-STING-mediated signalling and reactivation of 93 latent proviruses, respectively [34-36]. Importantly, Vpx also recruits the SAMHD1 restriction factor to 94 DCAF1, in order to mark it for proteasomal destruction [37, 38]. SAMHD1 is a deoxynucleotide 95 triphosphate (dNTP) triphosphohydrolase that restricts retroviral replication in non-dividing cells by 96 lowering the dNTP pool to levels that cannot sustain viral reverse transcription [39-46]. Retroviruses 97 that express Vpx are able to alleviate SAMHD1 restriction and allow replication in differentiated 98 myeloid lineage cells, resting T cells and memory T cells [38, 47, 48]. As a result of the constant 99 evolutionary arms race between the host’s SAMHD1 restriction and its viral antagonist Vpx, the 100 mechanism of Vpx-mediated SAMHD1 recruitment is highly virus species- and strain-specific: The 101 Vpx clade represented by VpxHIV-2 recognises the SAMHD1 C-terminal domain (CtD), while Vpxmnd2/rcm 102 binds the SAMHD1 N-terminal domain (NtD) in a fundamentally different way [24, 49-52]. 103 In the course of evolutionary adaptation to their primate hosts, and due to selective pressure to evade 104 SAMHD1 restriction, two groups of SIVs that do not have Vpx, SIVagm, and SIVdeb/mus/syk, neo-105 functionalised their Vpr to bind SAMHD1 and induce its degradation [24, 49, 53]. Consequently, these 106 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 6 species evolved “hybrid” Vpr proteins that retain targeting of some host factors depleted by HIV-1-type 107 Vpr [27], and additionally induce SAMHD1 degradation. 108 To uncover the molecular mechanisms of DCAF1- and SAMHD1-interaction of such a “hybrid” Vpr, 109 we initiated integrative biochemical and structural analyses of the Vpr protein from an SIV infecting 110 Cercopithecus cephus, Vprmus. These studies reveal similarities and differences to Vpx and Vpr proteins 111 from other lentivirus species and pinpoint the divergent molecular mechanism of Vprmus-dependent 112 SAMHD1 recruitment to CRL4DCAF1. Furthermore, cryo-electron microscopic (cryo-EM) 113 reconstructions of a Vprmus-modified CRL4DCAF1 protein complex allow for insights into the structural 114 plasticity of the entire CRL4 ubiquitin ligase assembly, with implications for the ubiquitin transfer 115 mechanism. 116 117 Results 118 SAMHD1-CtD is necessary and sufficient for Vprmus-binding and ubiquitylation in vitro 119 To investigate the molecular interactions between Vprmus, the neo-substrate SAMHD1 from rhesus 120 macaque and CRL4 subunits DDB1/DCAF1 C-terminal domain (DCAF1-CtD), protein complexes were 121 reconstituted in vitro from purified components and analysed by gel filtration (GF) chromatography. 122 The different protein constructs that were employed are shown schematically in S1A Fig. Vprmus is 123 insoluble after removal of the GST affinity purification tag (S1B Fig) and accordingly could not be 124 applied to the GF column. No interaction of SAMHD1 with DDB1/DCAF1-CtD could be detected in 125 the absence of Vprmus (S1C Fig). Analysis of binary protein combinations (Vprmus and DDB1/DCAF1-126 CtD; Vprmus and SAMHD1) shows that Vprmus elutes in a single peak together with DDB1/DCAF1-CtD 127 (S1D Fig) or with SAMHD1 (S1E Fig). Incubation of Vprmus with DDB1/DCAF1B and SAMHD1 128 followed by GF resulted in elution of all three components in a single peak (Fig 1A, B, red trace). 129 Together, these results show that Vprmus forms stable binary and ternary protein complexes with 130 DDB1/DCAF1-CtD and/or SAMHD1 in vitro. Furthermore, incubation with any of these interaction 131 partners apparently stabilises Vprmus by alleviating its tendency for aggregation/insolubility. 132 Previous cell-based assays indicated that residues 583-626 of rhesus macaque SAMHD1 (SAMHD1-133 CtD) are necessary for Vprmus-induced proteasomal degradation [49]. To test this finding in our in vitro 134 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 7 system, constructs containing SAMHD1-CtD fused to T4 lysozyme (T4L-SAMHD1-CtD), or lacking 135 SAMHD1-CtD (SAMHD1-ΔCtD, Fig 1A), were incubated with Vprmus and DDB1/DCAF1-CtD, and 136 complex formation was assessed by GF chromatography. Analysis of the resulting chromatograms by 137 SDS-PAGE shows that SAMHD1-ΔCtD did not co-elute with DDB1/DCAF1-CtD/Vprmus (Fig 1A, B, 138 green trace). By contrast, T4L-SAMHD1-CtD accumulated in a single peak, which also contained 139 DDB1/DCAF1-CtD and Vprmus (Fig 1A, B, cyan trace). These results confirm that SAMHD1-CtD is 140 necessary for stable association with DDB1/DCAF1-CtD/Vprmus in vitro, and demonstrate that 141 SAMHD1-CtD is sufficient for Vprmus-mediated recruitment of the T4L-SAMHD1-CtD fusion construct 142 to DDB1/DCAF1-CtD. 143 To correlate these data with enzymatic activity, in vitro ubiquitylation assays were conducted by 144 incubating SAMHD1, SAMHD1-ΔCtD or T4L-SAMHD1-CtD with purified CRL4DCAF1-CtD, E1 145 (UBA1), E2 (UBCH5C), ubiquitin and ATP. Input proteins are shown in S2A Fig, and control reactions 146 in S2B, C Fig. In the absence of Vprmus, no SAMHD1 ubiquitylation was observed (Figs 1C and S2D), 147 while addition of Vprmus resulted in robust SAMHD1 ubiquitylation (Figs 1D and S2E). In agreement 148 with the analytical GF data, SAMHD1-ΔCtD was not ubiquitylated in the presence of Vprmus (Figs 1E 149 and S2F), while T4L-SAMHD1-CtD, was ubiquitylated with similar kinetics as the full-length protein 150 (Figs 1F and S2F). Again, these data substantiate the functional importance of SAMHD1-CtD for 151 Vprmus-mediated recruitment to the CRL4DCAF1 ubiquitin ligase. 152 153 Crystal Structure analysis of apo- and Vprmus-bound DDB1/DCAF1-CtD protein 154 complexes 155 To obtain structural information regarding Vprmus and its mode of binding to the CRL4 substrate receptor 156 DCAF1, the X-ray crystal structures of a DDB1/DCAF1-CtD complex, and DDB1/DCAF1-CtD/T4L-157 Vprmus (residues 1-92) fusion protein ternary complex were determined. The structures were solved 158 using molecular replacement and refined to resolutions of 3.1 Å and 2.5 Å respectively (S1 Table). 159 Vprmus adopts a three-helix bundle fold, stabilised by coordination of a zinc ion by His and Cys residues 160 on Helix-1 and at the C-terminus (Fig 2A). Superposition of Vprmus with previously determined Vpxsm 161 [50], Vpxmnd2 [51, 52], and VprHIV-1 [54] structures reveals a conserved three-helix bundle fold, and 162 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 8 similar position of the helix bundles on DCAF1-CtD (S3A Fig). In addition, the majority of side chains 163 involved in DCAF1-interaction are type-conserved in all Vpx and Vpr proteins (Figs S3B-G and S6A), 164 strongly suggesting a common molecular mechanism of host CRL4-DCAF1 hijacking by the Vpx/Vpr 165 family of accessory proteins. However, there are also significant differences in helix length and register 166 as well as conformational variation in the loop region N-terminal of Helix-1, at the start of Helix-1 and 167 in the loop between Helices-2 and -3 (S3A Fig). 168 Vprmus binds to the side and on top of the disk-shaped 7-bladed β-propeller (BP) DCAF1-CtD domain 169 with a total contact surface area of ~1600 Å2 comprising three major regions of interaction. The extended 170 Vprmus N-terminus attaches to the cleft between DCAF1 BP blades 1 and 2 through several hydrogen 171 bonds, electrostatic and hydrophobic interactions (S3B-D Fig). A second, smaller contact area is formed 172 by hydrophobic interaction between Vprmus residues L31 and E34 from Helix-1, and DCAF1 W1156, 173 located in a loop on top of BP blade 2 (S3E Fig). The third interaction surface comprises the C-terminal 174 half of Vprmus Helix-3, which inserts into a ridge on top of DCAF1 (S3F, G Fig). 175 Superposition of the apo-DDB1/DCAF1-CtD and Vprmus-bound crystal structures reveals 176 conformational changes in DCAF1 upon Vprmus association. Binding of the N-terminal arm of Vprmus 177 induces only a minor rearrangement of a loop in BP blade 2 (S3C Fig). By contrast, significant structural 178 changes occur on the upper surface of the BP domain: polar and hydrophobic interactions of DCAF1 179 residues P1329, F1330, F1355, N1371, L1378, M1380 and T1382 with Vprmus side chains of T79, R83, 180 R86 and E87 in Helix-3 result in the stabilisation of the sequence stretch that connect BP blades 6 and 181 7 (“C-terminal loop”, Figs 2B and S3F). Moreover, side chain electrostatic interactions of Vprmus 182 residues R15, R75 and R76 with DCAF1 E1088, E1091 and E1093 lock the conformation of an “acidic 183 loop” upstream of BP blade 1, which is also unstructured and flexible in the absence of Vprmus (Figs 2B, 184 C and S3D, F). 185 Notably, in previously determined structures of Vpx/DCAF1/SAMHD1 complexes the “acidic loop” is 186 a central point of ternary contact, providing a binding platform for positively charged amino acid side 187 chains in either the SAMHD1 N- or C-terminus [50-52]. For example, Vpxsm positions SAMHD1-CtD 188 in such a way, that SAMHD1 K622 engages in electrostatic interaction with the DCAF1 “acidic loop” 189 residue D1092 (Fig 2C, left panel). However, in the Vprmus crystal structure the bound Vprmus now blocks 190 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 9 access to the corresponding SAMHD1-CtD binding pocket, in particular by the positioning of an 191 extended N-terminal loop that precedes Helix-1. Additionally, Vprmus side chains R15, R75 and R76 192 neutralise the DCAF1 “acidic loop”, precluding the formation of further salt bridges to basic residues in 193 SAMHD1-CtD (Fig 2C, right panel). 194 To validate the importance of Vprmus residues R15 and R75 for DCAF1-CtD- and SAMHD1-binding, 195 charge reversal mutations to glutamates were generated by site-directed mutagenesis. The effect of the 196 Vprmus R15E R75E double mutant on complex assembly was then analysed by GF chromatography. 197 SDS-PAGE analysis of the resulting chromatographic profile shows an almost complete loss of the 198 DDB1/DCAF1-CtD/Vprmus/SAMHD1 complex peak (Fig 2D, fraction 6), when compared to the wild 199 type, concomitant with enrichment of (i) Vprmus R15E R75E-bound DDB1/DCAF1-CtD (Fig 2D, 200 fractions 7-8), and of (ii) Vprmus R15E R75E/SAMHD1 binary complex (Fig 2D, fraction 8-9). This 201 suggests that charge reversal of Vprmus side chains R15 and R75 weakens the strong association with 202 DCAF1 observed in wild type Vprmus, due to loss of electrostatic interaction with the “acidic loop”, in 203 accordance with the crystal structure. Consequently, some proportion of Vpr-bound SAMHD1 204 dissociates, further indicating that Vprmus side chains R15 and R75 are not central to SAMHD1 205 interaction. 206 207 Molecular mechanism of SAMHD1-targeting 208 To obtain mechanistic insight into Vprmus-recruitment of SAMHD1-CtD, we initiated cryo-EM analyses 209 of the CRL4DCAF1-CtD/Vprmus/SAMHD1 assembly. In these studies, the small ubiquitin-like protein 210 NEDD8 was enzymatically attached to the CUL4 subunit, in order to obtain its active form (S4A Fig) 211 [55]. A CRL4-NEDD8DCAF1-CtD/Vprmus/SAMHD1 complex was reconstituted in vitro and purified by GF 212 chromatography (S4B Fig). Extensive 2D and 3D classification of the resulting particle images revealed 213 considerable conformational heterogeneity, especially regarding the position of the CUL4-214 NEDD8/ROC1 subcomplex (stalk) relative to DDB1/DCAF1/Vprmus (core), (S4 Fig). 215 Nevertheless, a homogeneous particle population could be separated, which yielded a 3D reconstruction 216 at a nominal resolution of 7.3 Å that contained electron density corresponding to the core (S4C-F Fig). 217 Molecular models of DDB1 BP domains A and C (BPA, BPC), DCAF1-CtD and Vprmus, derived from 218 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 10 our crystal structure (Fig 2), could be fitted as rigid bodies into this cryo-EM volume (Fig 3A). No 219 obvious electron density was visible for the bulk of SAMHD1. However, close inspection revealed an 220 additional tubular, slightly arcing density feature, approx. 35 Å in length, located on the upper surface 221 of the Vprmus helix bundle, approximately 17 Å away from and opposite of the Vprmus/DCAF1-CtD 222 binding interface (Fig 3A, red arrows). One end of the tubular volume contacts the middle of Vprmus 223 Helix-1, and the other end forms additional contacts to the C-terminus of Helix-2 and the N-terminus of 224 Helix-3. A local resolution of 7.5-8 Å precluded the fitting of an atomic model. Considering the 225 biochemical data, showing that SAMHD1-CtD is sufficient for recruitment to DDB1/DCAF1/Vprmus, 226 we hypothesise that this observed electron density feature corresponds to the region of SAMHD1-CtD 227 which physically interacts with Vprmus. Given its dimensions, the putative SAMHD1-CtD density could 228 accommodate approx. 10 amino acid residues in a fully extended conformation or up to 23 residues in 229 a kinked helical arrangement. All previous crystal structure analyses [46], as well as secondary structure 230 predictions indicate that SAMHD1 residues C-terminal to the catalytic HD domain and C-terminal lobe 231 (amino acids 599-626) are disordered in the absence of additional binding partners. Accordingly, the 232 globular domains of the SAMHD1 molecule might be flexibly linked to the C-terminal tether identified 233 here. In that case, the bulk of SAMHD1 samples a multitude of positions relative to the DDB1/DCAF1-234 CtD/Vprmus core, and consequently is averaged out in the process of cryo-EM reconstruction. 235 The topology of CRL4DCAF1-CtD/Vprmus/SAMHD1 and the binding region of SAMHD1-CtD were further 236 assessed by cross-linking mass spectrometry (CLMS) using the photo-reactive cross-linker sulfo-SDA 237 [56]. A large number of cross-links between SAMHD1 and the C-terminal half of CUL4, the side and 238 top of DCAF1-CtD, and BP blades 6-7 of DDB1 were found, consistent with highly variable positioning 239 of the SAM and HD domains of SAMHD1 relative to the CRL4 core (Fig 3B). Moreover, multiple 240 cross-links between SAMHD1-CtD and Vprmus were observed, more specifically locating to a sequence 241 stretch comprising the C-terminal half of Vprmus Helix-1 (residues A27-E36), and to a portion of the 242 disordered Vprmus C-terminus (residues Y90, Y100). These data are in accordance with the presence of 243 SAMHD1-CtD in the unassigned cryo-EM density and its role as Vprmus tether. The remaining 244 SAMHD1-CtD cross-links were with the C-terminus of CUL4 and the “acidic loop” of DCAF1 (Fig 245 3B). Distance restraints from these SAMHD1-CtD cross-links, together with our structural models of 246 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 11 CRL4DCAF1-CtD/Vprmus (see below), were employed to visualise the interaction space accessible to the 247 centre of mass of SAMHD1-CtD. This analysis is compatible with recruitment of SAMHD1-CtD on top 248 of the Vprmus helix bundle as indicated by cryo-EM (Fig 3C). Interestingly, cross-links to Vprmus were 249 restricted to the C-terminal end of SAMHD1-CtD (residues K622, K626), while cross-links to CUL4 250 and DCAF1 were found in the N-terminal portion (residues K595, K596, T602-S606). These 251 observations are consistent with a model where the very C-terminus of SAMHD1 is immobilised on 252 Vprmus, and SAMHD1-CtD residues further upstream are exposed to the catalytic machinery 253 surrounding the CUL4 C-terminal domain. 254 To further probe the interaction, Vprmus amino acid residues in close proximity to the putative SAMHD1-255 CtD density were substituted by site-directed mutagenesis. Specifically, Vprmus W29 was changed to 256 alanine to block a hydrophobic contact with SAMHD1-CtD involving the aromatic side chain, and 257 Vprmus A66 was changed to a bulky tryptophan, in order to introduce a steric clash with SAMHD1-CtD 258 (Fig 3D). This Vprmus W29A A66W double mutant was then assessed for complex formation with 259 DDB1/DCAF1-CtD and SAMHD1 by analytical GF. In comparison to wild type Vprmus, the W29A 260 A66W mutant showed a reduction of DDB1/DCAF1-CtD/Vprmus/SAMHD1 complex peak intensity (Fig 261 3E, fraction 6), concomitant with (i) enrichment of DDB1/DCAF1-CtD/Vprmus ternary complex, sub-262 stoichiometrically bound to SAMHD1 (Fig 3E, fraction 7), (ii) excess DDB1/DCAF1-CtD binary 263 complex (Fig 3E, fraction 8), and (iii) monomeric SAMHD1 species (Fig 3E, fractions 9-10). In 264 conclusion, this biochemical analysis, together with cryo-EM reconstruction at intermediate resolution 265 and CLMS analysis, locate the SAMHD1-CtD binding site on the upper surface of the Vprmus helix 266 bundle. 267 These data allow for structural comparison with neo-substrate binding modes of Vpx and Vpr proteins 268 from different retrovirus lineages (Fig 4A-D). VpxHIV-2 and Vpxsm position SAMHD1-CtD at the side of 269 the DCAF1 BP domain through interactions with the N-termini of Vpx Helices-1 and -3 (Fig 4B) [50]. 270 Vpxmnd2 and Vpxrcm bind SAMHD1-NtD using a bipartite interface comprising the side of the DCAF1 271 BP and the upper surface of the Vpx helix bundle (Fig 4C) [51, 52]. VprHIV-1 engages its ubiquitylation 272 substrate UNG2 using both the top and the upper edge of the VprHIV-1 helix bundle (Fig 4D) [54]. Of 273 note, these upper-surface interaction interfaces only partially overlap with the Vprmus/SAMHD1-CtD 274 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 12 binding interface identified here and employ fundamentally different sets of interacting amino acid 275 residues. Thus, it appears that the molecular interaction interfaces driving Vpx/Vpr-mediated neo-276 substrate recognition and degradation are not conserved between related SIV and HIV Vpx/Vpr 277 accessory proteins, even in cases where identical SAMHD1-CtD regions are targeted for recruitment. 278 279 Cryo-EM analysis of Vprmus-modified CRL4-NEDD8DCAF1-CtD conformational states and 280 dynamics 281 A reanalysis of the cryo-EM data using strict selection of high-quality 2D classes, followed by focussed 282 3D classification yielded three additional particle populations, resulting in 3D reconstructions at 8-10 Å 283 resolution, which contained both the Vprmus-bound CRL4 core and the stalk (conformational states-1, -2 284 and -3, Figs 5A and S4G-J). The quality of the 3D volumes was sufficient to fit crystallographic models 285 of core (Fig 2) and the stalk (PDB 2hye) [15] as rigid bodies (Figs 5B and S5A). For the catalytic RING-286 domain subunit ROC1, only fragmented electron density was present near the position it occupies in the 287 crystallographic model (S5A Fig). In all three states, electron density was selectively absent for the C-288 terminal CUL4 winged helix B (WHB) domain (residues 674-759), which contains the NEDD8 289 modification site (K705), and for the preceding α-helix, which connects the CUL4 N-terminal domain 290 to the WHB domain (S5A Fig). In accordance with this observation, the positions of CRL5-attached 291 NEDD8 and of the CRL4 ROC1 RING domain are sterically incompatible upon superposition of their 292 respective crystal structures (S5B Fig) [57]. 293 Alignment of 3D volumes from states-1, -2 and -3 shows that core densities representing DDB1 BPA, 294 BPC, DCAF1-CtD and Vprmus superimpose well, indicating that these components do not undergo major 295 conformational fluctuations and thus form a rigid platform for substrate binding and attachment of the 296 CRL4 stalk (Fig 5). However, rotation of DDB1 BPB around a hinge connecting it to BPC results in 297 three different orientations of state-1, -2 and -3 stalk regions relative to the core. BPB rotation angles 298 were measured as 69° between state-1 and -2, and 50° between state-2 and -3. Furthermore, the 299 crosslinks between DDB1 and CUL4 identified by CLMS are satisfied by the state-1 model, but 300 increasingly violated in states-2 and -3, validating in solution the conformational variability observed 301 by cryo-EM. (S5C Fig). Taken together, this places the CRL4 catalytic machinery, sited at the distal end 302 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 13 of the stalk, appropriately to approach the Vprmus-tethered bulk of SAMHD1 for ubiquitylation at a wide 303 range of angles (Fig 5B). 304 These data are in line with previous prediction based on extensive comparative crystal structure 305 analyses, which postulated an approx. 150° rotation of the CRL4 stalk around the core [13, 15, 16, 19, 306 58]. However, the left- and rightmost CUL4 orientations observed here, states-1 and -3 from our cryo-307 EM analysis, indicate a slightly narrower stalk rotation range (119°), when compared to the outermost 308 stalk conformations modelled from previously determined crystal structures (143°) (S5D Fig). An 309 explanation for this discrepancy comes from inspection of the cryo-EM densities and fitted models, 310 revealing that along with the main interaction interface on DDB1 BPB there are additional molecular 311 contacts between CUL4 and DDB1. Specifically, in state-1, there is a contact between the loop 312 connecting helices D and E of CUL4 cullin repeat (CR)1 (residues 161-169) and a loop protruding from 313 BP blade 3 of DDB1 BPC (residues 795-801, S5E Fig). In state-3, the loop between CUL4 CR2 helices 314 D and E (residues 275-282) abuts a region in the C-terminal helical domain of DDB1 (residues 1110-315 1127, S5F Fig). These auxiliary interactions might be required to lock the outermost stalk positions 316 observed here in order to confine the rotation range of CUL4. 317 318 Discussion 319 Our X-ray crystallographic studies of the DDB1/DCAF1-CtD/Vprmus assembly provide the first 320 structural insight into a class of “hybrid” SIV Vpr proteins. These are present in the SIVagm and 321 SIVmus/deb/syk lineages of lentiviruses and combine characteristics of related VprHIV-1 and SIV Vpx 322 accessory proteins. 323 Like SIV Vpx, “hybrid” Vpr proteins down-regulate the host restriction factor SAMHD1 by recruiting 324 it to CRL4DCAF1 for ubiquitylation and subsequent proteasomal degradation. However, using a 325 combination of X-ray, cryo-EM and CLMS analyses, we show that the molecular strategy, which Vprmus 326 evolved to target SAMHD1, is strikingly different from Vpx-containing SIV strains. In the two clades 327 of Vpx proteins, divergent amino acid sequence stretches just upstream of Helix-1 (variable region 328 (VR)1, S6A Fig), together with polymorphisms in the SAMHD1-N-terminus of the respective host 329 species, determine if HIV-2-type or SIVmnd-type Vpx recognise SAMHD1-CtD or SAMHD1-NtD, 330 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 14 respectively. These recognition mechanisms result in positioning of SAMHD1-CtD or -NtD on the side 331 of the DCAF1 BP domain in a way that allows for additional contacts between SAMHD1 and DCAF1, 332 thus forming ternary Vpx/SAMHD1/DCAF1 assemblies with very low dissociation rates [50-52, 59]. 333 In Vprmus, different principles determine the specificity for SAMHD1-CtD. Here, VR1 is not involved 334 in SAMHD1-CtD-binding at all, but forms additional interactions with DCAF1, which are not observed 335 in Vpx/DCAF1 protein complexes (S6A Fig). Molecular contacts between Vprmus and SAMHD1 are 336 dispersed on Helices-1 and -3, facing away from the DCAF1 interaction site and immobilising 337 SAMHD1-CtD on the top side of the Vprmus helix bundle (S6A Fig). Placement of SAMHD1-CtD in 338 such a position precludes stabilising ternary interaction with DCAF1-CtD, but still results in robust 339 SAMHD1 ubiquitylation in vitro and SAMHD1 degradation in cell-based assays [24]. 340 Predictions regarding the molecular mechanism of SAMHD1-binding by other “hybrid” Vpr 341 orthologues are difficult due to sequence divergence. Even in Vprdeb, the closest relative to Vprmus, only 342 approximately 50% of amino acid side chains lining the putative SAMHD1-CtD binding pocket are 343 conserved (S6A Fig). Previous in vitro ubiquitylation and cell-based degradation experiments did not 344 show a clear preference of Vprdeb for recruitment of either SAMHD1-NtD or –CtD [24, 49]. 345 Furthermore, it is disputed if Vprdeb actually binds DCAF1 [60], which might possibly be explained by 346 amino acid variations in the very N-terminus and/or in Helix-3 (S6A Fig). Vprsyk is specific for 347 SAMHD1-CtD [49], but the majority of residues forming the binding platform for SAMHD1-CtD 348 observed in the present study are not conserved. The SIVagm lineage of Vpr proteins is even more 349 divergent, with significant differences not only in possible SAMHD1-contacting residues, but also in 350 the sequence stretches preceding Helix-1, and connecting Helices-2 and -3, as well as in the N-terminal 351 half of Helix-3 (S6A Fig). Furthermore, there are indications that recruitment of SAMHD1 by the 352 Vpragm.GRI sub-type involves molecular recognition of both SAMHD1-NtD and –CtD [49, 53]. In 353 conclusion, recurring rounds of evolutionary lentiviral adaptation to the host SAMHD1 restriction 354 factor, followed by host re-adaptation, resulted in highly species-specific, diverse molecular modes of 355 Vpr-SAMHD1 interaction. In addition to the example presented here, further structural characterisation 356 of SAMHD1-Vpr complexes will be necessary to illustrate the manifold outcomes of this particular 357 virus-host molecular “arms race”. 358 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 15 Previous structural investigation of DDB1/DCAF1/VprHIV-1 in complex with the neo-substrate UNG2 359 demonstrated that VprHIV-1 engages UNG2 by mimicking the DNA phosphate backbone. More precisely, 360 UNG2 residues, which project into the major groove of its endogenous DNA substrate, insert into a 361 hydrophobic cleft formed by VprHIV-1 Helices-1, -2 and the N-terminal half of Helix-3 [54]. This 362 mechanism might rationalise VprHIV-1’s extraordinary binding promiscuity, since the list of potential 363 VprHIV-1 degradation substrates is significantly enriched in DNA- and RNA-binding proteins [27]. 364 Moreover, promiscuous VprHIV-1-induced degradation of host factors with DNA- or RNA-binding 365 activity has been proposed to induce cell cycle arrest at the G2/M phase border, which is the most 366 thoroughly described phenotype of Vpr proteins so far [26, 27, 61]. In Vprmus, the N-terminal half of 367 Helix-1 as well as the bulky amino acid residue W48, which is also conserved in Vpragm and Vpx, 368 constrict the hydrophobic cleft (S6A, B Fig). Furthermore, the extended N-terminus of Vprmus Helix-3 369 is not compatible with UNG2-binding due to steric exclusion (S6C Fig). In accordance with these 370 observations, Vprmus does not down-regulate UNG2 in a human T cell line [27]. However, Vprmus, Vprsyk 371 and Vpragm also cause G2/M cell cycle arrest in their respective host cells [60, 62, 63]. This strongly 372 hints at the existence of further structural determinants in Vprmus, Vprsyk, Vpragm and potentially VprHIV-1, 373 which regulate recruitment and ubiquitylation of DNA/RNA-binding host factors, in addition to the 374 hydrophobic, DNA-mimicking cleft on top of the three-helix bundle. Future efforts to structurally 375 characterise these determinants will further extend our understanding of how the Vpx/Vpr helical 376 scaffold binds, and in this way adapts to a multitude of neo-substrate epitopes. In addition, such efforts 377 might inform approaches to design novel CRL4DCAF1-based synthetic degraders, in the form of 378 proteolysis-targeting chimera-(PROTAC-) type compounds [64, 65]. 379 Our cryo-EM reconstructions of CRL4DCAF1-CtD/Vprmus/SAMHD1, complemented by CLMS, also 380 provide insights into the structural dynamics of CRL4 assemblies prior to ubiquitin transfer. The data 381 confirm previously described rotational movement of the CRL4 stalk, in the absence of constraints 382 imposed by a crystal lattice, creating a ubiquitylation zone around the Vprmus-modified substrate receptor 383 (Figs 5 and 6A) [13, 15, 16, 19, 58]. Missing density for the neddylated CUL4 WHB domain and for 384 the catalytic ROC1 RING domain indicates that these distal stalk elements are highly mobile and likely 385 sample a multitude of orientations relative to the CUL4 scaffold (Fig 6B). These observations are in line 386 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 16 with structure analyses of CRL1 and CRL5, where CUL1/5 neddylation leads to re-orientation of the 387 cullin WHB domain, and to release of the ROC1 RING domain from the cullin scaffold, concomitant 388 with stimulation of ubiquitylation activity [57]. Moreover, recent cryo-EM structure analysis of 389 CRL1β-TRCP/IκBα demonstrated substantial mobility of pre-catalytic NEDD8-CUL1 WHB and ROC1 390 RING domains [66]. Such flexibility seems necessary to structurally organise multiple CRL1-dependent 391 processes, in particular the nucleation of a catalytic assembly, involving intricate protein-protein 392 interactions between NEDD8, CUL1, ubiquitin-charged E2 and substrate receptor. This synergistic 393 assembly then steers the ubiquitin C-terminus towards a substrate lysine for priming with ubiquitin [66]. 394 Accordingly, our cryo-EM studies might indicate that similar principles apply for CRL4-catalysed 395 ubiquitylation. However, to unravel the catalytic architecture of CRL4, sophisticated cross-linking 396 procedures as in reference (65) will have to be pursued. 397 Intrinsic mobility of CRL4 stalk elements might assist the accommodation of a variety of sizes and 398 shapes of substrates in the CRL4 ubiquitylation zone and might rationalise the wide substrate range 399 accessible to CRL4 ubiquitylation through multiple DCAF receptors. Owing to selective pressure to 400 counteract the host’s SAMHD1 restriction, HIV-2 and certain SIVs, amongst other viruses, have taken 401 advantage of this dynamic CRL4 architecture by modification of the DCAF1 substrate receptor with 402 Vpx/Vpr-family accessory proteins. By tethering either SAMHD1-CtD or -NtD to DCAF1, and in this 403 way flexibly recruiting the bulk of SAMHD1, the accessibility of lysine side chains both tether-proximal 404 and on the SAMHD1 globular domains to the CRL4 catalytic assembly might be further improved (Fig 405 6C, D). This ensures efficient Vpx/Vpr-mediated SAMHD1 priming, poly-ubiquitylation and 406 proteasomal degradation to stimulate virus replication. 407 408 Methods 409 Protein expression and purification 410 Constructs were PCR-amplified from cDNA templates and inserted into the indicated expression 411 plasmids using standard restriction enzyme methods (S2 Table). pAcGHLT-B-DDB1 (plasmid #48638) 412 and pET28-UBA1 (plasmid #32534) were obtained from Addgene. The pOPC-UBA3-GST-APPBP1 413 co-expression plasmid, and the pGex6P2-UBC12 plasmid were obtained from MRC-PPU Reagents and 414 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 17 Services (clones 32498, 3879). Bovine erythrocyte ubiquitin and recombinant hsNEDD8 were 415 purchased from Sigma-Aldrich (U6253) and BostonBiochem (UL-812) respectively. Point mutations 416 were introduced by site-directed mutagenesis using KOD polymerase (Novagen). All constructs and 417 variants are summarised S3 Table. 418 Proteins expressed from vectors pAcGHLT-B, pGex6P1/2, pOPC and pET49b contained an N-terminal 419 GST-His-tag; pHisSUMO – N-terminal His-SUMO-tag; pET28, pRSF-Duet-1 – N-terminal His-tag; 420 pTri-Ex-6 – C-terminal His-tag. Constructs in vectors pAcGHLT-B and pTri-Ex-6 were expressed in 421 Sf9 cells, and constructs in vectors pET28, pET49b, pGex6P1/2, pRSF-Duet-1, and pHisSUMO in E. 422 coli Rosetta 2(DE3). 423 Recombinant baculoviruses (Autographa californica nucleopolyhedrovirus clone C6) were generated 424 as described previously [67]. Sf9 cells were cultured in Insect-XPRESS medium (Lonza) at 28°C in an 425 Innova 42R incubator shaker (New Brunswick) at a shaking speed of 180 rpm. In a typical preparation, 426 1 L of Sf9 cells at 3×106 cells/mL were co-infected with 4 mL of high titre DDB1 virus and 4 mL of 427 high titre DCAF1-CtD virus for 72 h. 428 For a typical E. coli Rosetta 2 (DE3) expression, 2 L of LB medium was inoculated with 20 mL of an 429 overnight culture and grown in a Multitron HT incubator shaker (Infors) at 37°C, 150 rpm until OD600 430 reached 0.7. At that point, temperature was reduced to 18°C, protein expression was induced by addition 431 of 0.2 mM IPTG, and cultures were grown for further 20 h. During co-expression of CUL4 and ROC1 432 from pRSF-Duet, 50 µM zinc sulfate was added to the growth medium before induction. 433 Sf9 cells were pelleted by centrifugation at 1000 rpm, 4°C for 30 min using a JLA 9.1000 centrifuge 434 rotor (Beckman). E. coli cells were pelleted by centrifugation at 4000 rpm, 4°C for 15 min using the 435 same rotor. Cell pellets were resuspended in buffer containing 50 mM Tris, pH 7.8, 500 mM NaCl, 4 436 mM MgCl2, 0.5 mM tris-(2-carboxyethyl)-phosphine (TCEP), mini-complete protease inhibitors (1 437 tablet per 50 mL) and 20 mM imidazole (for His-tagged proteins only). 100 mL of lysis buffer was used 438 for resuspension of a pellet from 1 L Sf9 culture, and 35 mL lysis buffer per pellet from 1 L E. coli 439 culture. Before resuspension of CUL4/ROC1 co-expression pellets, the buffer pH was adjusted to 8.5. 440 5 µL Benzonase (Merck) was added and the cells lysed by passing the suspension at least twice through 441 a Microfluidiser (Microfluidics). Lysates were clarified by centrifugation at 48000xg for 45 min at 4°C. 442 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 18 Protein purification was performed at 4°C on an Äkta pure FPLC (GE) using XK 16/20 chromatography 443 columns (GE) containing 10 mL of the appropriate affinity resin. GST-tagged proteins were captured 444 on glutathione-Sepharose (GSH-Sepharose FF, GE), washed with 250 mL of wash buffer (50 mM Tris-445 HCl pH 7.8, 500 mM NaCl, 4 mM MgCl2, 0.5 mM TCEP), and eluted with the same buffer 446 supplemented with 20 mM reduced glutathione. His-tagged proteins were immobilised on Ni-Sepharose 447 HP (GE), washed with 250 mL of wash buffer supplemented with 20 mM imidazole, and eluted with 448 wash buffer containing 0.3 M imidazole. Eluent fractions were analysed by SDS-PAGE, and appropriate 449 fractions were pooled and reduced to 5 mL using centrifugal filter devices (Vivaspin). If applicable, 100 450 µg GST-3C protease, or 50 µg thrombin, per mg total protein, was added and the sample was incubated 451 for 12 h on ice to cleave off affinity tags. As second purification step, gel filtration chromatography 452 (GF) was performed on an Äkta prime plus FPLC (GE), with Superdex 200 16/600 columns (GE), 453 equilibrated in 10 mM Tris-HCl pH 7.8, 150 mM NaCl, 4 mM MgCl2, 0.5 mM TCEP buffer, at a flow 454 rate of 1 mL/min. For purification of the CUL4/ROC1 complex, the pH of all purification buffers was 455 adjusted to 8.5. Peak fractions were analysed by SDS-PAGE, appropriate fractions were pooled and 456 concentrated to approx. 20 mg/mL, flash-frozen in liquid nitrogen in small aliquots and stored at -80°C. 457 Protein concentrations were determined with a NanoDrop spectrophotometer (ND 1000, Peqlab), using 458 theoretical absorption coefficients calculated based upon the amino acid sequence by ProtParam on the 459 ExPASy webserver [68]. 460 461 Analytical gel filtration analysis 462 Prior to gel filtration analysis affinity tags were removed by incubation of 30 µg GST-3C protease with 463 6 µM of each protein component in a volume of 120 µL wash buffer, followed by incubation on ice for 464 12 h. In order to remove the cleaved GST-tag and GST-3C protease, 20 μL GSH-Sepharose FF beads 465 (GE) were added and the sample was rotated at 4 °C for one hour. GSH-Sepharose beads were removed 466 by centrifugation at 4°C, 3500 rpm for 5 min, and 120 µL of the supernatant was loaded on an analytical 467 GF column (Superdex 200 10/300 GL, GE), equilibrated in 10 mM Tris-HCl pH 7.8, 150 mM NaCl, 468 4 mM MgCl2, 0.5 mM TCEP, at a flow rate of 0.5 mL/min. 1 mL fractions were collected and analysed 469 by SDS-PAGE. 470 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 19 471 In vitro ubiquitylation assays 472 160 µL reactions were prepared, containing 0.5 µM substrate (indicated SAMHD1 constructs, S2 Fig), 473 0.125 µM DDB1/DCAF1-CtD, 0.125 µM CUL4/ROC1, 0.125 µM HisSUMO-T4L-Vprmus (residues 1-474 92), 0.25 µM UBCH5C, 15 µM ubiquitin in 20 mM Tris-HCl pH 7.8, 150 mM NaCl, 2.5 mM MgCl2, 475 2.5 mM ATP. In control reactions, certain components were left out as indicated in S2 Fig. A 30 µl 476 sample for SDS-PAGE analysis was taken (t=0). Reactions were initiated by addition of 0.05 µM UBA1, 477 incubated at 37°C, and 30 µl SDS-PAGE samples were taken after 1 min, 2 min, 5 min and 15 min, 478 immediately mixed with 10 µl 4x SDS sample buffer and boiled at 95°C for 5 min. Samples were 479 analysed by SDS-PAGE. 480 481 In vitro neddylation of CUL4/ROC1 482 For initial neddylation tests, a 200 µL reaction was prepared, containing 8 µM CUL4/ROC1, 1.8 µM 483 UBC12, 30 µM NEDD8 in 50 mM Tris-HCl pH 7.8, 150 mM NaCl, 2.5 mM MgCl2, 2.5 mM ATP. 2x 484 30 µL samples were taken for SDS-PAGE, one was immediately mixed with 10 µL 4x SDS sample 485 buffer, the other one incubated for 60 min at 25°C. The reaction was initiated by addition of 0.7 µM 486 APPBP1/UBA3, incubated at 25°C, and 30 µL SDS-PAGE samples were taken after 1 min, 5 min, 487 10 min, 30 min and 60 min, immediately mixed with 10 µL 4x SDS sample buffer and boiled at 95°C 488 for 5 min. Samples were analysed by SDS-PAGE. Based on this test, the reaction was scaled up to 1 mL 489 and incubated for 5 min at 25°C. Reaction was quenched by addition of 5 mM TCEP and immediately 490 loaded onto a Superdex 200 16/600 GF column (GE), equilibrated in 10 mM Tris-HCl pH 7.8, 150 mM 491 NaCl, 4 mM MgCl2, 0.5 mM TCEP at a flow rate of 1 mL/min. Peak fractions were analysed by SDS-492 PAGE, appropriate fractions were pooled and concentrated to ~20 mg/mL, flash-frozen in liquid 493 nitrogen in small aliquots and stored at -80°C. 494 495 X-ray crystallography sample preparation, crystallisation, data collection and structure 496 solution 497 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 20 DDB1/DCAF1-CtD complex. DDB1/DCAF1-CtD crystals were grown by the hanging drop vapour 498 diffusion method, by mixing equal volumes (1 µL) of DDB1/DCAF1-CtD solution at 10 mg/mL with 499 reservoir solution containing 100 mM Tri-Na citrate pH 5.5, 18% PEG 1000 and suspending over a 500 500 µl reservoir. Crystals grew over night at 18°C. Crystals were cryo-protected in reservoir solution 501 supplemented with 20% glycerol and cryo-cooled in liquid nitrogen. A data set from a single crystal was 502 collected at Diamond Light Source (Didcot, UK) at a wavelength of 0.92819 Å. Data were processed 503 using XDS [69] (S1 Table), and the structure was solved using molecular replacement with the program 504 MOLREP [70] and available structures of DDB1 (PDB 3e0c) and DCAF1-CtD (PDB 4cc9) [50] as 505 search models. Iterative cycles of model adjustment with the program Coot [71], followed by refinement 506 using the program PHENIX [72] yielded final R/Rfree factors of 22.0%/27.9% (S1 Table). In the model, 507 94.5 % of residues have backbone dihedral angles in the favoured region of the Ramachandran plot, the 508 remainder fall in the allowed regions, and none are outliers. Details of data collection and refinement 509 statistics are presented in S1 Table. Coordinates and structure factors have been deposited in the PDB, 510 accession number 6zue. 511 DDB1/DCAF1-CtD/T4L-Vprmus (1-92) complex. The DDB1/DCAF1-CtD/Vprmus complex was 512 assembled by incubation of purified DDB1/DCAF1-CtD and HisSUMO-T4L-Vprmus (residues 1-92), at 513 a 1:1 molar ratio, in a buffer containing 50 mM Bis-tris propane pH 8.5, 0.5 M NaCl, 4 mM MgCl2, 0.5 514 mM TCEP, containing 1 mg of HRV-3C protease for HisSUMO-tag removal. After incubation on ice 515 for 12 h, the sample was loaded onto a Superdex 200 16/600 GF column (GE), with a 1 mL GSH-516 Sepharose FF column (GE) connected in line. The column was equilibrated with 10 mM Bis-tris propane 517 pH 8.5, 150 mM NaCl, 4 mM MgCl2, and 0.5 mM TCEP. The column flow rate was 1 mL/min. GF 518 fractions were analysed by SDS-PAGE, appropriate fractions were pooled and concentrated to 4.5 519 mg/mL. 520 Crystals were prepared by the sitting drop vapour diffusion method, by mixing equal volumes (200 nL) 521 of the protein complex at 4.5 mg/mL and reservoir solution containing 8-10% PEG 4000 (w/v), 200 mM 522 MgCl2, 100 mM HEPES-NaOH, pH 7.0-8.2. The reservoir volume was 75 µL. Crystals grew after at 523 least 4 weeks of incubation at 4°C. Crystals were cryo-protected in reservoir solution supplemented with 524 20% glycerol and cryo-cooled in liquid nitrogen. Data sets from two single crystals were collected, 525 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 21 initially at BESSY II (Helmholtz-Zentrum Berlin, HZB) at a wavelength of 0.91841 Å, and later at 526 ESRF (Grenoble) at a wavelength of 1 Å. Data sets were processed separately using XDS [69] and 527 XDSAPP [73]. The structure was solved by molecular replacement, using the initial BESSY data set, 528 with the program PHASER [74], and the following structures as search models: DDB1/DCAF1-CtD 529 (this work) and T4L variant E11H (PDB 1qt6) [75]. After optimisation of the initial model and 530 refinement against the higher-resolution ESRF data set, Vprmus was placed manually into the density, 531 using an NMR model of VprHIV-1 (PDB 1m8l) [76] as guidance. Iterative cycles of model adjustment 532 with the program Coot [71], followed by refinement using the program PHENIX [72] yielded final 533 R/Rfree factors of 21.61%/26.05%. In the model, 95.1 % of residues have backbone dihedral angles in 534 the favoured region of the Ramachandran plot, the remainder fall in the allowed regions, and none are 535 outliers. Details of data collection and refinement statistics are presented in S1 Table. Coordinates and 536 structure factors have been deposited in the PDB, accession number 6zx9. 537 538 Cryo-EM sample preparation and data collection 539 Complex assembly. Purified CUL4-NEDD8/ROC1, DDB1/DCAF1-CtD, GST-Vprmus and rhesus 540 macaque SAMHD1, 1 µM each, were incubated in a final volume of 1 mL of 10 mM Tris-HCl pH 7.8, 541 150 mM NaCl, 4 mM MgCl2, 0.5 mM TCEP, supplemented with 1 mg of GST-3C protease. After 542 incubation on ice for 12 h, the sample was loaded onto a Superdex 200 16/600 GF column (GE), 543 equilibrated with the same buffer at 1 mL/min, with a 1 mL GSH-Sepharose FF column (GE) connected 544 in line. GF fractions were analysed by SDS-PAGE, appropriate fractions were pooled and concentrated 545 to 2.8 mg/mL. 546 Grid preparation. 3.5 µl protein solution containing 0.05 µM CUL4-NEDD8/ROC1/DDB1/DCAF1-547 CtD/Vprmus/SAMHD1 complex and 0.25 µM UBCH5C-ubiquitin conjugate (S4 A, B Fig) were applied 548 to a 300 mesh Quantifoil R2/4 Cu/Rh holey carbon grid (Quantifoil Micro Tools GmbH) coated with an 549 additional thin carbon film as sample support and stained with 2% uranyl acetate for initial 550 characterisation. For cryo-EM, a fresh 400 mesh Quantifoil R1.2/1.3 Cu holey carbon grid (Quantifoil 551 Micro Tools GmbH) was glow-discharged for 30 s using a Harrick plasma cleaner with technical air at 552 0.3 mbar and 7 W. 3.5 µl protein solution containing 0.4 µM CUL4-NEDD8/ROC1/DDB1/DCAF1-553 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 22 CtD/Vprmus/SAMHD1 complex and 2 µM UBCH5C-ubiquitin conjugate were applied to the grid, 554 incubated for 45 s, blotted with a Vitrobot Mark II device (FEI, Thermo Fisher Scientific) for 1-2 s at 555 8°C and 80% humidity, and plunged in liquid ethane. Grids were stored in liquid nitrogen until imaging. 556 Cryo-EM data collection. Initial negative stain and cryo-EM datasets were collected automatically 557 for sample quality control and low-resolution reconstructions on a 120 kV Tecnai Spirit cryo-EM (FEI, 558 Thermo Fisher Scientific) equipped with a F416 CMOS camera (TVIPS) using Leginon [77, 78]. 559 Particle images were then analysed by 2D classification and initial model reconstruction using SPHIRE 560 [79], cisTEM [80] and Relion 3.07 [81]. These data revealed the presence of the complexes containing 561 both DDB1/DCAF1-CtD/Vprmus (core) and CUL4/ROC1 (stalk). High-resolution data was collected on 562 a 300 kV Tecnai Polara cryo-EM (FEI, Thermo Fisher Scientific) equipped with a K2summit direct 563 electron detector (Gatan) at a nominal magnification of 31000x, with a pixel size of 0.625 Å/px on the 564 object scale. In total, 3644 movie stacks were collected in super-resolution mode using Leginon [77, 78] 565 with the following parameters: defocus range of 0.5-3.0 µm, 40 frames per movie, 10 s exposure time, 566 electron dose of 1.25 e/Å2/s and a cumulative dose of 50 e/Å2 per movie. 567 568 Cryo-EM computational analysis 569 Movies were aligned and dose-weighted using MotionCor2 [82] and initial estimation of the contrast 570 transfer function (CTF) was performed with the CTFFind4 package [83]. Resulting micrographs were 571 manually inspected to exclude images with substantial contaminants (typically large protein aggregates 572 or ice contaminations) or grid artefacts. Power spectra were manually inspected to exclude images with 573 astigmatic, weak, or poorly defined spectra. After these quality control steps the dataset included 2322 574 micrographs (63% of total). At this stage, the data set was picked twice and processed separately, to 575 yield reconstructions of the core (analysis 1) and states-1, -2 and -3 (analysis 2). 576 For analysis 1, particle positions were determined using template matching with a filtered map 577 comprising core and stalk using the software Gautomatch (https://www2.mrc-578 lmb.cam.ac.uk/research/locally-developed-software/zhang-software/). 712,485 particle images were 579 found, extracted with Relion 3.07 and subsequently 2D-classified using cryoSPARC [84], resulting in 580 505,342 particle images after selection (S4C, D Fig). These particle images were separated into two 581 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 23 equally sized subsets and Tier 1 3D-classification was performed using Relion 3.07 on both of them to 582 reduce computational burden (S4D Fig). The following parameters were used: initial model=“core”, 583 number of classes K=4, T=10, global step search=7.5°, number of iterations=25, pixel size 3.75 Å/px. 584 From these, the ones possessing both core and stalk were selected. Classes depicting a similar stalk 585 orientation relative to the core were pooled and directed into Tier 2 as three different subpopulations 586 containing 143,172, 193,059 and 167,666 particle images, respectively (S4D Fig). 587 For Tier 2, each subpopulation was classified separately into 4 classes each. From these 12 classes, all 588 particle images exhibiting well-defined densities for core and stalk were pooled and labelled 589 “core+stalk”, resulting in 310,801 particle images in total. 193,096 particle images representing classes 590 containing only the core were pooled and labelled “core” (S4D Fig) 591 For Tier 3, the “core” particle subset was separated into 4 classes which yielded uninterpretable 592 reconstructions lacking medium- or high-resolution features. The “core+stalk” subset was separated into 593 6 classes, with 5 classes containing both stalk and core (S4D Fig) and one class consisting only of the 594 core with Vprmus bound. The 5 classes with stalk showed similar stalk orientations as the ones obtained 595 from analysis 2 (see below, S5 Fig), but refined individually to lower resolution as in analysis 2 and 596 were discarded. However, individual refinement of the core-only tier 3 class yielded a 7.3 Å 597 reconstruction (S4E, F Fig). 598 For analysis 2, particle positions were determined using cisTEMs Gaussian picking routine, yielding 599 959,155 particle images in total. After two rounds of 2D-classification, 227,529 particle images were 600 selected for further processing (S4G, H Fig). Using this data, an initial model was created using Relion 601 3.07. The resulting map yielded strong signal for the core but only fragmented stalk density, indicating 602 a large heterogeneity in the stalk-region within the data set. This large degree of compositional (+/- 603 stalk) and conformational heterogeneity (movement of the stalk relative to the core) made the 604 classification challenging. Accordingly, alignment and classification were carried out simultaneously. 605 The first objective was to separate the data set into three categories: “junk”, “core” and “core+stalk”. 606 Therefore, the stalk was deleted from the initial model using the “Eraser”-tool in Chimera [85]. This 607 core-map was used as an initial model for the Tier 1 3D-classification with Relion 3.07 at a decimated 608 pixel size of 2.5 Å/px. The following parameters were used: number of classes K=6, T=10, global step 609 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 24 search=7.5°, number of iterations= 25. The classification yielded two classes containing the stalk 610 (classes 3 and 5 containing 23% and 22% of the particle images, respectively) (S4H Fig). These particles 611 were pooled and directed into Tier 2 3D-classification using the following parameters: number of classes 612 K=6, T=10, global step search=7.5°, number of iterations=25. Three of these classes yielded medium-613 resolution maps with interpretable features (states-1, -2 and -3, S4H Fig). These three classes were 614 refined individually using 3D Relion 3.07, resulting in maps with resolution ranging from 7.8 Å – 8.9 615 Å (S4H-J Fig). 616 617 Molecular visualisation, rigid body fitting, 3D structural alignments, rotation and 618 interface analysis 619 Density maps and atomic models were visualised using Coot [71], PyMOL (Schrödinger) and UCSF 620 Chimera [85]. Rigid body fits and structural alignments were performed using the program UCSF 621 Chimera [85]. Rotation angles between extreme DDB1 BPB domain positions were measured using the 622 DynDom server [86] (http://dyndom.cmp.uea.ac.uk/dyndom/runDyndom.jsp). Molecular interfaces 623 were analysed using the EBI PDBePISA server [87] (https://www.ebi.ac.uk/msd-srv/prot_int/cgi-624 bin/piserver). 625 626 Multiple sequence alignment 627 A multiple sequence alignment was calculated using the EBI ClustalOmega server [88] 628 (https://www.ebi.ac.uk/Tools/msa/clustalo/), and adjusted manually using the program GeneDoc [89]. 629 630 Cross-linking mass spectrometry (CLMS) 631 Complex assembly. Purified CUL4/ROC1, DDB1/DCAF1-CtD, GST-Vprmus and rhesus macaque 632 SAMHD1, 1 µM each, were incubated in a volume of 3 mL buffer containing 10 mM HEPES pH 7.8, 633 150 mM NaCl, 4 mM MgCl2, 0.5 mM TCEP, supplemented with 1 mg GST-3C protease. After 634 incubation on ice for 12 h, the sample was loaded onto a Superdex 200 16/600 GF column (GE), 635 equilibrated with the same buffer, at a flow rate of 1 mL/min with a 1 mL GSH-Sepharose FF column 636 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 25 (GE) connected in line. GF fractions were analysed by SDS-PAGE, appropriate fractions were pooled 637 and concentrated to 6 mg/mL. 638 Photo-Crosslinking. The cross-linker sulfo-SDA (sulfosuccinimidyl 4,4′-azipentanoate) (Thermo 639 Scientific) was dissolved in cross-linking buffer (10 mM HEPES pH 7.8, 150 mM NaCl, 4 mM 640 MgCl2, 0.5 mM TCEP) to 100 mM before use. The labelling step was performed by incubating 641 18 μg aliquots of the complex at 1 mg/mL with 2, 1, 0.5, 0.25, 0.125 mM sulfo-SDA, added, 642 respectively, for an hour. The samples were then irradiated with UV light at 365 nm, to form cross-643 links, for 20 min and quenched with 50 mM NH4HCO3 for 20 min. All steps were performed on 644 ice. Reaction products were separated on a Novex Bis-Tris 4–12% SDS−PAGE gel (Life 645 Technologies). The gel band corresponding to the cross-linked complex was excised and digested 646 with trypsin (Thermo Scientific Pierce) [90] and the resulting tryptic peptides were extracted and 647 desalted using C18 StageTips [91]. Eluted peptides were fractionated on a Superdex Peptide 3.2/300 648 increase column (GE Healthcare) at a flow rate of 10 µL/min using 30% (v/v) acetonitrile and 0.1 649 % (v/v) trifluoroacetic acid as mobile phase. 50 μL fractions were collected and vacuum-dried. 650 CLMS acquisition. Samples for analysis were resuspended in 0.1% (v/v) formic acid, 3.2% (v/v) 651 acetonitrile. LC-MS/MS analysis was performed on an Orbitrap Fusion Lumos Tribrid mass 652 spectrometer (Thermo Fisher) coupled on-line with an Ultimate 3000 RSLCnano HPLC system 653 (Dionex, Thermo Fisher). Samples were separated on a 50 cm EASY-Spray column (Thermo Fisher). 654 Mobile phase A consisted of 0.1% (v/v) formic acid and mobile phase B of 80% (v/v) acetonitrile with 655 0.1% (v/v) formic acid. Flow rates were 0.3 μL/min using gradients optimized for each chromatographic 656 fraction from offline fractionation, ranging from 2% mobile phase B to 55% mobile phase B over 657 90 min. MS data were acquired in data-dependent mode using the top-speed setting with a 3 s cycle 658 time. For every cycle, the full scan mass spectrum was recorded using the Orbitrap at a resolution of 659 120,000 in the range of 400 to 1,500 m/z. Ions with a precursor charge state between 3+ and 7+ were 660 isolated and fragmented. Analyte fragmentation was achieved by Higher-Energy Collisional 661 Dissociation (HCD) [92] and fragmentation spectra were then recorded in the Orbitrap with a resolution 662 of 50,000. Dynamic exclusion was enabled with single repeat count and 60 s exclusion duration. 663 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 26 CLMS processing. A recalibration of the precursor m/z was conducted based on high-confidence 664 (<1% false discovery rate (FDR)) linear peptide identifications. The re-calibrated peak lists were 665 searched against the sequences and the reversed sequences (as decoys) of cross-linked peptides using 666 the Xi software suite (v.1.7.5.1) for identification [93]. Final crosslink lists were compiled using the 667 identified candidates filtered to <1% FDR on link level with xiFDR v.2.0 [94] imposing a minimum of 668 20% sequence coverage and 4 observed fragments per peptide. 669 CLMS analysis. In order to sample the accessible interaction volume of the SAMHD1-CtD consistent 670 with CLMS data, a model for SAMHD1 was generated using I-TASSER [95]. The SAMHD1-CtD, 671 which adopted a random coil configuration, was extracted from the model. In order to map all crosslinks, 672 missing loops in the complex structure were generated using MODELLER [96]. An interaction volume 673 search was then submitted to the DisVis webserver [97] with an allowed distance between 1.5 Å and 22 674 Å for each restraint using the "complete scanning" option. The rotational sampling interval was set to 675 9.72° and the grid voxel spacing to 1Å. The accessible interaction volume was visualised using UCSF 676 Chimera [85]. 677 678 Acknowledgments 679 We thank the MPI-MG for granting access to the TEM instruments of the microscopy and cryo-EM 680 service group. We thank Manfred Weiss and the scientific staff of the BESSY-MX (Macromolecular X-681 ray Crystallography)/Helmholtz Zentrum Berlin für Materialien und Energie at beamlines BL14.1, 682 BL14.2, and BL14.3 operated by the Joint Berlin MX-Laboratory at the BESSY II electron storage ring 683 (Berlin-Adlershof, Germany) as well as the scientific staff of the ESRF (Grenoble, France) at beamlines 684 ID30A-3, ID30B, ID23-1, ID23-2, and ID29 for continuous support. We acknowledge Diamond Light 685 Source (Didcot, UK) for access and support of the synchrotron beamline I04 and cryo-EM facilities at 686 the UK's national Electron Bio-imaging Centre (eBIC). Furthermore, the authors acknowledge the 687 North-German Supercomputing Alliance (HLRN) and the HPC for Research cluster of the Berlin 688 Institute of Health for providing HPC resources. The pHisSUMO plasmid was a generous gift from Dr. 689 Evangelos Christodoulou (The Francis Crick Institute, UK). The rhesus macaque SAMHD1 cDNA 690 template was a generous gift from Prof. Michael Emerman (Fred Hutchinson Cancer Research Center, 691 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 27 Seattle, USA). Recombinant BAC10:1629KO bacmid was a generous gift from Prof. Ian Jones 692 (University of Reading, UK). pAcGHLT-B-DDB1 was a gift from Ning Zheng (Addgene plasmid 693 48638). pET28-mE1 was a gift from Jorge Eduardo Azevedo (Addgene plasmid 32534). 694 695 Data availability 696 The coordinates and structure factors for the crystal structures have been deposited at the Protein Data 697 Bank (PDB) with the accession codes 6ZUE (DDB1/DCAF1-CtD) and 6ZX9 (DDB1/DCAF1-698 CtD/T4L-Vprmus 1-92). Cryo-EM reconstructions have been deposited at the Electron Microscopy Data 699 Bank (EMDB) with the accession codes EMD-10611 (core), EMD-10612 (conformational state-1), 700 EMD-10613 (state-2) and EMD-10614 (state-3). CLMS data have been deposited at the PRIDE database 701 [98] with the accession code PXD020453, reviewer password fCrQG2u8. 702 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 28 References 703 1. Randow F, Lehner PJ. Viral avoidance and exploitation of the ubiquitin system. Nat Cell Biol. 704 2009;11(5):527-34. doi: 10.1038/ncb0509-527. 705 2. Isaacson MK, Ploegh HL. Ubiquitination, ubiquitin-like modifiers, and deubiquitination in viral 706 infection. Cell host & microbe. 2009;5(6):559-70. doi: 10.1016/j.chom.2009.05.012. 707 3. Gustin JK, Moses AV, Fruh K, Douglas JL. Viral takeover of the host ubiquitin system. Front 708 Microbiol. 2011;2:161. doi: 10.3389/fmicb.2011.00161. 709 4. Barry M, Fruh K. Viral modulators of cullin RING ubiquitin ligases: culling the host defense. Science's 710 STKE : signal transduction knowledge environment. 2006;2006(335):pe21. Epub 2006/05/18. doi: 711 10.1126/stke.3352006pe21. 712 5. Mahon C, Krogan NJ, Craik CS, Pick E. Cullin E3 ligases and their rewiring by viral factors. 713 Biomolecules. 2014;4(4):897-930. Epub 2014/10/15. doi: 10.3390/biom4040897. 714 6. Becker T, Le-Trilling VTK, Trilling M. Cellular Cullin RING Ubiquitin Ligases: Druggable Host 715 Dependency Factors of Cytomegaloviruses. Int J Mol Sci. 2019;20(7). doi: 10.3390/ijms20071636. 716 7. Seissler T, Marquet R, Paillart JC. Hijacking of the Ubiquitin/Proteasome Pathway by the HIV 717 Auxiliary Proteins. Viruses. 2017;9(11). doi: 10.3390/v9110322. 718 8. Zheng N, Shabek N. Ubiquitin Ligases: Structure, Function, and Regulation. Annu Rev Biochem. 719 2017;86:14.1-29. 720 9. Sauter D, Kirchhoff F. Key Viral Adaptations Preceding the AIDS Pandemic. Cell host & microbe. 721 2019;25(1):27-38. doi: 10.1016/j.chom.2018.12.002. 722 10. Sharp PM, Hahn BH. Origins of HIV and the AIDS pandemic. Cold Spring Harbor perspectives in 723 medicine. 2011;1(1):a006841. Epub 2012/01/10. doi: 10.1101/cshperspect.a006841. 724 11. Hatziioannou T, Del Prete GQ, Keele BF, Estes JD, McNatt MW, Bitzegeio J, et al. HIV-1-induced 725 AIDS in monkeys. Science. 2014;344(6190):1401-5. Epub 2014/06/21. doi: 10.1126/science.1250761. 726 12. Malim MH, Bieniasz PD. HIV Restriction Factors and Mechanisms of Evasion. Cold Spring Harbor 727 perspectives in medicine. 2012;2(5):a006940. Epub 2012/05/04. doi: 10.1101/cshperspect.a006940. 728 13. Fischer ES, Scrima A, Bohm K, Matsumoto S, Lingaraju GM, Faty M, et al. The molecular basis of 729 CRL4DDB2/CSA ubiquitin ligase architecture, targeting, and activation. Cell. 2011;147(5):1024-39. Epub 730 2011/11/29. doi: 10.1016/j.cell.2011.10.035. 731 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 29 14. Lee J, Zhou P. DCAFs, the missing link of the CUL4-DDB1 ubiquitin ligase. Molecular cell. 732 2007;26(6):775-80. Epub 2007/06/26. doi: 10.1016/j.molcel.2007.06.001. 733 15. Angers S, Li T, Yi X, MacCoss MJ, Moon RT, Zheng N. Molecular architecture and assembly of the 734 DDB1-CUL4A ubiquitin ligase machinery. Nature. 2006;443(7111):590-3. Epub 2006/09/12. doi: 735 10.1038/nature05175. 736 16. Scrima A, Konickova R, Czyzewski BK, Kawasaki Y, Jeffrey PD, Groisman R, et al. Structural basis of 737 UV DNA-damage recognition by the DDB1-DDB2 complex. Cell. 2008;135(7):1213-23. Epub 2008/12/27. doi: 738 10.1016/j.cell.2008.10.045. 739 17. Zimmerman ES, Schulman BA, Zheng N. Structural assembly of cullin-RING ubiquitin ligase 740 complexes. Current opinion in structural biology. 2010;20(6):714-21. Epub 2010/10/01. doi: 741 10.1016/j.sbi.2010.08.010. 742 18. Andrejeva J, Young DF, Goodbourn S, Randall RE. Degradation of STAT1 and STAT2 by the V 743 proteins of simian virus 5 and human parainfluenza virus type 2, respectively: consequences for virus replication 744 in the presence of alpha/beta and gamma interferons. Journal of virology. 2002;76(5):2159-67. doi: 745 10.1128/jvi.76.5.2159-2167.2002. 746 19. Li T, Chen X, Garbutt KC, Zhou P, Zheng N. Structure of DDB1 in complex with a paramyxovirus V 747 protein: viral hijack of a propeller cluster in ubiquitin ligase. Cell. 2006;124(1):105-17. Epub 2006/01/18. doi: 748 10.1016/j.cell.2005.10.033. 749 20. Trilling M, Le VT, Fiedler M, Zimmermann A, Bleifuss E, Hengel H. Identification of DNA-damage 750 DNA-binding protein 1 as a conditional essential factor for cytomegalovirus replication in interferon-gamma-751 stimulated cells. PLoS pathogens. 2011;7(6):e1002069. doi: 10.1371/journal.ppat.1002069. 752 21. Paradkar PN, Duchemin JB, Rodriguez-Andres J, Trinidad L, Walker PJ. Cullin4 Is Pro-Viral during 753 West Nile Virus Infection of Culex Mosquitoes. PLoS pathogens. 2015;11(9):e1005143. doi: 754 10.1371/journal.ppat.1005143. 755 22. Decorsiere A, Mueller H, van Breugel PC, Abdul F, Gerossier L, Beran RK, et al. Hepatitis B virus X 756 protein identifies the Smc5/6 complex as a host restriction factor. Nature. 2016;531(7594):386-9. doi: 757 10.1038/nature17170. 758 23. Murphy CM, Xu Y, Li F, Nio K, Reszka-Blanco N, Li X, et al. Hepatitis B Virus X Protein Promotes 759 Degradation of SMC5/6 to Enhance HBV Replication. Cell reports. 2016;16(11):2846-54. doi: 760 10.1016/j.celrep.2016.08.026. 761 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 30 24. Lim ES, Fregoso OI, McCoy CO, Matsen FA, Malik HS, Emerman M. The ability of primate 762 lentiviruses to degrade the monocyte restriction factor SAMHD1 preceded the birth of the viral accessory protein 763 Vpx. Cell host & microbe. 2012;11(2):194-204. Epub 2012/01/31. doi: 10.1016/j.chom.2012.01.004. 764 25. Romani B, Cohen EA. Lentivirus Vpr and Vpx accessory proteins usurp the cullin4-DDB1 (DCAF1) 765 E3 ubiquitin ligase. Current opinion in virology. 2012;2(6):755-63. Epub 2012/10/16. doi: 766 10.1016/j.coviro.2012.09.010. 767 26. Fabryova H, Strebel K. Vpr and Its Cellular Interaction Partners: R We There Yet? Cells. 2019;8(11). 768 doi: 10.3390/cells8111310. 769 27. Greenwood EJD, Williamson JC, Sienkiewicz A, Naamati A, Matheson NJ, Lehner PJ. Promiscuous 770 Targeting of Cellular Proteins by Vpr Drives Systems-Level Proteomic Remodeling in HIV-1 Infection. Cell 771 reports. 2019;27(5):1579-96 e7. doi: 10.1016/j.celrep.2019.04.025. 772 28. Schrofelbauer B, Yu Q, Zeitlin SG, Landau NR. Human immunodeficiency virus type 1 Vpr induces the 773 degradation of the UNG and SMUG uracil-DNA glycosylases. Journal of virology. 2005;79(17):10978-87. doi: 774 10.1128/JVI.79.17.10978-10987.2005. 775 29. Lahouassa H, Blondot ML, Chauveau L, Chougui G, Morel M, Leduc M, et al. HIV-1 Vpr degrades the 776 HLTF DNA translocase in T cells and macrophages. Proceedings of the National Academy of Sciences of the 777 United States of America. 2016;113(19):5311-6. doi: 10.1073/pnas.1600485113. 778 30. Laguette N, Bregnard C, Hue P, Basbous J, Yatim A, Larroque M, et al. Premature activation of the 779 SLX4 complex by Vpr promotes G2/M arrest and escape from innate immune sensing. Cell. 2014;156(1-2):134-780 45. Epub 2014/01/15. doi: 10.1016/j.cell.2013.12.011. 781 31. Zhou X, DeLucia M, Ahn J. SLX4-SLX1 Protein-independent Down-regulation of MUS81-EME1 782 Protein by HIV-1 Viral Protein R (Vpr). The Journal of biological chemistry. 2016;291(33):16936-47. doi: 783 10.1074/jbc.M116.721183. 784 32. Romani B, Shaykh Baygloo N, Aghasadeghi MR, Allahbakhshi E. HIV-1 Vpr Protein Enhances 785 Proteasomal Degradation of MCM10 DNA Replication Factor through the Cul4-DDB1[VprBP] E3 Ubiquitin 786 Ligase to Induce G2/M Cell Cycle Arrest. The Journal of biological chemistry. 2015;290(28):17380-9. doi: 787 10.1074/jbc.M115.641522. 788 33. Lv L, Wang Q, Xu Y, Tsao LC, Nakagawa T, Guo H, et al. Vpr Targets TET2 for Degradation by 789 CRL4(VprBP) E3 Ligase to Sustain IL-6 Expression and Enhance HIV-1 Replication. Molecular cell. 790 2018;70(5):961-70 e5. doi: 10.1016/j.molcel.2018.05.007. 791 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 31 34. Su J, Rui Y, Lou M, Yin L, Xiong H, Zhou Z, et al. HIV-2/SIV Vpx targets a novel functional domain 792 of STING to selectively inhibit cGAS-STING-mediated NF-kappaB signalling. Nat Microbiol. 2019;4(12):2552-793 64. doi: 10.1038/s41564-019-0585-4. 794 35. Chougui G, Munir-Matloob S, Matkovic R, Martin MM, Morel M, Lahouassa H, et al. HIV-2/SIV viral 795 protein X counteracts HUSH repressor complex. Nat Microbiol. 2018;3(8):891-7. doi: 10.1038/s41564-018-796 0179-6. 797 36. Yurkovetskiy L, Guney MH, Kim K, Goh SL, McCauley S, Dauphin A, et al. Primate 798 immunodeficiency virus proteins Vpx and Vpr counteract transcriptional repression of proviruses by the HUSH 799 complex. Nat Microbiol. 2018;3(12):1354-61. doi: 10.1038/s41564-018-0256-x. 800 37. Hrecka K, Hao C, Gierszewska M, Swanson SK, Kesik-Brodacka M, Srivastava S, et al. Vpx relieves 801 inhibition of HIV-1 infection of macrophages mediated by the SAMHD1 protein. Nature. 2011;474(7353):658-802 61. Epub 2011/07/02. doi: 10.1038/nature10195. 803 38. Laguette N, Sobhian B, Casartelli N, Ringeard M, Chable-Bessia C, Segeral E, et al. SAMHD1 is the 804 dendritic- and myeloid-cell-specific HIV-1 restriction factor counteracted by Vpx. Nature. 2011;474(7353):654-805 7. Epub 2011/05/27. doi: 10.1038/nature10117. 806 39. Powell RD, Holland PJ, Hollis T, Perrino FW. Aicardi-Goutieres syndrome gene and HIV-1 restriction 807 factor SAMHD1 is a dGTP-regulated deoxynucleotide triphosphohydrolase. The Journal of biological chemistry. 808 2011;286(51):43596-600. Epub 2011/11/10. doi: 10.1074/jbc.C111.317628. 809 40. Goldstone DC, Ennis-Adeniran V, Hedden JJ, Groom HC, Rice GI, Christodoulou E, et al. HIV-1 810 restriction factor SAMHD1 is a deoxynucleoside triphosphate triphosphohydrolase. Nature. 811 2011;480(7377):379-82. Epub 2011/11/08. doi: 10.1038/nature10623. 812 41. Zhu C, Gao W, Zhao K, Qin X, Zhang Y, Peng X, et al. Structural insight into dGTP-dependent 813 activation of tetrameric SAMHD1 deoxynucleoside triphosphate triphosphohydrolase. Nature communications. 814 2013;4:2722. Epub 2013/11/13. doi: 10.1038/ncomms3722. 815 42. Kim B, Nguyen LA, Daddacha W, Hollenbaugh JA. Tight interplay among SAMHD1 protein level, 816 cellular dNTP levels, and HIV-1 proviral DNA synthesis kinetics in human primary monocyte-derived 817 macrophages. The Journal of biological chemistry. 2012;287(26):21570-4. Epub 2012/05/17. doi: 818 10.1074/jbc.C112.374843. 819 43. Lahouassa H, Daddacha W, Hofmann H, Ayinde D, Logue EC, Dragin L, et al. SAMHD1 restricts the 820 replication of human immunodeficiency virus type 1 by depleting the intracellular pool of deoxynucleoside 821 triphosphates. Nature immunology. 2012;13(3):223-8. Epub 2012/02/14. doi: 10.1038/ni.2236. 822 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 32 44. St Gelais C, de Silva S, Amie SM, Coleman CM, Hoy H, Hollenbaugh JA, et al. SAMHD1 restricts 823 HIV-1 infection in dendritic cells (DCs) by dNTP depletion, but its expression in DCs and primary CD4+ T-824 lymphocytes cannot be upregulated by interferons. Retrovirology. 2012;9:105. Epub 2012/12/13. doi: 825 10.1186/1742-4690-9-105. 826 45. Rehwinkel J, Maelfait J, Bridgeman A, Rigby R, Hayward B, Liberatore RA, et al. SAMHD1-827 dependent retroviral control and escape in mice. The EMBO journal. 2013;32(18):2454-62. Epub 2013/07/23. 828 doi: 10.1038/emboj.2013.163. 829 46. Morris ER, Taylor IA. The missing link: allostery and catalysis in the anti-viral protein SAMHD1. 830 Biochem Soc Trans. 2019;47(4):1013-27. doi: 10.1042/BST20180348. 831 47. Baldauf HM, Pan X, Erikson E, Schmidt S, Daddacha W, Burggraf M, et al. SAMHD1 restricts HIV-1 832 infection in resting CD4(+) T cells. Nature medicine. 2012;18(11):1682-7. Epub 2012/09/14. doi: 833 10.1038/nm.2964. 834 48. Shingai M, Welbourn S, Brenchley JM, Acharya P, Miyagi E, Plishka RJ, et al. The Expression of 835 Functional Vpx during Pathogenic SIVmac Infections of Rhesus Macaques Suppresses SAMHD1 in CD4+ 836 Memory T Cells. PLoS pathogens. 2015;11(5):e1004928. doi: 10.1371/journal.ppat.1004928. 837 49. Fregoso OI, Ahn J, Wang C, Mehrens J, Skowronski J, Emerman M. Evolutionary toggling of Vpx/Vpr 838 specificity results in divergent recognition of the restriction factor SAMHD1. PLoS pathogens. 839 2013;9(7):e1003496. Epub 2013/07/23. doi: 10.1371/journal.ppat.1003496. 840 50. Schwefel D, Groom HC, Boucherit VC, Christodoulou E, Walker PA, Stoye JP, et al. Structural basis of 841 lentiviral subversion of a cellular protein degradation pathway. Nature. 2014;505(7482):234-8. Epub 2013/12/18. 842 doi: 10.1038/nature12815. 843 51. Schwefel D, Boucherit VC, Christodoulou E, Walker PA, Stoye JP, Bishop KN, et al. Molecular 844 Determinants for Recognition of Divergent SAMHD1 Proteins by the Lentiviral Accessory Protein Vpx. Cell 845 host & microbe. 2015;17(4):489-99. Epub 2015/04/10. doi: 10.1016/j.chom.2015.03.004. 846 52. Wu Y, Koharudin LM, Mehrens J, DeLucia M, Byeon CH, Byeon IJ, et al. Structural Basis of Clade-847 specific Engagement of SAMHD1 (Sterile alpha Motif and Histidine/Aspartate-containing Protein 1) Restriction 848 Factors by Lentiviral Viral Protein X (Vpx) Virulence Factors. The Journal of biological chemistry. 849 2015;290(29):17935-45. doi: 10.1074/jbc.M115.665513. 850 53. Spragg CJ, Emerman M. Antagonism of SAMHD1 is actively maintained in natural infections of simian 851 immunodeficiency virus. Proceedings of the National Academy of Sciences of the United States of America. 852 2013;110(52):21136-41. Epub 2013/12/11. doi: 10.1073/pnas.1316839110. 853 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 33 54. Wu Y, Zhou X, Barnes CO, DeLucia M, Cohen AE, Gronenborn AM, et al. The DDB1-DCAF1-Vpr-854 UNG2 crystal structure reveals how HIV-1 Vpr steers human UNG2 toward destruction. Nature structural & 855 molecular biology. 2016;23(10):933-40. doi: 10.1038/nsmb.3284. 856 55. Enchev RI, Schulman BA, Peter M. Protein neddylation: beyond cullin-RING ligases. Nature reviews 857 Molecular cell biology. 2015;16(1):30-44. doi: 10.1038/nrm3919. 858 56. Schneider M, Belsom A, Rappsilber J. Protein Tertiary Structure by Crosslinking/Mass Spectrometry. 859 Trends in biochemical sciences. 2018;43(3):157-69. Epub 2018/02/06. doi: 10.1016/j.tibs.2017.12.006. 860 57. Duda DM, Borg LA, Scott DC, Hunt HW, Hammel M, Schulman BA. Structural insights into NEDD8 861 activation of cullin-RING ligases: conformational control of conjugation. Cell. 2008;134(6):995-1006. doi: 862 10.1016/j.cell.2008.07.022. 863 58. Fischer ES, Bohm K, Lydeard JR, Yang H, Stadler MB, Cavadini S, et al. Structure of the DDB1-864 CRBN E3 ubiquitin ligase in complex with thalidomide. Nature. 2014;512(7512):49-53. doi: 865 10.1038/nature13527. 866 59. DeLucia M, Mehrens J, Wu Y, Ahn J. HIV-2 and SIVmac accessory virulence factor Vpx down-867 regulates SAMHD1 enzyme catalysis prior to proteasome-dependent degradation. The Journal of biological 868 chemistry. 2013;288(26):19116-26. doi: 10.1074/jbc.M113.469007. 869 60. Berger G, Lawrence M, Hue S, Neil SJ. G2/M cell cycle arrest correlates with primate lentiviral Vpr 870 interaction with the SLX4 complex. Journal of virology. 2014. Epub 2014/10/17. doi: 10.1128/JVI.02307-14. 871 61. Guenzel CA, Herate C, Benichou S. HIV-1 Vpr-a still "enigmatic multitasker". Front Microbiol. 872 2014;5:127. doi: 10.3389/fmicb.2014.00127. 873 62. Stivahtis GL, Soares MA, Vodicka MA, Hahn BH, Emerman M. Conservation and host specificity of 874 Vpr-mediated cell cycle arrest suggest a fundamental role in primate lentivirus evolution and biology. Journal of 875 virology. 1997;71(6):4331-8. 876 63. Planelles V, Jowett JB, Li QX, Xie Y, Hahn B, Chen IS. Vpr-induced cell cycle arrest is conserved 877 among primate lentiviruses. Journal of virology. 1996;70(4):2516-24. 878 64. Schapira M, Calabrese MF, Bullock AN, Crews CM. Targeted protein degradation: expanding the 879 toolbox. Nat Rev Drug Discov. 2019;18(12):949-63. doi: 10.1038/s41573-019-0047-y. 880 65. Hanzl A, Winter GE. Targeted protein degradation: current and future challenges. Curr Opin Chem 881 Biol. 2020;56:35-41. doi: 10.1016/j.cbpa.2019.11.012. 882 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 34 66. Baek K, Krist DT, Prabu JR, Hill S, Klugel M, Neumaier LM, et al. NEDD8 nucleates a multivalent 883 cullin-RING-UBE2D ubiquitin ligation assembly. Nature. 2020;578(7795):461-6. doi: 10.1038/s41586-020-884 2000-y. 885 67. Zhao Y, Chapman DA, Jones IM. Improving baculovirus recombination. Nucleic acids research. 886 2003;31(2):E6-. Epub 2003/01/16. 887 68. Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, et al. Protein identification 888 and analysis tools in the ExPASy server. Methods Mol Biol. 1999;112:531-52. Epub 1999/02/23. doi: 10.1385/1-889 59259-584-7:531. 890 69. Kabsch W. Xds. Acta crystallographica Section D, Biological crystallography. 2010;66(Pt 2):125-32. 891 Epub 2010/02/04. doi: 10.1107/S0907444909047337. 892 70. Vagin A, Teplyakov A. Molecular replacement with MOLREP. Acta crystallographica Section D, 893 Biological crystallography. 2010;66(Pt 1):22-5. Epub 2010/01/09. doi: 10.1107/S0907444909042589. 894 71. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta crystallographica Section 895 D, Biological crystallography. 2004;60(Pt 12 Pt 1):2126-32. Epub 2004/12/02. doi: 896 10.1107/S0907444904019158. 897 72. Liebschner D, Afonine PV, Baker ML, Bunkoczi G, Chen VB, Croll TI, et al. Macromolecular structure 898 determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct 899 Biol. 2019;75(Pt 10):861-77. Epub 2019/10/08. doi: 10.1107/S2059798319011471. 900 73. Sparta KM, Krug M, Heinemann U, Mueller U, Weiss MS. XDSAPP2.0. Journal of Applied 901 Crystallography. 2016;49(3):1085-92. doi: doi:10.1107/S1600576716004416. 902 74. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. Phaser 903 crystallographic software. Journal of Applied Crystallography. 2007;40(4):658-74. doi: 904 doi:10.1107/S0021889807021206. 905 75. Kuroki R, Weaver LH, Matthews BW. Structural basis of the conversion of T4 lysozyme into a 906 transglycosidase by reengineering the active site. Proceedings of the National Academy of Sciences of the 907 United States of America. 1999;96(16):8949-54. Epub 1999/08/04. doi: 10.1073/pnas.96.16.8949. 908 76. Morellet N, Bouaziz S, Petitjean P, Roques BP. NMR structure of the HIV-1 regulatory protein VPR. 909 Journal of molecular biology. 2003;327(1):215-27. Epub 2003/03/05. 910 77. Carragher B, Kisseberth N, Kriegman D, Milligan RA, Potter CS, Pulokas J, et al. Leginon: an 911 automated system for acquisition of images from vitreous ice specimens. J Struct Biol. 2000;132(1):33-45. Epub 912 2000/12/21. doi: 10.1006/jsbi.2000.4314. 913 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 35 78. Suloway C, Pulokas J, Fellmann D, Cheng A, Guerra F, Quispe J, et al. Automated molecular 914 microscopy: the new Leginon system. J Struct Biol. 2005;151(1):41-60. Epub 2005/05/14. doi: 915 10.1016/j.jsb.2005.03.010. 916 79. Moriya T, Saur M, Stabrin M, Merino F, Voicu H, Huang Z, et al. High-resolution Single Particle 917 Analysis from Electron Cryo-microscopy Images Using SPHIRE. J Vis Exp. 2017;(123). Epub 2017/06/02. doi: 918 10.3791/55448. 919 80. Grant T, Rohou A, Grigorieff N. cisTEM, user-friendly software for single-particle image processing. 920 Elife. 2018;7. Epub 2018/03/08. doi: 10.7554/eLife.35383. 921 81. Zivanov J, Nakane T, Forsberg BO, Kimanius D, Hagen WJ, Lindahl E, et al. New tools for automated 922 high-resolution cryo-EM structure determination in RELION-3. Elife. 2018;7. Epub 2018/11/10. doi: 923 10.7554/eLife.42166. 924 82. Zheng SQ, Palovcak E, Armache JP, Verba KA, Cheng Y, Agard DA. MotionCor2: anisotropic 925 correction of beam-induced motion for improved cryo-electron microscopy. Nat Methods. 2017;14(4):331-2. 926 Epub 2017/03/03. doi: 10.1038/nmeth.4193. 927 83. Mindell JA, Grigorieff N. Accurate determination of local defocus and specimen tilt in electron 928 microscopy. J Struct Biol. 2003;142(3):334-47. Epub 2003/06/05. doi: 10.1016/s1047-8477(03)00069-8. 929 84. Punjani A, Rubinstein JL, Fleet DJ, Brubaker MA. cryoSPARC: algorithms for rapid unsupervised 930 cryo-EM structure determination. Nat Methods. 2017;14(3):290-6. Epub 2017/02/07. doi: 10.1038/nmeth.4169. 931 85. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera--a 932 visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605-12. Epub 933 2004/07/21. doi: 10.1002/jcc.20084. 934 86. Hayward S, Lee RA. Improvements in the analysis of domain motions in proteins from conformational 935 change: DynDom version 1.50. J Mol Graph Model. 2002;21(3):181-3. Epub 2002/12/05. doi: 10.1016/s1093-936 3263(02)00140-7. 937 87. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. Journal of 938 molecular biology. 2007;372(3):774-97. Epub 2007/08/08. doi: 10.1016/j.jmb.2007.05.022. 939 88. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, et al. The EMBL-EBI search and 940 sequence analysis tools APIs in 2019. Nucleic acids research. 2019;47(W1):W636-W41. Epub 2019/04/13. doi: 941 10.1093/nar/gkz268. 942 89. Nicholas KB, Nicholas Jr., H. B., Deerfield II., D. W. GeneDoc: Analysis and Visualization of Genetic 943 Variation. embnetnews. 1997;4(2):1-4. 944 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 36 90. Shevchenko A, Tomas H, Havlis J, Olsen JV, Mann M. In-gel digestion for mass spectrometric 945 characterization of proteins and proteomes. Nature protocols. 2006;1(6):2856-60. Epub 2007/04/05. doi: 946 10.1038/nprot.2006.468. 947 91. Rappsilber J, Ishihama Y, Mann M. Stop and go extraction tips for matrix-assisted laser 948 desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal Chem. 949 2003;75(3):663-70. Epub 2003/02/15. doi: 10.1021/ac026117i. 950 92. Kolbowski L, Mendes ML, Rappsilber J. Optimizing the Parameters Governing the Fragmentation of 951 Cross-Linked Peptides in a Tribrid Mass Spectrometer. Anal Chem. 2017;89(10):5311-8. Epub 2017/04/14. doi: 952 10.1021/acs.analchem.6b04935. 953 93. Mendes ML, Fischer L, Chen ZA, Barbon M, O'Reilly FJ, Giese SH, et al. An integrated workflow for 954 crosslinking mass spectrometry. Mol Syst Biol. 2019;15(9):e8994. Epub 2019/09/27. doi: 955 10.15252/msb.20198994. 956 94. Fischer L, Rappsilber J. Quirks of Error Estimation in Cross-Linking/Mass Spectrometry. Anal Chem. 957 2017;89(7):3829-33. Epub 2017/03/08. doi: 10.1021/acs.analchem.6b03745. 958 95. Yang J, Zhang Y. Protein Structure and Function Prediction Using I-TASSER. Curr Protoc 959 Bioinformatics. 2015;52:5 8 1-5 8 15. Epub 2015/12/19. doi: 10.1002/0471250953.bi0508s52. 960 96. Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. Curr Protoc 961 Bioinformatics. 2016;54:5 6 1-5 6 37. Epub 2016/06/21. doi: 10.1002/cpbi.3. 962 97. van Zundert GC, Trellet M, Schaarschmidt J, Kurkcuoglu Z, David M, Verlato M, et al. The DisVis and 963 PowerFit Web Servers: Explorative and Integrative Modeling of Biomolecular Complexes. Journal of molecular 964 biology. 2017;429(3):399-407. Epub 2016/12/13. doi: 10.1016/j.jmb.2016.11.032. 965 98. Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, et al. The PRIDE 966 database and related tools and resources in 2019: improving support for quantification data. Nucleic acids 967 research. 2019;47(D1):D442-D50. Epub 2018/11/06. doi: 10.1093/nar/gky1106. 968 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 37 Figures 969 970 Fig 1. Biochemical analysis of Vprmus-induced CRL4DCAF1 specificity redirection. 971 (A) GF analysis of in vitro reconstitution of protein complexes containing DDB1/DCAF1-CtD, Vprmus 972 and SAMHD1 constructs. A schematic of the SAMHD1 constructs is shown above the chromatograms. 973 SAM – sterile α-motif domain, HD – histidine-aspartate domain, T4L – T4 Lysozyme. (B) SDS-PAGE 974 analysis of fractions collected during GF runs in A, boxes are colour-coded with respect to the 975 chromatograms. Note that during preparation of the GF run containing SAMHD1-ΔCtD (green trace), 976 the GST-affinity tag, which forms dimers in solution, was not removed completely from DDB1. 977 Accordingly, the GF trace contains an additional dimeric GST-DDB1/DCAF1-CtD/Vprmus component 978 in fractions 4-5. (C-F) In vitro ubiquitylation reactions with purified protein components in the absence 979 (C) or presence (D-F) of Vprmus, with the indicated SAMHD1 constructs as substrate. Reactions were 980 stopped after the indicated times, separated on SDS-PAGE and visualised by staining. 981 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 38 982 Fig 2. Crystal structure of the DDB1/DCAF1-CtD/Vprmus complex. 983 (A) Overall structure of the complex in two views. DCAF1-CtD is shown as grey cartoon and semi-984 transparent surface. Vprmus is shown as a dark green cartoon with the co-ordinated zinc ion shown as 985 grey sphere. T4L and DDB1 have been omitted for clarity. (B) Superposition of apo-DCAF1-CtD (light 986 blue cartoon) with Vprmus-bound DCAF1-CtD (grey/green cartoon). Only DCAF1-CtD regions with 987 significant structural differences between apo- and Vprmus-bound forms are shown. Disordered loops are 988 indicated as dashed lines. (C) Comparison of the binary Vprmus/DCAF1-CtD and ternary Vpxsm/DCAF1-989 CtD/SAMHD1-CtD complexes. For DCAF1-CtD, only the N-terminal “acidic loop” region is shown. 990 Vprmus, DCAF1-CtD and bound zinc are coloured as in A; Vpxsm is represented as orange cartoon and 991 SAMHD1-CtD as pink cartoon. Selected Vpr/Vpx/DCAF1-CtD side chains are shown as sticks, and 992 electrostatic interactions between these side chains are indicated as dotted lines. (D) In vitro 993 reconstitution of protein complexes containing DDB1/DCAF1-CtD/Vprmus or the Vprmus R15E/R75E 994 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 39 mutant, and SAMHD1, analysed by analytical GF. SDS-PAGE analysis of corresponding GF fractions 995 is shown next to the chromatogram. 996 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 40 997 Fig 3. Mechanism of SAMHD1-CtD recruitment by Vprmus. 998 (A) Two views of the cryo-EM reconstruction of the CRL4-NEDD8DCAF1-CtD/Vprmus/SAMHD1 core. 999 The crystal structure of the DDB1/DCAF1-CtD/Vprmus complex was fitted as a rigid body into the cryo-1000 EM density and is shown in the same colours as in Fig 2A. The DDB1 BPB model and density was 1001 removed for clarity. The red arrows mark additional density on the upper surface of the Vprmus helix 1002 bundle. (B) Schematic representation of Sulfo-SDA cross-links (grey lines) between CRL4DCAF1/Vprmus 1003 and SAMHD1, identified by CLMS. Proteins are colour-coded as in A, CUL4 is coloured orange, 1004 SAMHD1 black/white. SAMHD1-CtD is highlighted in red, and cross-links to SAMHD1-CtD are 1005 highlighted in violet. (C) The accessible interaction space of SAMHD1-CtD, calculated by the DisVis 1006 server [97], consistent with at least 14 of 26 observed cross-links, is visualised as grey mesh. DCAF1-1007 CtD and Vprmus are oriented and coloured as in A. (D) Detailed view of the SAMHD1-CtD electron 1008 density. The model is in the same orientation as in A, left panel. Selected Vprmus residues W29 and A66, 1009 which are in close contact to the additional density, are shown as red space-fill representation. (E) In 1010 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 41 vitro reconstitution of protein complexes containing DDB1/DCAF1-CtD, Vprmus or the Vprmus 1011 W29A/A66W mutant, and SAMHD1, assessed by analytical GF. SDS-PAGE analysis of corresponding 1012 GF fractions is shown below the chromatogram. 1013 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 42 1014 Fig 4. Variability of neo-substrate recognition in Vpx/Vpr proteins. 1015 Comparison of neo-substrate recognition modes of Vprmus (A), Vpxsm (B), Vpxmnd2 (C) and VprHIV-1 (D) 1016 proteins. DCAF1-CtD is shown as grey cartoon and semi-transparent surface, Vprmus – green, Vpxsm – 1017 orange, Vpxmnd2 – blue and VprHIV-1– light brown are shown as cartoon. Models of the recruited 1018 ubiquitylation substrates are shown as strongly filtered, semi-transparent calculated electron density 1019 maps with the following colouring scheme: SAMHD1-CtD bound to Vprmus – yellow, SAMHD1-CtD 1020 (bound to Vpxsm, PDB 4cc9) [50] – mint green, SAMHD1-NtD (Vpxmnd2, PDB 5aja) [51] – magenta, 1021 UNG2 (VprHIV-1, PDB 5jk7) [54] – light violet. 1022 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 43 1023 Fig 5. Cryo-EM analysis of CRL4-NEDD8DCAF1-CtD conformational states. 1024 (A) Two views of an overlay of CRL4-NEDD8DCAF1-CtD/Vprmus/SAMHD1 cryo-EM reconstructions 1025 (conformational state-1 – light green, state-2 – salmon, state-3 – purple). The portions of the densities 1026 corresponding to DDB1 BPA/BPC, DCAF1-CtD and Vprmus have been superimposed. (B) Two views 1027 of a superposition of DDB1/DCAF1-CtD/Vprmus and CUL4/ROC1 (PDB 2hye) [15] molecular models, 1028 which have been fitted as rigid bodies to the corresponding cryo-EM densities; the models are oriented 1029 as in A. DDB1/DCAF1-CtD/Vprmus is shown as in Fig 2A, CUL4 is shown as cartoon, coloured as in A 1030 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 44 and ROC1 is shown as cyan cartoon. Cryo-EM density corresponding to SAMHD1-CtD is shown in 1031 yellow, to illustrate the SAMHD1-CtD binding site in the context of the whole CRL4 assembly. 1032 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 45 1033 Fig 6. Schematic illustration of structural plasticity in Vprmus-modified CRL4DCAF1-CtD, and 1034 implications for ubiquitin transfer. 1035 (A) Rotation of the CRL4 stalk increases the space accessible to catalytic elements at the distal tip of 1036 the stalk, forming a ubiquitylation zone around the core. (B) Modification of CUL4-WHB with NEDD8 1037 leads to increased mobility of these distal stalk elements (CUL4-WHB, ROC1 RING domain) [57], 1038 further extending the ubiquitylation zone and activating the formation of a catalytic assembly for 1039 ubiquitin transfer (see also D) [66]. (C) Flexible tethering of SAMHD1 to the core by Vprmus places the 1040 bulk of SAMHD1 in the ubiquitylation zone and optimises surface accessibility. (D) Dynamic processes 1041 A-C together create numerous possibilities for assembly of the catalytic machinery (NEDD8-CUL4-1042 WHB, ROC1, ubiquitin-(ubi-)charged E2) on surface-exposed SAMHD1 lysine side chains. Here, three 1043 of these possibilities are exemplified schematically. In this way, ubiquitin coverage on SAMHD1 is 1044 maximised. 1045 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 1, 2021. ; https://doi.org/10.1101/2020.12.31.424931doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424931 10_1101-2020_12_31_424971 ---- Insights into Genome Recoding from the Mechanism of a Classic +1-Frameshifting tRNA 1 Insights into Genome Recoding 1 from the Mechanism of a Classic +1-Frameshifting tRNA 2 3 4 Howard Gamper1,5, Haixing Li2,5, Isao Masuda1, D. Miklos Robkis3, Thomas Christian1, 5 Adam B. Conn4, Gregor Blaha4, E. James Petersson3, Ruben L. Gonzalez, Jr2,#, 6 and Ya-Ming Hou1,#,* 7 8 9 1Department of Biochemistry and Molecular Biology, Thomas Jefferson University, 10 Philadelphia, PA 19107, USA 11 2Department of Chemistry, Columbia University, New York, NY 10027, USA 12 3Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104, USA 13 4Department of Biochemistry, University of California, Riverside, CA 92521, USA 14 5These authors contributed equally to this work. 15 #Corresponding authors: 16 rlg2118@columbia.edu (T) 212-854-1096; (F) 212-932-1289; ORCID: 0000-0002-1344-5581 17 ya-ming.hou@jefferson.edu (T) 215-503-4480; (F) 215-503-4954; 18 ORCID: 0000-0001-6546-2597 19 20 *Lead contact: Ya-Ming Hou (ya-ming.hou@jefferson.edu) 21 22 Running Title: Mechanism of SufB2-induced +1 frameshifting 23 24 25 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 ABSTRACT 26 While genome recoding using quadruplet codons to incorporate non-proteinogenic amino 27 acids is attractive for biotechnology and bioengineering purposes, the mechanism through which 28 such codons are translated is poorly understood. Here we investigate translation of quadruplet 29 codons by a +1-frameshifting tRNA, SufB2, that contains an extra nucleotide in its anticodon loop. 30 Natural post-transcriptional modification of SufB2 in cells prevents it from frameshifting using a 31 quadruplet-pairing mechanism such that it preferentially employs a triplet-slippage mechanism. 32 We show that SufB2 uses triplet anticodon-codon pairing in the 0-frame to initially decode the 33 quadruplet codon, but subsequently shifts to the +1-frame during tRNA-mRNA translocation. 34 SufB2 frameshifting involves perturbation of an essential ribosome conformational change that 35 facilitates tRNA-mRNA movements at a late stage of the translocation reaction. Our results 36 provide a molecular mechanism for SufB2-induced +1 frameshifting and suggest that engineering 37 of a specific ribosome conformational change can improve the efficiency of genome recoding. 38 39 Key words: SufB2 frameshift suppressor tRNA, +1 ribosomal frameshifting, quadruplet codon, 40 genome expansion, m1G37 methylation 41 42 43 44 45 46 47 48 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 INTRODUCTION 49 The ability to recode the genome and expand the chemical repertoire of proteins to include 50 non-proteinogenic amino acids promises novel tools for probing protein structure and function. 51 While most recoding employs stop codons as sites for incorporating non-proteinogenic amino 52 acids, only two stop codons can be simultaneously recoded due to the cellular need to reserve 53 the third stop codon for termination of protein synthesis. The use of quadruplet codons as 54 additional sites for incorporating non-proteinogenic amino acids has thus emerged as an attractive 55 alternative1,2. Recoding at a quadruplet codon requires a +1-frameshifting tRNA that is 56 aminoacylated with the non-proteinogenic amino acid of interest. The primary challenge faced by 57 this technology has been the low efficiency with which the full-length protein carrying the non-58 proteinogenic amino acid can be synthesized. One reason for this is the poor recoding efficiency 59 of the +1-frameshifting aminoacyl (aa)-tRNA, and the second is the failure of the +1-frameshifting 60 aa-tRNA to compete with canonical aa-tRNAs that read the first three nucleotides of the 61 quadruplet codon at the ribosomal aa-tRNA binding (A) site during the aa-tRNA selection step of 62 the translation elongation cycle. While directed evolution by synthetic biologists has yielded +1-63 frameshifting tRNAs, efficient recoding requires cell lines that have been engineered to deplete 64 potential competitor tRNAs3-8. These problems emphasize the need to better understand the 65 mechanism through which quadruplet codons are translated by +1-frameshifting tRNAs. 66 In bacteria, +1-frameshifting tRNAs that suppress single-nucleotide insertion mutations that 67 shift the translational reading frame to the +1-frame have been isolated from genetic studies9,10. 68 These +1-frameshifting tRNAs typically contain an extra nucleotide in the anticodon loop – a 69 property that has led to the proposal of two competing models for their mechanism of action. In 70 the quadruplet-pairing model, the inserted nucleotide joins the triplet anticodon in pairing with the 71 quadruplet codon in the A site and this quadruplet anticodon-codon pair is translocated to the 72 ribosomal peptidyl-tRNA binding (P) site11. In the triplet-slippage model, the expanded anticodon 73 loop forms an in-frame (0-frame) triplet anticodon-codon pair in the A site and subsequently shifts 74 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 to the +1-frame at some point later in the elongation cycle12,13, possibly during translocation of the 75 +1-frameshifting tRNA from the A to P sites14 or within the P site15. The triplet-slippage model is 76 supported by structural studies of ribosomal complexes in which the expanded anticodon-stem-77 loops (ASLs) of +1-frameshifting tRNAs have been found to use triplet anticodon-codon pairing 78 in the 0-frame at the A site16-18 and in the +1-frame at the P site19. Nonetheless, these structures 79 do not eliminate the possibility that two competing triplet pairing schemes (0-frame and +1-frame) 80 can co-exist when a quadruplet codon motif occupies the A site15, that some amount of +1 81 frameshifting can occur via the quadruplet-pairing model, and that the quadruplet-pairing model 82 may even dominate for particular +1-frameshifting tRNAs, codon sequences, and/or reaction 83 conditions10. We also do not know how each model determines the efficiency of +1 frameshifting 84 or whether any competition between the two models is driven by the kinetics of frameshifting or 85 the thermodynamics of base pairing. In addition, virtually all natural tRNAs contain a purine at 86 nucleotide position 37 on the 3'-side of the anticodon (http://trna.bioinf.uni-leipzig.de/), which is 87 invariably post-transcriptionally modified and is important for maintaining the translational reading 88 frame in the P site15. While most +1-frameshifting tRNAs sequenced to date also contain a purine 89 nucleotide at position 378, we do not know whether it is post-transcriptionally modified or how the 90 modification affects +1 frameshifting. Perhaps most importantly, while the structural studies 91 described above provide snapshots of the initial and final states of +1 frameshifting, they do not 92 reveal where, when, or how the shift occurs, thereby precluding an understanding of the structural 93 basis and mechanism of +1 frameshifting. These open questions have limited our ability to 94 increase the efficiency of genome recoding at quadruplet codons. 95 To address these questions, we have investigated the mechanism of +1 frameshifting by 96 SufB2 (Figure 1a), a +1-frameshifting tRNA that was isolated from Salmonella typhimurium as a 97 suppressor of a single C insertion into a proline (Pro) CCC codon20. The observed high +1-98 frameshifting efficiency of SufB2 at the CCC-C motif, nearly 80-fold above background20, 99 demonstrates its ability to successfully compete with the naturally occurring ProL and ProM 100 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 isoacceptor tRNAs that read the CCC codon. Using the ensemble ‘codon-walk’ methodology21 101 and single-molecule fluorescence resonance energy transfer (smFRET), we have compared the 102 +1 frameshifting activity of SufB2 relative to its closest counterpart, ProL, at a CCC-C motif, and 103 determined the position and timing of the shift. Our results show that SufB2 is naturally N1-104 methylated at G37 in cells, generating an m1G37 that blocks quadruplet pairing and forces SufB2 105 to use 0-frame triplet anticodon-codon pairing to decode the quadruplet codon at the A site. 106 Additionally, we find that SufB2, and likely all +1-frameshifting tRNAs, shifts to the +1-frame during 107 the subsequent translocation reaction in which the translational GTPase elongation factor (EF)-G 108 catalyzes the movement of SufB2 from the A to P sites (i.e., a triplet-slippage mechanism). More 109 specifically, we show that this frameshift occurs in the later steps of translocation, during which 110 EF-G catalyzes a series of conformational rearrangements of the ribosomal pre-translocation 111 (PRE) complex that enable the tRNA ASLs and their associated codons to move to their 112 respective post-translocation positions within the ribosomal small (30S in bacteria) subunit22-28. 113 Thus, efforts to increase the recoding efficiency of +1-frameshifting tRNAs should focus on 114 enforcing a triplet anticodon-codon pairing in the 0-frame at the A site and directed evolution to 115 optimize conformational rearrangements of the ribosomal PRE complex during the late stages of 116 translocation. 117 118 RESULTS 119 Native-state SufB2 is N1-methylated at G37 and is readily aminoacylated with Pro 120 SufB2 contains an extra G37a nucleotide inserted between G37 and U38 of ProL20 (Figure 121 1a). Whether the extra G37a is methylated and how it affects methylation of G37 is unknown. We 122 thus determined the methylation status of the G37-G37a motif using RNase T1 cleavage inhibition 123 assays and primer extension inhibition assays. We first generated a plasmid-encoded SufB2 by 124 inserting G37a into an existing Tac-inducible plasmid encoding Escherichia coli ProL29, which has 125 an identical sequence to S. typhimurium ProL. We then expressed and purified the plasmid-126 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 encoded SufB2 and ProL from an E. coli ProL knock-out (ProL-KO) strain30 containing all the 127 endogenous enzymes necessary for processing SufB2 and ProL to their S. typhimurium native 128 states such that they possess the full complement of naturally occurring post-transcriptional 129 modifications (termed the native-state tRNAs). In addition, we prepared in vitro transcripts of 130 SufB2 and ProL lacking all post-transcriptional modifications (termed the G37-state tRNAs), or 131 enzymatically methylated with purified E. coli TrmD30,31 such that they possess only the N1-132 methylation at G37 and no other post-transcriptional modifications (termed the m1G37-state 133 tRNAs). In the case of SufB2, RNase T1 cleavage inhibition assays demonstrated cleavage at 134 G37 and G37a of the G37-state tRNA, but inhibition of cleavage at either position upon treatment 135 with TrmD (Figure 1b), indicating that both nucleotides are N1-methylated in the m1G37-state 136 tRNA. 137 Primer extension inhibition assays, which were previously validated by mass spectrometry 138 analysis30, showed inhibition of extension at G37 and G37a in m1G37- and native-state SufB2 139 (Figure 1c), confirming that both nucleotides are N1-methylated in these species. Notably, N1 140 methylation shifted almost entirely to G37 in native-state SufB2, indicating that m1G37 is the 141 dominant methylation product in cells. As a control, no inhibition of extension at G37 or G37a was 142 observed for G37-state SufB2. Complementary kinetics experiments showed that the yield and 143 rate of N1-methylation of G37-state SufB2 were similar to those of G37-state ProL (Figure 1d). 144 Likewise, kinetics experiments revealed that the yield and rate of aminoacylation of native-state 145 SufB2 with Pro were similar to those of native-state ProL (Figure 1e). In contrast, aminoacylation 146 of G37-state SufB2 was inhibited (Figure 1f). These results demonstrate that the native-state 147 SufB2 synthesized in cells is quantitatively N1-methylated to generate m1G37 and is readily 148 aminoacylated with Pro. 149 150 SufB2 promotes +1 frameshifting using triplet-slippage and possibly other mechanisms 151 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 We next determined the mechanism(s) through which SufB2 promotes +1 frameshifting in a 152 cellular context. We created a pair of isogenic E. coli strains expressing SufB2 or ProL from the 153 chromosome in a trmD-knockdown (trmD-KD) background30. This background strain was 154 designed to evaluate the effect of m1G37 on +1 frameshifting and it was generated by deleting 155 chromosomal trmD and controlling cellular levels of m1G37 using arabinose-induced expression 156 of the human counterpart trm5, which is competent to stoichiometrically N1-methylate intracellular 157 tRNA substrates30. The isogenic pair of the SufB2 and ProL strains were measured for +1 158 frameshifting in a cell-based lacZ reporter assay in which a CCC-C motif was inserted into the 2nd 159 codon position of lacZ such that a +1-frameshifting event at the motif was necessary to synthesize 160 full-length b-galactosidase (b-Gal)29. The efficiency of +1 frameshifting was calculated as the ratio 161 of b-Gal expressed in cells containing the CCC-C insertion relative to cells containing an in-frame 162 CCC insertion. 163 In the m1G37-abundant (m1G37+) condition, SufB2 displayed a high +1-frameshifting 164 efficiency (8.2%, Figure 2a) relative to ProL (1.4%). In the m1G37-deficient (m1G37–) condition, 165 SufB2 exhibited an even higher efficiency (20.8%) and, consistent with our previous work29, ProL 166 also displayed an increased efficiency (7.0%) relative to background (1.4%). Because N1-167 methylation in the m1G37+ condition was stoichiometric (Figure 1c), thereby preventing 168 quadruplet-pairing, we attribute the 8.2% efficiency of SufB2 in this condition as arising exclusively 169 from triplet-slippage. In the m1G37– condition, we observed an increase in +1-frameshifting 170 efficiency of SufB2 to 20.8%. While multiple mechanisms may exist for the increased +1 171 frameshifting, the exploration of both triplet-slippage and quadruplet-pairing is one possibility. 172 To confirm our results, we performed similar studies with the isogenic SufB2 and ProL strains 173 on the endogenous E. coli lolB gene, encoding the outer membrane lipoprotein. The lolB gene 174 naturally contains a CCC-C motif at the 2nd codon position such that +1 frameshifting at this motif 175 would decrease protein synthesis due to premature termination. As a reference, we used E. coli 176 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 cysS, encoding cysteinyl-tRNA synthetase (CysRS)30, which has no CCC-C motif in the first 16 177 codons and would be less sensitive to +1 frameshifting at CCC-C motifs during protein synthesis. 178 The ratio of protein synthesis of lolB to cysS for the control sample ProL in the m1G37 condition, 179 measured from Western blots (Methods), was normalized to 1.00, denoting that lolB and cysS 180 were maximally translated in the 0-frame without +1 frameshifting (i.e., a relative +1 frameshifting 181 efficiency of 0.00) (Figures 2b, 2c). In the m1G37+ condition, SufB2 displayed a ratio of LolB to 182 CysRS of 0.62, indicating an increase in the relative +1 frameshifting efficiency to 0.38, and in the 183 m1G37– condition, it displayed a ratio of 0.17, indicating an increase in the relative +1 184 frameshifting efficiency to 0.83 (Figures 2b, 2c). Similarly, ProL in the m1G37– condition displayed 185 a ratio of LolB to CysRS of 0.47, indicating an increase in the +1-frameshifting efficiency to 0.53. 186 187 SufB2 can insert non-proteinogenic amino acids at CCC-C motifs 188 We next asked whether SufB2 can deliver non-proteinogenic amino acids to the ribosome by 189 inducing +1 frameshifting at a CCC-C motif (Figure 2d). We inserted a CCC-C motif at the 5th 190 codon position of the E. coli folA gene, encoding dihydrofolate reductase (DHFR). A SufB2-191 induced +1 frameshifting event at the insertion would result in full-length DHFR, whereas the 192 absence of +1 frameshifting would result in a C-terminal truncated DHFR fragment (DC). SufB2 193 was aminoacylated with non-proteinogenic amino acids using a Flexizyme32 and subsequently 194 tested in [35S]-Met-dependent in vitro translation reactions using the E. coli PURExpress system. 195 The resulting protein products were separated by sodium dodecyl sulfate (SDS)-polyacrylamide 196 gel electrophoresis and quantified by phosphorimaging. Control experiments with no SufB2 or 197 with a non-acylated SufB2 showed no full-length DHFR, demonstrating that synthesis of full-198 length DHFR depended upon SufB2 delivery of an amino acid as a result of +1 frameshifting at 199 the CCC-C motif. We showed that SufB2 was able to deliver Pro, Arg, Val, and the Pro analogs 200 cis-hydroxypro, trans-hydroxypro, azetidine, and thiapro (Supplementary Figure 1) to the 201 ribosome in response to the CCC-C motif, and that the efficiency of delivery by G37-state SufB2 202 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 was generally higher than that by native-state SufB2. Notably, the PURExpress system contains 203 all canonical tRNAs, including ProL and ProM, indicating the ability of SufB2 to successfully 204 compete with these tRNAs. 205 206 SufB2 uses triplet pairing in the 0-frame at the A site 207 To determine at which step in the elongation cycle SufB2 undergoes +1 frameshifting in 208 response to a CCC-C motif, we used an E. coli in vitro translation system composed of purified 209 components and supplemented with requisite tRNAs and translation factors to perform a series 210 of ensemble rapid kinetic studies. We began with a GTPase assay that reports on the yield and 211 rate with which the translational GTPase EF-Tu hydrolyzes GTP upon delivery of a ternary 212 complex (TC), composed of EF-Tu, [g-32P]-GTP, and prolyl-SufB2 (SufB2-TC) or ProL (ProL-TC), 213 to the A site of a ribosomal 70S initiation complex (70S IC) carrying an initiator fMet-tRNAfMet in 214 the P site and a programmed CCC-C motif at the A site. The results of these experiments showed 215 that the yield and rate of GTP hydrolysis (kGTP,obs) upon delivery of SufB2-TC were quantitatively 216 similar to those of ProL-TC for both the native- and G37-state tRNAs (Figure 3a). 217 We next performed a dipeptide formation assay that reports on the synthesis of a peptide 218 bond between the [35S]-fMet moiety of a P-site [35S]-fMet-tRNAfMet in a 70S IC and the Pro moiety 219 of a SufB2- or ProL-TC delivered to the A site. This assay revealed that the rate of [35S]-fMet-Pro 220 (fMP) formation (kfMP,obs) for SufB2-TC was within 2-fold of that for ProL-TC for both the native- 221 and G37-state tRNAs (Figure 3b, Table S2). 222 To test whether native-state SufB2-TC can effectively compete with ProL-TC for delivery to 223 the A site and peptide-bond formation, we varied the dipeptide formation assay such that an 224 equimolar mixture of each TC was used in the reaction (Figure 3c). Since aminoacylation of both 225 tRNAs with Pro would create dipeptides of the same identity (i.e., fMP), we used a Flexizyme to 226 aminoacylate them with different amino acids and generate distinct dipeptides. Control 227 experiments showed that ProL charged with Pro or Arg (Figure 3c, Bars 1 and 2) and SufB2 228 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 charged with Pro or Arg (Bars 3 and 4) generated the same amount of fMP and fMR, indicating 229 that the amino-acid identity did not affect the level of dipeptide formation. We found that the 230 amount of dipeptide formed by SufB2-TC and ProL-TC in these competition assays was similar, 231 although the amount formed by SufB2-TC was slightly less (45% vs. 55%), in both the native- 232 (Bars 5-8) and G37-state tRNAs (Supplementary Figure 2a). These competition experiments 233 provide direct evidence that SufB2-TC effectively competes with ProL-TC for delivery to the A site 234 and peptide-bond formation. 235 Collectively, the results of our GTPase-, dipeptide formation-, and competition assays indicate 236 that SufB2-TC is delivered to the A site and participates in peptide-bond formation in the same 237 way as ProL-TC, suggesting that SufB2 uses triplet pairing in the 0-frame at the A site that 238 successfully competes with triplet pairing by ProL. To support this interpretation, we measured 239 kfMP,obs in our dipeptide formation assay, using G37-state SufB2-TC and a series of mRNA variants 240 in which single nucleotides in the CCC-C motif were substituted. We showed that kfMP,obs did not 241 decrease upon substitution of the 4th nucleotide of the CCC-C motif, but that it decreased 242 substantially upon substitution of any of the first three nucleotides of the motif (Figure 3d, 243 Supplementary Figure 2b). Thus, triplet pairing of SufB2 to the first three Cs of the CCC-C motif 244 is necessary and sufficient for rapid delivery of the tRNA to the A site and its participation in 245 peptide-bond formation. 246 247 The A-site activity of SufB2 depends on the sequence of the anticodon loop 248 We next asked how delivery of SufB2-TC to the A site and peptide-bond formation depend on 249 the sequence of the SufB2 anticodon loop. Starting from G37-state SufB2, we created two 250 variants containing a G-to-C substitution in nucleotide 37 (G37C) or 34 (G34C) within the 251 anticodon loop and adapted our dipeptide formation assay to measure the fMP yield and kfMP,obs 252 generated by each variant at the CCC-C motif at the A site. We showed that the G37C variant 253 resulted in a fMP yield of 32% and a kfMP,obs of 0.14 ± 0.01 s–1, most likely by triplet pairing of 254 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 nucleotides 34-36 of the anticodon loop with the 0-frame of the CCC-C motif (Figure 4a). In 255 contrast, the G34C variant resulted in a fMP yield of 30% and a kfMP,obs of 0.28 ± 0.04 s–1, most 256 likely by triplet pairing of nucleotides 35-37 of the anticodon loop with the 0-frame of the CCC-C 257 motif (Figure 4b). Our interpretation that nucleotides 35-37 of the anticodon loop of the G34C 258 variant most likely triplet pair with the 0-frame of the CCC-C motif is consistent with the 259 observations that the fMP yield and kfMP,obs of the G34C variant are similar and 2-fold higher, 260 respectively, than those of the G37C variant. If nucleotides 34-36 of the anticodon loop of the 261 G34C variant were to form a triplet pair with the CCC-C motif, we would have expected it to pair 262 in the +2-frame, which would have most likely reduced the fMP yield and kfMP,obs of the G34C 263 variant relative to the G37C variant. These results suggest that G37-state SufB2 exhibits some 264 plasticity as to whether it can undergo triplet pairing with anticodon loop nucleotides 34-36 or 35-265 37, consistent with a previous study33. 266 267 SufB2 shifts to the +1-frame during translocation 268 Although SufB2 uses triplet pairing in the 0-frame when it is delivered to the A site, it is a 269 highly efficient +1-frameshifting tRNA (Figure 2). We therefore asked whether +1 frameshifting 270 occurs during or after translocation of SufB2 into the P site. We addressed this question by 271 adapting our previously developed tripeptide formation assays29. We rapidly delivered EF-G and 272 an equimolar mixture of G37-state SufB2-, tRNAVal-, and tRNAArg-TCs to 70S ICs assembled on 273 an mRNA in which the 2nd codon was a CCC-C motif and the 3rd codon was either a GUU codon 274 encoding Val in the +1 frame or a CGU codon encoding Arg in the 0-frame. As soon as 275 translocation of the PRE complex and the associated movement of SufB2 from the P to A sites 276 formed a ribosomal post-translocation (POST) complex with an empty A site in these experiments, 277 tRNAVal- and tRNAArg-TC would compete for the codon at the A site to promote formation of an 278 fMPV tripeptide or an fMPR tripeptide. Thus, the fMPV yield and kfMPV,obs report on the sub-279 population of SufB2 that shifted to the +1-frame, whereas the fMPR yield and kfMPR,obs report on 280 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 the sub-population that remained in the 0-frame29,34. The results showed that the yield of fMPV 281 was much higher than that of fMPR (90% vs. 10%, Figure 5a), demonstrating the high efficiency 282 with which G37-state SufB2 induces +1 frameshifting. Notably, relative to the +1 frameshifting of 283 ProL we have previously reported29, kfMPV,obs of SufB2 (0.09 s–1) was comparable to the rate of +1 284 frameshifting of ProL during translocation (0.1 s–1) rather than that of +1 frameshifting after 285 translocation into the P site (~10–3 s–1)29, indicating that SufB2 underwent +1 frameshifting during 286 translocation. Our observation that the fMPV yield plateaus at 90% at long reaction times 287 suggests that the sub-populations of SufB2 that will shift to the +1-frame and remain in the 0-288 frame are likely established in the A site, even before EF-G binds to the PRE complex. Given that 289 SufB2 exhibits triplet pairing in the 0-frame at the A site (Figures 3a-c, Supplementary Table 2, 290 and Supplementary Figure 2a) and shifts into the +1-frame during translocation (Figure 5a), the 291 two sub-populations of SufB2 in the A site seem to differ primarily in their propensity to undergo 292 +1 frameshifting during translocation. The sub-population that encompasses 90% of the total 293 would exhibit a high propensity of undergoing +1 frameshifting during translocation, whereas the 294 sub-population that encompasses 10% of the total would exhibit a low propensity of undergoing 295 +1 frameshifting during translocation, preferring instead to remain in the 0-frame. 296 We next determined whether the 10% sub-population of G37-state SufB2 that remained in the 297 0-frame during translocation could undergo +1 frameshifting after arrival at the P site. We varied 298 our tripeptide formation assay so as to deliver the TCs in two steps separated by a defined time 299 interval (Figure 5b). In the first step, G37-state SufB2-TC and EF-G were delivered to the 70S IC 300 to form a POST complex, which was then allowed the opportunity to shift to the +1-frame over a 301 systematically increasing time interval. In the second step, an equimolar mixture of tRNAArg- and 302 tRNAVal-TCs was delivered to the POST complex. The results showed that fMPV was rapidly 303 formed at a high yield and exhibited a kfMP+V,obs (where the “+” denotes the time interval between 304 the delivery of translation components) that did not increase as a function of time. In contrast, 305 fMPR was formed at a low yield and exhibited a kfMP+R,obs that did not decrease as a function of 306 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 time. Together, these results indicate that the sub-population of P site-bound SufB2 in the 0-frame 307 does not undergo +1 frameshifting. This interpretation is supported by the observation that EF-P, 308 an elongation factor which we showed suppresses +1 frameshifting within the P site29, had no 309 effect on the yield of fMPV yield (Supplementary Figure 2c and Supplementary Table 3). 310 Having shown that +1 frameshifting of SufB2 occurs only during translocation, we evaluated 311 the effect of m1G37 on the frequency of this event. We began by delivering G37-, m1G37-, or 312 native-state SufB2-TCs together with EF-G to 70S ICs to form the corresponding POST 313 complexes and then delivered an equimolar mixture of tRNAArg- and tRNAVal-TCs to each POST 314 complex to determine the relative formation of fMPV and fMPR. The results showed that m1G37- 315 and native-state SufB2 displayed a reduced fMPV yield and a concomitantly increased fMPR yield 316 relative to G37-state SufB2 (Figures 5c, Supplementary Figures 2d-f), consistent with the notion 317 that the presence of m1G37 compromises +1 frameshifting. 318 We then used the same tripeptide formation assay to determine how +1 frameshifting during 319 translocation of G37-state SufB2 depends on the identity of the 4th nucleotide of the CCC-C motif. 320 A series of POST complexes were generated by delivering G37-state SufB2-TCs and EF-G to 321 70S ICs programmed with a CCC-N motif at the 2nd codon position. Each POST complex was 322 then rapidly mixed with tRNAVal-TC to monitor the yield of fMPV and kfMP+V,obs (Figure 5d). The 323 results showed a high fMPV yield and high kfMP+V,obs at the CCC-[C/U] motifs, but a low yield and 324 low kfMP+V,obs at the CCC-[A/G] motifs. This indicates that high-efficiency of SufB2-induced +1 325 frameshifting during translocation requires the presence of a [C/U] at the 4th nucleotide of the 326 CCC-C motif. Because SufB2 in these experiments was in the G37-state, it is possible that a sub-327 population underwent +1 frameshifting via quadruplet-pairing with the [C/U] at the 4th nucleotide 328 of the CCC-[C/U] motif during translocation. It is also possible that a sub-population underwent 329 +1 frameshifting via triplet-slippage, which could potentially be inhibited by the presence of [G/A] 330 at the 4th nucleotide of the motif. To verify that the POST complex formed with the CCC-A 331 sequence was largely in the 0-frame, we rapidly mixed the complex with an equimolar mixture of 332 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 tRNASer-TC, cognate to the next A-site codon in the 0-frame (AGU), and tRNAVal-TC, cognate to 333 the next A-site codon in the +1-frame (GUU) (Figure 5e). The results showed a high yield and 334 high kfMP+S,obs, supporting the notion that the POST complex formed with the CCC-A motif was 335 largely in the 0-frame. Thus, the 4th nucleotide of the CCC-C motif plays a role in determining +1 336 frameshifting during translocation of SufB2 from the A site to the P site. 337 338 The +1-frameshifting efficiency of SufB2 depends on sequences of the anticodon loop and 339 the CCC-C motif 340 To determine whether the +1-frameshifting efficiency of SufB2 during translocation is influenced 341 by sequences of the anticodon loop and the CCC-C motif, we performed tripeptide formation 342 assays and monitored the yield of fMPV. In these experiments, we varied the sequence of the 343 SufB2 anticodon loop and/or the CCC-C motif at the 2nd codon position of the mRNA. To explore 344 the possibilities of both triplet-slippage and quadruplet-pairing, we used variants of G37-state 345 SufB2. We showed that variants with the potential to undergo quadruplet-pairing with the CCC-C 346 motif resulted in fMPV yields of 87% and 62% (Figures 4c, d). The different yields suggest that 347 G37-state SufB2 variants can induce triplet-slippage and/or engage in quadruplet-pairing with 348 different efficiencies during translocation. Analogous experiments showed that SufB2 variants 349 that were restricted to triplet-pairing resulted in reduced fMPV yields (26% and 20%, respectively) 350 upon pairing with a CCC-C motif (Figures 4e, f). Collectively, these results suggest that there is 351 considerable plasticity in the mechanisms that SufB2 uses to induce +1 frameshifting during 352 translocation and in the efficiencies of these mechanisms. 353 354 An smFRET signal that reports on ribosome dynamics during individual elongation cycles 355 To address the mechanism of SufB2-induced +1 frameshifting during translocation, we used 356 a previously developed smFRET signal to determine whether and how SufB2 alters the rates with 357 which the ribosome undergoes a series of conformational changes that drive and regulate the 358 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 elongation cycle35 (Figures 6a-c). This signal is generated using a ribosomal large, or 50S, subunit 359 that has been Cy3- and Cy5-labeled at ribosomal proteins bL9 and uL1, respectively, to report on 360 ‘opening’ and ‘closing’ of the L1 stalk of the 50S subunit. Accordingly, individual FRET efficiency 361 (EFRET) vs. time trajectories recorded using this signal exhibit transitions between two FRET states 362 corresponding to the ‘open’ (EFRET = ~0.55) and ‘closed’ (EFRET = ~0.31) conformations of the L1 363 stalk (Figure 6d). 364 Previously, we have shown that open→closed and closed→open L1 stalk transitions correlate 365 with a complex series of conformational changes that take place during an elongation cycle35-37. 366 The L1 stalk initially occupies the open conformation as an aa-tRNA is delivered to the A site of 367 a 70S IC or POST complex and peptide-bond formation generates a PRE complex that is in a 368 global conformation we refer to as global state (GS) 1. The PRE complex then undergoes a large-369 scale structural rearrangement that includes an open→closed transition of the L1 stalk so as to 370 occupy a second global conformation we refer to as GS2 (i.e., the 0.55→0.31 EFRET transition 371 denoted by the rate k70S IC→GS2 in Figures 6d and e, corresponding to the multi-step 70S IC→GS2 372 transition in Figure 6a). Subsequently, in the absence of EF-G, the L1 stalk goes through 373 successive closed→open and open→closed transitions as the PRE complex undergoes multiple 374 GS2→GS1 and GS1→GS2 transitions that establish a GS1⇄GS2 equilibrium (i.e., the 0.55⇄0.31 375 EFRET transitions denoted by the rates kGS1→GS2 and kGS2→GS1 and the equilibrium constant Keq = 376 (kGS1→GS2)/(kGS2→GS1) in Figure 6d, corresponding to the GS1⇄GS2 transitions in Figure 6a). In the 377 presence of EF-G, however, a single closed→open L1 stalk transition reports on conformational 378 changes of the PRE complex as it undergoes EF-G binding and completes translocation (i.e., the 379 0.31→0.55 EFRET transition denoted by the rate kGS2→POST in Figures 6d and e, corresponding to 380 the multi-step GS2→POST transition that takes place in the presence of EF-G and bridges across 381 Figures 6a and b). Using this approach, we have successfully monitored the conformational 382 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 dynamics of ribosomal complexes during individual elongation cycles36,38-41, including in a study 383 of –1 frameshifting41. 384 385 SufB2 interferes with elongation complex dynamics during late steps in translocation 386 We began by asking whether SufB2 alters the dynamics of elongation complexes during the 387 earlier steps of the elongation cycle. We stopped-flow delivered SufB2- or ProL-TC to 70S ICs 388 and recorded pre-steady-state movies during delivery, and steady-state movies 1 min post-389 delivery (Figures 6a, d, and f, Supplementary Figures 3, 4a, and 4b). The results showed that k70S 390 IC→GS2, as well as kGS1→GS2, kGS2→GS1, and Keq at 1 min post-delivery, for SufB2-TC were each less 391 than 2-fold different than the corresponding value for ProL-TC (Supplementary Table 4). The 392 close correspondence of these rates indicates that SufB2-TC is delivered to the A site, 393 participates in peptide-bond formation, undergoes GS2 formation, and exhibits GS1→GS2 and 394 GS2→GS1 transitions within the GS1⇄GS2 equilibrium in a manner that is similar to ProL-TC, 395 consistent with the results of ensemble kinetic assays (Figures 3a-c, Supplementary Table 2, and 396 Supplementary Figure 2a) and thereby strengthening our interpretation that SufB2 uses triplet 397 pairing in the 0-frame at the A site during the early stages of the elongation cycle that precede 398 EF-G binding and EF-G-catalyzed translocation. Although we could not confidently detect the 399 presence of two sub-populations of A site-bound SufB2 in the smFRET data that might differ in 400 their propensity of undergoing +1 frameshifting, as suggested by the results presented in Figure 401 5a, it is possible that the distance between our smFRET probes and/or the time spent in one of 402 the observed FRET states are not sensitive enough to detect the structural and/or energetic 403 differences between these sub-populations of A site-bound SufB2. The development of different 404 smFRET signals and/or the use of variants of SufB2 and/or the CCC-C motif with different 405 propensities of undergoing +1 frameshifting may allow future smFRET investigations to identify 406 and characterize such sub-populations. 407 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 We then investigated whether SufB2 alters the dynamics of elongation complexes during the 408 later steps of the elongation cycle. We stopped-flow delivered SufB2- or ProL-TC and EF-G to 409 70S ICs and recorded pre-steady-state movies during delivery, and steady-state movies 1, 3, 10, 410 and 20 min post-delivery (Figures 6b, e, and g, Supplementary Figures 4c, 4d, and 5). The results 411 showed that k70S IC→GS2 for SufB2 and ProL-TC were within error of each other (Supplementary 412 Table 5), again suggesting that SufB2-TC is delivered to the A site, participates in peptide-bond 413 formation, and undergoes GS2 formation in a manner that is similar to ProL-TC. Notably, the k70S 414 IC→GS2s obtained in the presence of EF-G were within error of the ones obtained in the absence of 415 EF-G, consistent with reports that EF-G has little to no effect on the rate with which PRE 416 complexes undergo GS1→GS2 transitions37,42. 417 Once it transitions into GS2, however, the SufB2 PRE complex can bind EF-G37,42 and we find 418 that it becomes arrested in an EF-G-bound GS2-like conformation for up to several minutes, 419 during which it slowly undergoes a GS2→POST transition (Figure 6g, Supplementary Figure 5). 420 While the limited number of time points did not allow rigorous determination of kGS2→POST for the 421 SufB2 PRE complex, visual inspection (Figure 6g) and quantitative analysis (Supplementary 422 Tables 5 and 6) showed that the GS2→POST reaction was complete between 3 and 10 min post-423 delivery (i.e., kGS2→POST = ~0.0017–0.0060 s–1). Remarkably, this range of kGS2→POST is up to 2-3 424 orders of magnitude lower than kGS2→POST measured for the ProL PRE complex (Supplementary 425 Table 5). It is also up to 2-3 orders of magnitude lower than kGS2→POST for a different PRE complex 426 measured using a different smFRET signal under the same conditions43 and the rate of 427 translocation measured using ensemble rapid kinetic approaches under similar conditions44,45. 428 This observation suggests that SufB2 adopts a conformation within the EF-G-bound PRE complex 429 that significantly impedes conformational rearrangements of the complex that are known to take 430 place during late steps in translocation. These rearrangements include the severing of interactions 431 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 between the decoding center of the 30S subunit and the anticodon-codon duplex in the A site22-432 25; forward and reverse swiveling of the ‘head’ domain of the 30S subunit27,28 associated with 433 opening and closing, respectively, of the ‘E-site gate’ of the 30S subunit26; reverse relative rotation 434 of the ribosomal subunits46,47; and opening of the L1 stalk35,37,48. Collectively, these dynamics 435 facilitate movement of the tRNA ASLs and their associated codons from the P and A sites to the 436 E and P sites of the 30S subunit. 437 We next explored whether SufB2 alters the dynamics of elongation complexes after it is 438 translocated into the P site. We prepared PRE-like complexes carrying deacylated SufB2 or ProL 439 in the P site and a vacant A site (denoted PRE–A complexes) and recorded steady-state movies 440 for the resulting GS1⇄GS2 equilibria (Figures 6c and h, Supplementary Figure 6). The results 441 showed that kGS1→GS2 and kGS2→GS1 for the SufB2 PRE–A complex were 45% lower and 36% higher, 442 respectively, than for the ProL PRE–A complex, driving a 2.5-fold shift towards GS1 in the 443 GS1⇄GS2 equilibrium (Supplementary Table 7), suggesting that SufB2 adopts a conformation at 444 the P site that is different from that of ProL. Consistent with this interpretation, a recent structural 445 study has shown that the conformation of P site-bound SufA6, a +1-frameshifting tRNA with an 446 extra nucleotide in the anticodon loop, is significantly distorted relative to a canonical tRNA49. 447 448 DISCUSSION 449 Here we leverage the high efficiency of recoding by SufB2 to identify the steps of the 450 elongation cycle during which it induces +1 frameshifting at a quadruplet codon, thus answering 451 the key questions of where, when, and how +1 frameshifting occurs. We are not aware of any 452 other studies of +1 frameshifting that have addressed these questions as precisely. In addition to 453 elucidating the determinants of reading-frame maintenance and the mechanisms of SufB2-454 induced +1 frameshifting, our findings reveal new principles that can be used to engineer genome 455 recoding with higher efficiencies. 456 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 Integrating our results with the available structural, biophysical, and biochemical data on the 457 mechanism of translation elongation results in the structure-based model for SufB2-induced +1 458 frameshifting that we present in Figure 7. In this model, POST complexes to which SufB2 or ProL 459 are delivered exhibit virtually indistinguishable conformational dynamics in the early steps of the 460 elongation cycle, up to and including the initial GS1→GS2 transition. However, POST complexes 461 to which SufB2 is delivered exhibit a kGS2→POST that is more than an order-of-magnitude slower 462 than those to which ProL is delivered. Notably, kGS2→POST comprises a series of conformational 463 rearrangements of the EF-G-bound PRE complex that facilitate translocation of the tRNA ASLs 464 and associated codons within the 30S subunit. These rearrangements encompass the severing 465 of decoding center interactions with the anticodon-codon duplex in the A site22-25; forward and 466 reverse head swiveling27,28,50 and associated opening and closing, respectively, of the E-site 467 gate26; reverse relative rotation of the subunits46,47; and opening of the L1 stalk35,37,48 (steps PRE-468 G2 to PRE-G4, denoted with red arrows, in Figure 7). Given the importance of these 469 rearrangements in translocation of the tRNA ASLs and their associated codons within the 30S 470 subunit, we propose that SufB2-mediated perturbation of these rearrangements underlies +1 471 frameshifting. More specifically, because SufB2 does not seem to impede the reverse relative 472 rotation of the subunits or opening of the L1 stalk during the GS2→GS1 transitions within the 473 GS1⇄GS2 equilibrium in the absence of EF-G (compare kGS2→GS1 for SufB2-TC vs. ProL-TC in 474 Supplementary Table 4), it most likely interferes with the severing of decoding center interactions 475 with the anticodon-codon duplex in the A site and/or forward and/or reverse head swiveling and 476 associated opening and/or closing, respectively, of the E-site gate. The latter rearrangement is 477 particularly important for movement of the tRNA ASLs and their associated codons within the 30S 478 subunit26-28,50, suggesting that SufB2-mediated perturbation of head swiveling may make the most 479 important contribution to +1 frameshifting. Consistent with this, a recent structural study showed 480 that upon forward head swiveling, the ASLs of the P- and A-site tRNAs can disengage from their 481 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 associated codons and occupy positions similar to a partial +1 frameshift, even in the presence 482 of a non-frameshift suppressor tRNA in the A site and the absence of EF-G51. 483 While previous structural studies have demonstrated that +1 frameshifting tRNAs bind to the 484 A site in the 0-frame16,17,49 and to the P site in the +1-frame19, these studies lacked EF-G and the 485 observed structures were obtained by directly binding a deacylated +1 frameshifting tRNA to the 486 P site. Specifically, a +1 frameshifting peptidyl-tRNA was not translocated from the A to P sites, 487 as would be the case during an authentic translocation event. In contrast, our elucidation of the 488 +1-frameshifting mechanism was executed in the presence of EF-G and is based on extensive 489 comparison of the kinetics with which SufB2 and ProL undergo individual reactions of the 490 elongation cycle (i.e., aa-tRNA selection, peptide-bond formation, and translocation) and the 491 associated conformational rearrangements of the elongation complex. Additionally, all of our in 492 vitro biochemical assays, and most of our ensemble rapid kinetics assays were performed under 493 the conditions in which the A site is always occupied by an aa- or peptidyl-tRNA, leaving no 494 chance of a vacant A site. Therefore, the +1 frameshifting mechanism we present here is distinct 495 from that presented by Farabaugh and co-workers13, in which the ribosome is stalled due to a 496 vacant A site, thus giving the +1-frameshifting-inducing tRNA at the P site an opportunity to 497 rearrange into the +1-frame. The fact that all well-characterized +1-frameshifting tRNAs contain 498 an extra nucleotide in the anticodon loop, despite differences in their primary sequences, the 499 amino acids they carry, and whether the extra nucleotide is inserted at the 3'- or 5'-sides of the 500 anticodon, suggests that the results we report here for SufB2 are likely applicable to other +1-501 frameshifting tRNAs with an expanded anticodon loop. 502 While an expanded anticodon loop is a strong feature associated with +1 frameshifting, it is 503 not associated with –1 frameshifting, which instead is typically induced by structural barriers in 504 the mRNA that stall a translating ribosome from moving forward, thus providing the ribosome with 505 an opportunity to shift backwards in the –1 direction10,52. Given the unique role of the expanded 506 anticodon loop in +1 frameshifting, here we have identified the determinants that drive the 507 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 ribosome to shift in the +1 direction. We show that SufB2 exclusively uses the triplet-slippage 508 mechanism of +1 frameshifting in the m1G37+ condition, but that it explores other mechanisms 509 (e.g., quadruplet-pairing) in the m1G37– condition during translocation from the A site to the P 510 site. Under conditions that only permit the triplet-slippage mechanism (e.g., in the presence of 511 m1G37), SufB2 exhibits a relatively low +1-frameshifting efficiency of ~30%, whereas under 512 conditions that permit quadruplet-pairing during translocation (e.g., in the absence of m1G37), it 513 exhibits a relatively high +1-frameshifting efficiency of ~90% (Figures 4c-f, 5a). This feature is 514 observed in various sequence contexts. One advantage of a quadruplet-pairing mechanism 515 during translocation is that it would enhance the thermodynamic stability of anticodon-codon 516 pairing during the large EF-G-catalyzed conformational rearrangements that PRE complexes 517 undergo during translocation to form POST complexes. Nonetheless, SufB2 is naturally 518 methylated with m1G37 (Figure 1c), indicating that it makes exclusive use of the triplet-slippage 519 mechanism in vivo. This mechanism is likely also exclusively used in vivo by all other +1-520 frameshifting tRNAs that have evolved from canonical tRNAs to retain a purine at position 37, 521 which is almost universally post-transcriptionally modified to block quadruplet-pairing 522 mechanisms. 523 The key insight from this work suggests an entirely novel pathway to increase the efficiency 524 of genome recoding at quadruplet codons. While initial success in genome recoding has been 525 achieved by engineering the anticodon-codon interactions of a +1-frameshifting-inducing tRNA at 526 the A site6,53, or by engineering a new bacterial genome with a minimal set of codons for all amino 527 acids54, we suggest that efforts to engineer the ‘neck’ structural element of the 30S subunit that 528 regulates head swiveling would be as, or even more, effective. This can be achieved by screening 529 for 30S subunit variants that exhibit high +1-frameshifting efficiencies mediated by +1-530 frameshifting tRNAs at quadruplet codons while preserving 0-frame translation by canonical 531 tRNAs at triplet codons. Specifically, head swiveling is driven by the synergistic action of two 532 hinges within the 16S ribosomal RNA elements that comprise the 30S subunit neck55. Hinge 1 is 533 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 composed of two G-U wobble base pairs that are separated by a bulged G within helix 28 (h28), 534 while hinge 2 is composed of a GACU linker between h34 and h35/36 within a three-helical 535 junction with h38. Co-engineering these two hinges by directed evolution should identify such 30S 536 subunit variants. To complement the directed evolution approach, we suggest that our recently 537 developed time-resolved cryogenic electron microscopy (TR cryo-EM) method56,57 can be used 538 to obtain structures of SufB2 and ProL in EF-G-bound PRE complexes captured in intermediate 539 states of translocation. Such cryo-EM structures would help further define how the two hinges 540 that control head swiveling are differentially modulated during translocation of SufB2 vs. ProL to 541 provide a structure-based roadmap for engineering them. In addition, detailed comparison of such 542 structures would offer the opportunity to identify ribosomal structural elements beyond the two 543 hinges that play a role in +1 frameshifting and can thus serve as additional targets for engineering. 544 Furthermore, antibiotics that bind to the 30S subunit and act as translocation modulators can be 545 exploited to further increase the +1-frameshifting efficiency at a quadruplet codon with either 546 wildtype or highly efficient 30S subunit variants. Implemented in combination and integrated into 547 a recently described in vivo ‘designer organelle’ strategy58, these approaches should provide a 548 novel and powerful platform for increasing the efficiency of genome recoding at quadruplet codons 549 with minimal off-target effects. 550 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 METHODS 551 Construction of E. coli strains. E. coli strains that expressed a plasmid-borne ProL or SufB2 for 552 isolation of native-state tRNAs were made in a ProL-KO strain, which was constructed by inserting 553 the Kan-resistance (Kan-R) gene, amplified by PCR primers from pKD4, into the ProL locus of E. 554 coli BL21(DE3) using the l-Red recombination method59, followed by removal of the Kan-R gene 555 using FLP recombination30. The pKK223-SufB2 plasmid was made by site-directed mutagenesis 556 to introduce G37a into the pKK223-ProL plasmid29. E. coli strains that expressed ProL or SufB2 557 from the chromosome as an isogenic pair for reporter assays were made using the l-Red 558 technique30. To construct the E. coli SufB2 strain, the SufB2 gene was PCR-amplified from 559 pKK223-SufB2, and the 5' end of the amplified gene was joined with Kan-R (from pKD4) by PCR 560 using reverse-2 primer, while the 3' end was homologous to the ProL 3' flanking region. The PCR-561 amplified SufB2-Kan product was used to replace ProL in l-Red expressing cells. An isogenic 562 counterpart strain expressing ProL-Kan was also made. These ProL-Kan and SufB2-Kan loci 563 were independently transferred to the trmD-KO strain29 by P1 transduction, followed by pCP20-564 dependent FLP recombination, generating the isogenic pair of ProL and SufB2 strains in the trmD-565 KO background. These strains were transformed with pKK223-3-lacZ reporter plasmid that has 566 the CCC-C motif at the 2nd codon position of the lacZ gene, and the b-Gal activity was measured29. 567 All primer sequences used in this work are shown in Supplementary Table 1. 568 569 Preparation of translation components for ensemble biochemical experiments. The mRNA 570 used for most in vitro translation reactions is shown below, including the Shine-Dalgarno 571 sequence, the AUG start codon, and the CCC-C motif: 572 5'-GGGAAGGAGGUAAAAAUGCCCCGUUCUAAG(CAC)7. 573 Variants of this mRNA had a base substitution in the CCC-C motif. All mRNAs were transcribed 574 from double-stranded DNA templates with T7 RNA polymerase and purified by gel 575 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 electrophoresis. E. coli strains over-expressing native-state tRNAfMet, tRNAArg (anticodon ICG, 576 where I = inosine), and tRNAVal (anticodon U*AC, where U* = cmo5U) were grown to saturation 577 and were used to isolate total tRNA. The over-expressed tRNA species in each total tRNA sample 578 was aminoacylated by the cognate aminoacyl-tRNA synthetase and used directly in the TC 579 formation reaction and subsequent TC delivery to 70S ICs or POST complexes. E. coli tRNASer 580 (anticodon ACU) was prepared by in vitro transcription. Aminoacyl-tRNAs with the cognate 581 proteinogenic amino acid were prepared using the respective aminoacyl-tRNA synthetase and 582 those with a non-proteinogenic amino acid were prepared using the dFx Flexizyme and the 3,5-583 dinitobenzyl ester (DBE) of the respective amino acid (Supplementary Figure 1). Aminoacylation 584 and formylation of tRNAfMet were performed in a one-step reaction in which formyl transferase and 585 the methyl donor 10-formyltetrahydrofolate were added to the aminoacylation reaction29. 586 Aminoacyl-tRNAs were stored in 25 mM sodium acetate (NaOAc) (pH 5) at –70 °C, as were six-587 His-tagged E. coli initiation and elongation factors and tight-coupled 70S ribosomes isolated from 588 E. coli MRE600 cells. Recombinant His-tagged E. coli EF-P bearing a b-lysyl-K34 was expressed 589 and purified from cells co-expressing efp, yjeA, and yjeK and stored at –20 °C29. 590 591 Preparation of translation components for smFRET experiments. 30S subunits and 50S 592 subunits lacking ribosomal proteins bL9 and uL1 were purified from a previously described bL9-593 uL1 double deletion E. coli strain35,60 using previously described protocols35,37,60. A previously 594 described single-cysteine variant of bL9 carrying a Gln-to-Cys substitution mutation at residue 595 position 18 (bL9(Q18C))35 and a previously described single-cysteine variant of uL1 carrying a 596 Thr-to-Cys substitution mutation at residue position 202 (uL1(T202C))35,37 were purified, labeled 597 with Cy3- and Cy5-maleimide, respectively, to generate bL9(Cy3) and uL1(Cy5), and 598 reconstituted into the 50S subunits lacking bL9 and uL1 following previously described 599 protocols35. The reconstituted bL9(Cy3)- and uL1(Cy5)-labeled 50S subunits were then re-purified 600 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 using sucrose density gradient ultracentrifugation35,43. 50S subunits lacking bL9(Cy3) and/or 601 uL1(Cy5) or harboring unlabeled bL9 and/or uL1 do not generate bL9(Cy3)-uL1(Cy5) smFRET 602 signals and therefore do not affect data collection or analysis. Previously, we have shown that 603 70S ICs formed with these bL9(Cy3)- and uL1(Cy5)-containing 50S subunits can undergo 604 peptide-bond formation and two rounds of translocation elongation with similar efficiency as 70S 605 ICs formed with wild-type 50S subunits35. 606 The sequence of the mRNA used for assembling ribosomal complexes for smFRET studies 607 is shown below, including the Shine-Dalgarno sequence, the AUG start codon, and the CCC-C 608 motif: 609 5'-GCAACCUAAAACUCACACAGGGCCCUAAGGACAUAAAAAUGCCCCGUU 610 AUCCUCCUGCUGCACUCGCUGCACAAAUCGCUCAACGGCAAUUAAGGA. 611 The mRNA was synthesized by in vitro transcription using T7 RNA polymerase, and then 612 hybridized to a previously described 3’-biotinylated DNA oligonucleotide (Supplementary Table 613 1) that was complementary to the 5' end of the mRNA and was chemically synthesized by 614 Integrated DNA Technologies60. Hybridized mRNA:DNA-biotin complexes were stored in 10 mM 615 Tris-OAc (pH = 7.5 at 37 ºC), 1 mM EDTA, and 10 mM KCl at –80 ºC until they were used in 616 ribosomal complex assembly. Aminoacylation and formylation of tRNAfMet (purchased from MP 617 Biomedicals) was achieved simultaneously using E. coli methionyl-tRNA synthetase and E. coli 618 formylmethionyl-tRNA formyltransferase60. Expression and purification of IF1, IF2, IF3, EF-Tu, 619 EF-Ts, and EF-G were following previously published procedures60. 620 621 Preparation and purification of SufB2 and ProL. Native-state SufB2 was isolated from a 622 derivative of E. coli JM109 lacking the endogenous ProL, but expressing SufB2 from the pKK223-623 3 plasmid (Supplementary Table 1), while native-state ProL was purified from total tRNA isolated 624 from E. coli JM109 cells over-expressing ProL from the pKK223-3 plasmid. The ProL-KO strain 625 lacking the endogenous ProL was described previously30. Each native-state tRNA was isolated 626 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 by a biotinylated capture probe attached to streptavidin-derivatized Sepharose beads29. G37-state 627 SufB2 and ProL were also prepared by in vitro transcription. Each primary transcript contained a 628 ribozyme domain on the 5'-side of the tRNA sequence, which self-cleaved to release the tRNA. 629 m1G37-state SufB2 and ProL were prepared by TrmD-catalyzed and S-adenosyl methionine 630 (AdoMet)-dependent methylation of each G37-state tRNA. Due to the lability of the aminoacyl 631 linkage to Pro, stocks of SufB2 and ProL aminoacylated with Pro were either used immediately 632 or stored no longer than 2-3 weeks at –70 °C in 25 mM NaOAc (pH 5.0). 633 634 Primer extension inhibition assays. Primer extension inhibition analyses of native-, G37-, and 635 m1G37-state SufB2 and ProL were performed as described30. A DNA primer complementary to 636 the sequence of C41 to A57 of SufB2 and ProL was chemically synthesized, 32P-labeled at the 637 5'-end by T4 polynucleotide kinase, annealed to each tRNA, and was extended by Superscript III 638 reverse transcriptase (Invitrogen) at 200 units/µL with 6 µM each dNTP in 50 mM Tris-HCl (pH 639 8.3), 3 mM MgCl2, 75 mM KCl, and 1 mM DTT at 55 °C for 30 min, and terminated by heating at 640 70 °C for 15 min. Extension was quenched with 10 mM EDTA and products of extension were 641 separated by 12% denaturing polyacrylamide gel electrophoresis (PAGE/7M urea) and analyzed 642 by phosphorimaging. In these assays, the length of the read-through cDNA is 54-55 nucleotides, 643 as in the case of the G37-state SufB2 and ProL, whereas the length of the primer-extension 644 inhibited cDNA products is 21-22 nucleotides, as in the case of the m1G37-state and native-state. 645 646 RNase T1 cleavage inhibition assays. RNase T1 cleaves on the 3'-side of G, but not m1G. 647 Cleavage of tRNAs was performed as previously described29. Each tRNA (1 µg) was 3'-end 648 labeled using Bacillus stearothermophilus CCA-adding enzyme (10 nM) with [α-32P]ATP at 60 °C 649 in 100 mM glycine (pH 9.0) and 10 mM MgCl2. The labeled tRNA was digested by RNase T1 650 (Roche, cat # 109193) at a final concentration of 0.02 units/µL for 20 min at 50 °C in 20 mM 651 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 sodium citrate (pH 5.5) and 1 mM ethylene diamine tetraacetic acid (EDTA). The RNA fragments 652 generated from cleavage were separated by 12% PAGE/7M urea along with an RNA ladder 653 generated by alkali hydrolysis of the tRNA of interest. Cleavage was analyzed by 654 phosphorimaging. 655 656 Methylation assays. Pre-steady-state assays under single-turnover conditions61 were performed 657 on a rapid quench-flow apparatus (Kintek RQF-3). The tRNA substrate was heated to 85 °C for 658 2.5 min followed by addition of 10 mM MgCl2, and slowly cooled to 37 °C in 15 min. N1-methylation 659 of G37 in the pre-annealed tRNA (final concentration 1 µM) was initiated with the addition of E. 660 coli TrmD (10 µM) and [3H]-AdoMet (Perkin Elmer, 4200 DPM/pmol) at a final concentration of 15 661 µM in a buffer containing 100 mM Tris-HCl (pH 8.0), 24 mM NH4Cl, 6 mM MgCl2, 4 mM DTT, 0.1 662 mM EDTA, and 0.024 mg/mL BSA in a reaction of 30 µL. The buffer used was optimized for TrmD 663 in order to evaluate its in vitro activity61. Reaction aliquots of 5 µL were removed at various time 664 points and precipitated in 5% (w/v) trichloroacetic acid (TCA) on filter pads for 10 min twice. Filter 665 pads were washed with 95% ethanol twice, with ether once, air dried, and measured for 666 radioactivity in an LS6000 scintillation counter (Beckman). Counts were converted to pmoles 667 using the specific activity of the [3H]-AdoMet after correcting for the signal quenching by filter 668 pads. In these assays, a negative control was always included, in which no enzyme was added 669 to the reaction61, and signal from the negative control was subtracted from signal of each sample 670 for determining the level of methylation. 671 672 Aminoacylation assays. Each SufB2 or ProL tRNA was aminoacylated with Pro by a 673 recombinant E. coli ProRS expressed from the plasmid pET22 and purified from E. coli BL21 674 (DE3)62. Each tRNA was heat-denatured at 80 ºC for 3 min, and re-annealed at 37 ºC for 15 min. 675 Aminoacylation under pre-steady state conditions was performed at 37 ºC with 10 µM tRNA, 1 676 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 µM ProRS, and 15 µM [3H]-Pro (Perkin Elmer, 7.5 Ci/mmol) in a buffer containing 20 mM KCl, 10 677 mM MgCl2, 4 mM dithiothreitol (DTT), 0.2 mg/mL bovine serum albumin (BSA), 2 mM ATP (pH 678 8.0), and 50 mM Tris-HCl (pH 7.5) in a reaction of 30 µL. Reaction aliquots of 5 µL were removed 679 at different time intervals and precipitated with 5% (w/v) TCA on filter pads for 10 min twice. Filter 680 pads were washed with 95% ethanol twice, with ether once, air dried, and measured for 681 radioactivity in an LS6000 scintillation counter (Beckman). Counts were converted to pmoles 682 using the specific activity of the [3H]-Pro after correcting for signal quenching by filter pads. 683 684 Cell-based +1-frameshifting reporter assays. Isogenic E. coli strains expressing chromosomal 685 copies of SufB2 or ProL were created in a previously developed trmD-knockdown (trmD-KD) 686 background, in which the chromosomal trmD is deleted but cell viability is maintained through the 687 arabinose-induced expression of a plasmid-borne trm5, the human counterpart of trmD29,30 that is 688 competent for m1G37 synthesis to support bacterial growth (Supplementary Table 1). Due to the 689 essentiality of trmD for cell growth, a simple knock-out cannot be made. We chose human Trm5 690 as the maintenance protein in the trmD-KD background, because this enzyme is rapidly degraded 691 in E. coli once its expression is turned off to allow immediate arrest of m1G37 synthesis. In the 692 isogenic SufB2 and ProL strains, the level of m1G37 is determined by the concentration of the 693 added arabinose in a cellular context that expresses ProM as the only competing tRNAPro species. 694 In the m1G37+ condition, where arabinose was added to 0.2% in the medium, tRNA substrates 695 of N1-methylation were confirmed to be 100% methylated by mass spectrometry, whereas in the 696 m1G37– condition, where arabinose was not added to the medium, tRNA substrates of N1-697 methylation were confirmed to be 20% methylated by mass spectrometry30. Each strain was 698 transformed with the pKK223-3 plasmid expressing an mRNA with a CCC-C motif at the 2nd codon 699 position of the reporter lacZ gene. To simplify the interpretation, the natural AUG codon at the 5th 700 position of lacZ was removed. A +1 frameshift at the CCC-C motif would enable expression of 701 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 lacZ. The activity of b-Gal was directly measured from lysates of cells grown in the presence or 702 absence of 0.2% arabinose to induce or not induce, respectively, the plasmid-borne human trm5. 703 In these assays, decoding of the CCC-C codon motif would be mediated by SufB2 and ProM in 704 the SufB2 strain, and would be mediated by ProL and ProM in the ProL strain. Due to the presence 705 of ProM in both strains, there would be no vacancy at the CCC-C codon motif. 706 707 Cell-based +1 frameshifting lolB assays. To quantify the +1-frameshifting efficiency at the 708 CCC-C motif at the 2nd codon position of the natural lolB gene, the ratio of protein synthesis of 709 lolB to cysS was measured by Western blots. Overnight cultures of the isogenic strains expressing 710 SufB2 or ProL were separately inoculated into fresh LB media in the presence or absence of 0.2% 711 arabinose and were grown for 4 h to produce the m1G37+ and m1G37– conditions, respectively. 712 Cultures were diluted 10- to 16-fold into fresh media to an optical density (OD) of ~0.1 and grown 713 for another 3 h. Cells were harvested and 15 µg of total protein from cell lysates was separated 714 on 12% SDS-PAGE and probed with rabbit polyclonal primary antibodies against LolB (at a 715 10,000 dilution) and against CysRS (at a 20,000 dilution), followed by goat polyclonal anti-rabbit 716 IgG secondary antibody (Sigma-Aldrich, #A0545). The ratio of protein synthesis of lolB to cysS 717 was quantified using Super Signal West Pico Chemiluminescent substrate (Thermo Fischer) in a 718 Chemi-Doc XR imager (Bio-Rad) and analyzed by Image Lab software (Bio-Rad, SOFT-LIT-170-719 9690-ILSPC-V-6-1). To measure the +1-frameshifting efficiency, we measured the ratio of protein 720 synthesis of lolB to cysS for each tRNA in each condition, and we normalized the observed ratio 721 in the control sample (i.e., ProL in the m1G37+ condition) to 1.0, indicating that protein synthesis 722 of these two genes was in the 0-frame and no +1 frameshifting. A decrease of this ratio was 723 interpreted as a proxy of +1 frameshifting at the CCC-C motif at the 2nd codon position of lolB. 724 From the observed ratio of each sample in each condition, we calculated the +1 frameshifting 725 efficiency relative to the control sample. 726 727 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Cell-free PURExpress in vitro translation assays. The folA gene, provided as part of the E. 728 coli PURExpress (New England BioLabs) in vitro translation system, was modified by site-directed 729 mutagenesis to introduce a CCC-C motif into the 5th codon position. If SufB2 induced +1 730 frameshifting at this motif, a full-length DHFR would be made, whereas if SufB2 failed to do so, a 731 C-terminal truncated fragment (DC) would be made due to premature termination of protein 732 synthesis. Because SufB2 has no orthogonal tRNA synthetase for aminoacylation with a non-733 proteinogenic amino acid, we used the Flexizyme ribozyme technology32 for this purpose. 734 Coupled in vitro transcription-translation of the modified E. coli folA gene containing the CCC-C 735 motif at the 5th codon position was conducted in the presence of [35S]-Met using the PURExpress 736 system. SDS-PAGE analysis was used to detect [35S]-Met-labeled polypeptides, which included 737 the full-length DHFR, the DC fragment, and a DN fragment that likely resulted from initiation of 738 translation at a cryptic site downstream from the CCC-C motif (Figure 2d). The fraction of the full-739 length folA gene product, the DC fragment, and the DN fragment was calculated from the amount 740 of each in the sum of all three products. We attribute the overall low recoding efficiency (0.5 – 741 5.0%) as arising from a combination of the rapid hydrolysis of the prolyl linkage, which is the least 742 stable among aminoacyl linkages63, and the lack of SufB2 re-acylation in the PURExpress system. 743 In these assays, each tRNA was tested in the G37-state and each was normalized by the 744 flexizyme aminoacylation efficiency, which was ~30% for Pro and Pro analogues. The 745 PURExpress contained all natural E. coli tRNAs, such that the CCC-C codon motif would not have 746 a chance of vacancy even when a specific CCC-reading tRNA was absent. 747 748 Rapid kinetic GTPase assays. Ensemble GTPase assays were performed using the codon-walk 749 approach, in which an E. coli in vitro translation system composed of purified components is 750 supplemented with the requisite tRNAs and translation factors to interrogate individual steps of 751 the elongation cycle. Programmed with a previously validated synthetic AUG-CCC-CGU-U mRNA 752 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 template29,34, a 70S IC was assembled that positioned the AUG start codon and an initiator fMet-753 tRNAfMet at the P site and the CCC-C motif at the A site. Reactions to monitor the EF-Tu-754 dependent hydrolysis of GTP during delivery and accommodation of a TC to the A site were 755 conducted at 20 °C in a buffer containing 50 mM Tris-HCl (pH 7.5), 70 mM NH4Cl, 30 mM KCl, 7 756 mM MgCl2, 1 mM DTT, and 0.5 mM spermidine29. Each TC was formed by incubating EF-Tu with 757 8 nM [g-32P]-GTP (6000 Ci/mmole) for 15 min at 37 ºC, after which aminoacylated SufB2 or ProL 758 was added and the incubation continued for 15 min at 4 ºC. Unbound [g-32P]-GTP was removed 759 from the TC solution by gel filtration through a spin cartridge (CentriSpin-20; Princeton 760 Separations). Equal volumes of each purified TC and a solution of 70S ICs were rapidly mixed in 761 the RQF-3 Kintek chemical quench apparatus29. Final concentrations in these reactions were 0.5 762 µM for the 70S IC; 0.8 µM for mRNA; 0.65 µM each for IFs 1, 2, and 3; 0.65 µM for fMet-tRNAMet; 763 1.8 µM for EF-Tu; 0.4 µM for aminoacylated SufB2 or ProL; and 0.5 mM for cold GTP. The yield 764 of GTP hydrolysis and kGTP,obs upon rapid mixing of each TC with excess 70S ICs were measured 765 by removing aliquots of the reaction at defined time points, quenching the aliquots with 40% formic 766 acid, separating [g-32P] from [g-32P]-GTP using thin layer chromatography (TLC), and quantifying 767 the amount of each as a function of time using phosphorimaging29. We adjusted reaction 768 conditions such that the kGTP,obs increased linearly as a function of 70S IC concentration. 769 770 Rapid kinetic di- and tripeptide formation assays. Di- and tripeptide formation assays were 771 performed using the codon-walk approach described above in 50 mM Tris-HCl (pH 7.5), 70 mM 772 NH4Cl, 30 mM KCl, 3.5 mM MgCl2, 1 mM DTT, 0.5 mM spermidine, at 20 °C unless otherwise 773 indicated29. 70S ICs were formed by incubating 70S ribosomes, mRNA, [35S]-fMet-tRNAfMet, and 774 IFs 1, 2, and 3, and GTP, for 25 min at 37 °C in the reaction buffer. Separately, TCs were formed 775 in the reaction buffer by incubating EF-Tu and GTP for 15 min at 37 °C followed by adding the 776 requisite aa-tRNAs and incubating in an ice bath for 15 min. In dipeptide formation assays, 70S 777 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 ICs templated with the specified variants of an AUG-NNN-NGU-U mRNA were mixed with SufB2-778 TC or ProL-TC. fMP formation was monitored in an RQF-3 Kintek chemical quench apparatus. In 779 tripeptide formation assays, 70S ICs templated with the specified variants of the AUG-NCC-NGU-780 U mRNA were mixed, either in one step or in two steps, with equimolar mixtures of SufB2-, tRNAVal 781 (anticodon U*AC, where U* = cmo5U)-, and tRNAArg (anticodon ICG, where I = inosine)-TCs and 782 EF-G. Formation of fMPV and fMPR were monitored in an RQF-3 Kintek chemical quench 783 apparatus. Tripeptide formation assays with one-step delivery of TCs were initiated by rapidly 784 mixing the 70S IC with two or more of the TCs in the RQF-3 Kintek chemical quench apparatus. 785 Final concentrations in these reactions were 0.37 µM for the 70S IC; 0.5 µM for mRNA; 0.5 µM 786 each for IFs 1, 2, and 3; 0.25 µM for [35S]-fMet-tRNAfMet; 2.0 µM for EF-G; 0.75 µM for EF-Tu for 787 each aa-tRNA; 0.5 µM each for the aa-tRNAs; and 1 mM for GTP. For tripeptide formation assays 788 with one-step delivery of G37-state SufB2-, tRNAVal-, and tRNAArg-TCs to the 70S ICs, the yield 789 of fMPV and kfMPV,obs report on the activity of ribosomes that shifted to the +1-frame, whereas the 790 yield of fMPR and kfMPR,obs report on the activity of ribosomes that remained in the 0-frame29,34. 791 We chose G37-state SufB2 to maximize its +1-frameshifting efficiency but native-state tRNAVal 792 and tRNAArg to prevent them from undergoing unwanted frameshifting (note that, for simplicity, 793 we have not denoted the aminoacyl or dipeptidyl moieties of the tRNAs). Tripeptide formation 794 assays with two-step delivery of TCs29 were performed in a manner similar to those with one-step 795 delivery of TCs, except that the 70S ICs were incubated with a SufB2- or ProL-TC and 2.0 µM 796 EF-G for 0.5-10 min, as specified, followed by manual addition of an equimolar mixture of tRNAArg- 797 and tRNAVal-TCs. Reactions were conducted at 20 °C unless otherwise specified, and were 798 quenched by adding concentrated KOH to 0.5 M. After a brief incubation at 37 °C, aliquots of 0.65 799 µL were spotted onto a cellulose-backed plastic TLC sheet and electrophoresed at 1000 V in 800 PYRAC buffer (62 mM pyridine, 3.48 M acetic acid, pH 2.7) until the marker dye bromophenol 801 blue reached the water-oil interface at the anode29. The position of the origin was adjusted to 802 maximize separation of the expected oligopeptide products. The separation of unreacted [35S]-803 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 fMet and each of the [35S]-fMet-peptide products was visualized by phosphorimaging and 804 quantified using ImageQuant (GE Healthcare) and kinetic plots were fitted using Kaleidagraph 805 (Synergy software). 806 807 Assembly and purification of 70S ICs, TCs, POST, and PRE–A complexes for use in smFRET 808 experiments. 70S ICs were assembled in a manner analogous to those for the ensemble rapid 809 kinetic studies described above, except that the mRNA containing an AUG-CCC-CGU-U coding 810 sequence was 5'-biotinylated and the 50S subunits were labeled with bL9(Cy3) and uL1(Cy5). 811 More specifically, 70S ICs were assembled in three steps. First, 15 pmol of 30S subunits, 27 pmol 812 of IF1, 27 pmol of IF2, 27 pmol of IF3, 18 nmol of GTP, and 25 pmol of biotin-mRNA in 7 µL of 813 Tris-Polymix Buffer (50 mM Tris-(hydroxymethyl)-aminomethane acetate (Tris-OAc) (pH25°C = 814 7.0), 100 mM KCl, 5 mM NH4OAc, 0.5 mM Ca(OAc)2, 0.1 mM EDTA, 10 mM 2-mercaptoethanol 815 (BME), 5 mM putrescine dihydrochloride, and 1 mM spermidine (free base)) at 5 mM Mg(OAc)2 816 were incubated for 10 min at 37 ºC. Then 20 pmol of fMet-tRNAfMet in 2 µL of 10 mM KOAc (pH = 817 5) was added to the reaction, followed by an additional incubation of 10 min at 37 ºC. Finally, 10 818 pmol of bL9(Cy3)- and uL1(Cy5)-labeled 50S subunits in 1 µL of Reconstitution Buffer (20 mM 819 Tris-HCl (pH25°C = 7.8), 8 mM Mg(OAc)2, 150 mM NH4Cl, 0.2 mM EDTA, and 5 mM BME) was 820 added to the reaction to give a final volume of 10 µL, followed by a final incubation of 10 min at 821 37 ºC. The reaction was then adjusted to 100 µL with Tris-Polymix Buffer at 20 mM Mg(OAc)2, 822 loaded onto a 10-40% (w/v) sucrose gradient prepared in Tris-Polymix Buffer at 20 mM Mg(OAc)2, 823 and purified by sucrose density gradient ultracentrifugation to remove any free mRNA, IFs, and 824 fMet-tRNAfMet. Purified 70S ICs were aliquoted, flash frozen in liquid nitrogen, and stored at –80 825 ºC until use in smFRET experiments. 826 TCs were prepared in two steps. First, 300 pmol of EF-Tu and 200 pmol of EF-Ts in 8 µL of 827 Tris-Polymix Buffer at 5 mM Mg(OAc)2 supplemented with GTP Charging Components (1 mM 828 GTP, 3 mM phosphoenolpyruvate, and 2 units/mL pyruvate kinase) were incubated for 1 min at 829 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 37 ºC. Then, 30 pmol of aa-tRNA in 2 µL of 25 mM NaOAc (pH = 5) was added to the reaction, 830 followed by an additional incubation of 1 min at 37 ºC. This results in a TC solution with a final 831 volume of 10 µL that was then stored on ice until used for smFRET experiments. 832 To prepare PRE–A complexes, we first needed to assemble POST complexes. POST 833 complexes were assembled by first preparing a 10-µL solution of 70S IC and a 10-µL solution of 834 TC as described above. Separately, a solution of GTP-bound EF-G was prepared by incubating 835 120 pmol EF-G in 5 µL of Tris-Polymix Buffer at 5 mM Mg(OAc)2 supplemented with GTP 836 Charging Components for 2 min at room temperature. Then 10 µL of the 70S IC, 10 µL of the TC, 837 and 2.5 µL the GTP-bound EF-G solution were mixed, and incubated for 5 min at room 838 temperature and for additional 5 min on ice. The resulting POST complex was diluted by adjusting 839 the reaction volume to 100 µL with Tris-Polymix Buffer at 20 mM Mg(OAc)2 and purified via 840 sucrose density gradient ultracentrifugation as described above for the 70S ICs. Purified POST 841 complexes were aliquoted, flash frozen in liquid nitrogen, and stored at –80 ºC until use in 842 smFRET experiments. PRE–A complexes were then generated by mixing 3 µL of POST complex, 843 2 µL of a 10 mM puromycin solution (prepared in Nanopure water and filtered using a 0.22 µm 844 filter), and 15 µL of Tris-Polymix Buffer at 15 mM Mg(OAc)2 and incubating the mixture for 10 min 845 at room temperature. PRE–A complexes were used for smFRET experiments immediately upon 846 preparation. 847 848 smFRET imaging using total internal reflection fluorescence (TIRF) microscopy. 70S ICs or 849 PRE–A complexes were tethered to the PEG/biotin-PEG-passivated and streptavidin-derivatized 850 surface of a quartz microfluidic flowcell via a biotin-streptavidin-biotin bridge between the biotin-851 mRNA and the biotin-PEG37,43. Untethered 70S ICs or PRE–A complexes were removed from the 852 flowcell, and the flowcell was prepared for smFRET imaging experiments, by flushing it with Tris-853 Polymix Buffer at 15 mM Mg(OAc)2 supplemented with an Oxygen-Scavenging System (2.5 mM 854 protocatechuic acid (pH = 9) (Sigma Aldrich) and 250 nM protocatechuate-3,4-dioxygenase (pH 855 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 = 7.8) (Sigma Aldrich))64 and a Triplet-State-Quencher Cocktail (1 mM 1,3,5,7-cyclooctatetraene 856 (Aldrich) and 1 mM 3-nitrobenzyl alcohol (Fluka))65. 857 Tethered 70S ICs or PRE–A complexes were imaged at single-molecule resolution using a 858 laboratory-built, wide-field, prism-based total internal reflection fluorescence (TIRF) microscope 859 with a 532-nm, diode-pumped, solid-state laser (Laser Quantum) excitation source delivering a 860 power of 16-25 mW as measured at the prism to ensure the same power density on the imaging 861 plane. The Cy3 and Cy5 fluorescence emissions were simultaneously collected by a 1.2 862 numerical aperture, 60´, water-immersion objective (Nikon) and separated based on wavelength 863 using a two-channel, simultaneous-imaging system (Dual ViewTM, Optical Insights LLC). The Cy3 864 and Cy5 fluorescence intensities were recorded using a 1024 ´ 1024 pixel, back-illuminated 865 electron-multiplying charge-coupled-device (EMCCD) camera (Andor iXon Ultra 888) operating 866 with 2 ´ 2 pixel binning at an acquisition time of 0.1 seconds per frame controlled by software 867 μManager 1.4. This microscope allows direct visualization of thousands of individual 70S ICs or 868 PRE-A complexes in a field-of-view of 115 × 230 µm2. Each movie was composed of 600 frames 869 in order to ensure that the majority of the fluorophores in the field-of-view were photobleached 870 within the observation period. For stopped-flow experiments using tethered 70S ICs, we delivered 871 0.25 µM of G37-state SufB2- or ProL-TC in the absence of EF-G or, when specified, in the 872 presence of a 2 µM saturating concentration of EF-G. Stopped-flow experiments proceeded by 873 recording an initial pre-steady-state movie of a field-of-view that captured conformational changes 874 taking place during delivery followed by recording of one or more steady-state movies of different 875 fields-of-view that captured conformational changes taking place the specified number of minutes 876 post-delivery. 877 878 Analysis of smFRET experiments. For each TIRF microscopy movie, we identified 879 fluorophores, aligned Cy3 and Cy5 imaging channels, and generated fluorescence intensity vs. 880 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 time trajectories for each pair of Cy3 and Cy5 fluorophores using custom-written software 881 (manuscript in preparation; Jason Hon, Colin Kinz-Thompson, Ruben L. Gonzalez) as described 882 previously66. For each time point, Cy5 fluorescence intensity values were corrected for Cy3 883 bleedthrough by subtracting 5% of the Cy3 fluorescence intensity value in the corresponding Cy3 884 fluorescence intensity vs. time trajectory. EFRET vs. time trajectories were generated by using the 885 Cy3 fluorescence intensity (ICy3) and the bleedthrough-corrected Cy5 fluorescence intensity (ICy5) 886 from each aligned pair of Cy3 and Cy5 fluorophores to calculate the EFRET value at each time point 887 using EFRET = (ICy5 / (ICy5 + ICy3)). 888 For both pre-steady-state and steady-state movies (Figures 6d-6h and Supplementary 889 Figures 3, 5, and 6, Supplementary Tables 4-7), an EFRET vs. time trajectory was selected for 890 further analysis if all of the transitions in the fluorescence intensity vs. time trajectory were anti-891 correlated for the corresponding, aligned pair of Cy3 and Cy5 fluorophores, and the Cy3 892 fluorescence intensity vs. time trajectory underwent single-step Cy3 photobleaching, 893 demonstrating it arose from a single ribosomal complex. In the case of pre-steady-state movies 894 (Figures 6d-6g, Supplementary Figures 3 and 5 and Tables 4-6), EFRET vs. time trajectories had 895 to meet two additional criteria in order to be selected for further analysis: (i) EFRET vs. time 896 trajectories had to be stably sampling EFRET = 0.55 prior to TC delivery, thereby confirming that 897 the corresponding ribosomal complex was a 70S IC carrying an fMet-tRNAfMet at the P site and 898 (ii) EFRET vs. time trajectories had to exhibit at least one 0.55→0.31 transition after delivery of TCs, 899 thereby confirming that the corresponding 70S IC had accommodated a Pro-SufB2 or Pro-ProL 900 into the A site, that the A site-bound Pro-SufB2 or Pro-ProL had participated as the acceptor in 901 peptide-bond formation, and that the resulting PRE complex was capable of undergoing 902 GS1→GS2 transitions. We note that the second criterion might result in the exclusion of EFRET vs. 903 time trajectories in which Cy3 or Cy5 simply photobleached prior to undergoing a 0.55→0.31 904 transition, and could therefore result in a slight overestimation of k70S IC→GS2 and/or kGS1→GS2 (see 905 below for a detailed description of how k70S IC→GS2, kGS1→GS2, and other kinetic and thermodynamic 906 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 parameters were estimated). Nonetheless, the number of such EFRET vs. time trajectories should 907 be exceedingly small. This is because the rates with which the fluorophore that photobleached 908 the fastest, Cy5, entered into the photobleached state (Æ) from the GS1, GS2, EF-G-bound GS2-909 like, and POST states were kGS1→Æ = 0.04 ± 0.02 s–1, kGS2→Æ = 0.07 ± 0.01 s–1, kGS2(G)→Æ = 0.07 ± 910 0.01 s–1 (where the subscript “(G)” denotes experiments performed in the presence of EF-G), and 911 kPOST→Æ 0.05 ± 0.02 s–1, respectively (see below for a detailed description of how kGS1→Æ, kGS2→Æ, 912 kGS2(G)→Æ, and kPOST→Æ were estimated). These rates are, on average, about 11-fold lower than 913 those of k70S IC → GS2 and kGS1 → GS2 (0.3–0.6 s–1 and 0.58–0.82 s–1 (Supplementary Table 4)). 914 Consequently, we do not expect the measurements of k70S IC→GS2 and kGS1→GS2 to be limited by 915 Cy3 or Cy5 photobleaching. Additionally, even if k70S IC→GS2 and kGS1→GS2 were slightly 916 overestimated, they would be expected to be equally overestimated for SufB2- and ProL 917 ribosomal complexes given that the rate of photobleaching would be expected to be very similar 918 for SufB2- and ProL ribosomal complexes. Furthermore, because we are primarily concerned with 919 the relative values of k70S IC→GS2 and kGS1→GS2 for SufB2- vs. ProL ribosomal complexes, rather 920 than with the absolute values of k70S IC→GS2 and kGS1→GS2 for the SufB2- and ProL ribosomal 921 complexes, such slight overestimations do not affect the conclusions of the work presented here. 922 To calculate k70S IC→GS2 and the corresponding error from the pre-steady-state experiments, 923 we analyzed the 70S IC survival probabilities (Supplementary Figure 4, Tables 4 and 5)37,67. 924 Briefly, for each trajectory, we extracted the time interval during which we were waiting for the 925 70S IC to undergo a transition to GS2 and used these ‘waiting times’ to construct a 70S IC survival 926 probability distribution, as shown in Supplementary Figure 4. All 70S IC survival probability 927 distributions were best described by a single exponential decay function of the type 928 𝑌 = 𝐴e("#/𝜏!"# %&) , (1) 929 where Y is survival probability, A is the initial population of 70S IC, t is time, and τ70S IC is the 930 time constant with which 70S IC transitions to a PRE complex in the GS2 state. k70S IC→GS2 was 931 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 then calculated using the equation k70S IC→GS2 = 1 / τ70S IC. Errors were calculated as the standard 932 deviation of technical triplicates. 933 Six sets of kinetic and/or thermodynamic parameters were calculated from hidden Markov 934 model (HMM) analyses of the recorded movies. These parameters are defined here as: (i) 935 kGS1→GS2, kGS2→GS1, and Keq from the pre-steady-state and steady-state movies recorded for the 936 delivery of SufB2- and ProL-TCs in the absence of EF-G (Figures 6d, 6f, and Supplementary 937 Figure 3 and Table 4); (ii) kGS2→POST from the pre-steady-state movie recorded for the delivery of 938 ProL-TC in the presence of EF-G (Figures 6e, 6g, and Supplementary Figure 5 and Table 5); (iii) 939 the fractional population of the POST complex from the pre-steady-state and steady-state movies 940 recorded for the delivery of SufB2- and ProL-TCs in the presence of EF-G (Figures 6e, 6g, and 941 Supplementary Figure 5 and Table 5); (iv) kGS1→GS2, kGS2→GS1, and Keq from a sub-population of 942 PRE complexes that lacked an A site-bound, deacylated SufB2 in the steady-state movies 943 recorded for the longer time points (i.e., 3, 10, and 20 min) after the delivery of SufB2-TC in the 944 presence of EF-G (Figures 6g, Supplementary Table 6); (v) kGS1→GS2, kGS2→GS1, and Keq from the 945 steady-state movies recorded for the SufB2- and ProL PRE–A complexes (Figures 6h and 946 Supplementary Figure 6 and Table 7); and (vi) kGS1→Æ, kGS2→Æ, kGS2(G)→Æ, and kPOST→Æ from the 947 movies described in (i)-(v) (Figures 6d-6h, Supplementary Figures 3, 5, and 6, and reported two 948 paragraphs above). To calculate these parameters, we extended the variational Bayes approach 949 we introduced in the vbFRET algorithm68 to estimate a ‘consensus’ (i.e., ‘global’) HMM of the 950 EFRET vs. time trajectories. In this approach, we use Bayesian inference to estimate a single, 951 consensus HMM that is most consistent with all the EFRET vs. time trajectories in a movie, rather 952 than to estimate a separate HMM for each trajectory in the movie. To estimate such a consensus 953 HMM, we assume each trajectory is independent and identically distributed, thereby enabling us 954 to perform the inference using the likelihood function 955 ℒ = ∏ ℒ& & ∈ )*+,-.)/*0-1 , (2) 956 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 where ℒ& is the variational approximation of the likelihood function for a single trajectory. 957 Subsequently, the single, consensus HMM that is most consistent with all of the trajectories is 958 estimated using the expectation-maximization algorithm that we have previously described68. 959 Viterbi paths (Supplementary Figures 3, 5, and 6), representing the most probable hidden-state 960 trajectory, were then calculated from the HMM using the Viterbi algorithm69. Based on extensive 961 smFRET studies of translation elongation using the bL9(Cy3)-uL1(Cy5) smFRET signal35,36,38, we 962 selected a consensus HMM composed of three states for further analysis of the data. For 963 calculation of the kinetic and/or thermodynamic parameters in (i), (iv), and (v), the three states 964 corresponded to GS1, GS2, and Æ and for calculation of the kinetic and/or thermodynamic 965 parameters in (ii) and (iii), the three states corresponded to EF-G-bound GS2-like, POST, and Æ. 966 The transition matrix of the consensus HMM was then used to calculate kGS1→GS2 and kGS2→GS1 in 967 (i), (iv), and (v); kGS2→POST in (ii); kGS1→Æ, kGS2→Æ, kGS2(G)→Æ, and kPOST→Æ in (vi); and the errors 968 corresponding to each of these parameters. This transition matrix consists of a 3 x 3 matrix in 969 which the off-diagonal elements correspond to the number of times a transition takes place 970 between each pair of the GS1, GS2, and Æ states (in (i), (iv), (v), and (vi)) or each pair of the EF-971 G-bound GS2-like, POST, and Æ states (in (ii) and (vi)) and the on-diagonal elements correspond 972 to the number of times a transition does not take place out of the GS1, GS2, and Æ states (in (i), 973 (iv), (v), and (vi)) or out of the EF-G-bound GS2-like, POST, and Æ states (in (ii) and (vi)). Each 974 element of this matrix parameterizes a Dirichlet distribution, from which we calculated the mean 975 and the square root of the variance for four transition probabilities pGS1→GS2, pGS2→GS1, pGS1→Æ, and 976 pGS2→Æ (in (i), (iv), (v), and (vi)) or for three transition probabilities pGS2→POST, pGS2(G)→Æ, and pPOST→977 Æ (in (ii) and (vi)). These transition probabilities were then used to calculate the corresponding 978 four rate constants, kGS1→GS2, kGS2→GS1, kGS1→Æ, and kGS2→Æ (in (i), (iv), (v), and (vi)) or three rate 979 constants, kGS2→POST, kGS2(G)→Æ, and kPOST→Æ (in (ii) and (vi)) using the equation 980 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 𝑘 = − ln(1 − 𝑝) 𝑡 , (3) 981 where t is the time interval between data points (t = 0.1 s). We propagated the error for the 982 transition probabilities into the error for the rate constants using the equation 983 𝜎2 = 𝜎3 (1 − 𝑝) × 𝑡 , (4) 984 where 𝜎3 is the standard deviation of the variance of p and 𝜎2 is the standard deviation of the 985 variance of k. Keq in (i), (iv), and (v) was determined using the equation Keq = kGS1→GS2 / kGS2→GS1. 986 The fractional populations of the POST complex in (iii) and the corresponding errors were 987 calculated by marginalizing, which in this case simply amounts to calculating the mean and the 988 standard error of the mean, for the conditional probabilities of each EFRET data point given each 989 hidden state. Because the data points preceding the initial 70S IC→GS2 transition in the pre-990 steady-state movies do not contribute to the kinetic and/or thermodynamic parameters in (i)-(vi), 991 these data points were not included in the analyses that were used to determine these 992 thermodynamic parameters. 993 994 QUANTIFICATION AND STATISTICAL ANALYSES 995 All ensemble biochemical experiments and cell-based reporter assays were repeated at least 996 three times and the mean values and standard deviations for each experiment or assay are 997 reported. Technical replicates of all smFRET experiments were repeated at least three times and 998 trajectories from all of the technical replicates for each experiment were combined prior to 999 generating the surface contour plot of the time evolution of population FRET and modeling with 1000 the HMM. Mean values and errors for the transition rates and fractional populations determined 1001 from modeling with an HMM are reported (for details see “Analysis of smFRET experiments” in 1002 Methods). Mean values and standard deviations for the k70S IC→GS2s were determined from 1003 technical triplicates of the survival plots analysis for each experiment and are reported. 1004 1005 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 41 DATA AND CODE AVAILABILITY 1006 1007 Data Availability 1008 With the exception of the smFRET data, all other data supporting the findings of this study are 1009 presented within this article. Due to the lack of a public repository for smFRET data, the smFRET 1010 data supporting the findings of this study are available from the corresponding authors upon 1011 request. Source data are provided with this paper. 1012 Code Availability 1013 The code used to analyze the TIRF movies in this study is described in a manuscript in preparation 1014 (Jason Hon, Colin Kinz-Thompson, Ruben L. Gonzalez), where R.L.G. is the corresponding 1015 author. Therefore, the code is available from R.L.G, upon request. 1016 1017 1018 1019 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 42 REFERENCES 1020 1. Wang, K., Schmied, W.H. & Chin, J.W. Reprogramming the genetic code: from triplet to 1021 quadruplet codes. Angew Chem Int Ed Engl 51, 2288-97 (2012). 1022 2. Chen, Y. et al. Controlling the Replication of a Genomically Recoded HIV-1 with a 1023 Functional Quadruplet Codon in Mammalian Cells. ACS Synth Biol 7, 1612-1617 (2018). 1024 3. Lee, B.S., Kim, S., Ko, B.J. & Yoo, T.H. An efficient system for incorporation of unnatural 1025 amino acids in response to the four-base codon AGGA in Escherichia coli. Biochim 1026 Biophys Acta 1861, 3016-3023 (2017). 1027 4. Chatterjee, A., Lajoie, M.J., Xiao, H., Church, G.M. & Schultz, P.G. A bacterial strain with 1028 a unique quadruplet codon specifying non-native amino acids. Chembiochem 15, 1782-6 1029 (2014). 1030 5. Niu, W., Schultz, P.G. & Guo, J. An expanded genetic code in mammalian cells with a 1031 functional quadruplet codon. ACS Chem Biol 8, 1640-5 (2013). 1032 6. Wang, N., Shang, X., Cerny, R., Niu, W. & Guo, J. Systematic Evolution and Study of 1033 UAGN Decoding tRNAs in a Genomically Recoded Bacteria. Sci Rep 6, 21898 (2016). 1034 7. Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J.W. Encoding multiple 1035 unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 1036 441-4 (2010). 1037 8. Wang, K. et al. Optimized orthogonal translation of unnatural amino acids enables 1038 spontaneous protein double-labelling and FRET. Nat Chem 6, 393-403 (2014). 1039 9. Atkins, J.F., Loughran, G., Bhatt, P.R., Firth, A.E. & Baranov, P.V. Ribosomal 1040 frameshifting and transcriptional slippage: From genetic steganography and 1041 cryptography to adventitious use. Nucleic Acids Res 44, 7007-78 (2016). 1042 10. Atkins, J.F. & Bjork, G.R. A gripping tale of ribosomal frameshifting: extragenic 1043 suppressors of frameshift mutations spotlight P-site realignment. Microbiol Mol Biol Rev 1044 73, 178-210 (2009). 1045 11. Roth, J.R. Frameshift suppression. Cell 24, 601-2 (1981). 1046 12. Bossi, L. & Roth, J.R. Four-base codons ACCA, ACCU and ACCC are recognized by 1047 frameshift suppressor sufJ. Cell 25, 489-96 (1981). 1048 13. Qian, Q. et al. A new model for phenotypic suppression of frameshift mutations by 1049 mutant tRNAs. Mol Cell 1, 471-82 (1998). 1050 14. Weiss, R.B., Dunn, D.M., Shuh, M., Atkins, J.F. & Gesteland, R.F. E. coli ribosomes re-1051 phase on retroviral frameshift signals at rates ranging from 2 to 50 percent. New Biol 1, 1052 159-69 (1989). 1053 15. Jager, G., Nilsson, K. & Bjork, G.R. The phenotype of many independently isolated +1 1054 frameshift suppressor mutants supports a pivotal role of the P-site in reading frame 1055 maintenance. PLoS One 8, e60246 (2013). 1056 16. Fagan, C.E., Maehigashi, T., Dunkle, J.A., Miles, S.J. & Dunham, C.M. Structural 1057 insights into translational recoding by frameshift suppressor tRNASufJ. RNA 20, 1944-54 1058 (2014). 1059 17. Maehigashi, T., Dunkle, J.A., Miles, S.J. & Dunham, C.M. Structural insights into +1 1060 frameshifting promoted by expanded or modification-deficient anticodon stem loops. 1061 Proc Natl Acad Sci U S A 111, 12740-5 (2014). 1062 18. Dunham, C.M. et al. Structures of tRNAs with an expanded anticodon loop in the 1063 decoding center of the 30S ribosomal subunit. RNA 13, 817-23 (2007). 1064 19. Hong, S. et al. Mechanism of tRNA-mediated +1 ribosomal frameshifting. Proc Natl Acad 1065 Sci U S A 115, 11226-11231 (2018). 1066 20. Sroga, G.E., Nemoto, F., Kuchino, Y. & Bjork, G.R. Insertion (sufB) in the anticodon loop 1067 or base substitution (sufC) in the anticodon stem of tRNA(Pro)2 from Salmonella 1068 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 43 typhimurium induces suppression of frameshift mutations. Nucleic Acids Res 20, 3463-9 1069 (1992). 1070 21. Caliskan, N., Katunin, V.I., Belardinelli, R., Peske, F. & Rodnina, M.V. Programmed -1 1071 frameshifting by kinetic partitioning during impeded translocation. Cell 157, 1619-31 1072 (2014). 1073 22. Taylor, D.J. et al. Structures of modified eEF2 80S ribosome complexes reveal the role 1074 of GTP hydrolysis in translocation. EMBO J 26, 2421-31 (2007). 1075 23. Khade, P.K. & Joseph, S. Messenger RNA interactions in the decoding center control 1076 the rate of translocation. Nat Struct Mol Biol 18, 1300-2 (2011). 1077 24. Liu, G. et al. EF-G catalyzes tRNA translocation by disrupting interactions between 1078 decoding center and codon-anticodon duplex. Nat Struct Mol Biol 21, 817-24 (2014). 1079 25. Abeyrathne, P.D., Koh, C.S., Grant, T., Grigorieff, N. & Korostelev, A.A. Ensemble cryo-1080 EM uncovers inchworm-like translocation of a viral IRES through the ribosome. Elife 5, 1081 doi: 10.7554/eLife.14874 (2016). 1082 26. Schuwirth, B.S. et al. Structures of the bacterial ribosome at 3.5 A resolution. Science 1083 310, 827-34 (2005). 1084 27. Pulk, A. & Cate, J.H. Control of ribosomal subunit rotation by elongation factor G. 1085 Science 340, 1235970 (2013). 1086 28. Ratje, A.H. et al. Head swivel on the ribosome facilitates translocation by means of intra-1087 subunit tRNA hybrid sites. Nature 468, 713-6 (2010). 1088 29. Gamper, H.B., Masuda, I., Frenkel-Morgenstern, M. & Hou, Y.M. Maintenance of protein 1089 synthesis reading frame by EF-P and m(1)G37-tRNA. Nat Commun 6, 7226 (2015). 1090 30. Masuda, I. et al. tRNA Methylation Is a Global Determinant of Bacterial Multi-drug 1091 Resistance. Cell Syst 8, 302-314 e8 (2019). 1092 31. Christian, T. & Hou, Y.M. Distinct determinants of tRNA recognition by the TrmD and 1093 Trm5 methyl transferases. J Mol Biol 373, 623-32 (2007). 1094 32. Murakami, H., Ohta, A., Ashigai, H. & Suga, H. A highly flexible tRNA acylation method 1095 for non-natural polypeptide synthesis. Nat Methods 3, 357-9 (2006). 1096 33. Walker, S.E. & Fredrick, K. Recognition and positioning of mRNA in the ribosome by 1097 tRNAs with expanded anticodons. J Mol Biol 360, 599-609 (2006). 1098 34. Gamper, H.B., Masuda, I., Frenkel-Morgenstern, M. & Hou, Y.M. The UGG Isoacceptor 1099 of tRNAPro Is Naturally Prone to Frameshifts. Int J Mol Sci 16, 14866-83 (2015). 1100 35. Fei, J. et al. Allosteric collaboration between elongation factor G and the ribosomal L1 1101 stalk directs tRNA movements during translation. Proc Natl Acad Sci U S A 106, 15702-1102 7 (2009). 1103 36. Ning, W., Fei, J. & Gonzalez, R.L., Jr. The ribosome uses cooperative conformational 1104 changes to maximize and regulate the efficiency of translation. Proc Natl Acad Sci U S A 1105 111, 12073-8 (2014). 1106 37. Fei, J., Kosuri, P., MacDougall, D.D. & Gonzalez, R.L., Jr. Coupling of ribosomal L1 stalk 1107 and tRNA dynamics during translation elongation. Mol Cell 30, 348-59 (2008). 1108 38. Fei, J., Richard, A.C., Bronson, J.E. & Gonzalez, R.L., Jr. Transfer RNA-mediated 1109 regulation of ribosome dynamics during protein synthesis. Nat Struct Mol Biol 18, 1043-1110 51 (2011). 1111 39. Boel, G. et al. The ABC-F protein EttA gates ribosome entry into the translation 1112 elongation cycle. Nat Struct Mol Biol 21, 143-51 (2014). 1113 40. Chen, B. et al. EttA regulates translation by binding the ribosomal E site and restricting 1114 ribosome-tRNA dynamics. Nat Struct Mol Biol 21, 152-9 (2014). 1115 41. Kim, H.K. et al. A frameshifting stimulatory stem loop destabilizes the hybrid state and 1116 impedes ribosomal translocation. Proc Natl Acad Sci U S A 111, 5538-43 (2014). 1117 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 44 42. Munro, J.B., Wasserman, M.R., Altman, R.B., Wang, L. & Blanchard, S.C. Correlated 1118 conformational events in EF-G and the ribosome regulate translocation. Nat Struct Mol 1119 Biol 17, 1470-7 (2010). 1120 43. Blanchard, S.C., Kim, H.D., Gonzalez, R.L., Jr., Puglisi, J.D. & Chu, S. tRNA dynamics 1121 on the ribosome during translation. Proc Natl Acad Sci U S A 101, 12893-8 (2004). 1122 44. Studer, S.M., Feinberg, J.S. & Joseph, S. Rapid kinetic analysis of EF-G-dependent 1123 mRNA translocation in the ribosome. J Mol Biol 327, 369-81 (2003). 1124 45. Wintermeyer, W. & Rodnina, M.V. Translational elongation factor G: a GTP-driven motor 1125 of the ribosome. Essays Biochem 35, 117-29 (2000). 1126 46. Ermolenko, D.N. et al. Observation of intersubunit movement of the ribosome in solution 1127 using FRET. J Mol Biol 370, 530-40 (2007). 1128 47. Ermolenko, D.N. & Noller, H.F. mRNA translocation occurs during the second step of 1129 ribosomal intersubunit rotation. Nat Struct Mol Biol 18, 457-62 (2011). 1130 48. Cornish, P.V. et al. Following movement of the L1 stalk between three functional states 1131 in single ribosomes. Proc Natl Acad Sci U S A 106, 2571-6 (2009). 1132 49. Nguyen, H.A., Hoffer, E.D. & Dunham, C.M. Importance of a tRNA anticodon loop 1133 modification and a conserved, noncanonical anticodon stem pairing in tRNACGGProfor 1134 decoding. J Biol Chem 294, 5281-5291 (2019). 1135 50. Guo, Z. & Noller, H.F. Rotation of the head of the 30S ribosomal subunit during mRNA 1136 translocation. Proc Natl Acad Sci U S A 109, 20391-4 (2012). 1137 51. Zhou, J., Lancaster, L., Donohue, J.P. & Noller, H.F. Spontaneous ribosomal 1138 translocation of mRNA and tRNAs into a chimeric hybrid state. Proc Natl Acad Sci U S A 1139 116, 7813-7818 (2019). 1140 52. Korniy, N., Samatova, E., Anokhina, M.M., Peske, F. & Rodnina, M.V. Mechanisms and 1141 biomedical implications of -1 programmed ribosome frameshifting on viral and bacterial 1142 mRNAs. FEBS Lett 593, 1468-1482 (2019). 1143 53. Lajoie, M.J. et al. Genomically recoded organisms expand biological functions. Science 1144 342, 357-60 (2013). 1145 54. Wang, K., de la Torre, D., Robertson, W.E. & Chin, J.W. Programmed chromosome 1146 fission and fusion enable precise large-scale genome rearrangement and assembly. 1147 Science 365, 922-926 (2019). 1148 55. Mohan, S., Donohue, J.P. & Noller, H.F. Molecular mechanics of 30S subunit head 1149 rotation. Proc Natl Acad Sci U S A 111, 13325-30 (2014). 1150 56. Kaledhonkar, S. et al. Late steps in bacterial translation initiation visualized using time-1151 resolved cryo-EM. Nature 570, 400-404 (2019). 1152 57. Chen, B. et al. Structural dynamics of ribosome subunit association studied by mixing-1153 spraying time-resolved cryogenic electron microscopy. Structure 23, 1097-105 (2015). 1154 58. Reinkemeier, C.D., Girona, G.E. & Lemke, E.A. Designer membraneless organelles 1155 enable codon reassignment of selected mRNAs in eukaryotes. Science 363(2019). 1156 59. Datsenko, K.A. & Wanner, B.L. One-step inactivation of chromosomal genes in 1157 Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97, 6640-5 (2000). 1158 60. Fei, J. et al. A highly purified, fluorescently labeled in vitro translation system for single-1159 molecule studies of protein synthesis. Methods Enzymol 472, 221-59 (2010). 1160 61. Christian, T., Lahoud, G., Liu, C. & Hou, Y.M. Control of catalytic cycle by a pair of 1161 analogous tRNA modification enzymes. J Mol Biol 400, 204-17 (2010). 1162 62. Zhang, C.M., Perona, J.J., Ryu, K., Francklyn, C. & Hou, Y.M. Distinct kinetic 1163 mechanisms of the two classes of Aminoacyl-tRNA synthetases. J Mol Biol 361, 300-11 1164 (2006). 1165 63. Peacock, J.R. et al. Amino acid-dependent stability of the acyl linkage in aminoacyl-1166 tRNA. RNA 20, 758-64 (2014). 1167 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 45 64. Aitken, C.E., Marshall, R.A. & Puglisi, J.D. An oxygen scavenging system for 1168 improvement of dye stability in single-molecule fluorescence experiments. Biophys J 94, 1169 1826-35 (2008). 1170 65. Gonzalez, R.L., Jr., Chu, S. & Puglisi, J.D. Thiostrepton inhibition of tRNA delivery to the 1171 ribosome. RNA 13, 2091-7 (2007). 1172 66. Desai, B.J. & Gonzalez, R.L., Jr. Multiplexed, bioorthogonal labeling of multicomponent, 1173 biomolecular complexes using genomically encoded, non-canonical amino acids. 1174 bioRxiv doi: 10.1101/730465(2019). 1175 67. MacDougall, D.D. & Gonzalez, R.L., Jr. Translation initiation factor 3 regulates switching 1176 between different modes of ribosomal subunit joining. J Mol Biol 427, 1801-18 (2015). 1177 68. Bronson, J.E., Fei, J., Hofman, J.M., Gonzalez, R.L., Jr. & Wiggins, C.H. Learning rates 1178 and states from biophysical time series: a Bayesian approach to model selection and 1179 single-molecule FRET data. Biophys J 97, 3196-205 (2009). 1180 69. Viterbi, A.J. Error bounds for convolutional codes and an asymptotically optimum 1181 decoding algorithm. IEEE Trans. Inform. Theory 13, 260-269 (1967). 1182 70. Agirrezabala, X. et al. Visualization of the hybrid state of tRNA binding promoted by 1183 spontaneous ratcheting of the ribosome. Mol Cell 32, 190-7 (2008). 1184 1185 1186 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 46 ACKNOWLEDGEMENTS 1187 We thank Dr. Hajime Tokuda for rabbit polyclonal anti-LolB antibodies, Dr. Colin Kinz-1188 Thompson and Korak Kumar Ray for help with smFRET data analysis. R.L.G. and H.L. thank the 1189 Columbia University Precision Biomolecular Characterization Facility for access to and support of 1190 instrumentation. This work was supported by NIH grants GM134931 to Y-M.H. and GM119386 to 1191 R.L.G., a Charles H. Revson Foundation Postdoctoral Fellowship in Biomedical Science 19-24 to 1192 H.L., a Japanese JSPS overseas postdoctoral fellowship to I.M., and NSF grant CHE-1708759 to 1193 E.J.P. 1194 1195 AUTHOR CONTRIBUTIONS 1196 H.G. conceived of and performed ensemble rapid kinetic assays, R.L.G. and H.L. conceived 1197 of and designed smFRET assays, H.L. performed smFRET assays, I.M. performed cell-based 1198 reporter assays, D.M.R. and E.J.P. generated aminoacyl-DBE derivatives, T.C. performed G37 1199 methylation and aminoacylation assays, and A.B.C. and G.B. provided E. coli 70S ribosomes. 1200 Y.M.H. and R.L.G. wrote the manuscript. 1201 1202 COMPETING FINANICAL INTERESTS 1203 The authors declare no competing interests. 1204 1205 CONTACT FOR REAGENT AND RESOURCE SHARING 1206 Further information and requests for resources and reagents should be directed to and will be 1207 fulfilled by the lead contacts Ruben L. Gonzalez, Jr. (rlg2118@columbia.edu) and Ya-Ming Hou 1208 (ya-ming.hou@jefferson.edu). 1209 1210 1211 1212 1213 1214 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 47 FIGURE LEGENDS 1215 1216 Figure 1. Methylation and aminoacylation of SufB2 and ProL. a Sequence and secondary 1217 structure of native-state SufB2, showing the N1-methylated G37 in red and the G37a insertion to 1218 ProL in blue. b RNase T1 cleavage inhibition assays of TrmD-methylated G37-state SufB2 1219 transcript confirm the presence of m1G37 and m1G37a. Cleavage products are marked by the 1220 nucleotide positions of Gs. L: the molecular ladder of tRNA fragments generated from alkali 1221 hydrolysis. c Primer extension inhibition assays identify m1G37 in native-state SufB2. Red and 1222 blue arrows indicate positions of primer extension inhibition products at the methylated G37 and 1223 G37a, respectively, which are offset by one nucleotide relative to ProL. The first primer extension 1224 inhibition product for SufB2 corresponds to m1G37a, the second corresponds to m1G37, while the 1225 primer extension inhibition product for ProL corresponds to m1G37. Due to the propensity of 1226 primer extension to make multiple stops on a long transcript of tRNA, the read-through primer 1227 extension product (54-55 nucleotides) had a reduced intensity relative to the primer extension 1228 inhibition products (21-22 nucleotides). Molecular size markers are provided by the primer alone 1229 (17 nucleotides) and the run-off products (54-55 nucleotides). d TrmD-catalyzed N1 methylation 1230 of G37-state SufB2 and ProL as a function of time. e, f ProRS-catalyzed aminoacylation. e 1231 Aminoacylation of native-state SufB2 and ProL. f Aminoacylation of G37-state SufB2 and ProL 1232 as a function of time. In panels b, c, gels were performed three times with similar results, while in 1233 panels d-f, the bars are SD of three independent (n = 3) experiments, and the data are presented 1234 as mean values ± SD. 1235 1236 Figure 2. SufB2-induced +1 frameshifting and genome recoding. a The +1-frameshifting 1237 efficiency in cell-based lacZ assay for SufB2 and ProL strains in m1G37+ and m1G37– conditions. 1238 The bars in the graph are SD of four, five, or six independent (n = 4, 5, or 6) biological repeats, 1239 and the data are mean values ± SD. b The difference in the ratio of protein synthesis of lolB to 1240 cysS for SufB2 and ProL strains in m1G37+ and m1G37– conditions relative to ProL in the m1G37+ 1241 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 48 condition. c Measurements underlying the bar plots in panel b. Each ratio was measured directly 1242 and the ratio of ProL in the m1G37+ condition was normalized to 1.0. The difference of each ratio 1243 relative to the normalized ratio represented the +1-frameshifting efficiency at the CCC-C motif at 1244 the 2nd codon of lolB. The bars in the graph are SD of three independent (n = 3) biological repeats, 1245 and the data are mean values ± SD. In a, b, decoding of the CCC-C motif was mediated by SufB2 1246 and ProM in the SufB2 strain, and by ProL and ProM in the ProL strain, where the presence of 1247 ProM ensured no vacancy at the CCC-C motif. The increased +1 frameshifting in the m1G37– 1248 condition vs. the m1G37+ condition indicates that SufB2 and ProL are each an active determinant 1249 in decoding the CCC-C motif. d SufB2-mediated insertion of non-proteinogenic amino acids at 1250 the CCC-C motif in the 5th codon position of folA using [35S]-Met-dependent in vitro translation. 1251 Reporters of folA are denoted by +/– CCC-C, where “+” and “–” indicate constructs with and 1252 without the CCC-C motif. SDS-PAGE analysis identifies full-length DHFR resulting from a +1-1253 frameshift event at the CCC-C motif by SufB2 pre-aminoacylated with the amino acid shown at 1254 the top of each lane, a DC fragment resulting from lack of the +1-frameshift event, and a DN 1255 fragment resulting from translation initiation at the AUG codon likely at position 17 or 21 1256 downstream from the CCC-C motif. Gel samples were derived from the same experiment, which 1257 was performed five times with similar results. Gels for each experiment were processed in parallel. 1258 Lane 1: full-length DHFR as the molecular marker; deacyl: deacylated tRNA. 1259 1260 Figure 3. SufB2 uses a triplet anticodon-codon pairing scheme at the A site. a GTP 1261 hydrolysis by EF-Tu as a function of time for delivery of G37- or native-state SufB2- or ProL-TC 1262 to the A site of a 70S IC. Although the concentration of TCs was limiting, which would limit the 1263 rate of binding of TCs to the 70S IC, the observed differences in the yield of GTPase activity 1264 indicated that binding was not the sole determinant, but that other factors, such as the identity 1265 and the methylation state of the tRNA, affected the GTPase activity. b Dipeptide fMP formation 1266 as a function of time for delivery of G37- or native-state SufB2- or ProL-TC to the A site of a 70S 1267 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 49 IC. Due to the limiting concentration of the 70S IC, which did not include the tRNA substrate, the 1268 yield of di- or tri-peptide formation assays was constant even with different tRNAs in TCs. c The 1269 yield of fMP and fMR in dipeptide formation assays in which equimolar mixtures of native-state 1270 SufB2-TC, carrying Pro and/or Arg, and/or native-state ProL-TC, carrying Pro and/or Arg, are 1271 delivered to 70S ICs. The mRNA in 70S ICs in (A-C) is AUG-CCC-CGU-U. d Dipeptide formation 1272 rate kfMP,obs for delivery of G37-state SufB2-TC to 70S ICs containing sequence variants of the 1273 CCC-C motif in the A site. In panels a, b, the bars in the graphs are SD of three independent (n 1274 = 3) experiments, in panel c, the bars in the graphs are SD of four independent (n = 4) experiments, 1275 and in panel d, the bars in the graphs are SD of three or four independent (n = 3 or 4) experiments. 1276 All data are presented as mean values ± SD. ∆t: a time interval, ND: not detected. 1277 1278 Figure 4. Plasticity of SufB2-induced +1 frameshifting. a fMP formation as a function of time 1279 upon delivery of the G37C variant of G37-state SufB2-TC to the A site of a 70S IC, allowing 1280 nucleotides 34-36 to pair with a CCC-C motif at the A site. b fMP formation as a function of time 1281 upon delivery of the G34C variant of G37-state SufB2-TC to the A site of a 70S IC, allowing 1282 nucleotides 35-37 to pair with a CCC-C motif. c-f Results of fMPV formation assays in which 1283 SufB2-TC is delivered to an A site programmed with a quadruplet codon at the 2nd position and 1284 sequences of the SufB2 anticodon loop and/or quadruplet codon are varied. Yields of fMPV 1285 formation represent +1 frameshifting during translocation of SufB2 from the A site to the P site. 1286 Possible +1-frame anticodon-codon pairing schemes of SufB2 during translocation: c G37-state 1287 SufB2 capable of frameshifting at a CCC-C motif via quadruplet pairing and/or triplet slippage, d 1288 G37C variant of G37-state SufB2 capable of frameshifting at a GCC-C motif via quadruplet pairing 1289 and/or triplet slippage, e m1G37-state SufB2 capable of frameshifting at a CCC-C motif via only 1290 triplet slippage, and f G37C variant of G37-state SufB2 capable of frameshifting at a CCC-C motif 1291 via only triplet slippage. In panels a, b, the bars in the graphs are SD of three (n = 3) independent 1292 experiments, and the data are presented as mean values ± SD. ∆t: a time interval. 1293 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 50 1294 Figure 5. SufB2 shifts to the +1-frame during translocation. a Relative fMPV and fMPR 1295 formation as a function of time upon rapid delivery of EF-G and an equimolar mixture of G37-state 1296 SufB2-, tRNAVal-, and tRNAArg-TCs to 70S ICs carrying a CCC-C motif in the A site. b Relative 1297 fMPV and fMPR formation as a function of time when a defined time interval is introduced between 1298 delivery of G37-state SufB2-TC and EF-G and delivery of an equimolar mixture of tRNAArg- and 1299 tRNAVal-TCs. c Relative fMPV and fMPR formation after reacting fMP-POST complexes with a 1300 mixture of tRNAVal- and tRNAArg-TCs based on the time courses in Supplementary Figures 2d-f. 1301 d fMPV formation as a function of time upon rapid delivery of tRNAVal-TC to an fMP-POST 1302 complex carrying a CCC-N motif in the A site. e Relative fMPV and fMPS formation as a function 1303 of time upon rapid delivery of an equimolar mixture of tRNAVal- and tRNASer-TCs to an fMP-POST 1304 complex carrying a CCC-A motif in the A site. In panels a-e, the bars are SD of three (n = 3) 1305 independent experiments and the data are presented as mean values ± SD. Arg: arginyl-tRNAArg; 1306 Val: valyl-tRNAVal. 1307 1308 Figure 6. SufB2 interferes with elongation complex dynamics during late steps of 1309 translocation. a-c Cartoon representation of elongation as a G37-state SufB2- or ProL-TC is 1310 delivered to the A site of a bL9(Cy3)- and uL1(Cy5)-labeled 70S IC; a in the absence, or b in the 1311 presence of EF-G, or c upon using puromycin (Pmn) to deacylate the P site-bound G37-state 1312 SufB2 or ProL and generate the corresponding PRE–A complex. The 30S and 50S subunits are 1313 tan and light blue, respectively; the L1 stalk is dark blue; Cy3 and Cy5 are bright green and red 1314 spheres, respectively; EF-Tu is pink; EF-G is purple; fMet-tRNAfMet is dark green; and SufB2 or 1315 ProL is dark red. d, e Hypothetical (top) and representative experimentally observed (bottom) 1316 EFRET vs. time trajectories recorded as ProL-TC is delivered to a 70S IC, d in the absence and e 1317 in the presence of EF-G as depicted in a, b. The waiting times associated with k70S IC→GS2, kGS1→GS2, 1318 kGS2→GS1, and kGS2→POST are indicated in each hypothetical trajectory. f, g, and h Surface contour 1319 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 51 plots of the time evolution of population FRET obtained by superimposing individual EFRET vs. time 1320 trajectories in the experiments in a, b, and c, respectively, for SufB2 (top) and ProL (bottom). N: 1321 the number of trajectories used to construct each contour plot. Surface contours are colored as 1322 denoted in the population color bars. For pre-steady-state experiments, the black dashed lines 1323 indicate the time at which the TC was delivered and the gray shaded areas denote the time 1324 required for the majority (54 - 68%) of the 70S ICs to transition to GS2. Note that the rate of 1325 deacylated SufB2 dissociation from the A site under our conditions is similar to that of EF-G-1326 catalyzed translocation, thereby resulting in the buildup of a PRE complex sub-population over 3-1327 20 min post-delivery that lacks an A site tRNA and is incapable of translocation. This sub-1328 population exhibits kGS1→GS2, kGS2→GS1, and Keq values similar to those observed in experiments 1329 recorded in the absence of EF-G (Supplementary Table 6). 1330 1331 Figure 7. Structure-based mechanistic model for SufB2-induced +1 frameshifting. A SufB2-1332 TC uses triplet anticodon-codon pairing in the 0-frame at a CCC-C motif, undergoes peptide-bond 1333 formation, and enables the resulting PRE complex to undergo a GS1→GS2 transition, all with 1334 rates similar to those of ProL-TC. During the GS1→GS2 transition, the 30S subunit rotates 1335 relative to the 50S subunit by 8º in the counter-clockwise (+) direction along the black curved 1336 arrow; the 30S subunit head swivels relative to the 30S subunit body by 5º in the clockwise (–) 1337 direction against the black curved arrow; the L1 stalk closes by ~60 Å; and the tRNAs are 1338 reconfigured from their P/P and A/A to their P/E and A/P configurations. EF-G then binds to the 1339 PRE complex to form PRE-G1 and subsequently catalyzes a series of conformational 1340 rearrangements of the complex (PRE-G1 to PRE-G4) that encompass further counter-clockwise 1341 and clockwise rotations of the subunits; severing of decoding center interactions with the 1342 anticodon-codon duplex in the A site; counter-clockwise and clockwise swiveling of the head and 1343 the associated opening and closing of the E-site gate; opening of the L1 stalk; and 1344 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 52 reconfigurations of the tRNAs as they move from the P and A sites to the E and P sites. It is during 1345 these steps, shown in red arrows within the gray shaded box, that SufB2 impedes forward and/or 1346 reverse swiveling of the head and the associated opening and/or closing of the E-site gate, 1347 facilitating +1 frameshifting. Next, EF-G and the deacylated tRNA dissociate from PRE-G4, 1348 leaving a POST complex ready to enter the next elongation cycle. The cartoons depicting PRE-1349 G1(GS1) and PRE-G1(GS2) were generated using Biological Assemblies 2 and 1, respectively, 1350 of PDB entry 4V9D. Due to the lack of an A-site tRNA or EF-G in 4V9D, cartoons of the A- and 1351 P-site tRNAs from previous structures1 were positioned into the two assemblies using the P-site 1352 tRNAs in 4V9D as guides and a cartoon of EF-G generated from 4V7D was manually positioned 1353 near the factor binding site of the ribosomes. The cartoons depicting PRE-G2, PRE-G3, and PRE-1354 G4 were generated from 4V7D, 4W29, and 4V5F, respectively, and colored as in Figure 6, with 1355 the head domain shown in orange. 1356 1357 1358 1359 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 53 Figure 1 1360 1361 1362 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 54 Figure 2 1363 1364 1365 1366 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 55 Figure 3 1367 1368 1369 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 56 Figure 4 1370 1371 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 57 Figure 5 1372 1373 1374 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 58 Figure 6 1375 1376 1377 P A 1.00 0.50 0.00 E F R E T 0 10 20 30 0 10 1 Time (s) Time (s) ProL-TC ProL-TC +EF-G Deliver TCsat 5 s Deliver TCs+EF-Gat 5 s N= k70S IC→GS kGS1→GS a kGS →GS1 A P. . Time e Time 0.55 ( 1 e ) 0. 1 ( 1 l se ) 0. 0.7 15 15 10 10 N= 5 1 ProL GS P E1 Stal P ST GS1 (EF-G) GS (EF-G) E GS GS1 E F R E T k70S IC→GS kGS →P ST 0 5 Time (s) 7.5.5 E F R E T N = N= N= N= 5 Suf N=1 N= 7 N= 1 ProL 0. 0.0 0. 0. 0. 0. 1.0 1. E F R E T Suf 0. 0.0 0. 0. 0. 0. 1.0 1. N=1 ProL 70S IC TC EF-G Pm N=1 Time (mi ) Time (mi ) .5 GS1 0. 5 0. 0 5 107.5 .5 0 5 Time (s) 10 Time (s) .5 0 5 Time (s) 7.5 a 0 1 0 1 10 N=17 0 0 N=55 Suf 0. 0. 0 5 Time (s) 10 0 5 Time (s) 7.5.5 0 5 Time (s) 7.5.5 0 5 Time (s) 7.5 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 59 Figure 7 1378 1379 1380 1381 60 Å POST PRE PRE-G1 PRE-G2 PRE-G3 PRE-G4 POST GS1 GS1 GS2 GS2 60 Å Intersubunit rotation 8° 8° 12° 2° 0° Head swive in - ° - ° 3° 21° 0° .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424971doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424971 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_12_31_424969 ---- A genome wide copper-sensitized screen identifies novel regulators of mitochondrial cytochrome c oxidase activity Genetic regulators of mitochondrial copper 1 A genome wide copper-sensitized screen identifies novel regulators of mitochondrial cytochrome c oxidase activity Natalie M. Garza1, Aaron T. Griffin1,2, Mohammad Zulkifli1, Chenxi Qiu1,3, Craig D. Kaplan1,4, Vishal M. Gohil*1 1Department of Biochemistry and Biophysics, MS 3474, Texas A&M University, College Station, TX 77843, USA 2Present Address: Department of Systems Biology, Columbia University, New York, NY 10032, USA 3Present Address: Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA 4Present Address: Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA *To whom the correspondence should be addressed: Vishal M. Gohil, 301 Old Main Drive, MS 3474, Texas A&M University, College Station, TX 77843 USA; Email: vgohil@tamu.edu; Tel: (979) 847-6138; Fax: (979) 845-9274 Running Title: Genetic regulators of mitochondrial copper Keywords: Copper, mitochondria, vacuole, cytochrome c oxidase, pH, AP-3, Rim20, Rim21 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 2 ABSTRACT Copper is essential for the activity and stability of cytochrome c oxidase (CcO), the terminal enzyme of the mitochondrial respiratory chain. Loss-of-function mutations in genes required for copper transport to CcO result in fatal human disorders. Despite the fundamental importance of copper in mitochondrial and organismal physiology, systematic characterization of genes that regulate mitochondrial copper homeostasis is lacking. To identify genes required for mitochondrial copper homeostasis, we performed a genome-wide copper- sensitized screen using DNA barcoded yeast deletion library. Our screen recovered a number of genes known to be involved in cellular copper homeostasis while revealing genes previously not linked to mitochondrial copper biology. These newly identified genes include the subunits of the adaptor protein 3 complex (AP-3) and components of the cellular pH-sensing pathway- Rim20 and Rim21, both of which are known to affect vacuolar function. We find that AP-3 and the Rim mutants impact mitochondrial CcO function by maintaining vacuolar acidity. CcO activity of these mutants could be rescued by either restoring vacuolar pH or by supplementing growth media with additional copper. Consistent with these genetic data, pharmacological inhibition of the vacuolar proton pump leads to decreased mitochondrial copper content and a concomitant decrease in CcO abundance and activity. Taken together, our study uncovered a number of novel genetic regulators of mitochondrial copper homeostasis and provided a mechanism by which vacuolar pH impacts mitochondrial respiration through copper homeostasis. INTRODUCTION Copper is an essential trace metal that serves as a cofactor for a number of enzymes in various biochemical processes, including mitochondrial bioenergetics (1). For example, copper is essential for the activity of cytochrome c oxidase (CcO), the evolutionarily conserved enzyme of the mitochondrial respiratory chain and the main site of cellular respiration (2). CcO metalation requires transport of copper to mitochondria followed by its insertion into Cox1 and Cox2, the two copper-containing subunits of CcO (3). Genetic defects that prevent copper delivery to CcO disrupt its assembly and activity resulting in rare but fatal infantile disorders (4, 5, 6). Intracellular trafficking of copper poses a challenge because of the high reactivity of this transition metal. Copper in an aqueous environment of the cell can generate deleterious reactive oxygen species via Fenton chemistry (7) and can inactivate other metalloproteins by mismetallation (8). Consequently, organisms must tightly control copper import and trafficking to subcellular compartments to ensure proper cuproprotein biogenesis while preventing its toxicity. Indeed, aerobic organisms have evolved highly conserved proteins to import and distribute copper to cuproenzymes in cells (9). Extracellular copper is imported by plasma membrane copper transporters and is immediately bound to metallochaperones Atx1 and Ccs1 for its delivery to different cuproenzymes residing in the Golgi and cytosol, respectively (10). However, copper transport to the mitochondria is not well understood. A non- proteinaceous ligand, whose molecular identity remains unknown, has been proposed to transport cytosolic copper to the mitochondria (3), where it is stored in the matrix (11). This mitochondrial matrix pool of copper is the main source of copper ions that are delivered to CcO subunits in a particularly complex process requiring multiple metallochaperones and thiol reductases (3, 12, 13). Specifically, copper from the mitochondrial matrix is exported to the intermembrane space via a yet .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 3 unidentified transporter, where it is inserted into the CcO subunits by metallochaperones Cox17, Sco1, and Cox11 that operate in a bucket-brigade manner (13). The copper- transporting function of metallochaperones requires disulfide reductase activities of Sco2 and Coa6, respectively (14, 15). In addition to the mitochondria, vacuoles in yeast and vacuole-like lysosomes in higher eukaryotes have been identified as critical storage sites and regulators of cellular copper homeostasis (16-18). Copper enters the vacuole by an unknown mechanism and is proposed to be stored as Cu(II) coordinated to polyphosphate (19). Depending on the cellular requirement, vacuolar copper is reduced to Cu(I), allowing its mobilization and export through Ctr2 (20, 21). Currently, the complete set of factors regulating the distribution of copper to mitochondria remains unknown. Here, we sought to identify regulators of mitochondrial copper homeostasis by exploiting the copper requirement of CcO in a genome-wide screen using a barcoded yeast deletion library. Our screen was motivated by prior observations that respiratory growth of yeast mutants such as coa6Δ can be rescued by copper supplementation in the media (22-24). Thus, we designed a copper-sensitized screen to identify yeast mutants whose growth can be rescued by addition of copper in the media. Our screen recovered Coa6 and other genes with known roles in copper metabolism while uncovering genes involved in vacuolar function as regulators of mitochondrial copper homeostasis. Here, we have highlighted the roles of two cellular pathways - adaptor protein 3 complex (AP- 3) and the pH-sensing pathway Rim101 – that converge on vacuolar function as important factors regulating CcO biogenesis by maintaining mitochondrial copper homeostasis. RESULTS A genome-wide copper-sensitized screen using barcoded yeast deletion mutant library We chose the yeast, Saccharomyces cerevisiae, to screen for genes that impact mitochondrial copper homeostasis because it can tolerate mutations that inactivate mitochondrial respiration by surviving on glycolysis. This enables the discovery of novel regulators of mitochondrial copper metabolism whose knockout is expected to result in a defect in aerobic energy generation (25). Yeast cultured in glucose- containing media (YPD) uses glycolytic fermentation as the primary source for cellular energy, however in glycerol/ethanol- containing non-fermentable media (YPGE), yeast must utilize the mitochondrial respiratory chain and its terminal cuproenzyme, CcO, for energy production. Based on the nutrient-dependent utilization of different energy-generating pathways, we expect that deletion of genes required for respiratory growth will specifically reduce growth in non-fermentable (YPGE) medium but will not impair growth of those mutants in fermentable (YPD) medium. Moreover, if respiratory deficiency in yeast mutants is caused by defective copper delivery to mitochondria, then these mutants may be amenable to rescue via copper supplementation in YPGE respiratory growth media (Fig. 1). Therefore, to identify genes required for copper-dependent respiratory growth, we cultured the yeast deletion mutants in YPD and YPGE with or without 5 μM CuCl2 supplementation (Fig. 1). Our genome-wide yeast deletion mutant library was derived from the variomics library reported previously (26). It is composed of viable haploid yeast mutants, where each mutant has one nonessential gene replaced with the selection marker KANMX4 and two unique flanking sequences (Fig. 1). These flanking sequences labeled “UP” and “DN” contains universal priming sites as well as a 20-bp .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 4 barcode sequence that is specific to each deletion strain. This unique barcode sequence allows for the quantification of individual strain relative abundance within a pool of competitively grown strains by DNA barcode sequencing (27). Here, we utilized this DNA barcode sequencing approach to quantify the relative fitness of each mutant grown in YPD and YPGE ± Cu to early stationary phase (Fig. 1). Genes required for respiratory growth We began the screen by identifying mutant strains with respiratory deficiency since perturbation of mitochondrial copper metabolism is expected to compromise aerobic energy metabolism. To identify mutants with this growth phenotype, we compared the relative abundance of each barcode in YPD to that of YPGE using T- score based on Welch’s t-test. T-score provides a quantitative measure of the difference in the abundance of a given mutant in two growth conditions. A negative T score identifies mutants that grow poorly in respiratory conditions; conversely, a positive T score identifies mutants with better competitive growth in respiratory conditions. We rank ordered all the mutants from negative to positive T scores and found that the lower tail of the distribution was enriched in genes with known roles in respiratory chain function as expected (Fig. 2A; Supplementary Table 1). The top “hits” representing mutants with most negative T score included COQ3, COX5A, RCF2, COA4, and PET54 genes that are involved in coenzyme Q and respiratory complex IV function (Fig. 2A). To more systematically identify cellular pathways that were enriched for reduced respiratory growth, we performed gene ontology analysis using an online tool - Gene Ontology enRichment anaLysis and visuaLizAtion (GOrilla) (28). The gene ontology (GO) analysis identified mitochondrial respiratory chain complex assembly (p-value: 7.73e-23) and cytochrome oxidase assembly (p-value: 5.09e-22) as the top-scoring biological process categories (Fig. 2B) and mitochondrial part (p-value: 1.40e-25) and mitochondrial inner membrane (p-value: 1.48e-20) as the top-scoring molecular components category (Fig. 2C). This unbiased analysis identified the expected pathways and processes validating our screening results. We further benchmarked the performance of our screen by determining the enrichment of genes encoding for mitochondria-localized and oxidative phosphorylation (OXPHOS) proteins at three different p-value thresholds (p<0.05, p<0.025, and p<0.01) (Supplementary Fig. 1). We observed that at a p-value of <0.05, ~25% of the genes encoded for mitochondrially localized proteins, of which ~40% OXPHOS proteins (Supplementary Fig. 1; Supplementary Table 2). The percentage of mitochondria- localized and OXPHOS genes increased progressively as we increased the stringency of our analysis by decreasing the significance cut-off from p-value of 0.05 to 0.01 (Supplementary Fig. 1). A total of 370 genes were identified to have respiratory deficient growth at p<0.01, of which 116 are known to encode mitochondrial proteins (29), nearly half of these are OXPHOS proteins from a total of 137 known OXPHOS genes in yeast (Supplementary Fig. 1; Supplementary Table 2). Expectedly, the respiratory deficient mutants included genes required for mitochondrial NADH dehydrogenase (NDI1) and OXPHOS complex II, III, IV, and V as well as genes involved in cytochrome c and ubiquinone biogenesis, which together forms mitochondrial energy generating machinery (Fig. 2D, Supplementary Table 2). Additionally, genes encoding TCA cycle enzymes and mitochondrial translation were also scored as hits (Supplementary Fig. 2). Surprisingly, a large fraction of genes required for respiratory growth encoded non-mitochondrial proteins involved in vesicle-mediated transport (Supplementary Fig. 2). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 5 Pathway analysis for Copper-based rescue Next, we focused on identifying mutants in which copper supplementation improved their fitness in respiratory growth conditions by comparing their abundance in YPGE + 5 μM CuCl2 versus YPGE growth conditions. We rank ordered the genes from positive to negative T scores. Mutants with positive T score are present in the upper tail of the distribution that displayed improved respiratory growth upon copper supplementation (Fig. 3A, Supplementary Table 3). Notably, several genes known to be involved in copper homeostasis were recovered as high scoring “hits” in our screen and were present in the expected upper tail of distribution (Fig. 3A). For example, we recovered CTR1, which encodes the plasma membrane copper transporter (30), ATX1, which encodes a metallochaperone involved in copper trafficking to the Golgi body (31), GEF1 and KHA1 which encodes proteins involved in copper loading into the cuproproteins in the Golgi compartment (22, 32), GSH1 and GSH2 which are required for biosynthesis of copper-binding molecule glutathione, and COA6, which encodes a mitochondrial protein that we previously discovered to have a role in copper delivery to the mitochondrial CcO (15, 23, 33) (Fig. 3A). Nevertheless, for many of our other top scoring hits, evidence supporting their role in mitochondrial copper homeostasis was either limited or lacking entirely. To determine which cellular pathways are essential for maintaining copper homeostasis, we performed gene ontology analysis using GOrilla. GO analysis identified biological processes - Golgi to vacuole transport (p-value: 1.49e-6), and post-Golgi vesicle-mediated transport, (p- value: 3.75e-6) as the most significantly enriched pathways (Fig. 3B). Additionally, GO category transition metal ion homeostasis - was also in the top five significantly enriched pathways, (p-value: 1.75e-5) (Fig. 3B). GO analysis for cellular component categories identified adaptor protein 3 complex (AP-3), which is known to transport vesicles from the Golgi body to vacuole, as the top scoring cellular component (p-value: 2.85e-11) (Fig. 3C). All four subunits of AP-3 complex (APL6, APM3, APL5, APS3) complex were in the top 10 of our rank list (Fig. 3A, Supplementary Table 3) (34, 35). Additionally, two subunits of the Rim101 pathway (RIM20 and RIM21), both of which are linked to vacuolar function (36), were also in our list of top-scoring genes (Supplementary Table 3). Of note, the seven major components of the Rim101 pathway were identified as top-scoring hits for respiratory deficient growth (Supplementary Fig. 2). Placing the hits from our screen on cellular pathways revealed a number of “hits” that were either involved in Golgi bud formation (Sys1, Arf2), vesicle coating (AP-3 and AP-1 complex subunits), tethering and fusion of Golgi vesicle cargo to the vacuole (Vam7), and vacuolar ATPase expression and assembly (Rim20, Rim21, Rav2) (Fig. 3D). We reasoned that these biological processes and cellular components were likely high scoring due to the role of the vacuole as a major storage site of intracellular metals (16). We decided to focus on AP-3 and Rim mutants, as these cellular components were not previously linked to mitochondrial respiration or mitochondrial copper homeostasis. AP-3 mutants exhibit reduced abundance of CcO and V-ATPase subunits To validate our screening results and to determine the specificity of the copper- based rescue of AP-3 mutants, we compared the respiratory growth of AP-3 deletion strains, aps3Δ, apl5Δ, and apl6Δ on YPD and YPGE media with or without Cu, Mg, Zn supplementation. Each of the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 6 AP-3 mutants exhibited reduced respiratory growth in YPGE media at 37°C, which was fully restored by copper, but not by magnesium or zinc (Fig. 4A), indicating that the primary defect in these cells is dysregulated copper homeostasis. Here we used 37°C for growth measurement to fully uncover growth defect on solid media. The coa6Δ mutant was used as a positive control because we have previously shown that respiratory growth deficiency of coa6Δ can be rescued by Cu supplementation (23). Since recent work has identified the role of the yeast vacuole in mitochondrial iron homeostasis (37, 38) we asked if iron supplementation could also rescue the respiratory growth of AP-3 mutants. Unlike copper, which rescued respiratory growth of AP-3 mutants at 5 μM concentration, low concentrations of iron (≤ 20 μM) did not rescue respiratory growth; but we did find that high iron supplementation (100 μM) improved their respiratory growth (Supplementary Fig. 3). To uncover the biochemical basis of reduced respiratory growth, we focused on Cox2, a copper- containing subunit of CcO, whose stability is dependent on copper availability and whose levels serve as a reliable proxy for mitochondrial copper content. The steady state levels of Cox2 were modestly but consistently reduced in all four AP-3 mutants tested (Fig. 4B). AP-3 complex function has not been directly linked to mitochondria but is linked to the trafficking of proteins from the Golgi body to the vacuole. Therefore, the decreased abundance of Cox2 in AP-3 mutants could be due to an indirect effect involving the vesicular trafficking role of the AP-3 complex. A previous study has shown that the AP-3 complex interacts with a subunit of the V-ATPase in human cells (34). As perturbation in V-ATPase function had been linked to defective respiratory growth (37- 41), we wondered if AP-3 impacts mitochondrial function via trafficking V- ATPase subunit(s) to the vacuole. To test this idea, we first measured vacuolar acidification and found that the AP-3 mutant, aps3Δ, exhibited significantly increased vacuolar pH (Fig. 4C). We hypothesized that the elevated vacuolar pH of aps3Δ cells could be due to a perturbation in the trafficking of V-ATPase subunit(s). To test this possibility, we measured the levels of V-ATPase subunit Vma2, in wild type (WT) and aps3Δ cells, by western blotting and found that Vma2 levels were indeed reduced in the isolated vacuolar fractions of aps3Δ cells but were unaffected in the whole cells (Fig. 4D). The decreased abundance of Vma2 in vacuoles of yeast AP-3 mutant explains decreased vacuolar acidification because Vma2 is an essential subunit of V-ATPase. Taken together, these results suggest that the AP- 3 complex is required for maintaining vacuolar acidification, which in turn could impact mitochondrial copper homeostasis. Genetic defects in Rim101 pathway perturbs mitochondrial copper homeostasis Next, we focused on two other hits from the screen, Rim20 and Rim21, which are the members of the Rim101 pathway that has been previously linked to the V-ATPase expression (42-44). The activation of Rim101 results in the increased expression of V-ATPase subunits (43). Consistently, we found elevated vacuolar pH in rim20Δ cells (Fig. 5A). We then compared the respiratory growth of rim20Δ and rim21Δ on YPD and YPGE media with or without Cu, Mg, or Zn supplementation. Consistent with our screening results, these mutants exhibited reduced respiratory growth that was fully restored by copper but not magnesium or zinc (Fig. 5B). To directly test the roles of these genes in cellular copper homeostasis, we measured the whole-cell copper levels of rim20Δ by inductively coupled plasma mass spectrometry (ICP-MS). The intracellular copper levels under basal or .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 7 copper-supplemented conditions in rim20Δ cells were comparable to the WT cells, suggesting that the copper import or sensing machinery is not defective in this mutant (Fig. 5C). In contrast to the total cellular copper levels, rim20Δ did exhibit significantly reduced mitochondrial copper levels, which were restored by copper supplementation (Fig. 5D). The decrease in mitochondrial copper levels is expected to perturb the biogenesis of CcO in rim20Δ cells. Therefore, we measured the abundance and activity of this complex by western blot analysis and enzymatic assay, respectively. Consistent with the decrease in mitochondrial copper levels, rim20Δ cells exhibited a reduction in the abundance of Cox2 along with a decrease in CcO activity, both of which were rescued by copper supplementation (Fig. 5E and F). To further dissect the compartment-specific effect by which Rim20 impacts cellular copper homeostasis, we measured the abundance and activity of Sod1, a mainly cytosolic cuproenzyme. We found that unlike CcO, Sod1 abundance and activity remain unchanged in rim20Δ cells (Supplementary Fig. 4). To determine if the decrease in CcO activity in the absence of Rim20 was due to its role in maintaining vacuolar pH, we manipulated vacuolar pH by changing the pH of the growth media. Previously, it has been shown that vacuolar pH is influenced by the pH of the growth media through endocytosis (45, 46). Indeed, acidifying growth media to pH 5.0 from the basal pH of 6.7 normalized vacuolar pH of rim20Δ to the WT levels and both strains exhibited lower vacuolar pH when grown in acidified media (Fig. 5G). Under these conditions of reduced vacuolar pH, the respiratory growth of rim20Δ was restored to WT levels (Fig. 5H). Notably, alkaline media also reduced the respiratory growth of WT cells, though the extent of growth reduction was lower than rim20Δ, which is likely because of a fully functional V-ATPase in WT cells (Fig. 5H). To uncover the biochemical basis of the restoration of respiratory growth of rim20Δ by acidified media, we measured CcO enzymatic activity in WT and rim20Δ cells grown in either basal or acidified growth medium (pH 6.7 and 5.0), respectively. Consistent with the respiratory growth rescue, the CcO activity was also restored in cells grown at an ambient pH of 5.0 (Fig. 5I). Notably, the restoration of respiratory growth by copper supplementation was independent of growth media pH (Fig. 5J). Taken together, these findings causally links vacuolar pH to CcO activity via mitochondrial copper homeostasis. Pharmacological inhibition of the V- ATPase results in decreased mitochondrial copper To directly assess the role of vacuolar pH in maintaining mitochondrial copper homeostasis, we utilized Concanamycin A (ConcA), a small molecule inhibitor of V- ATPase. Treating WT cells with increasing concentrations of ConcA led to progressively increased vacuolar pH (Fig. 6A). Notably, the increase in vacuolar pH with pharmacological inhibition of V-ATPase by ConcA was much more pronounced (Fig. 6A) than via genetic perturbation in aps3Δ or rim20Δ cells (Figs. 4C and 5A). Correspondingly, we observed a pronounced decrease in CcO abundance and activity in ConcA treated cells (Fig. 6B, C). This decrease in abundance of CcO is likely due to a reduction in mitochondrial copper levels (Fig. 6D). This data establishes the role of the vacuole in regulating mitochondrial copper homeostasis and CcO function. DISCUSSION Mitochondria are the major intracellular copper storage sites that harbor important cuproenzymes like CcO. When faced with copper deficiency, cells prioritize .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 8 mitochondrial copper homeostasis suggesting its critical requirement for this organelle (47). However, the complete set of factors required for mitochondrial copper homeostasis has not been identified. Here, we report a number of novel genetic regulators of mitochondrial copper homeostasis that link mitochondrial bioenergetic function with vacuolar pH. Specifically, we show that when vacuolar pH is perturbed by genetic, environmental, or pharmacological factors, then copper availability to the mitochondria is limited, which in turn reduces CcO function and impairs aerobic growth and mitochondrial respiration. It has been known for a long time that V- ATPase mutants have severely reduced respiratory growth (39, 40) and more recent high-throughput studies have corroborated these observations (48-50). However, the molecular mechanisms underlying this observation have remained obscure. Recent studies have shown that a decrease in vacuolar acidity (i.e. increased vacuolar pH) perturbs cellular and mitochondrial iron homeostasis, which impairs mitochondrial respiration, as iron is also required for electron transport through the mitochondrial respiratory chain due to its role in iron-sulfur cluster biogenesis and heme biosynthesis (37, 38, 51, 52). In an elegant series of experiments, Hughes et al, showed that when V-ATPase activity is compromised, there is an elevation in cytosolic amino acids because vacuoles with defective pH are unable to import and store amino acids. The resulting elevation in cytosolic amino acids, particularly cysteine, are toxic to the cells by disrupting cellular iron homeostasis and iron-dependent mitochondrial respiration (38). Although this exciting study took us a step closer to our understanding of V-ATPase-dependent mitochondrial function, the mechanism by which elevated cysteine perturbs iron homeostasis is still unclear. Since cysteine can strongly bind cuprous ions (53, 54) its sequestration in cytosol by cysteine would decrease its availability to Fet3, a multi-copper oxidase required for the uptake of extracellular iron, which in turn would aggravate iron deficiency (55). Thus, a defect in cellular copper homeostasis could cause a secondary defect in iron homeostasis. Consistent with this idea, we observed a rescue of AP-3 mutants’ respiratory growth with high iron supplementation (Supplementary Fig. 3). Interestingly, AP-3 has also been previously linked to vacuolar cysteine homeostasis (56). Our results showing diminished CcO activity and/or Cox2 levels in AP-3, Rim20, and ConcA-treated cells (Figs. 4B, 5E and F, 6B and C) connects vacuolar pH to mitochondrial copper biology. However, a modest decrease in CcO activity may not be sufficient to reduce respiratory growth. Therefore, it is very likely that the decreased respiratory growth we have observed is a result of a defect not only in copper but also in iron homeostasis. Consistent with this idea, previous high throughput studies reported sensitivity of AP-3 and Rim101 pathway mutants in conditions of iron deficiency and overload (57, 58). Moreover, Rim20 and Rim101 mutants have been shown to display sensitivity to copper starvation in Cryptococcus neoformans, an opportunistic fungal pathogen (59) and partial knockdown of Ap3s1, a subunit of AP-3 complex in zebrafish, sensitized developing melanocytes to hypopigmentation in low-copper environmental conditions (60). Thus, the Rim pathway and the AP-3 pathway is linked to copper homeostasis in multiple organisms. Our discovery of AP-3 pathway mutants and other mutants involved in the Golgi-to-vacuole transport (Fig. 3) is also consistent with a previous genome-wide study, which identified the involvement of these genes in Cu-dependent growth of yeast Saccharomyces cerevisiae (49), however, the biochemical mechanism(s) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 9 underlying the functional connection between the vacuole and mitochondrial CcO has not been previously elucidated. Thus, the results from our study are not only consistent with previous studies but also provide a biochemical mechanism elucidating how disruption in vacuolar pH perturbs mitochondrial respiratory function via copper-dependence of CcO. Interestingly, in both the genetic and pharmacological models of reduced V- ATPase function, mitochondrial copper levels were reduced (Fig. 5D and Fig. 6D) but were not absent, suggesting that the vacuole may only partially contribute to mitochondrial Cu homeostasis. Supporting this hypothesis, rescue of respiratory growth by copper supplementation was successful irrespective of vacuolar pH (Fig. 5A and J). The results of this study could also provide insights into mechanisms underlying the pathogenesis of human diseases associated with aberrant copper metabolism and/or decreased V-ATPase function including Alzheimer’s disease, amyotrophic lateral sclerosis (ALS), and Parkinson’s disease (61-67). Although multiple factors are known to contribute to the pathogenesis of these diseases, our study suggests disrupted mitochondrial copper homeostasis may also be an important contributing factor. In contrast to these multi-factorial diseases, pathogenic mutations in AP-3 subunits are known to cause Hermansky- Pudlak syndrome (HPS), a rare autosomal disorder, which is often associated with high morbidity (68-70). Just as in yeast, AP-3 in humans is required for the transport of vesicles to the lysosome, which is evolutionarily and functionally related to the yeast vacuole. Our study linking AP-3 to mitochondrial function suggests that decreased mitochondrial function could contribute to HPS pathology. More generally, decreased activity of V-ATPase has been linked to age-related decrease in lysosomal function (34, 71, 72) and impaired acidification of yeast vacuole has been shown to cause accelerated aging (41). Therefore, in addition to uncovering the fundamental aspects of cell biology of metal transport and distribution, our study suggests a possible role of mitochondrial dysfunction in multiple human disorders. METHODS Yeast strains and growth conditions Individual yeast Saccharomyces cerevisiae mutants used in this study were obtained from Open Biosystems or were constructed by one-step gene disruption using a hygromycin cassette (73). All strains used in this study are listed in Table 1. Authenticity of yeast strains was confirmed by polymerase chain reaction (PCR)-based genotyping. Yeast cells were cultured in either YPD (1% yeast extract, 2% peptone, and 2% dextrose) or YPGE (3% glycerol + 1% ethanol) medium. Solid YPD and YPGE media were prepared by addition of 2% agar. For metal supplementation experiments, growth medium was supplemented with divalent chloride salts of Cu, Mn, Mg, Zn or FeSO4. For growth on solid media, 3 μL of 10-fold serial dilutions of pre-cultures were seeded onto YPD or YPGE plates and incubated at 37°C for the indicated period. For growth in the liquid medium, yeast cells were pre-cultured in YPD and inoculated into YPGE and grown to mid-log phase. To acidify or alkalinize liquid YPGE, equivalents of HCl or NaOH were added, respectively. Liquid growth assays in acidified or alkalinized YPGE, cultures were grown for 42 h before comparing growth. For growth in the presence of concanamycin A (ConcA), cells were first cultured in YPD, transferred to YPGE allowed to grow for 24 h, then ConcA was added and allowed to grow further for 20 h. Growth in liquid media was monitored spectrophotometrically at 600 nm. Construction of yeast deletion pool The yeast deletion collection for Bar-Seq analysis was derived from the Variomics .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 10 library constructed previously (26) and was a kind gift of Xuewen Pan. The heterozygous diploid deletion library was sporulated and selected in liquid haploid selection medium (SC-Arg-His- Leu+G418+Canavanine) to obtain haploid cells containing gene deletions. To do this, we followed previously described protocol (26) with the following modification of adding uracil to allow the growth of deletion library lacking URA3. Prior to sporulation, the library pool was grown under conditions to first allow loss of URA3 plasmids and then subsequent selection for cells lacking URA3 plasmids. Original deletion libraries were initially constructed where each yeast open reading frame was replaced with kanMX4 cassette containing two gene specific barcode sequence referred to as the UP tag and the DN tag since they are located upstream and downstream of the cassette (74), respectively. Pooled growth assays A stored glycerol stock of the haploid deletion pool containing 1.5 x 108 cells/mL (equivalent of 3.94 optical density/mL) was thawed and approximately 60 μL was used to inoculate 6 mL of YPD, YPGE or YPGE + 5 μM CuCl2 media in quadruplicates in 50 mL falcon tube at a starting optical density of 0.04, which corresponded to ~ 1.5 x 106 cells/mL. The cells were grown at 30°C in an incubator shaker at 225 rpm till they reached an optical density of ~5.0 before harvesting. Cells were pelleted by centrifugation at 3000×g for 5 min and washed once with sterile water and stored at -80°C. Frozen cell pellets were thawed and resuspended in sterile nanopure water and counted. Genomic DNA was extracted from 5 x 107 cells using YeaStar Genomic DNA kit (Catalog No.D2002) from Zymo Research. The extracted DNA was used as a template to amplify barcode sequence by PCR, followed by purification of amplified DNA by QIAquick PCR purification kit from Qiagen. The number of PCR cycles used for amplification was determined by Quantitative real time PCR such that barcode sequences were not amplified in a nonlinear way. The amplified UP and DN barcode DNA were purified by gel electrophoresis and sequenced on Illumina HiSeq 2500 with 50 base pair, paired-end sequencing at Genomics and Bioinformatics Service of Texas A&M AgriLife Research. Assessing fitness of barcoded yeast strains by DNA sequencing. The sequencing reads were aligned to the barcode sequences using Bowtie2 (version 2.2.4) with the -N flag set to 0. Bowtie2 outputs were processed and counted using Samtools (version 1.3.1). Barcode sequences shorter than 15nts or were mapped to multiple reference barcodes were discarded. We noted that the DN tag sequences were missing for many genes and therefore we only used UP tag sequences to calculate the fitness score using T statistics. Gene Ontology analysis To identify enriched gene ontology terms, we generated a rank ordered list based on T-Scores (Supplementary Table 1 and 2) and used the reference genome for Saccharomyces cerevisiae in GOrilla (http://cbl-gorilla.cs.technion.ac.il/). Cellular and mitochondrial copper measurements Cellular and mitochondrial copper levels were measured by inductively coupled plasma (ICP) mass spectrometry using NexION 300D instrument from PerkinElmer Inc. Briefly, intact yeast cells were washed twice with ultrapure metal-free water containing 100 μM EDTA (TraceSELECT; Sigma) followed by two more washes with ultrapure water to eliminate EDTA. For mitochondrial samples, the same procedure was performed using 300 mM mannitol (TraceSELECT; Sigma) to maintain mitochondrial integrity. After washing, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 11 samples were weighed, digested with 40% nitric acid (TraceSELECT; Sigma) at 90°C for 18 h, followed by 6 h digestion with 0.75% H2O2 (Sigma-Supelco), then diluted in ultrapure water and analyzed. Copper standard solutions were prepared by diluting commercially available mixed metal standards (BDH Aristar Plus). Subcellular fractionation Whole-cell lysates were prepared by resuspending ~100 mg of yeast cells in 350 μl SUMEB buffer (1.0% sodium dodecyl sulfate, 8 M urea, 10 mM MOPS, pH 6.8, 10 mM EDTA, 1 mM Phenylmethanesulfonyl fluoride [PMSF] and 1X EDTA-free protease inhibitor cocktail from Roche) containing 350 mg of acid-washed glass beads (Sigma-Aldrich). Samples were then placed in a bead beater (mini bead beater from Biospec products), which was set at maximum speed. The bead beating protocol involved five rounds, where each round lasted for 50 s followed by 50 s incubation on ice. Lysed cells were kept on ice for 10 min, then heated at 70°C for 10 min. Cell debris and glass beads were spun down at 14,000×g for 10 min at 4°C. The supernatant was transferred to a separate tube and was used to perform SDS- PAGE/Western blotting. Mitochondria were isolated as described previously (75). Briefly, 0.5-2.5 g of cell pellet was incubated in DTT buffer (0.1 M Tris-HCl, pH 9.4, 10 mM DTT) at 30°C for 20 min. The cells were then pelleted by centrifugation at 3,000×g for 5 min, resuspended in spheroplasting buffer (1.2 M sorbitol, 20 mM potassium phosphate, pH 7.4) at 7 mL/g and treated with 3 mg zymolyase (US Biological Life Sciences) per gram of cell pellet for 45 min at 30°C. Spheroplasts were pelleted by centrifugation at 3,000×g for 5 min then homogenized in homogenization buffer (0.6 M sorbitol, 10 mM Tris-HCl, pH 7.4, 1 mM EDTA, 1 mM PMSF, 0.2% [w/v] BSA [essentially fatty acid-free, Sigma-Aldrich]) with 15 strokes using a glass teflon homogenizer with pestle B. After two centrifugation steps for 5 min at 1,500×g and 4,000×g, the final supernatant was centrifuged at 12,000×g for 15 min to pellet mitochondria. Mitochondria were resuspended in SEM buffer (250 mM sucrose, 1 mM EDTA, 10 mM MOPS-KOH, pH 7.2, containing 1X protease inhibitor cocktail from Roche). Isolation of pure vacuoles was performed as previously described (76). Yeast spheroplasts were pelleted at 3,000×g at 4°C for 5 min. Dextran-mediated spheroplast lysis of 1 g of yeast cells was performed by gently resuspending the pellet in 2.5 mL of 15% (w/v) Ficoll400 in Ficoll Buffer (10 mM PIPES/KOH, 200 mM sorbitol, pH 6.8, 1 mM PMSF, 1X protease inhibitor cocktail) followed by addition of 200 μL of 0.4 mg/mL dextran in Ficoll buffer. The mixture was incubated on ice for 2 min followed by heating at 30°C for 75 s and returning the samples to ice. A step-Ficoll gradient was constructed on top of the lysate with 3 mL each of 8%, 4%, and 0% (w/v) Ficoll400 in Ficoll Buffer. The step- gradient was centrifuged at 110,000×g for 90 min at 4°C. Vacuoles were removed from the 0%/4% Ficoll interface. Crude cytosolic fractions used to quantify Sod1 activity and abundance were isolated as described previously (77). Briefly, ~70 mg of yeast cells were resuspended in 100 μL of solubilization buffer (20 mM potassium phosphate, pH 7.4, 4 mM PMSF, 1 mM EDTA, 1X protease inhibitor cocktail, 1% [w/v] Triton X-100) for 10 min on ice. The lysate was extracted by centrifugation at 21,000×g for 15 min at 4°C, to remove the insoluble fraction. Protein concentrations for all cellular fractions were determined by the BCA assay (Thermo Scientific). SDS-PAGE and Western blotting For SDS-polyacrylamide gel electrophoresis (SDS-PAGE)/Western blotting experiments, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 12 20 μg of protein was loaded for either whole cell lysate or mitochondrial samples, while 30 μg of protein was used for cytosolic and vacuolar fractions. Proteins were separated on either 4-20% stain-free gels (Bio-Rad) or 12% NuPAGE Bis-Tris mini protein gels (ThermoFisher Scientific) and blotted onto a polyvinylidene difluoride membranes. Membranes were blocked for 1 h in 5% (w/v) nonfat milk dissolved in Tris-buffered saline with 0.1% (w/v) Tween 20 (TBST- milk), followed by overnight incubation with a primary antibody in TBST-milk or TBST- 5% bovine serum albumin at 4°C. Primary antibodies were used at the following dilutions: Cox2, 1:50,000 (Abcam 110271); Por1, 1:100,000 (Abcam 110326); Pgk1, 1:50,000 (Life Technologies 459250), Sod1, 1:5,000, and Vma2, 1:10,000 (Sigma H9658). Secondary antibodies (GE Healthcare) were used at 1:5,000 for 1 h at room temperature. Membranes were developed using Western Lightning Plus- ECL (PerkinElmer), or SuperSignal West Femto (ThermoFisher Scientific). Enzymatic activities To measure Sod1 activity, we used an in-gel assay as described previously, (78). 25 μg of cytosolic protein was diluted in NativePAGE sample buffer (ThermoFisher Scientific) and separated onto a 4-16% NativePAGE gel (ThermoFisher Scientific) at 4°C. The gel was then stained with 0.025% (w/v) nitroblue tetrazolium, 0.010% riboflavin for 20 min in the dark. This solution was then replaced by 1% tetramethylethylenediamine for 20 min and developed under a bright light. The gel was imaged by Bio-Rad ChemiDocTM MP Imaging System and densitometric analysis was performed using Image Lab software. CcO and citrate synthase enzymatic activities were measured as described previously (79) using a BioTek’s Synergy™ Mx Microplate Reader in a clear 96 well plate (Falcon). To measure CcO activity, 15 µg of mitochondria were resuspended in 115 µL of CcO buffer (250 mM sucrose, 10 mM potassium phosphate, pH 6.5, 1 mg/mL BSA) and allowed to incubate for 5 min. The reaction was started by the addition of 60 µL of 200 μM oxidized cytochrome c (equine heart, Sigma) and 25.5 µL of 1% (w/v) N- Dodecyl-Beta-D-Maltoside. Oxidation of cytochrome c was monitored at 550 nm for 3 min, then the reaction was inhibited by the addition of 7 µL of 7 mM KCN. To measure citrate synthase activity, 10 µg of mitochondria were resuspended in 100 µL of citrate synthase buffer (10 mM Tris-HCl pH 7.5, 0.2% [w/v] Triton X-100, 200 µM 5,5'-dithio-bis-[2-nitrobenzoic acid]) and 50 µL of 2 mM acetyl-CoA and incubated for 5 min. To start the reaction, 50 µL of 2 mM oxaloacetate was added and turn-over of acetyl-CoA was monitored at 412 nm for 10 min. Enzyme activity was normalized to that of WT for each replicate. Measuring vacuolar pH Vacuolar pH was measured using a ratiometric pH indicator dye, BCECF-AM (2′,7′-bis-(2-carboxyethyl)-5-(and-6)- carboxyfluorescein [Life Technologies]) as described by (80) using a BioTek’s Synergy™ Mx Microplate Reader. Briefly, 100 mg of cells were resuspended in 100 µL of YPGE containing 50 µM BCECF-AM for 30 min shaking at 30°C. To remove extracellular BCECF-AM, cells were washed twice and resuspended in 100 µL of fresh YPGE. 25 µL of this cell culture was added to 2 mL of 1 mM MES buffer, pH 6.7 or 5.0. The fluorescence emission intensity at 535 nm was monitored by using the excitation wavelengths 450 and 490 nm in a black 96 well plate, clear bottom (Falcon). A calibration curve of the fluorescence intensity in response to pH was carried out as described (80). STATISTICS T-scores for each pairwise media comparison (e.g. YPD vs. YPGE) were .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 13 calculated using Welch’s two-sample t-test for yeast knockout barcode abundance values normalized for sample sequencing depth (i.e. counts per million). Statistical analysis on bar charts was conducted using two sided students t-Test. Experiments were performed in three biological replicates, where biological replicates are defined as experiments performed on different days and different starting pre- culture. Error bars represent the standard deviation, *(P<0.05), **(P<0.01), ***(P<0.001). REFERENCES 1. Kim, B. E., Nevitt, T., and Thiele, D. J. (2008) Mechanisms for copper acquisition, distribution, and regulation. Nat. Chem. Biol. 4, 176-185 2. Little, A. G., Lau, G., Mathers, K. E., Leary, S. C., and Moyes, C. D. (2018) Comparative biochemistry of cytochrome c oxidase in animals. Comp. Biochem. Physiol. B. Biochem. Mol. Biol. 224, 170-184 3. Cobine, P. A., Moore, S. A., and Leary, S. C. (2020) Getting out what you put in: Copper in mitochondria and its impacts on human disease. Biochim. Biophys. Acta. Mol. Cell. Res. 1868, 118867 4. Baertling, F., van den Brand, M. M. A., Hertecant, J. L., Al-Shamsi, A., van den Heuvel, L. P., Distelmaier, F., Mayatepek, E., Smeitink, J. A., Nijtmans, L. G., and Rodenburg, R. J. (2015) Mutations in COA6 cause cytochrome c oxidase deficiency and neonatal hypertrophic cardiomyopathy. Hum. Mutat. 36, 34-38 5. Papadopoulou, L. C., Sue, C. M., Davidson, M. M., Tanji, K., Nishino, I., Sadlock, J. E., Krishna, S., Walker, W., Selby, J., Glerum, D. M., Coster, R. V., Lyon, G., Scalais, E., Lebel, R., Kaplan, P., Shanske, S., De Vivo, D. C., Bonilla, E., Hirano, M., DiMauro, S., and Schon, E. A. (1999) Fatal infantile cardioencephalomyopathy with COX deficiency and mutations in SCO2, a COX assembly gene. Nat. Genet. 23, 333-337 6. Valnot, I., Osmond, S., Gigarel, N., Mehaye, B., Amiel, J., Cormier-Daire, V., Munnich, A., Bonnefont, J. P., Rustin, P., and Rötig, A. (2000) Mutations of the SCO1 gene in mitochondrial cytochrome c oxidase deficiency with neonatal-onset hepatic failure and encephalopathy. Am. J. Hum. Genet. 67, 1104-1109 7. Halliwell, B., and Gutteridge, J. M. (1984) Oxygen toxicity, oxygen radicals, transition metals and disease. Biochem. J. 219, 1-14 8. Foster, A. W., Dainty, S. J., Patterson, C. J., Pohl, E., Blackburn, H., Wilson, C., Hess, C. R., Rutherford, J. C., Quaranta, L., Corran, A., and Robinson, N. J. (2014) A chemical potentiator of copper-accumulation used to investigate the iron-regulons of Saccharomyces cerevisiae. Mol. Microbiol. 93, 317-330 9. Nevitt, T., Ohrvik, H., and Thiele, D. J. (2012) Charting the travels of copper in eukaryotes from yeast to mammals. Biochim. Biophys. Acta. 1823, 1580-1593 10. Robinson, N. J., and Winge, D. R. (2010) Copper metallochaperones. Annu. Rev. Biochem. 79, 537-562 11. Cobine, P. A., Ojeda, L. D., Rigby, K. M., and Winge, D. R. (2004) Yeast contain a non- proteinaceous pool of copper in the mitochondrial matrix. J. Biol. Chem. 279, 14447- 14455 12. Cobine, P. A., Pierrel, F., Bestwick, M. L., and Winge, D. R. (2006) Mitochondrial matrix copper complex used in metallation of cytochrome oxidase and superoxide dismutase. J. Biol. Chem. 2811, 36552-36559 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 14 13. Timón-Gómez, A., Nývltová, E., Abriata, L. A., Vila, A. J., Hosler, J., and Barrientos A. (2018) Mitochondrial cytochrome c oxidase biogenesis: Recent developments. Semin. Cell. Dev. Biol. 76, 163-178 14. Leary, S. C., Sasarman, F., Nishimura, T., and Shoubridge, E. A. (2009) Human SCO2 is required for the synthesis of CO II and as a thiol-disulphide oxidoreductase for SCO1. Hum. Mol. Genet. 18, 2230-2240 15. Soma, S., Morgada, M. N., Naik, M. T., Boulet, A., Roesler, A. A., Dziuba, N., Ghosh, A., Yu, Q., Lindahl, P. A., Ames, J. B., Leary, S. C., Vila, A. J., and Gohil, V. M. (2019) COA6 Is Structurally Tuned to Function as a Thiol-Disulfide Oxidoreductase in Copper Delivery to Mitochondrial Cytochrome c Oxidase. Cell Rep. 29, 4114-4126 16. Blaby-Haas, C. E., and Merchant, S. S. (2014) Lysosome-related organelles as mediators of metal homeostasis. J. Biol. Chem. 289, 28129-28136 17. Polishchuck, E. V., and Polishchuk, R. S. (2016) The emerging role of lysosomes in copper homeostasis. Metallomics. 8, 853-863 18. Portnoy, M. E., Schmidt, P. J., Rogers, R. S., and Culotta, V. C. (2001) Metal transporters that contribute copper to metallochaperones in Saccharomyces cerevisiae. Mol. Genet. Genomics. 265, 873-882 19. Nguyen, T. Q., Dziuba, N., and Lindahl, P. A. (2019) Isolated Saccharomyces cerevisiae vacuoles contain low-molecular-mass transition-metal polyphosphate complexes. Metallomics. 11, 1298-1309. 20. Rees, E. M., Lee, J., and Thiele, D. J. (2004) Mobilization of intracellular copper stores by the ctr2 vacuolar copper transporter, J Biol Chem. 279, 54221-54229 21. Rees, E. M., and Thiele, D. J. (2007) Identification of a vacuole associated metalloreductase and its role in Ctr2-mediated intracellular copper mobilization, J. Biol. Chem. 282, 21629-21638 22. Wu X, Kim H, Seravalli J, Barycki JJ, Hart PJ, Gohara DW, Di Cera E, Jung WH, Kosman DJ, Lee J. Potassium and the K+/H+ Exchanger Kha1p Promote Binding of Copper to ApoFet3p Multi-copper Ferroxidase. J Biol Chem. 2016 Apr 29;291(18):9796- 9806 23. Ghosh, A., Trivedi, P. P., Timbalia, S. A., Griffin, A. T., Rahn, J. J., Chan, S. S., and Gohil, V. M. (2014) Copper supplementation restores cytochrome c oxidase assembly defect in a mitochondrial disease model of COA6 deficiency. Hum. Mol. Genet. 23, 3596- 3606 24. Glerum, D. M., Shtanko, A., and Tzagoloff, A. (1996) Characterization of COX17, a yeast gene involved in copper metabolism and assembly of cytochrome oxidase. J Biol Chem. 271, 14504-14509 25. Diaz-Ruiz, R., Uribe-Carvajal, S., Devin, A., and Rigoulet, M. (2009) Tumor cell energy metabolism and its common features with yeast metabolism. Biochim. Biophys. Acta. 1796, 252-265 26. Huang, Z., Chen, K., Zhang, J., Li, Y., Wang, H., Cui, D., Tang, J., Liu, Y., Shi, X., Li, W., Liu, D., Chen, R., Sucgang, R. S., and Pan, X. (2013) A functional variomics tool for discovering drug-resistance genes and drug targets. Cell. Rep. 3, 577-585 27. Smith, A. M., Heisler, L. E., Mellor, J., Kaper, F., Thompson, M. J., Chee, M., Roth, F. P., Giaever, G., and Nislow, C. (2009) Quantitative phenotyping via deep barcode sequencing. Genome. Res. 19, 1836-1842 28. Eden, E., Navon, R., Steinfeld, I., Lipson, D., and Yakhini, Z. (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. B. M. C. Bioinformatics. 10, 48 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 15 29. Vögtle FN, Burkhart JM, Gonczarowska-Jorge H, Kücükköse C, Taskin AA, Kopczynski D, Ahrends R, Mossmann D, Sickmann A, Zahedi RP, Meisinger C. Landscape of submitochondrial protein distribution. Nat Commun. 2017 Aug 18;8(1):290. 30. Dancis, A., Yuan, D. S., Haile, D., Askwith, C., Eide, D., Moehle, C., Kaplan, J., and Klausner, R. D. (1994) Molecular characterization of a copper transport protein in S. cerevisiae: an unexpected role for copper in iron transport. Cell. 76, 393-402 31. Lin, S. J., and Culotta, V. C. (1995) The ATX1 gene of Saccharomyces cerevisiae encodes a small metal homeostasis factor that protects cells against reactive oxygen toxicity. Proc. Natl. Acad. Sci. U. S. A. 92, 3784-3788 32. Gaxiola, R. A., Yuan, D. S., Klausner, R. D., and Fink, G. R. (1998) The yeast CLC chloride channel functions in cation homeostasis. Proc. Natl. Acad. Sci. U. S. A. 95, 4046-4050 33. Ghosh, A., Pratt, A. T., Soma, S., Theriault, S. G., Griffin, A. T., Trivedi, P. P., and Gohil, V. M. (2016) Mitochondrial disease genes COA6, COX6B, and SCO2 have overlapping roles in COX2 biogenesis. Hum. Mol. Genet. 25, 660-671 34. Bagh, M.B., Peng, S., Chandra, G., Zhang, Z., Singh, S. P., Pattabiraman, N., Liu, A., and Mukherjee, A.B. (2017) Misrouting of v-ATPase subunit V0a1 dysregulates lysosomal acidification in a neurodegenerative lysosomal storage disease model. Nat. Commun. 8:14612 35. Dell'Angelica, E. C. (2009) AP-3-dependent trafficking and disease: the first decade. Curr. Opin. Cell. Biol. 21, 552-559 36. Lamb, T. M., Xu, W., Diamond, A., and Mitchell, A. P. (2001) Alkaline response genes of Saccharomyces cerevisiae and their relationship to the RIM101 pathway. J. Biol. Chem. 276,1850-1856 37. Chen, K. L., Ven, T. N., Crane, M. M., Brunner, M. L. C., Pun, A. K., Helget, K. L., Brower, K., Chen, D. E., Doan, H., Dillard-Telm, J. D., Huynh, E., Feng, Y. C., Yan, Z., Golubeva, A., Hsu, R. A., Knight, R., Levin, J., Mobasher, V., Muir, M., Omokehinde, V., Screws, C., Tunali, E., Tran, R. K., Valdez, L., Yang, E., Kennedy, S. R., Herr, A. J., Kaeberlein, M., and Wasko, B. M. (2020) Loss of vacuolar acidity results in iron-sulfur cluster defects and divergent homeostatic responses during aging in Saccharomyces cerevisiae. Geroscience. 42, 749-764 38. Hughes, C. E., Coody, T. K., Jeong, M. Y., Berg, J. A., Winge, D. R., and Hughes, A. L. (2020) Cysteine Toxicity Drives Age-Related Mitochondrial Decline by Altering Iron Homeostasis. Cell. 180, 296-310 39. Ohya, Y., Umemoto, N., Tanida, I., Ohta, A., Iida, H., and Anraku, Y. Calcium-sensitive cls mutants of Saccharomyces cerevisiae showing a Pet- phenotype are ascribable to defects of vacuolar membrane H(+)-ATPase activity. J. Biol. Chem. 266,13971-13977 40. Eide, D. J., Bridgham, J. T., Zhao, Z., and James, M. R. (1993) The vacuolar H+- ATPase of Saccharomyces cerevisiae is required for efficient copper detoxification, mitochondrial function, and iron metabolism. Mol. Gen. Genet. 241, 447-456 41. Hughes, A. L., and Gottschling, D. E. (2012) An early age increase in vacuolar pH limits mitochondrial function and lifespan in yeast. Nature. 492, 261-265 42. Maeda, T. (2012) The signaling mechanism of ambient pH sensing and adaptation in yeast and fungi. FEBS. J. 279, 1407-1413 43. Pérez-Sampietry, M., and Herrero, E. (2014) The PacC-family protein Rim101 prevents selenite toxicity in Saccharomyces cerevisiae by controlling vacuolar acidification. Fungal. Genet. biol. 71, 26-85 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 16 44. Xu, W., Smith, F. J., Subaran, R., and Mitchell, A. P. (2004) Multivesicular body-ESCRT components function in pH response regulation in Saccharomyces cerevisiae and Candida albicans. Mol Biol Cell. 15, 5528-5537 45. Brett, C. L., Kallay, L., Hua, Z., Green, R., Chyou, A., Zhang, Y., Graham, T. R., Donowitz, M., and Rao, R. (2011) Genome-wide analysis reveals the vacuolar pH-stat of Saccharomyces cerevisiae. PLoS. ONE. 6, e17619 46. Orij, R., Urbanus, M. L., Vizeacoumar, F. J., Giaever, G., Boone, C., Nislow, C., Brul, S., and Smits, G. J. (2012) Genome-wide analysis of intracellular pH reveals quantitative control of cell division rate by pH(c) in Saccharomyces cerevisiae. Genome. Biol. 13, R80 47. Dodani, S. C., Leary, S. C., Cobine, P. A., Winge, D. R., and Chang, C. J. (2011) A targetable fluorescent sensor reveals that copper-deficient SCO1 and SCO2 patient cells prioritize mitochondrial copper homeostasis. J. Am. Chem. Soc. 133, 8606-8616 48. Merz, S., and Westermann, B. (2009) Genome-wide deletion mutant analysis reveals genes required for respiratory growth, mitochondrial genome maintenance and mitochondrial protein synthesis in Saccharomyces cerevisiae. Genome. Biol. 10, R95 49. Schlecht, U., Suresh, S., Xu, W., Aparicio, A. M., Chu, A., Proctor, M. J., Davis, R. W., Scharfe, C., and St Onge, R. P. (2014) A functional screen for copper homeostasis genes identifies a pharmacologically tractable cellular system. B. M. C. Genomics. 15, 263 50. Stenger, M., Le, D. T., Klecker, T., and Westermann, B. (2020) Systematic analysis of nuclear gene function in respiratory growth and expression of the mitochondrial genome in S. cerevisiae. Microb. Cell. 7, 234-249 51. Weber, R. A., Yen, F. S., Nicholson, S. P. V., Alwaseem, H., Bayraktar, E. C., Alam, M., Timson, R. C., La, K., Abu-Remaileh, M., Molina, H., and Birsoy, K. (2020) Maintaining Iron Homeostasis Is the Key Role of Lysosomal Acidity for Cell Proliferation. Mol. Cell. 7, 645-655 52. Yambire, K. F., Rostosky, C., Watanabe, T., Pacheu-Grau, D., Torres-Odio, S., Sanchez-Guerrero, A., Senderovich, O., Meyron-Holtz, E. G., Milosevic, I., Frahm, J., West, A. P., and Raimundo, N. (2019) Impaired lysosomal acidification triggers iron deficiency and inflammation in vivo. Elife. 8, e51031 53. Giles, N. M., Watts, A. B., Giles, G. I., Fry, F. H., Littlechild, J. A., and Jacob, C. (2003) Metal and redox modulation of cysteine protein function. Chem. Biol. 10, 677-693 54. Rigo, A., Corazza, A., di Paolo, M. L., Rossetto, M., Ugolini, R., and Scarpa, M. (2004) Interaction of copper with cysteine: stability of cuprous complexes and catalytic role of cupric ions in anaerobic thiol oxidation. J. Inorg. Biochem. 98, 1495-1501 55. Taylor, A. B., Stoj, C. S., Ziegler, L., Kosman, D. J., and Hart, P. J. (2005) The copper- iron connection in biology: structure of the metallo-oxidase Fet3p. Proc. Natl. Acad. Sci. U. S. A. 102, 15459-15464 56. Llinares, E., Barry, A. O., and Andre, B. (2015) The AP-3 adaptor complex mediates sorting of yeast and mammalian PQ-loop-family basic amino acid transporters to the vacuolar/lysosomal membrane. Sci. Rep. 5, 16665 57. Jo, W. J., Loguinov, A., Chang, M., Wintz, H., Nislow, C., Arkin, A. P., Giaever, G. and Vulpe, C. D. (2008) Identification of genes involved in the toxic response of Saccharomyces cerevisiae against iron and copper overload by parallel analysis of deletion mutants. Toxicol. Sci. 101, 140-151 58. Jo, W. J., Kim, J. H., Oh, E., Jaramillo, D., Holman, P., Loguinov, A. V., Arkin, A. P., Nislow, C., Giaever, G., and Vulpe, C. D. (2009) Novel insights into iron metabolism by .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 17 integrating deletome and transcriptome analysis in an iron deficiency model of the yeast Saccharomyces cerevisiae. B.M.C. Genomics. 10, 130 59. Chun, C. D., and Madhani, H.D. (2010) Ctr2 links copper homeostasis to polysaccharide capsule formation and phagocytosis inhibition in the human fungal pathogen. Crytococcus neoformans. PLoS. ONE. 5, e12503 60. Ishizaki, H., Spitzer, M., Wildenhain, J., Anastasaki, C., Zeng, Z., Dolma, S., Shaw, M., Madsen, E., Gitlin, J., Marais, R., Tyers, M., and Patton, E. E. (2010) Combined zebrafish-yeast chemical-genetic screens reveal gene-copper-nutrition interactions that modulate melanocyte pigmentation. Dis. Model. Mech. 3, 639-651 61. Colacurcio, D. J., and Nixon, R. A. (2016) Disorders of lysosomal acidification-The emerging role of v-ATPase in aging and neurodegenerative disease. Ageing. Res. Rev. 32, 75-88 62. Corrionero, A., and Horvitz, H. R. (2018) A C9orf72 ALS/FTD Ortholog Acts in Endolysosomal Degradation and Lysosomal Homeostasis. Curr. Biol. 28, 1522-1535 63. Desai, V., and Kaler, S. G. (2008) Role of copper in human neurological disorders. Am. J. Clin. Nutr. 88, 855S-858S 64. Kaler, S. G. (2013) Inborn errors of copper metabolism. Handb. Clin. Neurol. 113:1745- 1754 65. Nixon, R. A., Yang, D. S., and Lee, J. H. (2008) Neurodegenerative lysosomal disorders: a continuum from development to late age. Autophagy. 4, 590-599 66. Nguyen, M., Wong, Y. C., Ysselstein, D., Severino, A., and Krainc, D. (2019) Synaptic, mitochondrial, and lysosomal dysfunction in parkinson's disease. Trends. Neurosci. 42,140-149 67. Stepien, K. M., Roncaroli, F., Turton, N., and Hendriksz, C. J., (2020) Roberts M, Heaton RA, Hargreaves I. Mechanism of mitochondrial dysfunction in lysosomal storage disorders: a review. J. Clin. Med. 9, 2596 68. Ammann, S., Schulz A., Krägeloh-Mann, I., Dieckmann, N. M., Niethammer, K., Fuchs. S., Eckl, K. M., Plank, R., Werner, R., Altmüller, J., Thiele, H., Nürnberg, P., Bank, J., Strauss, A., von Bernuth, H., Zur, Stadt, U., Grieve, S., Griffiths, G. M., Lehmberg, K., Hennies, H. C., and Ehl, S. (2016) Mutations in AP3D1 associated with immunodeficiency and seizures define a new type of Hermansky-Pudlak syndrome. Blood. 127, 997-1006 69. Dell'Angelica, E. C., Shotelersuk, V., Aguilar, R. C., Gahl, W. A., and Bonifacino, J. S. (1999) Altered trafficking of lysosomal proteins in Hermansky-Pudlak syndrome due to mutations in the beta 3A subunit of the AP-3 adaptor. Mol. Cell. 3, 11-21. 70. El-Chemaly, S., and Young, L. R. (2016) Hermansky-Pudlak Syndrome. Clin. Chest. Med. 37, 505-511 71. Korvatska, O., Strand, N. S., Berndt, J. D., Strovas, T., Chen, D. H., Leverenz, J. B., Kiianitsa, K., Mata, I. F., Karakoc, E., Greenup, J. L., Bonkowski, E., Chuang, J., Moon, R. T., Eichler, E. E., Nickerson, D. A., Zabetian, C. P., Kraemer, B. C., Bird, T. D., and Raskind, W. H. (2013) Altered splicing of ATP6AP2 causes X-linked parkinsonism with spasticity (XPDS). Hum. Mol Genet. 22, 3259-3268 72. Lee, J. H., Yu, W. H., Kumar, A., Lee, S., Mohan, P. S., Peterhoff, C. M., Wolfe, D. M., Martinez-Vicente, M., Massey, A. C., Sovak, G., Uchiyama, Y., Westaway, D., Cuervo, A. M., and Nixon, R. A. (2010) Lysosomal proteolysis and autophagy require presenilin 1 and are disrupted by Alzheimer-related PS1 mutations. Cell. 141, 1146-1158 73. Janke, C., Magiera, M. M., Rathfelder, N., Taxis, C., Reber, S., Maekawa, H., Moreno- Borchart, A., Doenges, G., Schwob, E., Schiebel, E., and Knop, M. (2004) A versatile .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 18 toolbox for PCR-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. Yeast. 21, 947-962 74. Pan, X., Yuan, D. S., Xiang, D., Wang, X., Sookhai-Mahadeo, S., Bader, J. S., Hieter, P., Spencer, F., and Boeke, J. D. (2004) A robust toolkit for functional profiling of the yeast genome. Mol. Cell. 16, 487-96. 75. Meisinger, C., Pfanner, N., and Truscott, K. N. (2006) Isolation of yeast mitochondria. Methods. Mol. Biol. 313, 33-39 76. Haas, A. (1995) A quantitative assay to measure homotypic vacuole fusion in vitro. Methods. Cell. sci. 17, 283-294 77. Horn, D., Al-Ali, H., and Barrientos, A. (2008) Cmc1p is a conserved mitochondrial twin CX9C protein involved in cytochrome c oxidase biogenesis. Mol. Cell. biol. 28, 4354– 4364 78. Flohe, L., and Otting, F. (1984) Superoxide dismutase assays. Methods. Enzymol. 105, 93-104 79. Spinazzi, M., Casarin, A., Pertegato, V., Salviati, L., and Angelini, C. (2012) Assessment of mitochondrial respiratory chain enzymatic activities on tissues and cultured cells. Nat. Protoc. 7, 1235-1246 80. Diakov, T. T., Tarsio, M., and Kane, P. M. (2013) Measurement of Vacuolar and Cytosolic pH In Vivo in Yeast Cell Suspensions. J. Vis. Exp. 19, 50261 FUNDING AND ADDITIONAL INFORMATION: This work was supported by the National Institutes of Health awards R01GM111672 to VMG, R0GM1097260 to CDK, and 5F31GM128339 to NMG. NMG was also supported by National Science Foundation award HRD-1502335 in the first year of this work. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation. ACKNOWLEDGEMENTS We thank Valentina Canedo Pelaez for assistance with growth measurements and Dr. Thomas Meek for allowing us to use the BioTek’s Synergy™ Mx Microplate Reader. We gratefully acknowledge Xuewen Pan for kind gift of the Variomics library from which the barcoded deletion pool used here was derived. CONFLICT OF INTEREST: The authors declare that they have no conflicts of interest with the contents of this article. AUTHOR CONTRIBUTIONS VMG conceptualized the project. VMG, NMG, and ATG designed the experiments. NMG, ATG, MZ, performed the experiments. CDK and CQ designed the Bar-Seq protocols and generated the yeast deletion collection. NMG, ATG, and VMG analyzed the data and wrote the manuscript. VMG supervised the whole project and was responsible for the resources and primary funding acquisition. All authors commented on the manuscript. FIGURE LEGENDS Figure 1. Schematic of genome-wide copper-sensitized screen. The yeast deletion library is a collection of ~ 6000 mutants where each mutant has a gene replaced with KANMX4 cassette that is flanked by a unique UP tag (UP) and DOWN tag (DN) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 19 sequences. The deletion mutant pool was grown in fermentable (YPD) and non-fermentable medium (YPGE) with and without 5 µM CuCl2 supplementation till cells reached an optical density of 5.0. The genomic DNA was isolated from harvested cells and was used as template to amplify UP and DN tag DNA barcode sequences using universal primers. PCR products were then sequenced and the resulting data analyzed. The mutants with deletion in genes required for respiratory growth is expected to grow poorly in non-fermentable medium resulting in reduced barcode reads for that particular gene(s). However, if the same gene(s) function is supported by copper supplementation then we expect increased barcode reads for that gene(s) in copper-supplemented non-fermentable growth medium. Figure 2. Yeast genes required for respiratory growth. (A) Growth of each mutant in the deletion collection cultured in YPGE and YPD media was measured by BarSeq and analyzed by T-scores. T(YPGE/YPD) is plotted for top and bottom 500 mutants. Known mitochondrial respiratory genes are highlighted in red. (B-C) Gene ontology analysis was used to identify the top five cellular processes (B) and cellular components (C) that were significantly enriched amongst our top scoring hits from a rank- ordered list, where ranking was done from the lowest to highest T-score. ES indicates enrichment score. (D) A schematic of mitochondrial OXPHOS subunits and assembly factors, where genes depicted in red were “hits” in the screen with their T-scores values below -2.35 (p- value ≤ 0.05). Figure 3. Genes required for copper homeostasis. (A) T(YPGE + Cu/YPGE) score is plotted for top and bottom 500 mutants. Known copper homeostasis genes are highlighted in red. Novel top hits belonging to two major cellular processes are highlighted in blue. (B-C) Gene ontology analysis was used to identify the top five cellular processes (B) and cellular components (C) that were significantly enriched in our top scoring hits. ES indicates enrichment score. (D) Top hits are mapped along the secretory pathway. Red arrows point to top hit genes. Dashed arrow indicates that protein is not a subunit of the complex but is involved in the maintenance of listed protein. Figure 4. Loss of AP-3 results in reduced vacuolar and mitochondrial function. (A) Serial dilutions of WT and the indicated mutants were seeded onto YPD and YPGE plates with and without 5 μM CuCl2, MgCl2 and ZnCl2 and grown at 37°C for two (YPD) or four days (YPGE). coa6Δ cells, which have been previously shown to be rescued by CuCl2, were used as a control. (B) Whole cell protein lysate was analyzed by SDSPAGE/western blot using a Cox2 specific antibody to detect CcO abundance. Stain free imaging served as a loading control. coa6Δ cell lysate was used as control for decreased Cox2 levels. (C) Vacuolar pH of WT and aps3Δ cells was measured by using BCECF-AM dye. (D) Whole cell lysate and isolated vacuole fractions were analyzed by SDSPAGE/western blot. Vma2 was used to determine V-ATPase abundance. Prc1 and Pgk1 served as loading controls for vacuolar and whole cell protein lysate, respectively. Figure 5. Normalization of vacuolar pH in rim20Δ cells restores mitochondrial copper homeostasis. (A) Vacuolar pH of WT and rim20Δ cells was measured by BCECF-AM dye. (B) Serial dilutions of WT and the indicated mutants were seeded onto YPD and YPGE plates with and without 5 μM CuCl2, MgCl2, or ZnCl2 and grown at 37°C for two (YPD) or four days (YPGE). (C) Cellular and (D) mitochondrial copper levels were measured by ICP-MS. (E) Mitochondrial proteins were .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Genetic regulators of mitochondrial copper 20 analyzed by SDS-PAGE/western blot. Cox2 served as a marker for CcO levels, and Por1 served as a loading control. (F) CcO activity was measured spectrophotometrically and normalized to the citrate synthase activity. (G) Vacuolar pH of WT and rim20Δ cultured in standard (pH 6.7) or acidified (pH 5.0) YPGE medium was measured by BCECF-AM dye. (H) The growth of WT and rim20Δ cells in YPGE medium of different pH. Decrease in growth at each pH was calculated by normalizing to growth at pH 5.0. (I) CcO activity of WT and rim20Δ cultured in standard or acidified YPGE was normalized to citrate synthase activity. (J) The growth of WT and rim20Δ in YPGE + 5 µM CuCl2 medium of different pH. Decrease in growth at each pH was calculated by normalizing to growth at pH 5.0. Figure 6. Pharmacological inhibition of V-ATPase decreases mitochondrial copper content (A) Vacuolar pH of WT cells grown in the presence of either DMSO or 125, 250, 500, 1000 nM ConcA. (B) Mitochondrial proteins in WT cells treated with DMSO or 500 nM concA were analyzed by SDSPAGE/Western blot. Cox2 served as a marker for CcO abundance, Atp2 and Por1 were used as loading controls. (C) CcO activity in WT cells treated with DMSO or 500 nM concA is shown after normalization with citrate synthase activity. (D) Mitochondria copper levels in WT cells treated with DMSO or 500 nM concA were determined by ICP-MS. Table 1: Saccharomyces cerevisiae strains used in this study. Yeast Strains Genotype Source BY4741 WT MATa, his301, leu200, met1500, ura300 Greenberg, M.L. BY4741 coa6Δ MATa, his301, leu200, met1500, ura300, coa6Δ:: kanMX4 Open Biosystems BY4741 gef1Δ MATa, his301, leu200, met1500, ura300, gef1Δ:: kanMX4 Open Biosystems BY4741 aps3Δ MATa, his301, leu200, met1500, ura300, aps3Δ:: kanMX4 Open Biosystems BY4741 aps3Δ - NMG MATa, his301, leu200, met1500, ura300, aps3Δ:: kanMX4 This study BY4741 apm3Δ MATa, his301, leu200, met1500, ura300, apm3Δ:: kanMX4 Open Biosystems BY4741 apl5Δ MATa, his301, leu200, met1500, ura300, apl5Δ:: kanMX4 Open Biosystems BY4741 apl6Δ MATa, his301, leu200, met1500, ura300, apl6Δ:: kanMX4 Open Biosystems BY4741 rim20Δ MATa, his301, leu200, met1500, ura300, rim20Δ:: kanMX4 Open Biosystems BY4741 rim20Δ - NMG MATa, his301, leu200, met1500, ura300, rim20Δ:: kanMX4 This study BY4741 rim21Δ MATa, his301, leu200, met1500, ura300, rim21Δ:: kanMX4 Open Biosystems .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Antibiotic R Antibiotic R Gene 3. Harvest cells and isolate genomic DNA 4. PCR amplify tags for each mutant 6. Statistical Analysis Antibiotic R Antibiotic R Antibiotic R 5. Sequence DNA Barcodes DN DN DN DN DNUP DN UP UP UP UP YPD YPGE YPGE + Cu Figure 1. ... Gene 1 Gene 2 Gene 5927 1. Yeast deletion collection ~6,000 different mutants 2. Grow deletion collection to early stationary phase .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ B Gene Set - Process Mitochondrial respiratory chain complex assembly Cytochrome oxidase assembly Mitochondrial respiratory chain complex IV assembly Respiratory chain complex IV assembly P-Value ES Cellular respiration 7.73e-23 5.09e-22 3.33e-14 3.33e-14 2.26e-13 16.80 17.26 17.40 17.40 9.61 Gene Set - Component Mitochondrial part Mitochondrial inner membrane Mitochondrial membrane part Mitochondrial membrane P-Value Organelle inner membrane 1.40e-25 1.48e-20 5.10e-20 4.17e-19 7.02e-9 4.33 6.14 4.58 5.67 4.23 C Figure 2. ES -80 -60 -40 -20 0 20 40 cox5aΔ rcf2Δ coa4Δ pet54Δ coq3Δ Bottom 500 Top 500 T -S co re A D Sdh1 Sdh2 Sdh3 Sdh4 Sdh5 Sdh7 Sdh8 Qcr1 Qcr2 Cob1 Cyt1 Qcr7 Qcr8 Qcr9 Qcr10 Rip1 Qcr6 Fmp25 Cbs2 Cbp2 Cbp3 Cbp6 Cyt2 Cbt1 Bcs1 Cox1 Cox2 Cox3 Cox5a Cox8 Cox9 Cox12 Cox4 Cox5b Cox6 Cox13 Cox7 Coq6 Coq9 Coq3 Coq7 Coq1 Coq2 Coq5 Ndi1 Cyc1 Cyc2 Cyc3 CII CIII CIV CVCoQ cyt c Atp17 Atp2 Atp3 Atp14 Atp1 Atp5 Atp6 Atp15 Atp4 Atp8 Atp9 Atp16 Atp7 Inh1 Atp19 Atp20 Atp21 Atp18 Stf1 Stf2 Coq10 Tcm62 Sdh6 Coq4 Coq8 Cbp1 Cbp4 Mzm1 Aep2 Aep3 Nam1 Nca2 Atp10 Aep1 Atp12 Atp22 Atp23 Fmc1 Atp11 Cbs1 Coq11 Rcf1 Rcf2 Suv3 Mrs1 Cox24 Nam2 Pet 309 Mss51 Pet54 Pet 122 Mne1 Ccm1 Pet 111 Pet 494 Oxa1 Cox20Cox18 Mss2 Pnt1 Imp1 Imp2 Som1 Cox17 Sco1 Cox11 Cox19 Cox23 Pet 191 Cmc1 Coa6 Cox10 Cmc2 Cox15 Arh1 Cox16 Pet 117 Pet 100 Cox14 Shy1 Coa1 Coa2 Coa3 Coa4 Yah1 Mss 116 Atp25 Nca3 Mam 33 (Y P G E -Y P D ) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 3. 30 20 -20 10 0 -30 -10 ccc1Δ apl6Δ atx1Δ apl5Δ aps3Δ coa6Δ gsh1Δ apm3Δ ctr1Δ Top 500 Bottom 500 Gene Set - Process Golgi to vacuole transport Post-Golgi vesicle-mediated transport Protein targeting to vacuole Transition metal ion homeostasis P-Value ES Chemical homeostasis 1.49e-6 3.75e-6 5.41e-6 1.75e-5 2.90e-5 73.67 11.87 2.26 1.76 14.98 Gene Set - Component AP-3 adaptor complex AP-type membrane coat adaptor complex Golgi apparatus Cytoplastmic vesicle P-Value ES Intracellular vesicle 2.85e-11 4.08e-8 5.26e-6 5.26e-6 6.58e-6 515.70 158.68 4.09 4.09 5.65 Golgi Vacuole Endosome Multivesicular Body Apm3 Aps3 Apl5 Apl6 AP-3 Complex AP-1 Complex Apm1 Aps1 Apl4 Rim20 Rim21 Sys1 Arf2 Vam7 H+ H+ A B C D gef1Δ Rav2 Rim101 Pathway gsh2Δ kha1Δ T -S co re (Y P G E C u- Y P G E ) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ W T ap s3 Δ Vma2 Prc1 Pgk1 Isolated vacuoles - - + + Whole cell lysate + + - - W T ap s3 Δ W T co a6 Δ ap l6Δ ap m 3 ap l5Δ ap s3 Δ Cox2 Stain Free 5.6 5.8 6.0 V ac uo le pH W T ap s3 Δ A B D 6.2 Δ C WT coa6Δ aps3Δ apl5Δ YPD + 5 µM CuCl2 + 5 µM ZnCl 2 YPGE + 5 µM MgCl No Addition apl6Δ Figure 4. ∗∗ 2 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ ∗∗ ∗ C cO A ct iv ity (N or m al iz ed to C itr at e sy nt ha se ) 0.0 1.0 1.5 0.5 ∗ NS M ito ch on dr ia l c op pe r (n g/ m g of m ito ch on dr ia ) 0.0 2.5 5.0 To ta l c el lu la r co pp er (n g/ m g of c el l p el le t) WT rim 20Δ WT +C u 0.0 2.0 4.0 NS NS YPD + 5 µM CuCl2 + 5 µM ZnCl2 YPGE + 5 µM MgCl 2 WT coa6Δ rim20Δ rim21Δ No Addition μg protein 10 20 10 20 10 20 Cox2 Por1 rim 20Δ +C u WT rim 20Δ WT +C u rim 20Δ +C u WT rim 20Δ rim 20Δ +C u WT rim 20Δ rim 20Δ +C u B C D Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, cons ectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat E F G Figure 5. 0.0 0.4 0.8 1.2 W T 6.2 V ac uo la r pH 5.9 5.6 V ac uo la r pH 5.8 6.2 5.4 C cO A ct iv ity (N or m al iz ed to C itr at e sy nt ha se ) WT rim20Δ ∗∗ ∗∗ ∗∗∗∗ Media pH 6.7 5.0 6.7 5.0 WT rim20Δ A H I ∗ Media pH 6.7 5.0 6.7 5.0 ri m 20 Δ ∗∗ 5 6 7 8 -1.0 -0.5 0.0 WT rim20Δ YPGE pH D ec re as e in g ro w th ( N or m al iz ed to g ro w th a t p H 5 .0 ) J 6 7 8 -1.0 -0.5 0.0 WT+Cu rim20Δ+Cu 0.2 YPGE+Cu pH 5 D ec re as e in g ro w th ( N or m al iz ed to g ro w th a t p H 5 .0 ) ∗ ∗ ∗ ∗ ∗ ∗ NS .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ C cO a ct iv ity (N or m al iz ed to ci tr at e sy nt ha se ) 0.0 0.5 1.0 W T W T+ C on cA Cox2 Por15.5 6.5 7.5 V ac uo le pH WT 0.0 0.5 1.0 1.5 Atp2 ∗∗∗∗∗∗ ∗∗∗ ConcA W T + C on cA M ito ch on dr ia l c op pe r (n g/ m g of m ito ch on dr ia ) W T + C on cAW T W T A B C D Figure 6. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.31.424969doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424969 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2020_12_31_424989 ---- Distinct cryo-EM Structure of α-synuclein Filaments derived by Tau 1 Distinct cryo-EM Structure of -synuclein Filaments derived by Tau Alimohammad Hojjatian1, Anvesh K. R. Dasari2, Urmi Sengupta3, Dianne Taylor1, Nadia Daneshparvar1, Fatemeh Abbasi Yeganeh1, Lucas Dillard4, Brian Michael5, Robert G. Griffin5, Mario Borgnia4, Rakez Kayed3, Kenneth A. Taylor1, Kwang Hun Lim2,* 1Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306-4380, USA. 2Department of Chemistry, East Carolina University, Greenville, NC 27858, USA. 3Departments of Neurology, Neuroscience and Cell Biology, University of Texas Medical Branch, Galveston, TX, 77555, USA. 4Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, NC, 27709, USA. 5Department of Chemistry and Francis Bitter Magnet Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. Corresponding authors: limk@ecu.edu .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint mailto:limk@ecu.edu https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract Recent structural studies of ex vivo amyloid filaments extracted from human patients demonstrated that the ex vivo filaments associated with different disease phenotypes adopt diverse molecular conformations distinct from those in vitro amyloid filaments. A very recent cryo-EM structural study also revealed that ex vivo -synuclein filaments extracted from multiple system atrophy (MSA) patients adopt quite distinct molecular structures from those of in vitro -synuclein filaments, suggesting the presence of co-factors for -synuclein aggregation in vivo. Here, we report structural characterizations of -synuclein filaments derived by a potential co-factor, tau, using cryo-EM and solid-state NMR. Our cryo-EM structure of the tau-promoted -synuclein filament at 4.0 Å resolution is somewhat similar to one of the polymorphs of in vitro -synuclein filaments. However, the N- and C-terminal regions of the tau-promoted -synuclein filament have different molecular conformations. Our structural studies highlight the conformational plasticity of -synuclein filaments, requiring additional structural investigation of not only more ex vivo - synuclein filaments, but also in vitro -synuclein filaments formed in the presence of diverse co- factors to better understand molecular basis of diverse molecular conformations of -synuclein filaments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 Introduction Aggregation of α-synuclein into amyloid filaments is associated with numerous neurodegenerative diseases including Parkinson’s disease (PD), dementia with Lewy bodies (DLB), and multiple system atrophy (MSA) collectively termed synucleiopathy.1 Increasing evidence suggests that the protein aggregates play a key role in the initiation and spreading of pathology in the neurodegenerative diseases.2-6 It was shown that α-synuclein aggregates are capable of spreading through the brain and acting as seeds to promote misfolding and aggregation like prion.7-10 Although precise molecular mechanisms underlying the neurodegenerative disorders have remained elusive, misfolded α-synuclein aggregates including oligomeric and sonicated fibrillar species exhibit cytotoxic activities.11 In addition, injection of preformed filamentous α-synuclein aggregates into mice induced PD-like pathology.7, 12 Structural elucidation of filamentous α- synuclein aggregates is, therefore, essential to understanding molecular basis of neurotoxic properties of α-synuclein aggregates and developing therapeutic strategies. α-synuclein is a 140-residue protein expressed predominantly in the dopaminergic neurons.13 The intrinsically disordered protein adopts heterogeneous ensembles of conformations. The diverse conformers in the conformational ensemble might be induced to form distinct amyloid aggregates with different molecular conformations depending on experimental conditions (Figure 1).10, 14, 15 Indeed, recent high-resolution structural studies using solid-state NMR and cryo-EM revealed that α-synuclein filaments can adopt diverse molecular conformations under various in vitro experimental conditions.16-21 Structural analyses of α-synuclein aggregates seeded by brain extracts from PD and MSA patients suggested that the brain-derived aggregates are heterogenous mixtures of filaments that are distinct from in vitro α-synuclein filaments.22 Very recently, high- resolution cryo-EM structures of α-synuclein filaments extracted from MSA and DLB patients were reported.23 Interestingly, two types of α-synuclein filaments consisting of two twisting asymmetric protofilaments were observed in MSA filaments extracted from 5 patients. On the other hand, ex vivo DLB filaments were untwisted and morphologically different from those of ex vivo MSA. The structural studies revealed that ex vivo α-synuclein filaments are structurally diverse and quite distinct from those of in vitro α-synuclein filaments produced in buffer, suggesting that diverse co-factors may exist in vivo and induce formation of different α-synuclein filaments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 Figure 1. A schematic diagram of energy landscape for α-synuclein aggregation. α-synuclein remains largely unfolded at low protein concentrations (< 0.1 mM) under physiological conditions. The formation of filamentous aggregates is triggered at aggregation- prone conditions such as higher protein concentrations and more acidic pH.14, 15 Misfolding and aggregation of α-synuclein is also promoted by interactions with a variety of co-factors such as lipids, poly(ADP-ribose) (PAR), and other pathological aggregation-prone proteins such as tau and Aβ(1–42) peptides.14, 24-27 The co-factors may interact with monomeric α-synuclein and lead to distinct misfolding pathways, resulting in different molecular conformations. Comparative structural analyses of in vitro α-synuclein filaments derived by co-factors and brain-derived ex vivo α-synuclein filaments are required to identify co-factors that promote α-synuclein aggregation in vivo. Our previous NMR study revealed that tau interacts with the C-terminal region of α- synuclein, accelerating the formation of α-synuclein filaments.28 Here we report structural investigation of tau-promoted α-synuclein filaments using solid-state NMR and cryo-EM to investigate the effect of the interactions on the structure of α-synuclein filaments. Our initial solid- .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 state NMR studies indicate that the tau-promoted α-synuclein filaments have similar structural features to those of one of the polymorphs. However, cryo-EM structure of the tau-promoted α- synuclein filaments at 4.0 Å resolution revealed distinct molecular conformations in the N- and C- terminal regions with a much faster helical twist, suggesting that the co-factor, tau, directs α- synuclein into a distinct misfolding and aggregation pathway. Experimental Methods Protein expression and purification α-Synuclein: Full-length α-synuclein was expressed in BL21(DE3) E. coli cells using pET21a plasmid (a gift from Michael J Fox Foundation, Addgene plasmid # 51486) and was purified at 4 ℃ as previously described.29 Briefly, the transformed E. coli cells were grown at 37 °C in LB medium to an OD600 of 0.8. The protein expression was induced by addition of IPTG to a final concentration of 1 mM and the cells were harvested by centrifugation after 12 hrs of incubation at 25 oC. The bacterial pellet was resuspended in lysis buffer (20 mM Tris, 150 mM NaCl, pH 8.0) and sonicated at 4 oC. The soluble fraction of the lysate was precipitated with ammonium sulfate (50%). The resulting protein pellet collected by centrifugation at 5000 g was resuspended in 10 mM tris buffer (pH 8.0) and the protein solution was dialyzed against 10 mM tris buffer overnight at 4 ℃. α-Synuclein was purified by anion exchange chromatography (HiTrap Q HP; 20 mM tris buffer, pH 8) and size exclusion chromatography (HiLoad 16/60 Superdex 75 pg; 10 mM phosphate buffer, pH 7.4) at 4 ℃. Tau: Recombinant full-length tau (2N4R) protein was expressed and purified from BL21(DE3) E. coli cells transformed with the pET15b plasmid (a gift from Dr. Smet-Nocca, Université de Lille, Sciences et Technologies, France) as previously described.30 Briefly, when the cells were grown at 37 oC in LB medium to an OD of 0.8, they were induced by addition of 0.5 mM IPTG and incubated for 3-4 hrs at 37 °C. After the induction, the cells were harvested by centrifugation. The bacterial pellet was resuspended in the lysis buffer and sonicated at 4 oC. The soluble fraction was heated at 80 ℃ for 20 min and the precipitates were removed by centrifugation. The supernatant containing tau protein was purified by cation exchange chromatography (HiTrap SP HP; 20 mM MES, 2 mM DTT, 1 mM MgCl2, 1 mM EGTA, 1 mM PMSF) followed by size exclusion chromatography (HiLoad 26/60 Superdex 200 pg; 10 mM phosphate buffer, 150 mM NaCl, 1 mM DTT, pH 7.4). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 Preparation of tau promoted α-synuclein filaments To prepare α-synuclein filaments in the presence of tau, monomeric α-synuclein (70 µM in 10 mM phosphate buffer, pH 7.4) was mixed with tau monomers (20 µM in 10 mM phosphate buffer, pH 7.4) and incubated at 37 ℃ for 1 day under constant agitation at 250 rpm in an orbital shaker. Filamentous aggregates were examined with transmission electron microscopy (TEM). TEM α-synuclein filamentous solution (1 mg/ml) was diluted by 20 times with 10 mM phosphate buffer (pH 7.4) and 5 L of the diluted solution was placed on a formvar/carbon supported 400 mesh copper grid. After 30 sec incubation of the sample on the TEM grid, excess sample was blotted off with a filter paper. The grids were washed briefly with 10 L of 1% uranyl acetate. The samples were then stained with 10 L of 1% uranyl acetate for 30 sec and the excess stain was blotted off with a filter paper. The grids were then allowed to air dry and TEM images were collected using a Philips CM12 transmission electron microscope at an accelerating voltage of 80 kV. Cryo-EM data collection A four microliter α-synuclein filamentous solution was applied to the back of each of the glow- discharged R2/1 Quantifoil grids. For the formation of vitrified ice, the grids were manually plunge-frozen into liquid nitrogen temperature cooled liquid ethane, after 3 seconds of blotting with filter papers. Grids were examined on Titan Krios electron microscope, equipped with GATAN K3 camera operated at 300kV. The defocus on camera was set to be randomly within 5,000-25,000Å range. The images have been collected with GATAN automated data collection software Latitude S (GATAN, Inc). The magnification was set to 81,000 and as a result the nominal pixel size is set to 1.1Å (the calibrated pixel size is found to be 1.07Å). Image processing Movies were beam-induced motion corrected (in frame and among frames) and dose-weighted using MotionCor231. Aligned (non-dose-weighted) integrated micrographs were used for contrast transfer function (CTF) estimation of each micrograph, using GCTF 32. Using Relion3-beta 33, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 filaments were manually picked and extracted with helical extraction. Two-dimensional (2D) classification in cisTEM 34 was performed to determine the segments with better quality and no crossing filaments. Segments from the best-looking classes were selected and moved to Relion for further processing. Helical 2D classification in Relion moved all of the segments into a limited number of classes, independent of the values used for regularization parameter (T). Using a cylinder (produced by relion_helix_toolbox) as the initial model and a very tight mask, the segments were helically 3D refined with local search for symmetry, starting from 4.7 Å and -1° values for helical rise and helical twist, respectively 35. The result of the refinement (resolution: ~ 8Å) was then lowpass filtered to 10 Å and then used for 3D classification without alignment with a very tight mask (T=25), into 6 classes which resulted in two improved classes with major portion of the particles (~243,000 and ~109,000 particles). Each of these two classes has been processed, but only the class with ~243,000 particles produced a higher resolution structure. Following the same methodology used in similar studies36, we continued with 3D classification into 1 class, starting with the class average from the last 3D classification lowpass filtered to 8 Å (T=35) with the same tight mask to focus the refinement on the separation of the subunits of the α-synuclein within the mask. Step by step increase of the value of T up to 45, resulted in a higher resolution for the reconstruction. Then to down-weigh the role of mask, we extended the binary mask much more beyond the diameter of the filament to include the structure inside the mask. Local search for helical twist and helical rise converged to -1.19° and 4.76Å. The structure hinted a higher-level helical symmetry with 179.36° and 2.43Å for helical twist and helical rise, respectively, and thus those values were used for further refinement. The handedness of the filaments was initially imposed arbitrarily. Later using a tomography data set of the same filament, the filaments were verified as left-handed. Auto-Refinement, with T=90, resulted in the best map with the highest resolution. Two separate rounds of beam-tilt correction using CtfRefine in Relion were done to improve the overall resolution as well as the map visual quality. Per particle CTF refinement, however, did not improve the resolution of the map. Using Relion post-processing, we were able to determine the overall reconstruction resolution to be 4 Å (Figure S1). Local resolution was determined using the corresponding Relion tool and local sharpening was done using LocalDeblur37. Over-sharpening was seen in last iterations of local sharpening. Consequently, the result with the lowest amount of noise was selected for model building. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 Atomic model building and refinement Resolution of the tau-promoted α-synuclein filament density map was not high enough for de novo modeling of the structure. However, our solid-state NMR data showed that the resonances for certain residues in our filaments are similar to those of a previously reported structure (PDB: 6rt0) for α-synuclein filaments. Hence, the atomic model was built, using 6rt0 as the initial model, starting from the region having residues with resonances of high similarity (residues 53-65) in Coot38. Then a poly-alanine model was built into the density and the residues were later replaced by the correct sequence. The atomic model was refined using Phenix real space refinement 39, manually modified in Coot 38 and validated using Phenix 40 (Table S1 and S2). Lack of well-resolved sidechains makes it difficult to investigate salt-bridges between protofilaments. Therefore, we used MDFF 41 with explicit solvent to look at the molecular dynamic interactions in the atomic model. Water molecules were added in VMD 42 and the solvent was neutralized with 150 mM of NaCl to simulate physiological ionic strength conditions. Salt-bridges between protofilaments (K45, E46) were detected using the Salt-bridge module of VMD. Data availability: The electron density map is available in Electron Microscopy Data Bank (EMDB) with ID EMD- 23212 and the atomic model is available in Protein Data Bank (PDB) with ID 7L7H. The raw data, intermediate maps, masks, and intermediate atomic models are all available from the authors upon request. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 Results Solid-state NMR of Tau-promoted α-synuclein filaments Monomeric α-synuclein (70 M) was incubated in the presence of tau monomers (20 M) at 37 ℃ in 10 mM phosphate buffer (pH 7.4). Long homogeneous filamentous aggregates were observed after 24 hrs of incubation in the presence of tau (Figure 2). Solid-state NMR was initially used to compare structural features of the tau-promoted filaments to those of previously reported in vitro α-synuclein filaments (Figure 3). The two-dimensional 13C-13C correlation spectrum obtained with dipolar-assisted rotational resonance mixing scheme (DARR)43 suggests that the tau-promoted filaments (Figure 3a) have distinct molecular conformations from those of two α-synuclein filaments (Figure 3c and 3d). On the contrary, the 2D DARR spectrum of the tau-promoted filaments is somewhat similar to that of the in vitro filament (red in Figure 3b) with notable differences (black in Figure 3b). These solid-state NMR results suggest that the co-factor, tau, appears to induce the formation of a specific fibrillar conformation. Figure 2. Representative TEM images of tau-promoted α-synuclein filaments showing the homogeneous twisting filaments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 Figure 3. Overview of aliphatic region of 2D 13C -13C DARR NMR spectra of uniformly 13C/15N labeled α-synuclein filament polymorphs. (a) Tau-promoted α-synuclein polymorph. (b) Fibril- type α-synuclein polymorph (BMRB 18860)44. (c) Ribbon-type α-synuclein polymorph (BMRB 17498).45 (d) Greek-key type α-synuclein polymorph (BMRB 25518)16. Cross-peaks with similar NMR resonances for the tau-promoted α-synuclein polymorph and ribbon-type polymorph are colored red in 3b. The ribbon- and fibril-type polymorphs of α-synuclein have distinct molecular packing arrangement and intermolecular interactions.44, 46 The NMR cross-peaks were drawn using our experimental DARR spectrum for the tau-promoted filaments (a) and chemical shifts reported in BMRB for the previously reported DARR spectra of α-synuclein filaments (b – d). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Cryo-EM structure of the tau-promoted α-synuclein filaments Cryo-EM was then used to determine near-atomic structure of the tau-promoted α- synuclein filaments. Preformed tau-promoted filaments were frozen on a carbon-coated grid and images were acquired at 81,000X magnification on a Titan Krios (300 kV) equipped with a K3 GATAN direct electron detector camera. About 240,000 segments extracted from 1,800 micrographs were analyzed using Relion reference-free two-dimensional (2D) classification. The initial classification analyses revealed one major species in the 2D classes (Figure 4a). The 2D classes show that the protofilaments are twisted around with a crossover distance of 610 Å (Figure 4b) and a helical rise of 4.8 Å based on the power spectrum (Figure 4c). The left-twisting handedness was determined by cryo-electron tomography. The 2D classes were used for three dimensional (3D) helical reconstruction in Relion 3, which resulted in a 3D density map at 4.0 Å resolution (Figure 5a). Figure 4. 2D class averages of the tau-promoted α-synuclein filaments. (a) Representative 2D class averages of tau derived filaments using cisTEM (box size of 440 Å). (b) A sinogram representing the full rotation along the helical axis of the filament produced by relion_helix_inimodel2d as described by Scheres.47 (c) The power spectrum of selected 2D reference-free class averages. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 Figure 5. Structural comparison of α-synuclein filament polymorphs. (a) Overlay of the tau- promoted α-synuclein filament atomic model on the density map. (b and c) α-synuclein filament polymorphs 2a and 2b, respectively, determined by previous cryo-EM structural studies.20 (d) Overlaid structures of the tau-promoted α-synuclein filament (purple) and polymorph 2b (green). The same salt bridge between the residues K45 and E46 was observed in the interfacial region of the polymorph 2b (Figure 5c) and tau-promoted α-synuclein polymorph (Figure 5d). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 The 3D density map of the tau-promoted α-synuclein filaments was similar to that of the polymorph 2b, as was suggested by our solid-state NMR (Figure 3a and 3b). It is, however, interesting to note that tau induced the formation of only one polymorph (Figure 5c), although the previous study showed that the protofilaments were assembled into the two fibril polymorphs in the same buffer (Figure 5b and 5c).20 The tau-promoted α-synuclein filament also exhibits notable differences in comparison to that of the polymorph, particularly the N- and C-terminal regions (Figure 5d and Figure S2). Firstly, the interaction between the N-terminal (15-20) and C-terminal (85-91) regions are not observed in the tau-promoted filament. Secondly, the more extensive C- terminal region (80-140) is disordered in comparisons with that of the other structure (91-140), which might be due to interactions between the positively charged tau and negatively charged C- terminal region of α-synuclein (Figure S2). Thirdly, the tau-promoted filaments with a half-pitch of 63 nm are twisted much faster in comparison with that of polymorph 2b (96 nm) (Figure 5d and Figure S3). The structural model for the tau-promoted filaments was compared to the previously reported structures of α-synuclein filaments (Figure 6). Our tau-promoted α-synuclein filaments adopt an overall Greek-key type structure observed in the first solid-state NMR structure of α- synuclein filaments.16 However, several regions including the N- and C-terminal regions (residues 36-48 and 66-79) are notably different from the previous Greek-key type structures, as was suggested by our solid-state NMR results (Figure 3a and 3d). In addition, interfacial contacts between the two protofilaments and the degree of helical twist (Table S3) are quite distinct from those of the previously reported structures. These results indicate that interactions between co- factors and α-synuclein may lead to distinct molecular conformations and intermolecular contacts between the protofilaments of α-synuclein. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 Figure 6. Structural comparison of various polymorphs of full-length α-synuclein filaments. Representative structures of (a) α-synuclein polymorphs 1a (PDB 2n0a, 6a6b)16, 17 and polymorph 1b (PDB 6cu8)18. (b) α-synuclein polymorph 2a (PDB 6ssx)20, polymorph 2b (PDB 6sst)20 and tau-promoted α-synuclein polymorph (this study, PDB 7l7h). (c) MSA patient derived α-synuclein polymorph type-1 (PDB 6xyp)23 and type-2 (PDB 6xyo, 6xyq)23. (d-f) Overlay of protofilament folds of tau-promoted α-synuclein filament with polymorphs 1, 2 and ex vivo MSA polymorphs, respectively. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 Discussion Molecular mechanism by which α-synuclein self-assembles into fibrillar aggregates in vivo has remained largely unknown. It was previously shown that monomeric α-synuclein is stabilized by long-range interactions between the N- and C-terminal regions.48, 49 Perturbations of the long-range interactions may initiate misfolding and aggregation of α-synuclein. Indeed, various co-factors that interact with the N- and/or C-terminal regions promoted the formation of fibrillar aggregates of α- synuclein.50-52 Recently solved cryo-EM structures of ex vivo α-synuclein filaments extracted from MSA and DLB patients revealed that the ex vivo filaments adopt distinct molecular structures from those of in vitro α-synuclein filaments,23 supporting that co-factors may play important roles in promoting α-synuclein aggregation in vivo. Comparative structural analyses of α-synuclein aggregates derived by co-factors and ex vivo aggregates will, therefore, be required to identify co- factors that may play critical roles in α-synuclein aggregation in vivo. Several lines of evidence indicate that pathological proteins such as -amyloid (A) peptides, tau and α-synuclein synergistically promote their mutual aggregation.24, 53-61 In particular, co-existence of tau and α-synuclein aggregates in synucleinopathy patient’s brains suggests that tau may interact with α-synuclein, accelerating the formation of fibrillar α-synuclein aggregates in vivo. In this work, we solved cryo-EM structure of α-synuclein filaments derived by tau and compared the structure to those of previously reported structures of α-synuclein filaments. Previous structural studies of α-synuclein filaments revealed that α-synuclein can form diverse filamentous aggregates with distinct molecular conformations (Figure 6). Polymorphic structures were also observed for the filaments formed even in the same buffer.16, 18, 20 It is plausible that multiple conformers in the conformational ensemble of disordered α-synuclein are able to form diverse α-synuclein filaments with different molecular conformations and/or different interfaces between the protofilaments (Figure 1). Interestingly, tau-promoted α-synuclein filaments adopt a Greek-key type structure similar to one of the polymorphic α-synuclein filaments. However, the detailed molecular conformation and the degree of the helical twist are different from those of the polymorphs (Table S3). In addition, recent studies revealed that poly(ADP-ribose) may interact with α-synuclein in vivo62 and induce the formation of a more toxic α-synuclein strain with distinct molecular conformations.25 These results suggest that the interaction between co-factors and α- synuclein may direct the protein to a specific misfolding and aggregation pathway toward the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 distinct α-synuclein filament, highlighting the importance of cellular environments in protein misfolding and aggregation.63 Previous structural studies of the full-length and truncated α-synuclein filaments revealed that the C-terminally truncated α-synuclein filaments have increased helical twists even though the full-length and truncated filaments adopt almost identical core structures,19, 64 suggesting that the negatively charged C-terminal region affects the helical twist in the parallel alignment. Thus, the increased helical twist of the tau-promoted full-length α-synuclein filaments may result from the electrostatic interaction between the positively charged tau and negatively charged C-terminal region of α-synuclein, which may reduce repulsive interactions between the C-terminal regions in the parallel alignment and facilitate the tighter helical twist. The longer disordered C-terminal region (residues 81 – 140) in the tau-promoted filament compared to that of the previously reported α-synuclein filaments (residues 90 – 140) may also result from the interaction between the tau and C-terminal regions of α-synuclein. In summary, we report a distinct molecular structure of the α-synuclein filament formed in the presence of tau. The interaction between the C-terminal region of α-synuclein and tau leads to a distinct molecular conformation of α-synuclein filament with a shorter helical pitch. These results suggest that interaction between α-synuclein and various potential co-factors in cellular environments may promote the formation of diverse α-synuclein filaments with different molecular conformations. More extensive comparative structural analyses of in vitro α-synuclein filaments derived by co-factors and ex vivo α-synuclein filaments extracted from the patients are required to better understand molecular mechanism of α-synuclein aggregation in vivo. ASSOCIATED CONTENT Supporting Information. FSC curve. Structural comparison of tau derived α-synuclein polymorph and polymorph 2b. Density maps showing the helical twisting patterns α-synuclein polymorph and polymorph 2b. Cryo-EM data collection, refinement, and validation statistics. Helical twists comparison of various α-synuclein polymorphs. The following files are available free of charge. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 AUTHOR INFORMATION Corresponding Author *limk@ecu.edu. Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources This work was supported in part by NIH R01 NS097490 (K.H.L.), R01 AG054025 (R.K.) and R01 NS094557 (R.K.). Notes The authors declare no competing financial interest. ACKNOWLEDGMENT We thank Dr. Jun-yong Choe (East Carolina University) for helpful discussion. We also thank Hamidreza Rahmani for helpful suggestions on molecular dynamics analysis on the atomic model. ABBREVIATIONS NMR, nuclear magnetic resonance; TEM, transmission electron microscopy; DARR, dipolar assisted rotational resonance; cryo-EM, cryo-electron microscopy. Accession Codes α-synuclein: UniProtKB entry P37840 tau: UniProtKB entry P10636 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 References 1. Goedert, M. (2001) Alpha-synuclein and neurodegenerative diseases. Nat. Rev. Neurosci. 2, 492-501. 2. Westermark, G. T., and Westermark, P. (2010) Prion-like aggregates: Infectious agents in human disease. Trends Mol. Med. 16, 501-507. 3. Clavaguera, F., Lavenir, I., Falcon, B., Frank, S., Goedert, M., and Tolnay, M. (2013) "Prion- like" templated misfolding in tauopathies. Brain Pathol. 23, 342-349. 4. Kim, J., and Holtzman, D. M. (2010) Medicine. prion-like behavior of amyloid-beta. Science. 330, 918-9. 5. Jucker, M., and Walker, L. C. (2013) Self-propagation of pathogenic protein aggregates in neurodegenerative diseases. Nature. 501, 45-51. 6. Frost, B., and Diamond, M. I. (2010) Prion-like mechanisms in neurodegenerative diseases. Nat Rev Neurosci. 11, 155-9. 7. Luk, K. C., Kehm, V., Carroll, J., Zhang, B., O'Brien, P., Trojanowski, J. Q., and Lee, V. M. (2012) Pathological alpha-synuclein transmission initiates parkinson-like neurodegeneration in nontransgenic mice. Science. 338, 949-953. 8. Irwin, D. J., Lee, V. M., and Trojanowski, J. Q. (2013) Parkinson's disease dementia: Convergence of alpha-synuclein, tau and amyloid-beta pathologies. Nat. Rev. Neurosci. 14, 626- 636. 9. Iba, M., Guo, J. L., McBride, J. D., Zhang, B., Trojanowski, J. Q., and Lee, V. M. (2013) Synthetic tau fibrils mediate transmission of neurofibrillary tangles in a transgenic mouse model of alzheimer's-like tauopathy. J. Neurosci. 33, 1024-1037. 10. Peng, C., Gathagan, R. J., Covell, D. J., Medellin, C., Stieber, A., Robinson, J. L., Zhang, B., Pitkin, R. M., Olufemi, M. F., Luk, K. C., Trojanowski, J. Q., and Lee, V. M. (2018) Cellular milieu imparts distinct pathological alpha-synuclein strains in alpha-synucleinopathies. Nature. 557, 558-563. 11. Wong, Y. C., and Krainc, D. (2017) Α-synuclein toxicity in neurodegeneration: Mechanism and therapeutic strategies. Nat. Med. 23, 1-13. 12. Sacino, A. N., Brooks, M., Thomas, M. A., McKinney, A. B., Lee, S., Regenhardt, R. W., McGarvey, N. H., Ayers, J. I., Notterpek, L., Borchelt, D. R., Golde, T. E., and Giasson, B. I. (2014) Intramuscular injection of alpha-synuclein induces CNS alpha-synuclein pathology and a rapid-onset motor phenotype in transgenic mice. Proc. Natl. Acad. Sci. U. S. A. 111, 10732- 10737. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 13. Spillantini, M. G., Schmidt, M. L., Lee, V. M., Trojanowski, J. Q., Jakes, R., and Goedert, M. (1997) Alpha-synuclein in lewy bodies. Nature. 388, 839-840. 14. Stephens, A. D., Zacharopoulou, M., and Kaminski Schierle, G. S. (2019) The cellular environment affects monomeric α-synuclein structure. Trends Biochem. Sci. 44, 453-466. 15. Candelise, N., Schmitz, M., Thüne, K., Cramm, M., Rabano, A., Zafar, S., Stoops, E., Vanderstichele, H., Villar-Pique, A., Llorens, F., and Zerr, I. (2020) Effect of the micro- environment on α-synuclein conversion and implication in seeded conversion assays. Transl. Neurodegener. 9, 5-9. eCollection 2020. 16. Tuttle, M. D., Comellas, G., Nieuwkoop, A. J., Covell, D. J., Berthold, D. A., Kloepper, K. D., Courtney, J. M., Kim, J. K., Barclay, A. M., Kendall, A., Wan, W., Stubbs, G., Schwieters, C. D., Lee, V. M., George, J. M., and Rienstra, C. M. (2016) Solid-state NMR structure of a pathogenic fibril of full-length human alpha-synuclein. Nat. Struct. Mol. Biol. 23, 409-415. 17. Li, Y., Zhao, C., Luo, F., Liu, Z., Gui, X., Luo, Z., Zhang, X., Li, D., Liu, C., and Li, X. (2018) Amyloid fibril structure of alpha-synuclein determined by cryo-electron microscopy. Cell Res. 28, 897-903. 18. Li, B., Ge, P., Murray, K. A., Sheth, P., Zhang, M., Nair, G., Sawaya, M. R., Shin, W. S., Boyer, D. R., Ye, S., Eisenberg, D. S., Zhou, Z. H., and Jiang, L. (2018) Cryo-EM of full-length alpha-synuclein reveals fibril polymorphs with a common structural kernel. Nat. Commun. 9, 3609-2. 19. Ni, X., McGlinchey, R. P., Jiang, J., and Lee, J. C. (2019) Structural insights into alpha- synuclein fibril polymorphism: Effects of parkinson's disease-related C-terminal truncations. J. Mol. Biol. 431, 3913-3919. 20. Guerrero-Ferreira, R., Taylor, N. M., Arteni, A. A., Kumari, P., Mona, D., Ringler, P., Britschgi, M., Lauer, M. E., Makky, A., Verasdonck, J., Riek, R., Melki, R., Meier, B. H., Bockmann, A., Bousset, L., and Stahlberg, H. (2019) Two new polymorphic structures of human full-length alpha-synuclein fibrils solved by cryo-electron microscopy. Elife. 8, 10.7554/eLife.48907. 21. Guerrero-Ferreira, R., Taylor, N. M., Mona, D., Ringler, P., Lauer, M. E., Riek, R., Britschgi, M., and Stahlberg, H. (2018) Cryo-EM structure of alpha-synuclein fibrils. Elife. 7, 10.7554/eLife.36402. 22. Strohäker, T., Jung, B. C., Liou, S. H., Fernandez, C. O., Riedel, D., Becker, S., Halliday, G. M., Bennati, M., Kim, W. S., Lee, S. J., and Zweckstetter, M. (2019) Structural heterogeneity of α-synuclein fibrils amplified from patient brain extracts. Nat. Commun. 10, 5535-w. 23. Schweighauser, M., Shi, Y., Tarutani, A., Kametani, F., Murzin, A. G., Ghetti, B., Matsubara, T., Tomita, T., Ando, T., Hasegawa, K., Murayama, S., Yoshida, M., Hasegawa, M., .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 Scheres, S. H. W., and Goedert, M. (2020) Structures of alpha-synuclein filaments from multiple system atrophy. Nature. 585, 464-469. 24. Giasson, B. I., Forman, M. S., Higuchi, M., Golbe, L. I., Graves, C. L., Kotzbauer, P. T., Trojanowski, J. Q., and Lee, V. M. (2003) Initiation and synergistic fibrillization of tau and alpha-synuclein. Science. 300, 636-640. 25. Kam, T. I., Mao, X., Park, H., Chou, S. C., Karuppagounder, S. S., Umanah, G. E., Yun, S. P., Brahmachari, S., Panicker, N., Chen, R., Andrabi, S. A., Qi, C., Poirier, G. G., Pletnikova, O., Troncoso, J. C., Bekris, L. M., Leverenz, J. B., Pantelyat, A., Ko, H. S., Rosenthal, L. S., Dawson, T. M., and Dawson, V. L. (2018) Poly(ADP-ribose) drives pathologic α-synuclein neurodegeneration in parkinson's disease. Science. 362, eaat8407. doi: 10.1126/science.aat8407. 26. Galvagnion, C., Buell, A. K., Meisl, G., Michaels, T. C., Vendruscolo, M., Knowles, T. P., and Dobson, C. M. (2015) Lipid vesicles trigger α-synuclein aggregation by stimulating primary nucleation. Nat. Chem. Biol. 11, 229-234. 27. Sengupta, U., Puangmalai, N., Bhatt, N., Garcia, S., Zhao, Y., and Kayed, R. (2020) Polymorphic α-synuclein strains modified by dopamine and docosahexaenoic acid interact differentially with tau protein. Mol. Neurobiol. 57, 2741-2765. 28. Dasari, A. K. R., Kayed, R., Wi, S., and Lim, K. H. (2019) Tau interacts with the C-terminal region of α-synuclein, promoting formation of toxic aggregates with distinct molecular conformations. Biochemistry. 58, 2814-2821. 29. Ghee, M., Melki, R., Michot, N., and Mallet, J. (2005) PA700, the regulatory complex of the 26S proteasome, interferes with alpha-synuclein assembly. FEBS J. 272, 4023-4033. 30. Despres, C., Byrne, C., Qi, H., Cantrelle, F. X., Huvent, I., Chambraud, B., Baulieu, E. E., Jacquot, Y., Landrieu, I., Lippens, G., and Smet-Nocca, C. (2017) Identification of the tau phosphorylation pattern that drives its aggregation. Proc. Natl. Acad. Sci. U. S. A. 114, 9080- 9085. 31. Zheng, S. Q., Palovcak, E., Armache, J., Cheng, Y., and Agard, D. A. (2016) Anisotropic correction of beam-induced motion for improved single-particle electron cryo-microscopy. Cold Spring Harbor Laboratory, . 32. Zhang, K. (2015) Gctf: Real-time CTF determination and correction. Cold Spring Harbor Laboratory, . 33. Zivanov, J., Nakane, T., Forsberg, B. O., Kimanius, D., Hagen, W. J., Lindahl, E., and Scheres, S. H. (2018) New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife. 7, 1-22. 34. Grant, T., Rohou, A., and Grigorieff, N. (2018) cisTEM, user-friendly software for single- particle image processing. eLife. 7, 1-24. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 35. He, S., and Scheres, S. H. W. (2017) Helical reconstruction in RELION. Journal of structural biology. 198, 163-176. 36. Fitzpatrick, A. W. P., Falcon, B., He, S., Murzin, A. G., Murshudov, G., Garringer, H. J., Crowther, R. A., Ghetti, B., Goedert, M., and Scheres, S. H. W. (2017) Cryo-EM structures of tau filaments from alzheimer’s disease brain. Nature. 547, 185-190. 37. Ramírez-Aportela, E., Vilas, J. L., Glukhova, A., Melero, R., Conesa, P., Martínez, M., Maluenda, D., Mota, J., Jiménez, A., Vargas, J., Marabini, R., Sexton, P. M., Carazo, J. M., and Sorzano, C. O. S. (2019) Automatic local resolution-based sharpening of cryo-EM maps. Computer applications in the biosciences. 36, 765-772. 38. Emsley, P., Lohkamp, B., Scott, W. G., and Cowtan, K. (2010) Features and development of coot. Acta crystallographica. Section D, Biological crystallography. 66, 486-501. 39. DiMaio, F., Song, Y., Li, X., Brunner, M. J., Xu, C., Conticello, V., Egelman, E., Marlovits, T. C., Cheng, Y., and Baker, D. (2015) Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nature methods. 12, 361-365. 40. Afonine, P. V., Klaholz, B. P., Moriarty, N. W., Poon, B. K., Sobolev, O. V., Terwilliger, T. C., Adams, P. D., and Urzhumtsev, A. (2018) New tools for the analysis and validation of cryo- EM maps and atomic models. Acta crystallographica. Section D, Structural biology. 74, 814- 840. 41. Trabuco, L. G., Villa, E., Schreiner, E., Harrison, C. B., and Schulten, K. (2009) Molecular dynamics flexible fitting: A practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods (San Diego, Calif.). 49, 174-180. 42. Humphrey, W., Dalke, A., and Schulten, K. (1996) VMD: Visual molecular dynamics. Journal of molecular graphics. 14, 33-38. 43. Takegoshi, K., Nakamura, S., and Terao, T. (2003) 13C–1H dipolar-driven 13C–13C recoupling without 13C rf irradiation in nuclear magnetic resonance of rotating solids. The Journal of Chemical Physics. 118, 2325-2341. 44. Gath, J., Bousset, L., Habenstein, B., Melki, R., Bockmann, A., and Meier, B. H. (2014) Unlike twins: An NMR comparison of two alpha-synuclein polymorphs featuring different toxicity. PLoS One. 9, e90659. 45. Gath, J., Habenstein, B., Bousset, L., Melki, R., Meier, B. H., and Bockmann, A. (2012) Solid-state NMR sequential assignments of alpha-synuclein. Biomol. NMR Assign. 6, 51-55. 46. Bousset, L., Pieri, L., Ruiz-Arlandis, G., Gath, J., Jensen, P. H., Habenstein, B., Madiona, K., Olieric, V., Bockmann, A., Meier, B. H., and Melki, R. (2013) Structural and functional characterization of two alpha-synuclein strains. Nat. Commun. 4, 2575. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 47. Scheres, S. H. W. (2020) Amyloid structure determination in RELION-3.1. Acta Crystallogr. D. Struct. Biol. 76, 94-101. 48. Bernado, P., Bertoncini, C. W., Griesinger, C., Zweckstetter, M., and Blackledge, M. (2005) Defining long-range order and local disorder in native alpha-synuclein using residual dipolar couplings. J. Am. Chem. Soc. 127, 17968-17969. 49. Dedmon, M. M., Lindorff-Larsen, K., Christodoulou, J., Vendruscolo, M., and Dobson, C. M. (2005) Mapping long-range interactions in alpha-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc. 127, 476-477. 50. Fernandez, C. O., Hoyer, W., Zweckstetter, M., Jares-Erijman, E. A., Subramaniam, V., Griesinger, C., and Jovin, T. M. (2004) NMR of alpha-synuclein-polyamine complexes elucidates the mechanism and kinetics of induced aggregation. EMBO J. 23, 2039-2046. 51. Lemkau, L. R., Comellas, G., Lee, S. W., Rikardsen, L. K., Woods, W. S., George, J. M., and Rienstra, C. M. (2013) Site-specific perturbations of alpha-synuclein fibril structure by the parkinson's disease associated mutations A53T and E46K. PLoS One. 8, e49750. 52. Kim, C., Lv, G., Lee, J. S., Jung, B. C., Masuda-Suzukake, M., Hong, C. S., Valera, E., Lee, H. J., Paik, S. R., Hasegawa, M., Masliah, E., Eliezer, D., and Lee, S. J. (2016) Exposure to bacterial endotoxin generates a distinct strain of alpha-synuclein fibril. Sci. Rep. 6, 30891. 53. Moussaud, S., Jones, D. R., Moussaud-Lamodiere, E. L., Delenclos, M., Ross, O. A., and McLean, P. J. (2014) Alpha-synuclein and tau: Teammates in neurodegeneration? Mol. Neurodegener. 9, 43-43. 54. Fujishiro, H., Tsuboi, Y., Lin, W. L., Uchikado, H., and Dickson, D. W. (2008) Co- localization of tau and alpha-synuclein in the olfactory bulb in alzheimer's disease with amygdala lewy bodies. Acta Neuropathol. 116, 17-24. 55. Forman, M. S., Schmidt, M. L., Kasturi, S., Perl, D. P., Lee, V. M., and Trojanowski, J. Q. (2002) Tau and alpha-synuclein pathology in amygdala of parkinsonism-dementia complex patients of guam. Am. J. Pathol. 160, 1725-1731. 56. Castillo-Carranza, D. L., Guerrero-Munoz, M. J., Sengupta, U., Gerson, J. E., and Kayed, R. (2018) Alpha-synuclein oligomers induce a unique toxic tau strain. Biol. Psychiatry. 84, 499- 508. 57. Ishizawa, T., Mattila, P., Davies, P., Wang, D., and Dickson, D. W. (2003) Colocalization of tau and alpha-synuclein epitopes in lewy bodies. J. Neuropathol. Exp. Neurol. 62, 389-397. 58. Gerson, J. E., Farmer, K. M., Henson, N., Castillo-Carranza, D. L., Carretero Murillo, M., Sengupta, U., Barrett, A., and Kayed, R. (2018) Tau oligomers mediate alpha-synuclein toxicity and can be targeted by immunotherapy. Mol. Neurodegener. 13, 13-9. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 59. Clinton, L. K., Blurton-Jones, M., Myczek, K., Trojanowski, J. Q., and LaFerla, F. M. (2010) Synergistic interactions between abeta, tau, and alpha-synuclein: Acceleration of neuropathology and cognitive decline. J. Neurosci. 30, 7281-7289. 60. Bhasne, K., Sebastian, S., Jain, N., and Mukhopadhyay, S. (2018) Synergistic amyloid switch triggered by early heterotypic oligomerization of intrinsically disordered α-synuclein and tau. J. Mol. Biol. 430, 2508-2520. 61. Lu, J., Zhang, S., Ma, X., Jia, C., Liu, Z., Huang, C., Liu, C., and Li, D. (2020) Structural basis of the interplay between α-synuclein and tau in regulating pathological amyloid aggregation. J. Biol. Chem. 295, 7470-7480. 62. Puentes, L. N., Lengyel-Zhand, Z., Lee, J. Y., Hsieh, C., Schneider, M. E., Edwards, K. J., Luk, K. C., Lee, V. M. -., Trojanowski, J. Q., and Mach, R. H. (2020) Poly (ADP-ribose) induces α-synuclein aggregation in neuronal-like cells and interacts with phosphorylated α-synuclein in post mortem PD samples. bioRxiv. , 2020.04.08.032250. 63. Cendrowska, U., Silva, P. J., Ait-Bouziad, N., Müller, M., Guven, Z. P., Vieweg, S., Chiki, A., Radamaker, L., Kumar, S. T., Fändrich, M., Tavanti, F., Menziani, M. C., Alexander-Katz, A., Stellacci, F., and Lashuel, H. A. (2020) Unraveling the complexity of amyloid polymorphism using gold nanoparticles and cryo-EM. Proc. Natl. Acad. Sci. U. S. A. 117, 6866-6874. 64. Iyer, A., Roeters, S. J., Kogan, V., Woutersen, S., Claessens, M M A E, and Subramaniam, V. (2017) C-terminal truncated alpha-synuclein fibrils contain strongly twisted beta-sheets. J. Am. Chem. Soc. 139, 15392-15400. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 For Table of Contents use only .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 3, 2021. ; https://doi.org/10.1101/2020.12.31.424989doi: bioRxiv preprint https://doi.org/10.1101/2020.12.31.424989 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_01_425047 ---- Degradation of Photoreceptor Outer Segments by the Retinal Pigment Epithelium Requires Pigment Epithelium-derived Factor Receptor (PEDF-R) 1 Degradation of Photoreceptor Outer Segments by the Retinal Pigment 1 Epithelium Requires Pigment Epithelium-derived Factor Receptor (PEDF-R) 2 Jeanee Bullock1,2*, Federica Polato1*, Mones Abu-Asab3, Alexandra Bernardo-Colón1, Elma 3 Aflaki1, Martin-Paul Agbaga4, S. Patricia Becerra1a 4 1Section of Protein Structure and Function-LRCMB, National Eye Institute, National Institutes 5 of Health, Bethesda, MD; 2Department of Biochemistry and Molecular & Cellular Biology, 6 Georgetown University Medical Center, Washington D.C.; 3Section of Histopathology, National 7 Eye Institute, National Institutes of Health, Bethesda, MD, 4Departments of Cell Biology and 8 Ophthalmology, Dean McGee Eye Institute, University of Oklahoma HSC, Oklahoma City, OK 9 *These authors contributed equally to this work. 10 aCorresponding author: 11 S. Patricia Becerra 12 NIH-NEI-LRCMB 13 Section of Protein Structure and Function 14 Bg. 6, Rm. 134 15 6 Center Drive MSC 0608 16 Bethesda, MD 20892-0608 17 becerrap@nei.nih.gov 18 19 Present address: 20 JB: Fort Washington, MD, USA; FP: Washington DC, USA; EA: National Institute of Alcohol 21 Abuse and Alcoholism, NIH 22 Funding information: This work was supported by the Intramural Research Program of the 23 National Eye Institute, NIHEY000306 to SPB and by NIH/NEI R01 EY030513 to MPA. 24 25 Word count: 7951 26 27 J. Bullock, None; F. Polato, None; M. Abu-Asab, None; A. Bernardo-Colón, None; E. Aflaki, 28 None; M.P. Agbaga, None; S. P. Becerra, None. 29 30 31 Abbreviations: 32 AMD, age-related macular degeneration; BEL, bromoenol lactone; β-HB, beta hydroxybutyrate; 33 cre, cyclization recombinase; DHA, docosahexaenoic acid; loxP, locus of X-over, P1; PEDF-R, 34 pigment epithelium-derived factor receptor; PNPLA2, patatin-like phospholipase domain 35 containing 2; POS, photoreceptor outer segments; ROI, regions of interest; RPE, retinal pigment 36 epithelium; TEM, transmission electron microscopy; WT, wild type 37 mailto:becerrap@nei.nih.gov 2 Abstract 38 Purpose: To examine the contribution of PEDF-R to the phagocytosis process. Previously, we 39 identified PEDF-R, the protein encoded by the PNPLA2 gene, as a phospholipase A2 in the 40 retinal pigment epithelium (RPE). During phagocytosis, RPE cells ingest abundant phospholipids 41 and protein in the form of photoreceptor outer segment (POS) tips, which are then hydrolyzed. 42 The role of PEDF-R in RPE phagocytosis is not known. 43 Methods: Mice in which PNPLA2 was conditionally knocked out in the RPE were generated 44 (cKO). Mouse RPE/choroid explants were cultured. Human ARPE-19 cells were transfected 45 with siPNPLA2 silencing duplexes. POS were isolated from bovine retinas. The phospholipase 46 A2 inhibitor bromoenol lactone was used. Transmission electron microscopy, 47 immunofluorescence, lipid labeling, pulse-chase experiments, western blots, and free fatty acid 48 and β-hydroxybutyrate assays were performed. 49 Results: The RPE of the cKO mice accumulated lipids as well as more abundant and larger 50 rhodopsin particles compared to littermate controls. Upon POS exposure, RPE explants from 51 cKO mice released less β-hydroxybutyrate compared to controls. After POS ingestion during 52 phagocytosis, rhodopsin degradation was stalled both in cells treated with bromoenol lactone and 53 in PNPLA2-knocked-down cells relative to their corresponding controls. Phospholipase A2 54 inhibition lowered β-hydroxybutyrate release from phagocytic RPE cells. PNPLA2 knock down 55 also resulted in a decline in fatty acids and β-hydroxybutyrate release from phagocytic RPE cells. 56 Conclusions: PEDF-R downregulation delayed POS digestion during phagocytosis. The 57 findings imply that efficiency of RPE phagocytosis depends on PEDF-R, thus identifying a novel 58 contribution of this protein to POS degradation in the RPE. 59 3 A vital function of the retinal pigment epithelium (RPE) is to phagocytose the tips of the 60 photoreceptors in the neural retina. As one of the most active phagocytes in the body, RPE cells 61 ingest daily a large amount of lipids and protein in the form of photoreceptor outer segments 62 (POS) tips.1–5 On the one hand, as outer segments are constantly being renewed at the base of 63 photoreceptors, the ingestion of POS tips (~10% of an outer segment) by RPE cells serves to 64 balance outer segment renewal, which is necessary for the visual activity of photoreceptors. On 65 the other hand, the ingested POS supply an abundant source of fatty acids, which are substrates 66 for fatty acid β-oxidation and ketogenesis to support the energy demands of the RPE.6–8 The fatty 67 acids liberated from phagocytosed POS are also used as essential precursors for lipid and 68 membrane synthesis, and as bioactive mediators in cell signaling processes, e.g., the main fatty 69 acid in POS phospholipids is docosahexaenoic acid, which is involved in signaling in the retina.9 70 Rhodopsin, a pigment present in rod photoreceptors involve in visual phototransduction, is the 71 most abundant protein in POS. Approximately 85% of the total protein of isolated bovine POS is 72 rhodopsin,10 which is embedded in a phospholipid bilayer at a molar ratio between rhodopsin and 73 phospholipids of about 1:60.11 Conversely, the RPE lacks expression of the rhodopsin gene. The 74 importance of POS clearance by the RPE in the maintenance of photoreceptors was 75 demonstrated in an animal model for retinal degeneration, the Royal College Surgeons (RCS) 76 rats, in which a genetic defect in the RCS rats renders their RPE unable to effectively 77 phagocytose POS, thereby leading to rapid photoreceptor degeneration.12,13 Moreover, human 78 RPE phagocytosis declines moderately with age and the decline is significant in RPE of human 79 donors with age-related macular degeneration (AMD), underscoring its importance in this 80 disease.14 Therefore, there is increasing interest in studying regulatory hydrolyzing enzymes 81 involved in RPE phagocytosis for maintaining retina function and the visual process. 82 4 We have previously reported that the human RPE expresses the PNPLA2 gene, which encodes a 83 503 amino acid polypeptide that exhibits phospholipase A2 (PLA2) activity and termed pigment 84 epithelium-derived factor receptor (PEDF-R).15 The enzyme liberates fatty acids from 85 phospholipids, specifically those in which DHA is in the sn-2 position.16 RPE plasma 86 membranes contain the PEDF-R protein,15,17 and photoreceptor membrane phospholipids have 87 high content of DHA in their sn-2 position,9 suggesting that upon POS ingestion the substrate 88 lipid is available to interact with PEDF-R. Other laboratories used different names for the PEDF-89 R protein (e.g., iPLA2ζ, desnutrin, adipose triglyceride lipase), and showed that it exhibits 90 additional lipase activities: triglyceride lipase and acylglycerol transacylase enzymatic 91 activities.18–20 In macrophages, the triglyceride hydrolytic activity is critical for efficient 92 efferocytosis of bacteria and yeast.21 Interestingly, we and others have shown that the inhibitor of 93 calcium-independent phospholipases A2 (iPLA2s), bromoenol lactone (BEL), inhibits the 94 phospholipase and triolein lipase activities of PEDF-R/iPLA2ζ.15,18 In addition, BEL can impair 95 the phagocytosis of POS by ARPE-19 cells, associating phospholipase A2 activity with the 96 regulation of photoreceptor cell renewal.22 However, the responsible phospholipase enzyme 97 involved in RPE phagocytosis is not yet known. 98 Given that the role of PEDF-R in RPE phagocytosis has not yet been studied, here we explored 99 its contribution in this process. We hypothesized that PEDF-R is involved in the degradation of 100 phospholipid-rich POS in RPE phagocytosis. To test this hypothesis, we silenced the PNPLA2 101 gene in vivo and in vitro. Results show that with down regulation of PNPLA2 expression and 102 inhibition of the PLA2 activity of PEDF-R, RPE cells cannot break down rhodopsin, nor release 103 β-hydroxybutyrate (β-HB) and fatty acids, thus identifying a novel contribution of this protein in 104 5 POS degradation. We discuss the role that PEDF-R may play in the disposal of lipids from 105 ingested OS, and in turn in the regulation of photoreceptor cell renewal. 106 Methods 107 Animals 108 The generation of desnutrin floxed mice (hereafter referred to as Pnpla2f/f)23 and the Tg(BEST1-109 cre)Jdun transgenic line24 (which will be named BEST1-cre in this report) have been previously 110 reported. The desnutrin floxed transgenic mouse model was kindly donated to our laboratory by 111 Dr. Hei Sook Sul. The transgenic Tg(BEST1-cre)Jdun mouse model was a generous gift by Dr. 112 Joshua Dunaief. It is an RPE-specific, cre-expressing transgenic mouse line, in which the activity 113 of the human BEST1 promoter is restricted to the RPE and drives the RPE-specific expression of 114 the targeted cre in the eye of transgenic mice.24 Homozygous floxed Pnpla2 (Pnpla2f/f) mice 115 were crossed with transgenic BEST1-cre mice. The resulting mice carrying one floxed allele and 116 the cre transgene (Pnpla2f/+/cre) were crossed with Pnpla2f/f mice to generate mice with Pnpla2 117 knockout specifically in the RPE, which are homozygous floxed mice expressing the cre 118 transgene only in the RPE, Pnpla2f/f/Cre (here also termed cKO). Pnpla2f/f/cre or Pnpla2f/+/Cre were 119 also used for breeding with Pnpla2f/f to expand the colony. Pnpla2f/+ or Pnpla2f/f littermates, 120 obtained through this breeding, were used as control mice. All procedures involving mice were 121 conducted following protocols approved by the National Eye Institute Animal Care and Use 122 Committee and in accordance with the Association for Research in Vision and Ophthalmology 123 Statement for the Use of Animals in Ophthalmic and Vision Research. The mice were housed in 124 the NEI animal facility with lighting at around 280-300 lux in 12 h (6 AM-6 PM) light/12 h dark 125 (6 PM-6 AM) cycles. 126 6 DNA isolation 127 DNA was isolated from mouse eyecups using the salt-chloroform DNA extraction method25 and 128 dissolved in 200 µl of TE (Tris-EDTA composed of 10 mM Tris-HCl, pH 8, and 1 mM EDTA). 129 Aliquots (2 µl) of the DNA solution were then used for each PCR reaction using oligonucleotide 130 primers P1 and P2 (sequences kindly provided by the laboratory of Dr. Hei Sook Sul; Table 1). 131 RNA extraction, cDNA synthesis, and quantitative RT-PCR 132 RNA was isolated from the mouse RPE following the methodology previously described.26 Total 133 RNA was purified from ARPE-19 cells using the RNeasy® Mini Kit (Qiagen, Germantown, MD) 134 following the manufacturer’s instructions. Between 100-500 ng of total RNA were used for 135 reverse transcription using the SuperScript III first-strand synthesis system (Invitrogen, Carlsbad, 136 CA). The PNPLA2 transcript levels in ARPE19 cells determined by quantitative RT-PCR were 137 normalized using the QuantiTect SYBR Green PCR Kit (Qiagen) in the QuantStudio 7 Flex 138 Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA). The primer sequences used 139 in this study are listed in Table 1. Murine PNPLA2 mRNA levels relative to HPRT transcript 140 levels were measured by the QuantStudio 7 Flex Real-Time PCR System using Taqman® gene 141 expression assays (PNPLA2, Mm00503040_m1; HPRT, Mm00446968_m1, Thermo Fisher 142 Scientific). PNPLA2 relative expression to HPRT was calculated using the comparative ΔΔCt 143 method.27 144 Eyecup flatmounts 145 Eyecup (RPE, choroid, sclera) flatmounts were prepared and processed as follows. After 146 enucleation, and removal of cornea, lens, and retina, eyecups were fixed for 1 h in 4% 147 paraformaldehyde at room temperature, and washed 3 times for 10 min each in Tris-Buffered 148 7 Saline (TBS; 25mM Tris HCl pH 7.4, 137 mM NaCl, 2.7 mM KCl). They were then blocked for 149 1 h with 10% normal goat serum (NGS) in 0.1% TBS-Ta (TBS containing 0.1% Triton-X, 150 Sigma, St. Louis, MO). Primary antibodies against cre recombinase and rhodopsin (see Table 2) 151 in 0.1% TBS-Ta containing 2% NGS were diluted and used at 4°C for 16 h. Then, the eyecups 152 were washed 3 times for 10 min each with TBS-Ta followed by incubation at room temperature 153 for 1 h with the respective secondary antibodies, using DAPI (to counterstain the nuclei) and 154 Alexa Fluor 647-phalloidin (to label the RPE cytoskeleton) diluted in 0.1% TBS-Ta containing 155 2% NGS. Eyecups were then flattened by introducing incisions and mounted with Prolong Gold 156 antifade reagent (Thermo Fisher Scientific). Images of the entire flatmounts were collected using 157 the tiling feature of the epifluorescent Axio Imager Z1 microscope (Carl Zeiss Microscopy, 158 White Plains, NY) at 20X magnification. The collected images were stitched together using the 159 corresponding feature of the Zen Blue software (Carl Zeiss Microscopy). Eyecups were also 160 imaged using confocal microscopy (Zeiss LSM 700) at 20X magnification collecting z-stacks 161 spanning 2 µm from each other and covering from the basal to the apical surface of the RPE 162 cells. The image resulting from the maximum intensity projection of the z-stacks was employed 163 for analysis. 164 Five regions of interest (ROI; 520 µm x 520 µm) were selected for each image of the flatmount 165 from cKO mice and control mice. The percentage of cre-positive cells was determined by 166 dividing the number of cells containing cre-stained nuclei by the number of RPE cells in each 167 ROI (identified by F-actin staining). 168 For phagocytosis assay, at least six ROI (320.5 µm x 320.5 µm) were analyzed per mouse. 169 Rhodopsin-stained particles were counted using Image J, after adjusting the color threshold and 170 size of the particles to eliminate the background. 171 8 Transmission electron microscopy 172 Mouse eyes were enucleated and doubly-fixed in 2.5% glutaraldehyde in PBS and 0.5% osmium 173 tetroxide in PBS and embedded in epoxy resin. Thin sections (90nm in thickness) sections were 174 generated and placed on 200-mesh copper grids, dried for 24 h, and double-stained with uranyl 175 acetate and lead citrate. Sections were viewed and photographed with a JEOL JM-1010 176 transmission electron microscope. 177 Electroretinography (ERG) 178 In dim red light, overnight dark-adapted mice were anesthetized by intraperitoneal (IP) injection 179 of Ketamine (92.5 mg/kg) and xylazine (5.5 mg/kg). Pupils were dilated with a mixture of 1% 180 tropicamide and 0.5 % phenylephrine. A topical anesthetic, Tetracaine (0.5%), was administered 181 before positioning the electrodes on the cornea for recording. ERG was recorded from both eyes 182 by the Espion E2 system with ColorDome (Diagnosys LLC, Lowell, MA, USA). Dark-adapted 183 responses were elicited with increasing light impulses with intensity from 0.0001 to 10 candela-184 seconds per meter squared (sc cd.s/m2). Light-adapted responses were recorded after 2 min 185 adaptation to a rod-saturating background (20 cd/m2) with light stimulus intensity from 0.3 to 186 100 sc cd.s/m2. During the recording, the mouse body temperature was maintained at 37°C by 187 placing them on a heating pad. Amplitudes for a-wave were measured from baseline to negative 188 peak, and b-wave amplitudes were measured from a-wave trough to b-wave peak. 189 DC ERG 190 For DC-ERG, sliver chloride electrode connected to glass capillary tubes filled with Hank’s 191 buffered salt solution (HBSS) were used for recording. The electrodes were kept in contact with 192 the cornea for 10 minutes minimum until the electrical activity reached steady-state. Responses 193 to 7-min stead light stimulation were recorded. 194 9 Cell Culture 195 Human ARPE-19 cells (ATCC, Manassas, VA, USA, Cat. # CRL-2302) were maintained in 196 Dulbecco’s modified eagle medium/Nutrient Mixture F-12 (DMEM/F-12) (Gibco; Grand Island, 197 NY) supplemented in 10% fetal bovine serum (FBS) (Gibco) and 1% penicillin/streptomycin 198 (Gibco) at 37°C with 5% CO2. For assays described below, a total of 1 x 105 cells in 0.5 ml were 199 plated per well of a 24-well plates and incubated for 3 days in DMEM/F12 with 10% FBS and 200 1% penicillin-streptomycin. ARPE-19 cells were authenticated by Bio-Synthesis (Lewisville, 201 TX) at passage 27. ARPE-19 cells in passage numbers 27-32 were used for all experiments. 202 Silencing of PNPLA2 in ARPE-19 cells using siRNA 203 Small interfering RNA (siRNA) oligo duplexes of 27 bases in length for human PNPLA2 were 204 purchased from OriGene (Rockville, MD). Their sequences, and that of a Scramble siRNA (Scr) 205 (Cat#: SR324651 and SR311349) are given in Table 3. From the six duplexes, siRNAs C, D, and 206 E consistently provided the highest silencing efficiency and therefore these three duplexes were 207 used individually for silencing experiments and referred to as siPNPLA2. ARPE-19 cells were 208 transfected by reverse transfection in 24-well tissue culture plates as follows: A total of 6 pmols 209 of siRNA was diluted in 100 µl of OptiMem (Gibco) per well, mixed with 1 µl of Lipofectamine 210 RNAiMAX (Invitrogen), and mock transfected cells received only 1 µl of Lipofectamine. Then 211 the mixture was added to each well. After incubation at room temperature for 10 min, a total of 1 212 x 105 cells in 500 µl antibiotic-free DMEM/F12 containing 10% FBS was added to each well and 213 the plate was swirled gently to mix. Assays were performed 72 h post-transfection. 214 Phagocytosis of bovine POS by ARPE-19 cells 215 POS were isolated as previously described28 from freshly obtained cow eyes (J.W. Treuth & 216 Sons, Catonsville, MD). POS pellets were stored at -80°C until use. Quantification of POS units 217 10 was performed using trypan blue and resulted in an average of 5 x 107 POS units per bovine eye. 218 The concentration of protein from purified POS was 21 pg/POS unit. Proteins in the POS 219 samples resolved by SDS-PAGE had the expected migration pattern for both reduced and non-220 reduced conditions, and the main bands stained with Coomassie Blue comigrated with 221 rhodopsin-immunoreactive proteins in western blots of POS proteins (Fig. S1). The percentage 222 of rhodopsin in the protein content of POS was estimated from the gels and revealed that 80% or 223 more of the protein content corresponded to rhodopsin. 224 Using electrospray ionization-mass spectrometry-mass spectrometry (ESI-MSMS) as previously 225 described,29 we determined the lipid composition of the POS that were fed to the ARPE-19 cells. 226 Phagocytosis assays in ARPE-19 cells were performed as follows: ARPE-19 cells (1 x 105 cells 227 per well) were attached to 24-well plates (commercial tissue culture-treated polystyrene plates, 228 TCPS,30 purchased from Corning, Corning, NY) and cultured for 3 days to form confluent and 229 polarized cell monolayers, as we reported previously.31 Ringer’s solution was prepared and 230 composed of the following: 120.6 mM NaCl, 14.3 mM NaHCO3, 4.2 mM KCl, 0.3 mM MgCl2, 231 and 1.1 mM CaCl2, with 15 mM HEPES dissolved separately and adjusted to pH 7.4 with N-232 methyl-D-glucamine. Prior to use, L-carnitine was added to the Ringer’s solution to achieve a 1 233 mM final concentration of L-carnitine. Purified POS were diluted to a concentration of 1 x 107 234 POS/ml in Ringer’s solution containing freshly prepared 5 mM glucose. A total of 500 µl of this 235 solution (medium) was added to each well and the cultures were incubated for 30 min, 60 min or 236 2.5 h, at 37°C. For pulse-chase experiments, after 2.5 h of incubation with POS (pulse), media 237 with POS were removed from the wells and replaced with DMEM/F12 containing 10% FBS and 238 continue incubation for a total of 16 h. The media were separated from the attached cells and 239 stored frozen until use, and the cells were used for preparing protein extracts and either used 240 11 immediately or stored frozen until used. For experiments using BEL (Sigma), BEL dissolved in 241 vehicle dimethyl sulfoxide (DMSO) was mixed with Ringer’s solution and the mixture added to 242 the cells and incubated for 1 h prior to starting the phagocytosis assays. The mixture was 243 removed and replaced with the POS mixture as described above containing DMSO or BEL 244 during the pulse. The assays were performed in duplicate wells per condition and each set of 245 experiments were repeated at least two times. 246 Cell viability by crystal violet staining 247 ARPE-19 cells were seeded in a 96-well plate at a density of 2 x 104 cells per well. The cells 248 were incubated at 37°C for 3 d. The medium was removed and replaced with Ringer’s solution 249 containing various concentrations of BEL and continued incubation at 37°C for 3.5 h. The 250 medium was replaced with complete medium and the cultures incubated for a total of 16 h. After 251 two washes of the cells with deionized H2O, the plate was inverted and tapped gently to remove 252 excess liquid. A total of 50 µl of a 0.1% crystal violet (Sigma) staining solution in 25% methanol 253 was added to each well and incubated at room temperature for 30 min on a bench rocker with a 254 frequency of 20 oscillations per min. The cells in the wells were briefly washed with deionized 255 H2O, and then the plates were inverted and placed on a paper towel to air dry without a lid for 10 256 min. For crystal violet extraction, 200 µl of methanol were added to each well and the plate 257 covered with a lid and incubated at room temperature for 20 min on a bench rocker set at 20 258 oscillations per min. The absorbance of the plate was measured at 570 nm. 259 Western blot 260 ARPE-19 cells plated in multiwell cell culture dishes were washed twice with ice-cold DPBS 261 (137 mM NaCl, 8 mM Na2HPO4-7H20, 1.47 mM KH2PO4, 2.6 mM KCl, 490 μM MgCl2-6H20, 262 900 μM CaCl2, pH 7.2). A total of 120 µl of cold RIPA Lysis and Extraction buffer (Thermo 263 12 Fisher Scientific) with protease inhibitors (Roche, Indianapolis, IN, added as per manufacturer’s 264 instructions) was added to each well and the plate was incubated on ice for 10 min. Cell lysates 265 were collected, sonicated for 20 s with a 50% pulse (Fischer Scientific Sonic Dismembrator 266 Model 100, Hampton, NH), and cellular debris are removed from soluble cell lysates by 267 centrifugation at 20,800 x g at 4°C for 10 min. Protein concentration in the lysates was 268 determined using the Pierce™ BCA Protein Assay Kit (Thermo Fisher Scientific) and the cell 269 lysates were stored at -20°C until use. Between 5 - 10 µg of cell lysates were used for western 270 blots. 271 Proteins were resolved by SDS-PAGE and transferred to nitrocellulose membranes for 272 immunodetection. The antibodies used are listed on Table 2. For PEDF-R immunodetection, 273 membranes were incubated in 1% BSA (Sigma) in TBS-Tb (50 mM Tris pH 7.5, 150 mM NaCl 274 containing 0.1% Tween-20 (Sigma) at room temperature for 1 h. Then they were incubated in a 275 solution of primary antibody against human PEDF-R at 1:1000 in 1% BSA/TBS-Tb at 4°C for 276 over 16 h. Membranes were washed vigorously with TBS-Tb for 30 min and incubated with anti-277 rabbit-HRP (Kindlebio, Greenwich, CT) diluted 1:1000 in 1% BSA/TBS-Tb at room temperature 278 for 30 min. The membranes were washed vigorously with TBS-Tb for 30 min and 279 immunoreactive proteins were visualized using the KwikQuant imaging system (Kindlebio). For 280 rhodopsin immunodetection, membranes were incubated in 5% dry milk (Nestle, Arlington, VA) 281 in PBS-T (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 2 mM KH2PO4, pH 7.4, 0.1% Tween 282 20) at room temperature for 1 h. Then, the membranes were incubated in a solution of primary 283 antibody against human rhodopsin (Novus, Littleton, CO) at 1:5000 in a suspension of 5% dry 284 milk in PBS-T at 4°C for over 16 h. The membranes were washed vigorously with PBS-T for 30 285 min and followed with incubation in a solution of anti-mouse-HRP (Kindlebio) 1:1000 in 5% 286 13 milk in PBS-T at room temperature for 30 min. The membranes were washed vigorously with 287 PBS-T for 30 min and immunoreactive proteins were visualized using the KwikQuant imaging 288 system. For protein loading control, the antibodies in membranes as processed described above 289 were removed using Restore™ Western Blot Stripping Buffer (Thermo Fisher Scientific), 290 sequentially followed by incubation with blocking 1% BSA in TBS-T at room temperature for 1 291 h, a solution of primary antibody against GAPDH (Genetex, cat. # GTX627408, Irvine, CA) 292 1:10,000 in 1% BSA/TBS-T at 4°C for over 16 h. After washing the membranes vigorously with 293 TBS-T at room temperature for 30 min, they were incubated in a solution of anti-mouse-HRP at 294 1:1000 in 1% BSA/TBS-T at room temperature for 30 min. After washes with TBS-T as 295 described above, the immunoreactive proteins were visualized using the KwikQuant imaging 296 system. 297 β-Hydroxybutyrate quantification assay 298 In mice, the assay was performed as described before.8 Briefly, after the removal of the cornea, 299 lens and retina, optic nerve, and extra fat and muscles, the eyecup explant from one eye was 300 placed in a well of a 96-well plate containing 170 µl Ringer’s solution and the eyecup from the 301 contralateral eye in another well with the same volume of Ringer’s solution containing 5 mM 302 glucose and purified bovine POS (200 µM phospholipid content, a kind gift from Dr. Kathleen 303 Boesze-Battaglia). The eyecup explant cultures were then incubated for 2 h at 37°C with 5% CO2 304 and, the media were collected and used immediately or stored frozen until use. In ARPE-19 cells, 305 at the endpoint of the phagocytosis assay as described above, a total of 100 µl of the culturing 306 medium was collected and used immediately or stored at 80°C until use. The levels of β-307 hydroxybutyrate (β-HB) released from the RPE cells were determined in the collected samples 308 using the enzymatic activity of β-HB dehydrogenase in a colorimetric assay from the Stanbio 309 14 Beta-hydroxybutyrate LiquiColor Test (Stanbio cat. # 2440058; Boerne, TX) with β-HB 310 standards and following manufacturer’s instructions. 311 Free fatty acids quantification assay 312 A total of 50 µl of conditioned medium from ARPE-19 cell cultures were collected and used to 313 quantify free fatty acids using the Free Fatty Acid Quantification Assay Kit (Colorimetric) 314 (Abcam cat. # ab65341; Cambridge, MA) following manufacturer’s instructions. 315 Statistical analyses 316 Data were analyzed with the two-tailed unpaired Student t test or 2-way ANOVA (analysis of 317 variance), and are shown as the mean ± standard deviation (SD). P values lower than 0.05 were 318 considered statistically significant. 319 Results 320 Generation of an RPE-specific Pnpla2-KO mouse 321 To circumvent the premature lethality of PNPLA2-KO mice,32 a mouse model with RPE-specific 322 knockout of the PNPLA2 gene was designed. For this purpose, we crossed Pnpla2f/f mice23 with 323 BEST1-cre transgenic mice24 to obtain mice with conditional Pnpla2- knockout specific to the 324 RPE, hereafter referred to as cKO (or Pnpla2f/f/cre). In the cKO mice, the promoter of the RPE-325 specific gene VMD2 (human bestrophin, here referred as BEST1) drive the expression of the cre 326 (cyclization recombinase) recombinase and restrict it to the RPE. These mice carry two floxed 327 alleles in the Pnpla2 gene and a copy of the BEST1-cre transgene (Pnpla2f/f/cre). 328 We performed PCR reactions with primers P1 and P2, upstream and downstream from the loxP 329 sites flanking exon 1, respectively (Fig. 1A), with DNA extracted from cKO eyecups and found 330 that the amplimers had the expected length of 253 bp corresponding to the recombined (cKO) 331 15 allele (Fig. 1B), thus showing that the cre-loxP recombination occurred successfully and led to 332 the deletion of the floxed region (exon 1) in the RPE of cKO mice (or Pnpla2f/f/cre). Conversely, 333 we observed two PCR bands of 1749 bp and 1866 bp for littermate Pnpla2f/+ control mice 334 carrying a WT and a floxed allele, respectively (the floxed allele contains two loxP sites) (Fig 335 1B). In lanes for the cKO (or Pnpla2f/f/cre), we also observed very low intensity bands migrating 336 at positions corresponding to 1749 bp and 1866 bp, which probably resulted from a few 337 unsuccessful recombination events. 338 Reverse transcriptase PCR (RT-PCR) revealed PNPLA2 transcript levels in the RPE that were 339 lower from cKO mice than from control (with a mean that was about 32% of the control mice) 340 (Fig. 1C). We determined the percentage of RPE cells that produced the cre protein by 341 immunofluorescence of RPE whole flatmounts. Cells were visualized by co-staining with 342 fluorescein-labelled phalloidin antibody to detect the actin cytoskeleton. We observed cre-343 immunoreactivity in the RPE flatmounts isolated from cKO mice, while no cre-labeling was 344 detected in the controls (Fig. 1D). The overall distribution was patchy and mosaic, as previously 345 described for the BEST1-cre mice.24 The percentage of cre-positive cells in ROI (regions of 346 interest) of flatmounts showed nine mice with expected percentages of cre-positive cells in RPE 347 and one with low cre-positivity (Fig. 1E). The average of the mean values of cre-positive cells 348 for each cKO mouse (mouse numbers 1, 2, 4-10) was 75% (ranging between 52%-91%), which 349 was within the expected for cre positivity in the RPE of the BEST1-cre mouse.24 Cre-positive 350 cells were not detected in RPE of control animals (Fig. 1D-E). Unfortunately, further protein 351 analysis of PEDF-R in mouse retinas was not conclusive because several commercial antibodies 352 to PEDF-R gave high background by immunofluorescence and in western blots. Nevertheless, 353 the results demonstrate the successful generation of RPE-specific PNPLA2-knock-down mice. 354 16 Lipid accumulates in the RPE of Pnpla2-cKO mice 355 We examined the ultrastructure of the RPE by TEM imaging. Accumulation of large lipid 356 droplets (LDs) was observed in cKO mice as early as 3 months of age compared to the control 357 mice cohort (Fig. 2A), and LDs were still observed in the RPE of 13-month old Pnpla2-cKO 358 compared to controls (Fig. 2B). The presence of LDs was associated with either the lack 359 (normally seen in the basal side) (Fig. S2A, S2H) or the decreased thickness of the basal 360 infoldings, and with granular cytoplasm, abnormal mitochondria (Fig. S2B), and disorganized 361 localization of organelles (mitochondria and melanosomes) (Fig. S2A). In some cells, LDs 362 crowded the cytoplasm and clustered together the mitochondria and melanosomes into the apical 363 region of the cells (Figs. S2A, S2C, S2D); however, the number and expansion of LDs within 364 the cells appeared to be random (Fig. S2E). Normal apical cytoplasmic processes were lacking; 365 and degeneration in the outer segment (OS) tips of the photoreceptors was apparent (Figs. S2A, 366 S2F). Additionally, normal phagocytosis of the OS by RPE cells was not evident, implying 367 certain degree of impairment (Figs. S2A, S2E, S2G). There were apparent unhealthy nuclei with 368 pyknotic chromatin and leakage of extranuclear DNA (enDNA), indicating the beginning of a 369 necrotic process (Fig. S2B). Some RPE cells had lighter low-density cytoplasm indicating 370 degeneration of cytoplasmic components in contrast to the denser and fuller cytoplasm in the 371 RPE of the littermate controls (Fig. S2I, S2J). Thus, these observations imply that Pnpla2 down 372 regulation caused lipid accumulation in the RPE. 373 Pnpla2 deficiency increases rhodopsin levels in the RPE of mice 374 Because the RPE does not express the rhodopsin gene, the level of rhodopsin protein in the RPE 375 cells is directly proportional to their phagocytic activity.5,33 To investigate how the knock down 376 17 of Pnpla2 affects RPE phagocytic activity in mice, we compared the rhodopsin-labeled particles 377 present in the eyecup of cKO mice and those of control mice at 2-h and 5-h post-light onset in 378 vivo. The ROIs for the mutant mice were selected from areas rich in cre-positive cells. Phalloidin 379 labeled flatmounts of control mice (n=10) showed that the RPE cells had the typical cobblestone 380 morphology, while nine out of ten cKO mice had distorted cell morphology. Rhodopsin was 381 detected in all ROIs and the labeled particles were more intense and larger in size in the majority 382 of cKO flatmounts compared to those in the control mice. Representative ROIs are shown in 383 figure 3A. The observations implied that Pnpla2 knock down in the RPE prevented rhodopsin 384 degradation in vivo. 385 Ketogenesis upon RPE phagocytosis in explants from cKO mice is impaired 386 Given that RPE phagocytosis is linked to ketogenesis,8 we also measured the levels of ketone 387 body β-HB released by RPE/choroid explants of the cKO mice ex vivo and compared them with 388 those of control littermates. The experiments were performed at 5-h (11AM) and 8-h (2 PM) 389 post-light onset, a time of day in which the amount of β-HB released due to endogenous 390 phagocytosis is not expected to vary with time. A phagocytic challenge by exposure to 391 exogenous bovine OS increased the amount of β-HB released by explants from both cKO and 392 control littermates compared to the β-HB released under basal condition (without addition of 393 exogenous OS) (Fig. 3B). The OS-mediated increase in β-HB release above basal levels of the 394 cKO RPE/choroid explants (1.8 nmols at 11 AM, 0.9 nmols at 2 PM) was lower than the one of 395 the control explants (3 nmols at 11 AM and 2.5 nmols at 2 PM) (Fig. 3C). These observations 396 reveal a deficiency in β-HB production by the RPE/choroid explants of cKO mice under 397 phagocytic challenge ex vivo. 398 18 Electroretinography of the cKO mouse 399 To examine the functionality of the retina and RPE of cKO mice, we performed ERG and DC-400 ERG. Figure 4 shows histograms that revealed no differences among the animals, implying that 401 the functionality was not affected in the RPE-Pnpla2-cKO mice. 402 Phagocytic ARPE-19 cells engulf and break down POS protein and lipid 403 The complexity of the interactions that occur in the native retina makes it difficult to evaluate the 404 subcellular and biochemical changes involved in phagocytosis of POS. Cultured RPE cells 405 provide an ideal alternative to perform these studies. Accordingly, we designed and validated an 406 assay with a human RPE cell line, ARPE-19, to which we added POS isolated from bovine 407 retinas, as described in Methods. The lipid composition of the POS fed to the ARPE-19 cells 408 included phosphatidylcholine (PC) containing very long chain polyunsaturated fatty acids (VLC-409 PUFAs) that was ~27 relative mole percent of total PC species in the POS. The other major PC 410 species include PC 32:00, PC 40:06, and PC 54:10, comprising ~38 relative mole percent of the 411 total PC phospholipids. The most abundant phosphatidylethanolamine (PE) species in the POS 412 were PE 38:06, PE 40:05, and PE 40:06 that accounts for about 74 relative mole percent of the 413 total PE phospholipids. The confluent monolayer of cells was exposed to the purified POS 414 membranes for up to 2.5 h and then the ingested POS were chased for 16h for pulse-chase 415 experiments. The fate of rhodopsin, the main protein in POS, was followed by western blotting 416 of cell lysates. Rhodopsin was detected in the cell lysates as early as 30 min and its levels 417 increased at 1 h and 2.5 h during the POS pulse, and decreased with a 16 h chase (Fig. S3A). 418 Quantification revealed that rhodopsin levels were 21% of those detected after 2.5 h of POS 419 supplementation (Fig. S3B). 420 19 Free fatty acid and β-HB levels were also determined in the culture media during the pulse. The 421 levels of free fatty acids in the medium of POS-challenged ARPE-19 cells were 7-, 5-, and 3-fold 422 higher at 30 min, 60 min and 2.5 h of incubation, respectively, relative to those in the medium of 423 cells not exposed to POS (Fig. S3C). The β-HB levels released into the medium after POS 424 addition also increased by 10-, 2.5- and 4-fold after 30 min, 60 min and 2.5 h incubations, 425 respectively, relative to those observed in the medium of cells not exposed to POS (Fig. S3D). 426 Altogether, these results show that under the specified conditions in this study, the batch of 427 ARPE-19 cells phagocytosed, i.e., engulfed and digested bovine POS protein and lipid 428 components. 429 Bromoenol lactone blocks the degradation of POS components in phagocytic 430 ARPE-19 cells 431 We investigated the role of PEDF-R PLA2 activity in RPE phagocytosis. As we have previously 432 described, a calcium-independent phospholipase A2 inhibitor, bromoenol lactone (BEL), inhibits 433 PEDF-R PLA2 enzymatic activity.15 First, we determined the concentrations of BEL that would 434 maintain viability of ARPE-19 cells. Figure 5A shows the concentration response curve of BEL 435 on ARPE-19 cell viability. The BEL concentration range tested was between 3.125 and 200 μM 436 and the Hill plot estimated an IC50 (concentration that would lower cell viability by 50%) of 437 30.3 μM BEL. Therefore, to determine the effects of BEL on the ARPE-19 phagocytic activity, 438 cultured cells were preincubated with the inhibitor at concentrations below the IC50 for cell 439 viability prior to pulse-chase assays designed as described above. Pretreatment with DMSO 440 alone without BEL was assayed as a control. Interestingly, the inhibitor at 10 μM and 25 μM 441 blocked more than 90% of the degradation of rhodopsin during POS chase for 16 h in ARPE-19 442 cells (Figs. 5B-5C). Similar blocking effects of BEL (25 µM) were observed with time up to 24 443 20 h during the chase (Figs. 5D-5E). The inhibitor did not appear to affect rhodopsin ingestion. The 444 rhodopsin levels in pulse-chase assays with cells pretreated with DMSO alone were like those 445 without pretreatment (compare Figs. 5B and S3A). The cells observed under the microscope 446 after the chase point and prior to the preparation of cell lysates had similar morphology and 447 density among cultures with and without POS, and cultures before and after pulse. Moreover, 448 BEL blocked 40% of the β-HB releasing activity of ARPE-19 cells, whereas DMSO alone did 449 not affect the activity (Fig. 5F). These observations demonstrate that while binding and 450 engulfment were not affected by BEL under the conditions tested, phospholipase A2 activity was 451 required for rhodopsin degradation and β-HB release by ARPE-19 cells during phagocytosis. 452 PNPLA2 down regulation in ARPE-19 cells impairs POS degradation 453 We also silenced PNPLA2 expression in ARPE-19 cells to investigate the possible requirement 454 of PEDF-R for phagocytosis. First, we tested the silencing efficiency of six different siRNAs 455 designed to target PNPLA2, along with a Scrambled siRNA sequence (Scr) as negative control 456 (see sequences in Table 3). The siRNA-mediated knockdown of PNPLA2 resulted in significant 457 decreases in the levels of PNPLA2 transcripts (siRNA A, C, D and E, Figs. 6A and S5) with a 458 concomitant decline in PEDF-R protein levels (siRNA C, D and E, Fig. 6D) in ARPE-19 cell 459 extracts. The siRNAs with the highest efficiency of silencing PNPLA2 mRNA (namely C, D, and 460 E) were individually used for subsequent experiments, and denoted as siPNPLA2 (Fig. 6A). A 461 time course of siPNPLA2 transfection revealed that the gene was silenced as early as 24 h and 462 throughout 72 h post-transfection and parallel to pulse-chase (98.5 h, Figs. 6B, S5). There was 463 no significant difference between mock transfected cells and cells transfected with Scr (Fig. 6C). 464 Examining the cell morphology under the microscope, we did not notice differences between the 465 scrambled and siPNPLA2-transfected cells. Western blots showed that protein levels of PEDF-R 466 21 in ARPE-19 membrane extracts declined 72 h post- transfection (Fig. 6D). Thus, subsequent 467 experiments with cells in which PNPLA2 was silenced were performed 72 h after transfection. 468 Second, we tested the effects of PNPLA2 silencing on ARPE-19 cell phagocytosis. Here we 469 monitored the outcome of rhodopsin in pulse-chase experiments. Interestingly, while PNPLA2 470 knock down did not affect ingestion, the siPNPLA2-transfected cells failed to degrade the 471 ingested POS rhodopsin (88% and 24% remaining at 16 h and at 24h, respectively), while Scr-472 transfected cells were more efficient in degrading them (21% and 12% remaining at 16 and 24 h 473 respectively) (Figs. 7A-7B). 474 Third, we also determined the levels of secreted free fatty acids and β-HB production in PNPLA2 475 silenced cells at 0.5 h, 1 h, and 2.5 h following POS addition. Free fatty acid levels in the culture 476 medium were lower in siPNPLA2-transfected cells than in cells transfected with Scr at 30 min 477 post-addition of POS, and no difference was observed between siPNPLA2 and Scr at 1 h and 2.5 478 h post-addition (Fig. 7C). Secreted β-HB levels in the culture medium were lower in siPNPLA2 479 cells than in Scr-transfected cells at all time points (Fig. 7D). To determine the effect of PNPLA2 480 knockdown on lipid and fatty acid levels in the ARPE-19 cells fed POS membranes, we used 481 electron spray ionization-mass spectrometry (ESI/MS/MS) and gas chromatography-flame ion 482 detection to identify and quantify total lipids and fatty acid composition of the ARPE-19 cells at 483 2.5 and 16 h post POS feeding. Our results did not show any significant differences in the 484 intracellular lipid and fatty acid levels in the siPNPLA2 knockdown in Scr and WT control cells 485 at both 2.5 and 16 h after POS addition (data not shown). Taken together, these results 486 demonstrate that digestion of POS protein and lipid components was impaired in PNPLA2 487 silenced ARPE-19 cells undergoing phagocytosis. 488 22 Discussion 489 Here, we report that PEDF-R is required for efficient degradation of POS by RPE cells after 490 engulfment during phagocytosis. This conclusion is supported by the observed decrease in 491 rhodopsin degradation, in fatty acid release and in β-HB production upon POS challenge when 492 the PNPLA2 gene is downregulated or the PEDF-R lipase is inhibited. These observations occur 493 in RPE cells in vivo, ex vivo and in vitro. The findings imply that RPE phagocytosis depends on 494 PEDF-R for the release of fatty acids from POS phospholipids to facilitate POS protein 495 hydrolysis, thus identifying a novel contribution of this enzyme in POS degradation and, in turn, 496 in the regulation of photoreceptor cell renewal. 497 This is the first time that the PNPLA2 gene has been studied in the context of RPE phagocytosis 498 of POS. Previously, we investigated its gene product, termed PEDF-R, as a phospholipase-linked 499 cell membrane receptor for pigment epithelium-derived factor (PEDF), a retinoprotective factor 500 encoded by the SERPINF1 gene and produced by RPE cells.15,17,34,35 Like RPE cells, non-501 inflammatory macrophages are phagocytic cells, but unlike RPE cells, they are found in all 502 tissues, where they engulf and digest cellular debris, foreign substances, bacteria, other microbes, 503 etc.36,37 The Kratky laboratory reported data on the effects of PNPLA2 silencing in efferocytosis 504 obtained using PNPLA2-deficient mice (termed atgl-/- mouse), and demonstrated that their 505 macrophages have lower triglyceride hydrolase activity, higher triglyceride content, lipid droplet 506 accumulation, and impaired phagocytosis of bacterial and yeast particles,21 and that in these 507 cells, intracellular lipid accumulation triggers apoptotic responses and mitochondrial 508 dysfunction.38 We have shown that PNPLA2 gene knockdown causes RPE cells to be more 509 responsive to oxidative stress-induced death.39 PNPLA2 gene silencing, PEDF-R peptides 510 blocking ligand binding, and enzyme inhibitors abolish the activation of mitochondrial survival 511 23 pathways by PEDF in photoreceptors and other retinal cells.17,34,40 Consistently, overexpression 512 of the PNPLA2 gene or exogenous additions of a PEDF-R peptide decreases both the death of 513 RPE cells undergoing oxidative stress and the accumulation of biologically detrimental 514 leukotriene LTB4 levels.31 The fact that PEDF is a ligand that enhances PEDF-R enzymatic 515 activity, suggests that exposure of RPE to this factor is likely to enhance phagocytosis. These 516 implications are unknown and need further study. Exogenous additions of recombinant PEDF 517 protein to ARPE-19 cells undergoing phagocytosis did not provide evidence for such 518 enhancement (JB personal observations). This suggests that heterologous SERPINF1 519 overexpression in cells and/or an animal model of inducible knock-in of Serpinf1 may be useful 520 to focus on the role of PEDF/PEDF-R in RPE phagocytosis unbiased by the endogenous 521 presence of PEDF. 522 To investigate the consequences of PNPLA2 silencing in POS phagocytosis, we generated a 523 mouse model with a targeted deletion of Pnpla2 in RPE cells in combination with the BEST-cre 524 system for its exclusive conditional silencing in RPE cells (cKO mouse). These mice are viable 525 with no apparent changes in other organs and in weight compared with control littermates and 526 wild type mice. The cKO mice live to an advanced age, in contrast to the constitutively silenced 527 PNPLA2-KO mice in which the lack of the gene causes premature lethality (12-16 weeks) due to 528 heart failure associated with massive accumulation of lipids in cardiomyocytes.32 The RPE cells 529 of the cKO mouse have large lipid droplets at early and late age (Figs. 2A, S2) consistent with a 530 buildup of substrates for the lipase activities of the missing enzyme. In cKO mice, lipid 531 accumulation associates with lack of or the decreased thickness of the basal infoldings, granular 532 cytoplasm, abnormal mitochondria and disorganized localization of organelles (mitochondria and 533 melanosomes) in some RPE cells (Fig. S2). Taken together, the TEM observations in 534 24 combination with the greater rhodopsin accumulation and decline in β-HB release in cKO mice 535 support that PEDF-R is required for lipid metabolism and phagocytosis in the RPE. However, 536 interestingly, the observed features do not seem to affect photoreceptor functionality (Fig. S3) 537 and appear to be inconsequential to age-related retinopathies in the Pnpla2-cKO mouse. This 538 unanticipated observation suggests that the remaining RPE cells expressing Pnpla2 gene 539 probably complement activities of those lacking the gene, thereby lessening photoreceptor 540 degeneration and dysfunction in the cKO mouse. We note that the cKO mouse has a mosaic 541 expression pattern with non-cre-expressing RPE cells, as shown before for the BEST1-cre 542 transgenic line.24 At the same time, the ERG measurements performed correspond to global 543 responses of the photoreceptors and RPE cells, thereby missing individual cell evaluation. The 544 lack of photoreceptor dysfunction with RPE lipid accumulation due to PNPLA2 down regulation 545 also suggests that during development a compensatory mechanism independent of 546 Pnpla2/PEDF-R is likely to be activated, thereby minimizing retinal degeneration in the cKO 547 mouse. Further study will be required to understand the implications of these unexpected 548 findings. Animal models of constitutive heterozygous knockout or inducible knockdown of 549 PNPLA2 may be instrumental to address the role of PNPLA2/PEDF-R in mature photoreceptors 550 unbiased by compensatory mechanisms due to low silencing efficiency or during development. 551 Results obtained from experiments using RPE cell cultures further establish that PEDF-R 552 deficiency affects phagocytosis. It is worth mentioning that the data obtained under our 553 experimental conditions were essentially identical to those typically obtained in assays 554 performed with cells attached to porous permeable membranes, and this provides an additional 555 advantage to the field by requiring shorter time to complete (see Fig. S4). On one hand, the 556 decrease in the levels of β-HB and in the release of fatty acids (the breakdown products of 557 25 phospholipids and triglycerides) upon POS ingestion by cells pretreated with BEL as well 558 as transfected with siPNPLA2 relative to the control cells indicates that PNPLA2 participates in 559 RPE lipid metabolism. On the other hand, the fact that PEDF-R inhibition and PNPLA2 down 560 regulation impair rhodopsin break down from ingested POS in RPE cells implies a likely 561 dependence of PEDF-R-mediated phospholipid hydrolysis for POS protein proteolysis. In this 562 regard, we envision that proteins in POS are mainly resistant to proteolytic hydrolysis, because 563 the surrounded phospholipids block their access to proteases for cleavage. Phospholipase A2 564 activity would hydrolyze these phospholipids to likely liberating the proteins from the 565 phospholipid membranes and become available to proteases, such as cathepsin D, an aspartic 566 protease responsible for 80% of rhodopsin degradation.41 It is important to note that the findings 567 cannot discern whether PEDF-R is directly associated to the molecular pathway of rhodopsin 568 degradation, or indirectly involved in downregulating cathepsin D or other proteases. It is also 569 possible that PNPLA2 deficiency results in the alteration of critical genes regulating the 570 phagocytosis pathway, such as LC3 and genes of the mTOR pathway. Animal models deficient 571 in such genes display retinal phenotypes such as impaired phagocytosis and lipid accumulation, 572 similar to those observed in PEDF-R deficient cells.42–44 These implications need further 573 exploration. 574 Given that BEL is an irreversible inhibitor of iPLA2 it has been used to discern the involvement 575 of iPLA2 in biological processes. Previously, we demonstrated that BEL at 1 to 25 µM blocks 20 576 – 40% of the PLA activity of human recombinant PEDF-R.15 Jenkins et al showed that 2 µM 577 BEL inhibits >90% of the triolein lipase activity of human recombinant PEDF-R (termed by this 578 group as iPLA2ζ).18 In cell-based assays, Wagner et al showed BEL at 20 µM inhibits 40% of 579 this enzyme’s triglyceride lipase activity in hepatic cells.45 In the present study, to minimize 580 26 cytotoxicity and ensure inhibition of the iPLA2 activity of PEDF-R in ARPE-19 cells, we 581 selected 10 µM and 25 µM BEL concentrations that are below the IC50 determined for ARPE-582 19 cell viability (30.2 µM BEL; Fig. 5A). We note that these BEL concentrations are within the 583 range used in an earlier study on ARPE-19 cell phagocytosis.22 We compared our results to those 584 by Kolko et al 22 regarding BEL effects on phagocytosis of ARPE-19 cells. Using Alexa-red 585 labeled-POS, they reported the percent of phagocytosis inhibition caused by 5 – 20 µM BEL as 586 24% in ARPE-19 cells. However, the authors did not specify the time of incubation for this 587 experiment and, based on the other experiments in the report, the time period may have lasted at 588 least 12 h of pulse, implying inhibition of ingestion of POS, and lacking description of the effects 589 of BEL on POS degradation. With unmodified POS in pulse-chase assays, our findings show a 590 percent of inhibition after chase of >90% for 10 µM and 25 µM BEL, indicating more effective 591 inhibition of POS digestion. The effect of BEL on POS ingestion under 2.5 h was insignificant 592 and over 2.5 h remains unknown (pulse). In addition, we show that pretreatment with BEL 593 results in a decrease in the release of β-HB, which is produced from the oxidation of fatty acids 594 liberated from POS. Thus, our assay provides new information -e.g., pulse-chase, use of 595 unmodified POS, β-HB release- to those reported by Kolko et al. It is concluded that BEL can 596 impair phagocytic processes in ARPE-19 cells. While BEL is recognized as a potent inhibitor of 597 iPLA2, it can also inhibit non-PLA2 enzymes, such as magnesium-dependent phosphatidate 598 phosphohydrolase and chymotrypsin.46,47 Consequently , a complementary genetic approach 599 targeting PEDF-R is deemed reasonable and appropriate to investigate its role in RPE 600 phagocytosis. The complex and highly regulated phagocytic function of the RPE also serves to 601 protect the retina against lipotoxicity. By engulfing lipid-rich POS and using ingested fatty acids 602 for energy, the RPE prevents the accumulation of lipids in the retina, particularly phospholipids, 603 27 which could trigger cytotoxicity when peroxidized.48,49 In this regard, the lack of observed 604 differences in intracellular phospholipid and fatty acids between PEDF-R-deficient RPE and 605 control cells lead us to speculate that in ARPE-19 cells exposed to POS the undigested lipids 606 remain within the cells and contribute to the total lipid and fatty acid pool, some of which may 607 be converted to other lipid byproducts to protect against lipotoxicity. Also, the duration of the in 608 vitro chase is shorter than what pertains in vivo, where undigested POS accumulate and overtime 609 coalesce to form the large lipid droplets observed in the RPE in vivo. Thus, future experiments 610 aimed at detailed time-dependent characterization of specific lipid species and free fatty acid 611 levels in the RPE in vivo, and in media and cells in vitro will allow us to have a better 612 understanding of classes of lipids and fatty acids that contribute to the lipid droplet accumulation 613 in the RPE in vivo due to PNPLA2 deletion. Nonetheless, a role of PEDF-R in POS degradation 614 agrees with the previously reported involvement of a phospholipase A2 activity in the RPE 615 phagocytosis of POS22, and with the role of providing protection of photoreceptors against 616 lipotoxicity. 617 In conclusion, this is the first study to identify a role for PEDF-R in RPE phagocytosis. The 618 findings imply that efficient RPE phagocytosis of POS requires PEDF-R, thus highlighting a 619 novel contribution of this protein in POS degradation and its consequences in the regulation of 620 photoreceptor cell renewal. 621 Acknowledgements 622 This work was supported by the Intramural Research Program of the National Eye Institute, NIH 623 (Project #EY000306) to SPB and by NIH/NEI R01 EY030513 to MPA. We thank the NEI 624 animal house, Histopathology, Visual Function, Genetic Engineering and Biological Imaging 625 28 Core facilities for technical support. We thank Dr. Hei Sook Sul, University of California, 626 Berkeley, for kindly providing sequences for primers of Pnpla2 and the Desnutrin flox mouse; 627 Dr. Joshua Dunaief, University of Pennsylvania for kindly providing the transgenic Tg(BEST1-628 cre)Jdun mouse model; Dr. Kathleen Boesze-Battaglia’s laboratory for kindly providing POS; 629 Drs. Eugenia Poliakov and Sheetal Uppal for help in isolating POS; Dr. Kiyoharu J Miyagishima 630 for performing the dcERG experiments; Dr. Preeti Subramanian for technical assistance with cell 631 culture and microscopy; and Dr. Ivan Rebustini for proofreading the manuscript and providing 632 feedback and reagents for RT-PCR. 633 29 Table 1. Primers used for qRT-PCR 634 Gene (Human) Forward Primer Reverse Primer PNPLA2 5’-AGCTCATCCAGGCCAATGTCT-3’ 5’-TGTCTGAAATGCCACCATCCA-3’ 18S 5’-GGTTGATCCTGCCAGTAG-3’ 5’-GCGACCAAAGGAACCATAAC-3’ P1 and P2 5’-GCTTCAAACAGCTTCCTCATG-3’ 5’-GGACTTTCGGTCATAGTTCCG-3’ 635 30 Table 2. Antibodies used in the study 636 Antibody Type & host Application Dilution Company Catalog number GAPDH Monoclonal mouse WB 1:10,000 GeneTex GTX627408 PEDF-R Polyclonal rabbit WB IF 1:1000 1:250 Protein tech 55190-1-AP Rhodopsin (A531) Monoclonal mouse WB IF 1:5000 1:800 Novus Biologicals NBP2-25159 Rhodopsin (B630) Monoclonal mouse IF 1:1000 Novus Biologicals NBP2-25160 cre Recombinase Monoclonal rabbit IF 1:800 Cell Signaling Technology 15036 Alexa Fluor 488 Goat anti-Mouse IgG (H+L) IF 1:500 ThermoFisher Scientific A-11001 Alexa Fluor 555 Goat anti-Rabbit IgG (H+L) IF 1:500 ThermoFisher Scientific A-21428 Alexa Fluor 647 - phalloidin IF 1:100 Cell Signaling Technology 8940 637 31 Table 3. siRNA duplex sequences 638 siRNA Duplex Identifier Duplex sequences SR311349A A rCrGrCrCrArArArGrCrArCrArUrGrUrArArUrArArArUrGCT SR311349B B rGrGrCrArCrArUrArUrArGrArArCrGrUrArCrUrGrCrArUrUCC SR311349C C rGrCrCrUrGrArGrArCrGrCrCrUrCrCrArUrUrArCrCrArCTG SR324651A D rCrCrArArGrUrUrCrArUrUrGrArGrGrUrArUrCrUrArArAGA SR324651B E rCrUrGrCrCrArCrUrCrUrArUrGrArGrCrUrUrArArGrArACA SR324651C F rCrUrUrGrGrUrArArArUrArArArArArCrGrArArArArUrGTT 639 32 References 640 1. Goldman AI, Teirstein PS, O’Brien PJ. The role of ambient lighting in circadian disc 641 shedding in the rod outer segment of the rat retina. Investigative Ophthalmology & Visual 642 Science. 1980;19(11):1257-1267. 643 2. LaVail MM. Circadian nature of rod outer segment disc shedding in the rat. Investigative 644 Ophthalmology & Visual Science. 1980;19(4):407-411. 645 3. Strauss O. The Retinal Pigment Epithelium. Physio Rev. 2005;85(3):845-881. 646 4. Kevany BM, Palczewski K. Phagocytosis of Retinal Rod and Cone Photoreceptors. 647 Physiology. 2010;25(1):8-15. doi:10.1152/physiol.00038.2009 648 5. Mazzoni F, Safa H, Finnemann SC. Understanding photoreceptor outer segment 649 phagocytosis: use and utility of RPE cells in culture. Exp Eye Res. 2014;126:51-60. 650 doi:10.1016/j.exer.2014.01.010 651 6. Fliesler AJ, Anderson RE. Chemistry and metabolism of lipids in the vertebrate retina. 652 Progress in Lipid Research. 1983;22(2):79-131. doi:10.1016/0163-7827(83)90004-8 653 7. Chen H, Anderson RE. Differential incorporation of docosahexaenoic and arachidonic acids 654 in frog retinal pigment epithelium. Journal of Lipid Research. 1993;34(11):1943-1955. 655 8. Reyes-Reveles J, Dhingra A, Alexander D, Bragin A, Philp NJ, Boesze-Battaglia K. 656 Phagocytosis-dependent ketogenesis in retinal pigment epithelium. J Biol Chem. 657 2017;292(19):8038-8047. doi:10.1074/jbc.M116.770784 658 9. SanGiovanni JP, Chew EY. The role of omega-3 long-chain polyunsaturated fatty acids in 659 health and disease of the retina. Progress in Retinal and Eye Research. 2005;24(1):87-138. 660 doi:10.1016/j.preteyeres.2004.06.002 661 10. Obin MS, Jahngen-Hodge J, Nowell T, Taylor A. Ubiquitinylation and Ubiquitin-dependent 662 Proteolysis in Vertebrate Photoreceptors (Rod Outer Segments): EVIDENCE FOR 663 UBIQUITINYLATION OF Gt AND RHODOPSIN. Journal of Biological Chemistry. 664 1996;271(24):14473-14484. doi:10.1074/jbc.271.24.14473 665 11. Palczewski K. G protein-coupled receptor rhodopsin. Annu Rev Biochem. 2006;75:743-767. 666 doi:10.1146/annurev.biochem.75.103004.142743 667 12. Strauss O, Stumpff F, Mergler S, Wienrich M, Wiederholt M. The Royal College of 668 Surgeons Rat: An Animal Model for Inherited Retinal Degeneration with a Still Unknown 669 Genetic Defect. Cells Tissues Organs. 1998;162(2-3):101-111. doi:10.1159/000046474 670 13. D’Cruz PM, Yasumura D, Weir J, et al. Mutation of the receptor tyrosine kinase gene Mertk 671 in the retinal dystrophic RCS rat. Human Molecular Genetics. 2000;9(4):645-651. 672 doi:10.1093/hmg/9.4.645 673 33 14. Inana G, Murat C, An W, Yao X, Harris IR, Cao J. RPE phagocytic function declines in age-674 related macular degeneration and is rescued by human umbilical tissue derived cells. J 675 Transl Med. 2018;16(1):63-63. doi:10.1186/s12967-018-1434-6 676 15. Notari L, Baladron V, Aroca-Aguilar JD, et al. Identification of a Lipase-linked Cell 677 Membrane Receptor for Pigment Epithelium-derived Factor. Journal of Biological 678 Chemistry. 2006;281(49):38022-38037. doi:10.1074/jbc.M600353200 679 16. Pham TL, He J, Kakazu AH, Jun B, Bazan NG, Bazan HEP. Defining a mechanistic link 680 between pigment epithelium–derived factor, docosahexaenoic acid, and corneal nerve 681 regeneration. Journal of Biological Chemistry. 2017;292(45):18486-18499. 682 doi:10.1074/jbc.M117.801472 683 17. Subramanian P, Locatelli-Hoops S, Kenealey J, DesJardin J, Notari L, Becerra SP. Pigment 684 epithelium-derived factor (PEDF) prevents retinal cell death via PEDF Receptor (PEDF-R): 685 identification of a functional ligand binding site. J Biol Chem. 2013;288(33):23928-23942. 686 doi:10.1074/jbc.M113.487884 687 18. Jenkins CM, Mancuso DJ, Yan W, Sims HF, Gibson B, Gross RW. Identification, Cloning, 688 Expression, and Purification of Three Novel Human Calcium-independent Phospholipase A2 689 Family Members Possessing Triacylglycerol Lipase and Acylglycerol Transacylase 690 Activities. Journal of Biological Chemistry. 2004;279(47):48968-48975. 691 doi:10.1074/jbc.M407841200 692 19. Villena JA, Roy S, Sarkadi-Nagy E, Kim K-H, Sul HS. Desnutrin, an Adipocyte Gene 693 Encoding a Novel Patatin Domain-containing Protein, Is Induced by Fasting and 694 Glucocorticoids: ECTOPIC EXPRESSION OF DESNUTRIN INCREASES 695 TRIGLYCERIDE HYDROLYSIS. Journal of Biological Chemistry. 2004;279(45):47066-696 47075. doi:10.1074/jbc.M403855200 697 20. Zimmermann R, Strauss JG, Haemmerle G, et al. Fat Mobilization in Adipose Tissue Is 698 Promoted by Adipose Triglyceride Lipase. Science. 2004;306(5700):1383. 699 doi:10.1126/science.1100747 700 21. Chandak PG, Radovic B, Aflaki E, et al. Efficient phagocytosis requires triacylglycerol 701 hydrolysis by adipose triglyceride lipase. J Biol Chem. 2010;285(26):20192-20201. 702 doi:10.1074/jbc.M110.107854 703 22. Kolko M, Wang J, Zhan C, et al. Identification of Intracellular Phospholipases A2 in the 704 Human Eye: Involvement in Phagocytosis of Photoreceptor Outer Segments. Investigative 705 Ophthalmology & Visual Science. 2007;48(3):1401-1409. doi:10.1167/iovs.06-0865 706 23. Ahmadian M, Abbott MJ, Tang T, et al. Desnutrin/ATGL is regulated by AMPK and is 707 required for a brown adipose phenotype. Cell Metab. 2011;13(6):739-748. 708 doi:10.1016/j.cmet.2011.05.002 709 34 24. Iacovelli J, Zhao C, Wolkow N, et al. Generation of Cre transgenic mice with postnatal RPE-710 specific ocular expression. Invest Ophthalmol Vis Sci. 2011;52(3):1378-1383. 711 doi:10.1167/iovs.10-6347 712 25. Müllenbach R, Lagoda P, Welter C. An efficient salt-chloroform extraction of DNA from 713 blood and tissues. Trends in genetics : TIG. 1989;5(12):391. 714 26. Xin-Zhao Wang C, Zhang K, Aredo B, Lu H, Ufret-Vincenty RL. Novel method for the 715 rapid isolation of RPE cells specifically for RNA extraction and analysis. Exp Eye Res. 716 2012;102:1-9. doi:10.1016/j.exer.2012.06.003 717 27. Livak KJ, Schmittgen TD. Analysis of Relative Gene Expression Data Using Real-Time 718 Quantitative PCR and the 2−ΔΔCT Method. Methods. 2001;25(4):402-408. 719 doi:10.1006/meth.2001.1262 720 28. Schertler GFX, Hargrave PA. [7] Preparation and analysis of two-dimensional crystals of 721 rhodopsin. In: Methods in Enzymology. Vol 315. Academic Press; 2000:91-107. 722 doi:10.1016/S0076-6879(00)15837-9 723 29. Agbaga M-P, Stiles MA, Brush RS, et al. The Elovl4 Spinocerebellar Ataxia-34 Mutation 724 736T>G (p.W246G) Impairs Retinal Function in the Absence of Photoreceptor 725 Degeneration. Molecular Neurobiology. Published online August 11, 2020. 726 doi:10.1007/s12035-020-02052-8 727 30. Lerman MJ, Lembong J, Muramoto S, Gillen G, Fisher JP. The Evolution of Polystyrene as a 728 Cell Culture Material. Tissue Engineering Part B: Reviews. 2018;24(5):359-372. 729 doi:10.1089/ten.teb.2018.0056 730 31. Subramanian P, Mendez EF, Becerra SP. A Novel Inhibitor of 5-Lipoxygenase (5-LOX) 731 Prevents Oxidative Stress-Induced Cell Death of Retinal Pigment Epithelium (RPE) Cells. 732 Invest Ophthalmol Vis Sci. 2016;57(11):4581-4588. doi:10.1167/iovs.15-19039 733 32. Haemmerle G, Lass A, Zimmermann R, et al. Defective Lipolysis and Altered Energy 734 Metabolism in Mice Lacking Adipose Triglyceride Lipase. Science. 2006;312(5774):734. 735 doi:10.1126/science.1123965 736 33. LaVail MM. Rod outer segment disc shedding in relation to cyclic lighting. Experimental 737 Eye Research. 1976;23(2, Part 2):277-280. doi:10.1016/0014-4835(76)90209-8 738 34. Comitato A, Subramanian P, Turchiano G, Montanari M, Becerra SP, Marigo V. Pigment 739 epithelium-derived factor hinders photoreceptor cell death by reducing intracellular calcium 740 in the degenerating retina. Cell Death Dis. 2018;9(5):560-560. doi:10.1038/s41419-018-741 0613-y 742 35. Hernández-Pinto A, Polato F, Subramanian P, et al. PEDF peptides promote photoreceptor 743 survival in rd10 retina models. Experimental Eye Research. 2019;184:24-29. 744 doi:10.1016/j.exer.2019.04.008 745 35 36. Mayerson PL, Hall MO. Rat retinal pigment epithelial cells show specificity of phagocytosis 746 in vitro. J Cell Biol. 1986;103(1):299-308. doi:10.1083/jcb.103.1.299 747 37. Finnemann SC, Rodriguez-Boulan E. Macrophage and retinal pigment epithelium 748 phagocytosis: apoptotic cells and photoreceptors compete for alphavbeta3 and alphavbeta5 749 integrins, and protein kinase C regulates alphavbeta5 binding and cytoskeletal linkage. J Exp 750 Med. 1999;190(6):861-874. doi:10.1084/jem.190.6.861 751 38. Aflaki E, Radovic B, Chandak PG, et al. Triacylglycerol accumulation activates the 752 mitochondrial apoptosis pathway in macrophages. J Biol Chem. 2011;286(9):7418-7428. 753 doi:10.1074/jbc.M110.175703 754 39. Subramanian P, Becerra SP. Role of the PNPLA2 Gene in the Regulation of Oxidative Stress 755 Damage of RPE. In: Bowes Rickman C, Grimm C, Anderson RE, Ash JD, LaVail MM, 756 Hollyfield JG, eds. Retinal Degenerative Diseases. Springer International Publishing; 757 2019:377-382. 758 40. Kenealey J, Subramanian P, Comitato A, et al. Small Retinoprotective Peptides Reveal a 759 Receptor-binding Region on Pigment Epithelium-derived Factor. J Biol Chem. 760 2015;290(42):25241-25253. doi:10.1074/jbc.M115.645846 761 41. Rakoczy PE, Baines M, Kennedy CJ, Constable IJ. Correlation Between Autofluorescent 762 Debris Accumulation and the Presence of Partially Processed Forms of Cathepsin D in 763 Cultured Retinal Pigment Epithelial Cells Challenged with Rod Outer Segments. 764 Experimental Eye Research. 1996;63(2):159-167. doi:10.1006/exer.1996.0104 765 42. Dhingra A, Bell BA, Peachey NS, et al. Microtubule-Associated Protein 1 Light Chain 3B, 766 (LC3B) Is Necessary to Maintain Lipid-Mediated Homeostasis in the Retinal Pigment 767 Epithelium. Front Cell Neurosci. 2018;12:351-351. doi:10.3389/fncel.2018.00351 768 43. Cheng S-Y, Cipi J, Ma S, et al. Altered photoreceptor metabolism in mouse causes late stage 769 age-related macular degeneration-like pathologies. Proc Natl Acad Sci U S A. 770 2020;117(23):13094-13104. doi:10.1073/pnas.2000339117 771 44. Go Y-M, Zhang J, Fernandes J, et al. MTOR-initiated metabolic switch and degeneration in 772 the retinal pigment epithelium. The FASEB Journal. 2020;34(9):12502-12520. 773 doi:10.1096/fj.202000612R 774 45. Wagner C, Hois V, Pajed L, et al. Lysosomal acid lipase is the major acid retinyl ester 775 hydrolase in cultured human hepatic stellate cells but not essential for retinyl ester 776 degradation. Biochim Biophys Acta Mol Cell Biol Lipids. 2020;1865(8):158730-158730. 777 doi:10.1016/j.bbalip.2020.158730 778 46. Balsinde J, Dennis EA. Bromoenol Lactone Inhibits Magnesium-dependent Phosphatidate 779 Phosphohydrolase and Blocks Triacylglycerol Biosynthesis in Mouse P388D1 Macrophages. 780 Journal of Biological Chemistry. 1996;271(50):31937-31941. doi:10.1074/jbc.271.50.31937 781 36 47. Jenkins CM, Han X, Mancuso DJ, Gross RW. Identification of Calcium-independent 782 Phospholipase A2 (iPLA2) β, and Not iPLA2γ, as the Mediator of Arginine Vasopressin-783 induced Arachidonic Acid Release in A-10 Smooth Muscle Cells: ENANTIOSELECTIVE 784 MECHANISM-BASED DISCRIMINATION OF MAMMALIAN iPLA2s. Journal of 785 Biological Chemistry. 2002;277(36):32807-32814. doi:10.1074/jbc.M202568200 786 48. Ueta T, Inoue T, Furukawa T, et al. Glutathione peroxidase 4 is required for maturation of 787 photoreceptor cells. J Biol Chem. 2012;287(10):7675-7682. doi:10.1074/jbc.M111.335174 788 49. Imai H, Matsuoka M, Kumagai T, Sakamoto T, Koumura T. Lipid Peroxidation-Dependent 789 Cell Death Regulated by GPx4 and Ferroptosis. In: Nagata S, Nakano H, eds. Apoptotic and 790 Non-Apoptotic Cell Death. Springer International Publishing; 2017:143-170. 791 doi:10.1007/82_2016_508 792 793 37 Figure legends 794 Figure 1. 795 Generation of RPE-specific PNPLA2-cKO mice. (A) Scheme of Pnpla2 floxed and cre-796 mediated recombined allele. The loxP sites flank Exon 1. P1 and P2 are the primers homologous 797 to sequences outside the floxed (flanked by the loxP sites) region used to detect cre-mediated 798 recombination (generating recombined alleles) on genomic DNA. The sizes of the amplicons 799 obtained by PCR using P1 and P2 are indicated. (B) Gel electrophoresis of PCR reaction 800 products obtained using primers P1 and P2 and genomic DNA isolated from mouse eyecups 801 from either cKO or control (Ctr) mice (Pnpla2f/+); lane 1 (MW) corresponds to molecular weight 802 markers (GeneRuler DNA Ladder Mix). One eyecup per lane from a 4-month old mouse, n=2 803 cKO, n=2 Ctr. (C) Pnpla2 expression (vs. HPRT) in RPE from month-old cKO (Pnpla2f/f/cre) 804 relative to control littermates (Pnpla2f/f). Each data point corresponds to the average of six PCR 805 reactions per eyecup, six eyes from three cKO mice and six eyes from three control mice at 5 – 7 806 months old. (D) cre (red) and phalloidin (yellow) labeling of RPE/choroid flatmounts from 807 control (Pnpla2f/f) (left) and littermate cKO (Pnpla2f/f/cre) (right). The scale corresponds to 20 808 µm. (n=2 images from individual mouse eyecup at 11-14 months old). (E) Plot of percentage of 809 cre-positive RPE cells in cKO animals (Pnpla2f/f/cre, n=10, age was 10.5-18.5 months old) as 810 indicated in x-axis. Each data point corresponds to percentage of cre-positive RPE cells from an 811 ROI, each bar corresponds to a flatmount of an individual cKO mouse, and the bar for control 812 (Pnpla2f/f) has data from 10 mice. 813 Figure 2. 814 Lipid accumulation in the RPE of Pnpla2-cKO mice. Electron microscopy micrographs 815 showing the RPE structure of 3- (A) and 13 (B) month-old cKO mice and control animals. LD: 816 lipid droplets; BI: basal infoldings. Scale bar corresponds to 2 µm. The representative images 817 were selected among examinations of micrographs from 8 eyes of cKO (PNPLA2f/f cre+) mice, 818 from 7 eyes of control (PNPLA2f/f) mice at 1.75 - 3.75-month-old; and from 3 eyes of cKO mice 819 and 3 eyes of control mice at 12.5 - 13-month-old. 820 Figure 3. 821 Phagocytosis and β-hydroxybutyrate production in the RPE of Pnpla2-cKO mice. (A) 822 Representative ROI of the eyecup from one control and one cKO animal isolated at 2 h (8 AM) 823 and 5 h (11 AM) post light onset (6 AM) after immunolabeling for rhodopsin (in green) 824 38 phalloidin (in yellow) and cre (in red). The column to the right shows magnification of an area. 825 The mean of rhodopsin immunolabel intensity in micrographs (n ≥ 6 ROIs) from flatmounts (as 826 indicated in x-axis) relative to control at 2h was determined among three mice per condition and 827 shown in the plot. Age of mice was 10.5 – 18.5 months. (B) Ex-vivo β-HB release by the RPE of 828 Pnpla2-cKO eyecups upon ingestion of outer segments (OS) in comparison to that of controls. 829 Eyecups were isolated at 5 h (11 AM) and 8 h (2 PM) after light onset (6 AM). Statistical 830 significance was calculated using 2-way ANOVA for the 2 groups (controls and cKO mice) with 831 and without treatment (second variance) for each time after light onset (* p=0.02; ** p=0.006; 832 *** p=0.0001); ns, not significant. (n =6 eyecups from 3 control (f/+) mice at 3.5 months; n=4 833 eyecups from 2 control (f/f cre-) mice at 3.5 months; n=10 eyecups from 5 mice (f/f cre+) at 2.75 834 – 3.5 months) (C) The OS-mediated increase in β-HB release above basal levels of the cKO 835 RPE/choroid explants was calculated from the data in Panel (C) and plotted. 836 Figure 4. 837 RPE and Retinal functionality in RPE-Pnpla2-cKO mice. (A) Histogram showing the 838 amplitude (mean, standard deviation) of the c-wave, fast oscillation (FO), light peak (LP) and 839 off-response (OFF) measured by DC-ERG of 11-week-old cKO (n=4, empty histograms) and 840 control mice ((Pnpla2f/f and Pnpla2f/+, n=5, filled histograms). (B) Electroretinograms showing 841 amplitude (y-axis) of scotopic a- and b-wave, and photopic b-wave, as a function of light 842 intensity (x-axis) of 3 and 12-month-old cKO mice (empty circle) and littermate controls 843 (Pnpla2f/f, filled circles) (n=3/genotype). 844 Figure 5. 845 Phagocytosis in ARPE-19 cells pretreated with BEL. (A) ARPE-19 cells were incubated with 846 BEL at the indicated concentrations for 3.5 h. Then the mixture was removed, washed gently 847 with PBS, and incubated with complete medium for a total of 16 h. Cell viability was assessed 848 by crystal violet staining and with three replicates per condition. (B) Representative immunoblot 849 of total lysates of cells, which were pretreated with DMSO alone, 10 or 25 µM BEL/DMSO for 1 850 h prior to pulse-chase of POS, as described in methods. Extracts of cells harvested at the 851 indicated times (top of blot) were resolved by SDS-PAGE followed by immunoblotting with 852 anti-rhodopsin. Migration position of rhodopsin is indicated to the right of the blot. (C) 853 Quantification of rhodopsin from total lysates of cells of the pulse-chase experiments as in panel 854 (B). Samples from each biological replicate were resolved in duplicate by SDS-PAGE from two 855 39 experiments and single for the third experiment for quantification. Intensities of the 856 immunoreactive bands were determined and the percentage of the remaining rhodopsin after 16-857 h chase relative to rhodopsin at 2.5 h-pulse was plotted. (D) Representative immunoblot of total 858 lysates of cells, as in panel B to determine the effects of BEL at 16 h and 24 h of chase (as 859 indicated). (E) Quantification of rhodopsin from two independent experiments of the pulse-chase 860 experiments as in panel D. Samples from each biological replicate were resolved in duplicate by 861 SDS-PAGE for quantification. Intensities of the immunoreactive bands were determined and the 862 percentage of the remaining rhodopsin after 16-h chase relative to rhodopsin at 2.5 h-pulse was 863 plotted. (F) Cells were preincubated with DMSO alone, 10 or 25 µM BEL/DMSO in Ringer’s 864 solution at 37°C for 1 h. Then, the mixture was removed, and cells were incubated with Ringer’s 865 solution containing 5 mM glucose and POS (1x107 units/ml) with DMSO alone, 10 or 25 µM 866 BEL/DMSO for the indicated times (x-axis). Media were removed to determine the levels of β-867 HB secretion, which were plotted (y-axis). (n=3) Data are presented as means ± S.D. **p<0.01, 868 ***p<0.001. 869 Figure 6. 870 Knockdown of PNPLA2 in ARPE-19 cells. ARPE-19 cells were transfected with Scr 871 (Scrambled siRNA control) or siRNAs targeting PNPLA2, and mRNA levels and protein were 872 tested. (A) RT-qPCR to measure PNPLA2 mRNA levels in ARPE-19 cells 72 h post-transfection 873 with Scr and six different siRNAs (as indicated in the x-axis) was performed and a plot is shown. 874 PNPLA2 mRNA levels were normalized to 18S. All siRNA are represented as the percentage of 875 the scrambled siRNA control. n = 3 (B) A plot is shown for a time course of PNPLA2 mRNA 876 levels following transfection with Scr and siPNPLA2-C. n = 3 (C) RT-qPCR of mock-transfected 877 cells, cells transfected with Scr, and siPNPLA2-C (x-axis) at 72 h after transfection. mRNA 878 levels were normalized to the 18S RNA (y-axis). n = 3 (D) Total protein was obtained from cells 879 harvested 72 h after transfection and resolved by SDS-PAGE followed by western blotting with 880 anti-PNPLA2 and anti-GAPDH (loading control). The siRNAs used in transfections are 881 indicated at the top, and migration positions for PEDF-R and GAPDH are to the right of the blot. 882 Data are presented as means ± S.D. **p<0.01, ***p<0.001***p<0.001 883 Figure 7. 884 Phagocytosis and fatty acid metabolism in siPNPLA2 cells. ARPE-19 cells were transfected 885 with Scr or siRNAs targeting PNPLA2. At 72 h post-transfection, ARPE-19 cells were incubated 886 40 with POS (1 x 107 units/ml) in 24-well tissue culture plates for pulse-chase experiments. (A) 887 Representative immunoblot of total lysates of ARPE-19 cells at 0.5 h, 1 h, and 2.5 h of POS 888 pulse and at a 16-h and 24-h chase period, as indicated at the top of the blot. Proteins in cell 889 lysates were subjected to immunoblotting with anti-rhodopsin followed by reprobing with anti-890 GAPDH as the loading control. (B) Quantification of rhodopsin from duplicate samples and 3 891 blots of cell lysates from pulse-chase experiments and time periods (indicated in the x-axis) as 892 from panel. Data are presented as means ± S.D. ns, not significant, **p<0.01. (A). Intensities of 893 the immunoreactive bands were determined and the percentage of the remaining rhodopsin after 894 16-h and 24-h chase relative to rhodopsin at 2.5 h-pulse was plotted (y-axis). (C-D) Levels of 895 secreted free fatty acids (C) and β-HB (D) were measured in culture media of cells transfected 896 with Scr or siPNPLA2 following incubation with POS for the indicated periods of times (x-axis). 897 (n =3) Data are presented as means ± S.D. * p < 0.05, **p<0.01. Duplex siPNPLA2 C was used 898 to generate the data (see Table 3 for sequences of duplexes). 899 41 Supplementary Information 900 Figure S1. Proteins in the POS samples were determined and resolved by SDS-PAGE in the 901 same gel in two sets: one with 5 µg and another with 0.1 µg protein per lane. For each set, one 902 sample was non-reduced and the other was reduced with DTT. After electrophoresis, the gels 903 were cut in half lengthwise. The gel portion with 5 µg of protein was stained with Coomassie 904 Blue and the other portion with 0.1 µg protein was transferred to a nitrocellulose membrane for 905 immunostaining using anti-rhodopsin antibodies (as described in Methods). Photos of the stained 906 gel and western blot are shown. 907 The proteins of POS isolated from bovine retina had the expected migration pattern for both 908 reduced and non-reduced conditions, and the main bands stained with Coomassie Blue 909 comigrated with rhodopsin-immunoreactive proteins in western blots of POS proteins. 910 Figure S2. Electron microscopy micrographs. Panels A-J show electron microscopy 911 micrographs of RPE structures of 3-month-old RPE cKO prepared as described in the main text 912 and Figure 2. Magnification is indicated for each image. 913 The presence of LDs was associated with lack (Fig. S2A) of or the decreased thickness of the 914 basal infoldings, and with granular cytoplasm, abnormal mitochondria (Fig. S2B), and 915 disorganized localization of organelles (mitochondria and melanosomes) (Fig. S2A). In some 916 cells, the large LDs crowded the cytoplasm and clustered together the mitochondria and 917 melanosomes into the apical region of the cells (Figs. S2A, S2C, S2D); however, LDs number 918 and expansion within the cells appeared to be random and their expansion could go into any 919 direction (Fig. S2E). Normal apical cytoplasmic processes were lacking; however, degeneration 920 in the outer segment (OS) tips of the photoreceptors was visible (Figs. S2A, S2F). Additionally, 921 normal phagocytosis of the OS was lacking indicating an impaired RPE phagocytosis (Figs. 922 S2A, S2E, S2G). There were apparent unhealthy nuclei with pyknotic chromatin and leakage of 923 extranuclear DNA (enDNA), indicating that the beginning of the necrotic process had started 924 (Fig. S2B). Some RPE cells lacked basal infoldings, normally seen at the basal side (Fig. S2H). 925 Occasionally some RPE cells had lighter low-density cytoplasm indicating degeneration of 926 cytoplasmic components in contrast to the denser and fuller cytoplasm in the RPE of the 927 littermate control (Fig. S2I, S2J). 928 42 Figure S3. 929 Phagocytosis in ARPE-19 cells. ARPE-19 cells were cultured in 24-well plates for 3 days, and 930 then exposed to POS at 1x107 units/ml for up to a 2.5-h pulse followed by a 16-h chase period as 931 described in Methods. (A) Representative immunoblots of total cell lysates during pulse-chase 932 (times indicated at the top of the blot) with anti-rhodopsin followed by reprobing with anti-933 GAPDH as the loading control are shown. Migration positions of rhodopsin and GAPDH are 934 indicated to the right of the blot. Duplicate biological replicates were performed. (B) 935 Quantification of rhodopsin from duplicate samples per condition from pulse-chase experiments 936 at time periods indicated in the x-axis as from panel (A). Intensities of the immunoreactive bands 937 from duplicate samples of cell lysates were determined. The percentage of the remaining 938 rhodopsin after 16-h chase relative to rhodopsin at 2.5 h-pulse was plotted. (C-D) Levels of free 939 fatty acids (C) and β-HB (D) measured in culture media of cells incubated with and without POS 940 for the indicated periods of time (x-axis) were plotted and shown. n = 3 Data are presented as 941 means ± S.D. * p < 0.05, ***p<0.001. 942 Figure S4. Phagocytosis in ARPE-19 cells in porous membranes. ARPE-19 cells were treated 943 with 1x107 POS/ml. (A) Representative immunoblot showing rhodopsin internalization from 944 total cell lysates of ARPE-19 cells following 30, 60, and 150 min of POS incubation following 945 plating in 12-well transwell inserts for 3 weeks. Cell extracts were resolved by SDS-PAGE 946 followed by immunoblotting with anti-rhodopsin. The blot was stripped and reprobed with anti-947 GAPDH as a loading control. (B) Levels of B-HB secreted towards the apical membrane of 948 ARPE-19 cells following POS incubation for 30, 60, and 150 min. (n = 3) Data are presented as 949 means ± S.D. 950 Methods: 951 To demonstrate a functional assay to study phagocytosis in ARPE-19 cells we perform the assay 952 with confluent cells attached on porous membranes 953 ARPE-19 cells seeded on porous membranes were incubated for 3 weeks in culturing media. 954 Then the media was removed and replaced with Ringer’s solution alone or Ringer’s solution 955 containing 1 x 107 POS/ml and 5 mM glucose for the indicated time points. Rhodopsin was 956 detected by western blotting. 957 43 Rhodopsin levels in the lysates of cells incubated with POS were detected in as little as 30 min 958 and up to 2.5 h following POS incubation, while rhodopsin was undetectable in cells without 959 POS (Fig. S4A). β-HB levels released into the media of the apical chamber of transwells 960 following POS incubation increased four-fold and three-fold after 1 h and 2.5 h, respectively, 961 while released β-HB levels from cells incubated with Ringer’s solution alone did not increase 962 (Fig. S4B). 963 Figure S5: ARPE-19 cells were transfected with siScramble siRNA control or siRNAs targeting 964 PNPLA2 (siPNPLA2 A). RT-qPCR to measure PNPLA2 mRNA levels in ARPE-19 cells at (A) 965 72 h post-transfection and (B) 98.5h post transfection equivalent to pulse (2.5h) and chase (24h) 966 was performed with siRNA duplexes (as indicated in the x-axis). Treatment of cells in panel B 967 was as for pulse-chase (see diagram in Fig S3). PNPLA2 mRNA levels were normalized to 18S. 968 n =3 biological replicates, each data point corresponds to the average of triplicate PCR reactions. 969 The RT-PCR was repeated twice per biological replicate. Values that fell out of the standard 970 curve were not included in the plot. 971 The data shows that siPNPLA2 duplex silenced PNPLA2 in ARPE-19 at 72 h post-transfection 972 and that silencing was maintained throughout a 2.5 h and pulse-chase of 24 h. 973 Floxed allele Cre-recombined allele 1866 bp 253 bp MW cKO Ctr cKO Ctr Figure 1. B. A. C. D. E. co ntr ol 1 2 3 4 5 6 7 8 9 10 0 40 80 120 cKO mouse # C re p os iti ve c el ls (% ) control cKO cre/phalloidin Co ntr ol cK O 0.0 0.5 1.0 1.5 P np la 2/ H P R T Figure 2. BI LD A. B. BI LD BI BI Co nt ro l cK O Figure 3. A. 2 ho ur s (8 A M ) cK O Co nt ro l rhodopsin/cre/phalloidin 5 ho ur s (1 1 A M ) cK O Co nt ro l 5 hours (11 AM) 8 hours (2 PM) 0 2 4 6 β -h yd ro xy bu ty ra te (n m ol ) Control Control +OS cKO cKO+OS ****** ** ** 5 h (11 AM) 8 h (2 pm) 0 1 2 3 Time after light onset ∆ β -H B (n m ol ) Control cKO * * B. C. co ntr ol cK O co ntr ol cK O 0 1 2 3 Mouse R ho do ps in (r el at iv e to c on tro l 2 h) 2 h 5 h Scotopic a-wave Scotopic b-wave Photopic b-wave -5 -4 -3 -2 -1 0 1 2 0 200 400 600 800 1000 -5 -4 -3 -2 -1 0 1 2 0 100 200 300 -2 -1 0 1 2 0 100 200 300 400 Control cKO -2 -1 0 1 2 0 100 200 300 400 -5 -4 -3 -2 -1 0 1 2 0 200 400 600 800 1000 -5 -4 -3 -2 -1 0 1 2 0 100 200 300 3 m on th 12 m on th Light intensity log (cd/s.m2) A m pl itu de (µ V) c-wave FO LP OFF 0.0 0.5 1.0 1.5 2.0 A m pl itu de (µ V) Figure 4. A. B. Figure 5. A. B. C. D. 1 10 100 1000 0.0 0.2 0.4 0.6 BEL (µM) C el l v ia bi lit y (A bs 57 0n m ) 0.5 1 2.5 0.5 1 2.5 0.5 1 2.5 0 1 2 3 Time (h) β -H B (n m ol es ) 0 10 25 ** *** ** ** *** *** BEL (µM) -Rhodopsin BEL (µM) 0 10 25 E. -Rhodopsin BEL (µM) 0 25 0.5 1 2.5 16 24 0.5 1 2.5 16 24 (h) F. 0.5 1 2.5 16 0.5 1 2.5 16 0.5 1 2.5 16 (h) 0 10 25 0 50 100 150 BEL (µM) R ho do ps in r em ai ni ng (% re la tiv e to 2 .5 h ) 16 h ✱✱✱ ✱✱ 2.5 16 24 2.5 16 24 0 50 100 150 Time (h) R ho do ps in r em ai ni ng (% re la tiv e to 2 .5 h ) BEL (µM) 0 25 ns **** *** Figure 6. B. A. C. D. 24 48 72 0.00 0.04 0.08 0.12 0.16 Time (h) P N P LA 2/ 18 S Scr siPNPLA2 *** *** *** No ne Sc r PN PL A2 0.00 0.15 0.30 0.45 siRNA PN PL A 2/ 18 S n.s. *** Scr A B C D E F 0 50 100 150 siRNA P N P LA 2/ 18 S (% ) *** *** *** *** *** ** None Scr C D E siRNA -GAPDH -PEDF-R Figure 7. A. C. D. Scr siPNPLA2 -GAPDH -Rhodopsin 0.5 1 2.5 16 24 0.5 1 2.5 16 24 (h) B. 0.5 1.0 2.5 0.0 0.4 0.8 1.2 Time (h) β -h yd ro xy bu ty ra te (n m ol ) Scr siPNPLA2 * * ** 0.5 1.0 2.5 0.0 0.5 1.0 1.5 Time (h) Fr ee fa tt y ac id s (p m ol ) Scr siPNPLA2 * 16 24 0 30 60 90 120 Time (h) R ho do ps in r em ai ni ng (% re la tiv e to 2 .5 h ) Scr siPNPLA2 ✱✱ ✱✱✱ Degradation of Photoreceptor Outer Segments by the Retinal Pigment Epithelium Requires Pigment Epithelium-derived Factor Receptor (PEDF-R) Jeanee Bullock, Federica Polato, Mones Abu-Asab, Alexandra Bernardo-Colón, Ivan Rebustini, Elma Aflaki, Martin-Paul Agbaga, S. Patricia Becerra Supplementary Figures POS (µg) 5 5 0.1 0.1 DTT - + - + ~260 ~140 ~100 ~70 ~50 ~40 ~35 ~25 ~15 MW x 10-3 Coomassie Blue Ab-Rhodopsin Proteins in the POS samples were determined and resolved by SDS-PAGE in the same gel in two sets: one with 5 µg and another with 0.1 µg protein per lane. For each set, one sample was non-reduced and the other was reduced with DTT. After electrophoresis, the gels were cut in half lengthwise. The gel portion with 5 µg of protein was stained with Coomassie Blue and the other portion with 0.1 µg protein was transferred to a nitrocellulose membrane for immunostaining using anti-rhodopsin antibodies (as described in Methods). Photos of the stained gel and western blot are shown. The proteins of POS isolated from bovine retina had the expected migration pattern for both reduced and non-reduced conditions, and the main bands stained with Coomassie Blue comigrated with rhodopsin-immunoreactive proteins in western blots of POS proteins. Figure S1. SDS-PAGE and western blot of bovine POS A. B. C. D. E. F. G. H. I. J. Figure S2. TEM of RPE in RPE-Pnpla2-cKO mice The presence of LDs was associated with lack (Fig. S2A) of or the decreased thickness of the basal infoldings, and with granular cytoplasm, abnormal mitochondria (Fig. S2B), and disorganized localization of organelles (mitochondria and melanosomes) (Fig. S1A). In some cells, the large LDs crowded the cytoplasm and clustered together the mitochondria and melanosomes into the apical region of the cells (Figs. S2A, S2C, S2D); however, LDs number and expansion within the cells appeared to be random and their expansion could go into any direction (Fig. S2E). Normal apical cytoplasmic processes were lacking; however, degeneration in the outer segment (OS) tips of the photoreceptors was visible (Figs. S2A, S2F); . Additionally, normal phagocytosis of the OS was lacking indicating an impaired RPE phagocytosis (Figs. S2A, S2E, S2G). There were apparent unhealthy nuclei with pyknotic chromatin and leakage of extranuclear DNA (enDNA), indicating that the beginning of the necrotic process had started (Fig. S2B). Some RPE cells lacked basal infoldings, normally seen at the basal side (Fig. S2H). Occasionally some RPE cells had lighter low-density cytoplasm indicating degeneration of cytoplasmic components in contrast to the denser and fuller cytoplasm in the RPE of the littermate control (Fig. S2I, S2J). Figure S3. A. C. D. 2.5 16 24 0 25 50 75 100 Time (h) Rh od op si n re m ai ni ng (% re la tiv e to 2 .5 h ) 0.5 1 2.5 0 10 20 30 Time (h) Fr ee fa tt y ac id s (p m ol ) - POS + POS *** *** * 0.5 1 2.5 0 2 4 6 Time (h) β- hy dr ox yb ut yr at e (n m ol ) - POS + POS *** *** *** B. 0.5 1 2.5 16 24 (h) -GAPDH -Rhodopsin - 3 days 0 0.5h 1h 2.5h 16h 24h Plate cells +107 POS/ml Remove POS Add complete media Media → FFA, β-HB Cells → WB Pulse Chase Figure S3. Phagocytosis in ARPE-19 cells. ARPE-19 cells were cultured in 24-well plates for 3 days, and then exposed to POS at 1x107 units/ml for up to a 2.5-h pulse followed by an upto 24-h chase period as described in Methods. (A) Representative immunoblots of total cell lysates during pulse-chase (times indicated at the top of the blot) with anti-rhodopsin followed by reprobing with anti-GAPDH as the loading control are shown. Migration positions of rhodopsin and GAPDH are indicated to the right of the blot. Duplicate biological replicates were performed. (B) Quantification of rhodopsin from duplicate samples per condition from pulse-chase experiments at time periods indicated in the x-axis as from panel (A). Intensities of the immunoreactive bands from duplicate samples of cell lysates were determined. The percentage of the remaining rhodopsin after 16-h chase relative to rhodopsin at 2.5 h-pulse was plotted. (C-D) Levels of free fatty acids (C) and -HB (D) measured in culture media of cells incubated with and without POS for the indicated periods of time (x-axis) were plotted and shown. n = 3 Data are presented as means ± S.D. * p < 0.05, ***p<0.001. 30 60 150 30 60 150 min - POS +POS -GAPDH -Rhodopsin A. Cells on porous membranes B. Cells on porous membranes 30 60 150 0 2 4 6 8 Time (min) S ec re te d β- H B (n m ol es ) - POS + POS 30 60 150 30 60 150 min - POS +POS -Rhodopsin -GAPDH C. Cells on plastic Figure S5. Phagocytosis in ARPE-19 cells in porous membranes. ARPE-19 cells were treated with 1x107 POS/ml. (A) Representative immunoblot showing rhodopsin internalization from total cell lysates of ARPE-19 cells following 30, 60, and 150 min of POS incubation following plating in 12-well transwell inserts for 3 weeks. Cell extracts were resolved by SDS-PAGE followed by immunoblotting with anti-rhodopsin. The blot was stripped and reprobed with anti-GAPDH as a loading control. (B) Levels of B-HB secreted towards the apical membrane of ARPE-19 cells following POS incubation for 30, 60, and 150 min. Data are presented as means ± S.D. ARPE-19 cells plated on porous membranes engulf bovine outer segments To demonstrate a functional assay to study phagocytosis in ARPE-19 cells we perform the assay with confluent cells attached on porous membranes Methods: ARPE-19 cells seeded on porous membranes were incubated for 3 weeks in culturing media. Then the media was replaced with Ringer’s solution alone or Ringer’s solution containing 1 x 107 POS/ml and 5 mM glucose for the indicated time points. Rhodopsin was detected by western blotting. Rhodopsin levels in the lysates of cells incubated with POS were detected in as little as 30 min and up to 2.5 h following POS incubation, while rhodopsin was undetectable in cells without POS (Fig. S4A). B-HB levels released into the media of the apical chamber of transwells following POS incubation increased four-fold and three-fold after 60 and 150 min, respectively, while released B-HB levels from cells incubated with Ringer’s solution alone did not increase (Fig. S4B). Figure S4. siScramble siRNA A 0.0 0.2 0.4 0.6 0.8 1.0 P np la 2/ 18 S **** siScramble siRNA A 0.0 0.5 1.0 1.5 2.0 P np la 2/ 18 S **** ARPE-19 cells were transfected with siScramble siRNA control or siRNAs targeting PNPLA2 (siPNPLA2 A). RT-qPCR to measure PNPLA2 mRNA levels in ARPE-19 cells at (A) 72 h post-transfection and (B) 98.5h post transfection equivalent to pulse (2.5h) and chase (24h) was performed with siRNA duplexes (as indicated in the x-axis). Treatment of cells in panel B was as for pulse-chase (see diagram in Fig S3). PNPLA2 mRNA levels were normalized to 18S. n =3 biological replicates, each data point corresponds to the average of triplicate PCR reactions. The RT-PCR was repeated twice per biological replicate. Values that fell out of the standard curve were not included in the plot. The data shows that siPNPLA2 duplex silenced PNPLA2 in ARPE-19 at 72 h post-transfection and that silencing was maintained throughout a 2.5 h and pulse-chase of 24 h. Figure S5. A. 72h post transfection B. 98.5 h post transfection, parallel to pulse-chase aCorresponding author: S. Patricia Becerra NIH-NEI-LRCMB Section of Protein Structure and Function Bg. 6, Rm. 134 6 Center Drive MSC 0608 Bethesda, MD 20892-0608 becerrap@nei.nih.gov PEDF-R in phagocytosis 9-22-20 MS REVISED 12-29-20.pdf aCorresponding author: S. Patricia Becerra NIH-NEI-LRCMB Section of Protein Structure and Function Bg. 6, Rm. 134 6 Center Drive MSC 0608 Bethesda, MD 20892-0608 becerrap@nei.nih.gov Phagocytosis and PEDF-R Figures 9-22-20 revised 12-22-2020.pdf Slide Number 1 Slide Number 2 Slide Number 3 Slide Number 4 Slide Number 5 Slide Number 6 Slide Number 7 Supplemmentary information 9-15-20 REVISED 12-26-20.pdf Degradation of Photoreceptor Outer Segments by the Retinal Pigment Epithelium Requires Pigment Epithelium-derived Factor Receptor (PEDF-R)�Jeanee Bullock, Federica Polato, Mones Abu-Asab, Alexandra Bernardo-Colón, Ivan Rebustini, Elma Aflaki, Martin-Paul Agbaga, S. Patricia Becerra Slide Number 2 Slide Number 3 Slide Number 4 Slide Number 5 Slide Number 6 Rhodopsin in RPE of cKO and control mice Slide Number 8 Slide Number 9 Slide Number 10 Slide Number 11 10_1101-2021_01_02_425093 ---- 76719164 1 Priming mycobacterial ESX-secreted protein B to form a channel-like structure Abril Gijsbers1, Vanesa Vinciauskaite1, Axel Siroy1,†, Ye Gao1, Giancarlo Tria1,‡, Anjusha Mathew2, Nuria Sánchez-Puig1,3, Carmen López-Iglesias1, Peter J. Peters1* and Raimond B. G. Ravelli1* 1Division of Nanoscopy, Maastricht Multimodal Molecular Imaging Institute (M4I), Maastricht University, Universiteitssingel 50, 6229 ER, Maastricht, the Netherlands 2Division of Imaging Mass Spectrometry, Maastricht Multimodal Molecular Imaging Institute (M4I), Maastricht University, Universiteitssingel 50, 6229 ER, Maastricht, the Netherlands 3Departamento de Biomacromoléculas Instituto de Química, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México, México †Present address: European Institute of Chemistry and Biology (IECB), Pessac, France ‡Present address: Dipartimento di Chimica "Ugo Schiff", Università degli Studi di Firenze, Via della Lastruccia, 3-13 I-50019 Sesto Fiorentino, Italia *Corresponding author: rbg.ravelli@maastrichtuniversity.nl (RBGR), pj.peters@maastrichtuniversity.nl (PJP) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract ESX-1 is a major virulence factor of Mycobacterium tuberculosis, a secretion machinery directly involved in the survival of the microorganism from the immune system defence. It disrupts the phagosome membrane of the host cell through a contact-dependent mechanism. Recently, the structure of the inner-membrane core complex of the homologous ESX-3 and ESX-5 was resolved; however, the elements involved in the secretion through the outer membrane or those acting on the host cell membrane are unknown. Protein substrates might form this missing element. Here, we describe the oligomerisation process of the ESX-1 substrate EspB, which occurs upon cleavage of its C-terminal region and is favoured by an acidic environment. Cryo-electron microscopy data are presented which show that EspB from different mycobacterial species have a conserved quaternary structure, except for the non-pathogenic species M. smegmatis. EspB assembles into a channel with dimensions and characteristics suitable for the transit of ESX-1 substrates, as shown by the presence of another EspB trapped within. Our results provide insight into the structure and assembly of EspB, and suggests a possible function as a structural element of ESX-1. Keywords Cryo-EM, EspB, ESX-1, mycobacteria, preferential orientation .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 Introduction Tuberculosis (TB) is an infectious disease caused by the bacillus Mycobacterium tuberculosis. It is estimated that one-quarter of the world’s population is currently infected with latent bacteria. In 2018, 10 million people developed the disease from which 0.5 million were caused by multidrug- resistant strains. Even though TB is curable, 1.5 million people succumb to it every year (World Health Organization, 2019). The current treatment is long and with serious side effects, often driving the patient to terminate the therapy before its conclusion (Schaberg et al., 1996). This has contributed to an increase in the number of patients suffering from multidrug- and extensively drug- resistant TB. While treatment is available for some of these resistant strains, the regimen is usually longer, more expensive and sometimes more toxic. For this reason, research on mycobacterial pathogenesis is vital to find a proper target in order to develop more effective therapeutics and vaccines. The high incidence of TB relates to the ability of M. tuberculosis to evade the host immune system (Ferluga et al., 2020). This ability is related to multiple factors, one of which is a complex cell envelope with low permeability that plays a crucial role in drug resistance and in survival under harsh conditions (Brennan & Nikaido, 1995). Likewise, pathogenic mycobacteria secrete virulence factors that manipulate the environment and compromise the host immune response. Mycobacteria have up to five specialised secretion machineries that carry out this process, named ESX-1 to -5 (together known as the type VII secretion system or T7SS). The core components of the inner-membrane part of T7SS have been identified (Pym et al., 2003). Nevertheless, it remains unknown whether the translocation of substrates through the inner and outer membrane is functionally coupled or not (one- or two-step, respectively), and if it deploys a specific outer-membrane complex to do so (Bunduc et al., 2020a). Proteins from the PE/PPE family, characterised by Pro-Glu and Pro-Pro-Glu motifs and secreted by T7SS, are often associated with the outer most layer of the mycobacterial cell .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 envelope, and have been suggested to play a role in the membrane channel formation (Burggraaf et al., 2019; Cascioferro et al., 2007; Wang et al., 2020). Recently, the intake of nutrients by M. tuberculosis was shown to be dependent on PE/PPE proteins, suggesting that these form small molecule–selective porins that allow the bacterium to take up nutrients over an otherwise impermeable barrier (Wang et al., 2020). ESX-1 to -5 are paralogue protein complexes with specialised functions and substrates, unable to complement each other (Abdallah et al., 2007; Phan et al., 2017). ESX-1 is an essential player in the virulence of M. tuberculosis. It has been implicated in phagosomal escape, cellular inflammation, host cell death, and dissemination of the bacteria to neighbouring cells (Abdallah et al., 2011; Houben et al., 2012a; Simeone et al., 2012; Stanley et al., 2007; van der Wel et al., 2007). Our knowledge about the structure of the machinery as well as the mechanism of secretion and regulation remains limited. ESX-3 is involved in iron homeostasis (Siegrist et al., 2009), and only recently the molecular architecture of its inner-membrane core has been determined (Famelis et al., 2019; Poweleit et al., 2019). The complex consists of a dimer of protomers, made of four proteins: ESX-conserved component (Ecc)-B, C, D(x2), and E. Despite the resolution achieved in both studies, there was no obvious channel through which the proteins substrates can traverse. Rosenberg and collaborators have described that one of the elements of the secretion system (EccC) forms dimers upon substrate binding, which then forms higher-order oligomers (Rosenberg et al., 2015). This is in agreement with observations that ESX-5, which is involved in nutrient uptake (Ates et al., 2015) and host cell death (Abdallah et al., 2011), forms a hexamer (Beckham et al., 2020; Bunduc et al., 2020b; Houben et al., 2012b). A recent structure of the ESX-5 hexamer shows that is it stabilised by a mycosin protease (MycP) positioned in the periplasm on top of EccB5 (Bunduc et al., 2020b). ESX-2 and ESX-4 are the least characterised, where ESX-4 is involved in DNA transfer (Gray et al., 2016) and is seen as being the ancestor of the five ESX-systems (Gey Van Pittius et al., 2001). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 Located in different positions in the genome, the esx loci contain the genes that code for the four Ecc proteins, MycP, a heterodimer of EsxA/B-like proteins, and one or more PE-PPE pairs. With high sequence similarity and conservation between paralogues (Poweleit et al., 2019; van Winden et al., 2016), one could expect the inner-membrane core of the different systems to share a similar architecture. So what makes each one of them unique? Experimental data suggest that the answer lies with the substrates (Lou et al., 2017). The esx-1 locus encodes for more than ten unique proteins that are known to be secreted (Sani et al., 2010), termed the ESX-1 secretion-associated proteins (Esp) (Bitter et al., 2009). Amongst those is EspC, a protein present in pathogenic organisms that was described to form filamentous structures in vitro and to localise on the surface of the bacteria in vivo (Lou et al., 2017). Due to the similarities between EspC and the needle protein of the type III secretion system, Lou et al. hypothesised that ESX-1 could be an injectosome system with EspC as its needle. This is of particular importance because, compared to the other systems, ESX-1 function has been described to take place through a contact-dependent mechanism (Conrad et al., 2017), which makes the discovery of an outer-membrane complex essential for understanding the system. Other proteins, like EspE that has been localised on the cell wall (Carlsson et al., 2009; Phan et al., 2018; Sani et al., 2010), are of interest as possible elements of the outer-membrane complex. The protein EspB has been the focus of attention due to its ability to oligomerise upon secretion (Korotkova et al., 2015; Solomonson et al., 2015), making it a strong candidate as a structural component of the machinery (Piton et al., 2020). EspB belongs to the PE/PPE family, but unlike other family members that form heterodimers in mycobacteria, EspB consists of a single poly-peptide chain fusing the PE and PPE domains (Korotkova et al., 2015). EspB is a 48-kDa protein that matures during secretion: Its largely unstructured C-terminal region is cleaved in the periplasm by the protease MycP1, leaving a mature 38-kDa isoform (Ohol et al., 2010; Solomonson et al., 2013; Xu et al., 2007). The purpose of this maturation is not yet clear but it was shown that inactivation of MycP1, and thus cleavage of EspB, deregulates the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 secretion of proteins by ESX-1 (Ohol et al., 2010). Chen et al. observed specific binding of EspB to phosphatidylserine and phosphatidic acid after cleavage (Chen et al., 2013), suggesting that the C- terminal processing of EspB is important for its functioning, possibly involving lipid binding. The crystal structure of the monomeric N-terminal part of EspB from M. tuberculosis and M. smegmatis has been determined: It forms a four-helix bundle with high structural homology between species (Korotkova et al., 2015; Solomonson et al., 2015). During the preparation of this work for publication, the structure of an EspB oligomer from M. tuberculosis was published by Piton et al., showing features of a pore-like transport protein (Piton et al., 2020). EspB is the only member of the PE/PPE family described to date to form higher-order oligomers. In this work, we studied the oligomerisation ability and structures of EspB from M. tuberculosis, M. marinum, M. haemophilum and M. smegmatis. We show that truncation of EspB at the MycP1 cleavage site and an acidic environment promote the oligomerisation of EspB from the three pathogenic species but not from non-pathogenic M. smegmatis. Oligomerisation is mediated by intermolecular hydrogen bonds and amide bridges between residues highly conserved in the pathogenic species, but absent in M. smegmatis. The structures of oligomeric EspB consist of two domains: an N-terminal region that forms a cylinder-like structure with a tunnel large enough to accommodate a folded PE-PPE pair, and a partly hydrophobic C-terminal region that interact with hydrophobic surfaces. The oligomer has similar inner-pore dimensions as was described for the pore within the periplasmic region of ESX-5 (Bunduc et al., 2020b). Visualisation of a trapped EspB monomer within the channel supports the idea that it could transit secreted proteins through its tunnel. Overall, in this work we describe factors that prime the oligomerisation of EspB, and provide insight into its potential role in the ESX-1 machinery. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 Results Oligomerisation is favoured by an acidic pH and maturation of EspB Previously, it has been described that oligomerisation of EspB occurs after secretion (Korotkova et al., 2015). In the infection context, this secretion would lead the protein to the phagosomal lumen of a macrophage, an organelle known to have pH acidification as a functional mechanism. To evaluate the putative role of pH in the oligomerisation process, the mature form of M. tuberculosis EspB (residues 2–358) was incubated at different pH values and analysed by size exclusion chromatography (SEC). Results showed that the equilibrium is favoured towards an oligomer form at pH 5.5 compared to pH 8.0 (Fig 1A), as observed by a higher oligomer/monomer ratio at any protein concentration (Fig 1B). Native mass spectrometry experiments confirmed this behaviour and could identify different oligomeric states of EspB, with the heptamer being the most predominant. Intermediate states were observed (dimer to pentamer) and even higher oligomeric states (octamer) but in lower abundance compared to the heptamer (Fig 1C and D). Because EspB undergoes proteolytic processing of its C-terminus during secretion, we investigated the effect of this cleavage on the quaternary structure of different EspB constructs, varying in their C-terminus lengths, from M. tuberculosis at pH 5.5 (Fig 2). With the exception of EspB7-278 that did not oligomerise, we observed that oligomerisation was favoured for all other constructs at pH 5.5 (Fig 2B). The full-length EspB2-460 (Fig 2B, blue trace) presented the lowest amounts of complex formation compared to the other constructs tested, while the highest amount was observed for the mature isoform, EspB2-358 (Fig 2B, orange trace). These results suggest that MycP1 cleaves EspB to allow oligomerisation, and that the remaining residues of the unstructured C-terminal region are needed, possibly, to stabilise the complex. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 EspB from M. smegmatis is unable to oligomerise To determine whether the oligomerisation ability is conserved across species, we performed cryo-EM analysis on different orthologues of the mature EspB. Proteins from the pathogenic species M. tuberculosis, M. marinum and M. haemophilum were able to oligomerise into ring-like structures while the non-pathogenic M. smegmatis did not, as seen by the lack of visible particles (Fig 3A); the structured region of an EspB monomer (30 kDa) has a signal-to-noise ratio too low to be visualised within these micrographs (Henderson, 1995; Zhang et al., 2020). Interestingly, comparison of the tertiary structure from the pathogenic species studied here with the published crystallographic model of EspB from M. smegmatis did not show substantial differences (RMSD Cα’s 0.98 – 1.13 Å), apart from an extended α-helix 2 (Fig 3B), absent in our oligomeric structures. To determine whether the differences in oligomerisation ability between M. smegmatis and the pathogenic species were due to their primary structure variances, we performed sequence alignment of multiple EspB orthologues. The species that presented oligomerisation showed high sequence identity whereas M. smegmatis has the lowest of all (Fig 3C and Fig EV1). Because EspB belongs to the PE/PPE family, we included in the analysis a PE-PPE pair with a structure already published (Ekiert & Cox, 2014). PE25-PPE41 did not oligomerise (Fig 3), despite sharing a similar tertiary structure (RMSD Cα’s 1.134 Å). With a low identity percentage (21.3%), it confirms the importance of specific amino acids sequence for the conservation of the quaternary structure. High-resolution cryo-EM structures of EspB oligomers Next, we aimed to solve the high-resolution structure of EspB oligomers by cryo-EM. Initial experiments were performed with EspB2-460 and EspB2-348 from M. tuberculosis, which displayed a very strong preferential orientation where only “top views” could be seen (Fig 4A). Cryo-electron tomography revealed these molecules to be attached to the air-water interface (Fig EV2). Different .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 oligomers were found: hexamers, heptamers, rings with an extra density in the middle, and octamers, with the heptameric ensemble being the predominant one (Fig 4A), in agreement with the results obtained in solution (Fig 1C). Preliminary 3D reconstructions could be obtained from data that were collected at one or more tilt angles (Fig 4B). Data processing resulted in 3–4 Å resolution maps from which the first heptamer models were built. Removal of C-terminal ending up at residue 287 led to a different distribution of particles on cryo-EM grids, now with random orientations (Fig 4C), implying that this region interacts with the air-water interface on the EM grid (Noble et al., 2018) (Fig EV2). Experiments were repeated for constructs EspB2-287 from M. tuberculosis and the equivalent construct from M. marinum (at 0°-stage tilt), leading to high-resolution EM maps of 2.3 Å and 2.5 Å average resolution, respectively (Fig 5A and B, and Fig EV3A–C). We observed high structural conservation between the two structures. Both displayed a four-helix bundle, like the EsxA-B complex and PE25-PPE41 complex, with the WxG and YxxxD located on one end of the elongated molecule, referred to the top hereafter, making an H-bond interaction between the nitrogen of W176 with the oxygen of Y81, as was observed in the crystal structure (Fig 5C) (Korotkova et al., 2015). The helical tip is located on the opposite end, referred to as the bottom, for both EspB and PE25-PPE41 (Korotkova et al., 2015; Solomonson et al., 2015). The C-terminal region starts near the top end of the elongated molecule. The overall structure shows seven copies tilted 32° with respect to the symmetry-axes forming a cylinder-like oligomer with a width and a height of 90 Å (Fig 5A and B). The single particle analysis (SPA) map from M. tuberculosis EspB revealed three Q-Q and Q-N interaction pairs between monomers (Fig 5C). Q48 was conserved in all EspB orthologues analysed here with exception of M. smegmatis that showed no oligomerisation (Fig 3C and Fig EV1). A Q48A substitution in the M. tuberculosis orthologue resulted in the disruption of the oligomer (Fig EV4D, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 yellow trace) as evidenced by the absence of a high molecular weight peak by SEC. Amide bridges are not commonly seen within the structures present in the Protein Data bank (PDB) (Joosten et al., 2009) but they are stronger interactions than typical hydrogen bonds and less affected by pH changes compared to salt bridges, another strong interaction (Xie et al., 2015). In addition to the Q-Q and Q-N interaction pairs, some hydrophobic interfacing residues were identified, including L51 and L161. Histidines, glutamates and aspartates were not found to interact directly with the neighbouring monomer. It seems that these residues mainly play a role in the pH-dependent overall charge distribution of the monomers (Fig 6A-B). Our high-resolution EspB oligomer maps did not reveal a continuous density for the PE-PPE linker. The proposed location of the linker within the crystal structure (Korotkova et al., 2015) overlaps with the oligomerisation interface, and would need to adopt a different position upon oligomerisation. Particle subtraction followed by focused classification showed partial densities for the linker at the periphery of the structure. We locked the PE-PPE linker in its crystal-structure position by making a double mutation, N55C (in the core of the monomer) and T119C (in the PE-PPE linker): this would prevent the linker to adopt a different conformation as is needed for the oligomerisation. This double mutant abolished oligomerisation of EspB, suggesting that an intramolecular bond was formed that prevented the linker from moving (Fig EV3D, red trace). EspB, a possible transport channel for T7SS proteins The EspB cylinder-like structure has an internal pore diameter of 40 Å (Fig 6A-B), large enough to accommodate folded proteins such as EsxA/EsxB (diameter 35 Å), PE25-PPE41 (diameter 27 Å) or an EspB monomer itself (diameter 28 Å). Analysis of the degree of hydrophobicity in the structure showed that the internal surface of the oligomer is mainly hydrophilic (Fig 6D), allowing other hydrophilic molecules to pass. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 During the cryo-EM data processing, additional densities were consistently found within the EspB heptamer of all the different constructs, including constructs that lack the C-terminal region. Fig 6E shows a high-resolution 2D class of EspB2-348 with a well-defined density inside the channel from a subset of the data collected at 0°-stage tilt. This 2D class was found in ~7% of the particles recorded. The 2D classes obtained at 40°-stage tilt could not be unambiguously manually assigned to specific oligomerisation forms. Instead, 3D classification in RELION (Scheres, 2012) was used to identify one class with solely C7 symmetry and one class with an extra density within the heptameric channel. Local symmetry averaging of the heptamer model while processing the overall map in C1 map revealed an extra density spanning the entire channel, in which we could fit an EspB monomer model (Fig 6G-H). Integrity of the PE-PPE linker is not essential for the oligomerisation of EspB To determine if the PE-PPE linker absent in our model was essential for oligomerisation, we performed limited proteolysis analysis on the M. tuberculosis constructs. Incubation with trypsin fully digested EspB7-278, perhaps due to its lower stability, but resulted in two major fragments for the constructs EspB2-460 and EspB2-348, as shown by SDS-PAGE (Fig EV4A). N-terminal sequencing and mass spectrometry analysis revealed that the larger fragment corresponded to a section of the protein comprising residues V122 to R343 (corresponding to the PPE domain), while the smaller fragment included the N-terminal end of the protein sequence, with a few residues from the affinity tag, up to residue R121 (PE domain and linker)(Fig EV4B–D). Despite being split within the PE-PPE linker into two fragments, EspB2-348 behaved in gel filtration as a single entity with the capacity to form oligomers (Fig EV4E and F) confirming that the integrity of this region is not necessary for the complex to form. It is noteworthy that trypsin did not cut before R343, even though there are cleavage sites in the so-called unfolded C-terminal region rising the question if this region is actually fully unstructured. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 Properties of the EspB C-terminal region The function of the C-terminal region has puzzled the scientific community for a long time, partly because it is the only substrate known to date of the MycP1 protease. Here, we described its processing as an important factor for the oligomerisation of the N-terminal region (1-287), however, the cleavage leaves ~70 residues for no-obvious reason. To gain insight in the properties of the C- terminal region that could hint for its function, and based on the preferential orientation-effect seen in cryo-EM, we performed a hydrophobicity analysis of this region on the different EspB orthologues. Analysis evidenced the presence of hydrophobic patches in the pathogenic species, that are absent in EspB from M. smegmatis (Fig EV5A). Some of these patches are present in all the constructs with preferential orientation, leading to speculate that residues 297–324 interact with the air-water interface of the cryo-EM grid (Noble et al., 2018). To understand whether this effect is related to a structural change or particular characteristic in the C-terminal region of the protein, we expressed a construct corresponding to residues 279–460 and carried out circular dichroism (CD) studies on it. Far UV CD spectra analysis of this region showed a negative band around 198 nm (Fig EV5B), characteristic of random coil structures. This result is in line with the high fraction (54%) of “disorder-promoting” residues within this region (lysine, glutamine, serine, glutamic acid, proline and glycine: amino acids commonly found in intrinsically disordered protein regions). Interestingly, its proline content is 2.5 times higher than that observed for proteins in the PBD (Theillet et al., 2013; Uversky, 2013). Comparative analysis of the CD difference spectra obtained at different pH [Δ] (pH 5.5 – pH 8.0)] revealed a positive signal close to 220 nm and a negative signal near 200 nm (Fig EV5B inset), showing that this region is able to adopt extended left- handed helical conformations [poly-L-proline type II or PPII (Rucker & Creamer, 2002)]. CD analysis of the C-terminal region in the presence of different concentrations of 2,2,2-trifluoroethanol (TFE) showed that this region has an intrinsic ability to attain helicity based on the decrease in the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 ellipticity signal at 222 nm (Fig EV5C) (Luo & Baldwin, 1997). The lack of a single isodichroic point at 200 nm suggests that the conformational changes elicited by TFE do not comply with a two-state model and that most probably the transition is accompanied by an intermediate, e.g. the presence of more than one α-helix. Discussion In the present study, we describe different factors that facilitate oligomerisation of EspB: an acidic environment, the truncation of its C-terminal region, a flexible PE-PPE linker and the residues involved in the interaction. Our findings are in agreement with previous observations that EspB oligomerises upon secretion (Korotkova et al., 2015). Based on these results, the C-terminus of the full-length protein could prevent premature oligomerisation in the cytosol of mycobacteria, possibly through steric hindrance. However, this region is also likely to have other functions. Deletion of EspB C-terminus does not affect its own secretion (McLaughlin et al., 2007; Xu et al., 2007) but rather the secretion of EsxA/EsxB, possibly by loss of interaction with the last residues of EspB (Xu et al., 2007). The sequence of the C-terminal end is highly conserved (Fig EV1), which makes it possible that this region interacts with other molecules in the cytoplasm of the bacterium. This ability of EspB to oligomerise seems to be conserved across mycobacterial species, with the exception of M. smegmatis. This microorganism is a fast-growing, non-pathogenic species that uses ESX-1 system for horizontal DNA transfer (Flint et al., 2004). The exact mechanism of this transfer is unknown; however, evidence suggests that ESX-1 is not the DNA conduit but rather secretes proteins that act like pheromones, which in turn induces the expression of esx-4 genes resulting in mating- pair interactions (Gray et al., 2016). The ESX-1 substrate EsxA was shown to undergo a structural change that allows membrane insertion in M. tuberculosis when exposed to an acidic environment, however, this effect does not occur in its M. smegmatis orthologue (De Leon et al., 2012; Ma et al., .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 2015). Taking the aforementioned antecedents and the oligomerisation differences between EspB proteins observed in this work, it is plausible to think that the mechanism of action of the ESX-1 is distinct between these two species. EspB interacts with the lipids phosphatidylserine and phosphatidic acid (Chen et al., 2013). It was suggested that EspB could transport phosphatidic acid (Piton et al., 2020) but the interior of the complex is mainly hydrophilic, making this scenario less plausible. Despite the presence of lipids in the crystallisation set up, Korotkova et al. (2015) could not find lipids within the crystal structure of EspB7-278 which lacks the C-terminus. Our results show that the C-terminus of EspB contributes to the protein’s preferred orientation on an EM grid caused by an interaction to the hydrophobic air-water interface (Noble et al., 2018), analogous to what could happen on a lipid membrane. With a PPII helix at the end of the channel followed by hydrophobic patches at the C-terminus, we hypothesise that this secondary structure interacts with the head group of the lipids, as it has been described for other PPII (Franz et al., 2016), allowing the hydrophobic residues to insert into bilayer membranes. Based on the chemical properties of the channel and supported by the evidence of an extra EspB monomer observed within the oligomer, we propose that EspB could be a structural element of ESX-1 allowing other substrates to transit through the channel. The combined data presented in our work leads us to hypothesise three models of the role of the EspB oligomer. EspB within the cytosol is likely to be monomeric (Korotkova et al., 2015), either free or chaperoned by EspK (McLaughlin et al., 2007). Binding of a chaperone to the helical tip of EspB would place the WxG and YxxxD bipartite secretion signal exposed on the top of EspB, ready to interact with the T7SS machinery. Upon exiting ESX-1 inner-membrane pore, the pre-protein EspB will be cleaved within the periplasm by MycP1. Analogous to ESX-5 (Bunduc et al., 2020b), we expect MycP1 to cap the central periplasmic dome-like chamber formed by EccB1, and to have its proteolytic .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 site faced towards its central pore. The cleavage of the C-terminal region at A358 (Solomonson et al., 2013) will remove the most hydrophilic part of the C-terminus leaving a hydrophobic tail (Fig EV6A). From here, we propose three possible pathways for the oligomerisation of EspB. In one scenario, after processing of the C-terminus, EspB binds the outer membrane of mycobacteria increasing its critical concentration to form an oligomer (Fig 7 model 1). As suggested above, it can be assumed that EspB monomers would transit through the inner-membrane pore with the top first where the C- terminus as well as the WxG and YxxxD motifs are located. In this position, the monomers would already be properly oriented to form an oligomer on the outer membrane inner leaflet just like EspB2-358 attaches to the air-water interface of a cryo-EM grid (Fig 4A). The inner pore of the EspB heptamer has similar dimensions compared to that proposed by Beckham et al. for the ESX-5 hexameric structure (Beckham et al., 2017), albeit they later published a higher resolution structure with a more constricted pore in a close state (Beckham et al., 2020). The space between the inner and outer-membrane has been reported to be 20–24 nm wide (Dulberger et al., 2020; Sani et al., 2010; Zuber et al., 2008), which could accommodate the 9-nm long EspB heptamer. It was postulated (Piton et al., 2020) that the positively charged interior of the EspB channel could play a role in the transfer of negatively charged substrates such as DNA or phospholipids. However, in analogy to the negative lumen of a bacteriophage tail that is used to transfer DNA (Zinke et al., 2020), we propose that the positively charged interior space of the EspB oligomer would channel substrates of the same charge, as negatively charged substrates would most likely bind and get trapped. Since the heptameric structure presented here lacks any trans-membrane domains and is highly soluble, it is unlikely to be embedded within the outer membrane but could be anchored by its C-terminus forming part of a larger machinery that completes the ESX-1 core complex. EspB is well known to be secreted to the culture medium of mycobacteria (Lodes et al., 2001), thus in this model EspB will help in its own secretion by forming a channel through which additional substrates like EspB itself, could travel. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 In a second scenario, after EspB got secreted outside the bacterium, it could interact with either the phagosomal membrane or the external face of the outer membrane (Fig 7 model 2). The aforementioned hypothesis of how EspB is secreted (C-terminus and WxG/YxxD motifs first) would favour interaction with the phagosomal membrane; however, there is also some evidence of EspB being extracted from outer-most layer of the bacterium (Sani et al., 2010). From different experiments (Fig 1), it is expected that oligomerisation is concentration dependent. Interaction with protein structures or a membrane could increase the local concentration, making the system more efficient. In the third more speculative model, EspB would undergo a conformational change, as observed for some pore-forming proteins such as the amphitropic gasdermins (Liu & Lieberman, 2020). Upon proteolysis, a pre-pore ring could assemble prior to membrane insertion (Ruan et al., 2018). Recently, it was suggested that the PE/PPE family of proteins could form small molecule- selective channels analogous to outer-membrane porins, allowing M. tuberculosis to take up nutrients through its almost impermeable cell wall (Wang et al., 2020). Despite evidence, it remains a mystery how such soluble heterodimers would insert into a membrane. We hypothesise that, analogous to the heterodimer EsxA/EsxB where EsxA alone can insert into a membrane in acidic conditions (De Leon et al., 2012), the amphiphilic helices of either PE or PPE alone might insert into the membrane. EspB is fundamentally different from PE/PPE pairs in the sense that its PE and PPE parts are fused into a single protein, joined by one long flexible linker able to adopt multiple conformations (Piton et al., 2020). Unlike the EsxA-EsxB heterodimer, where EsxA would act independently from EsxB upon membrane insertion, the PE moiety of EspB would still be linked to its PPE counterpart even if the latter inserts would itself into a membrane. We speculate that such linker could allow EspB to form tubular-like structures while exchanging PE and PPE domains between different molecules (Fig 7 model 3). Such higher-order oligomers, as described for EspC (Lou .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 et al., 2017) and occasionally also found in our data (Fig EV7), could be a component of the secretion apparatus. Our hypothesis that EspB acts as a scaffold or structural component of the secretion apparatus is supported by earlier findings. As ESX-1 work through a contact-dependent mechanism and not by secretion of toxins (Conrad et al., 2017), it is possible that the cytotoxic effects on macrophages observed by Chen et al. for EspB was the result of an increment in the machinery activity (Chen et al., 2013). Most of the work described here favours model 1 or 2. More evidence needs to be gathered to falsify or verify any of the models. Techniques like in situ cryo-electron tomography of infected immune cells could be used to provide visual insight. In summary, this study reveals factor that prime the oligomerisation of EspB and presents evidence that supports the hypothesis that EspB is a structural element of ESX-1 secretion system, possibly acting on a lipid membrane. ESX-1 is a major player in the virulence of mycobacterial species, like M. tuberculosis. However, after decades of arduous research, our understanding on the structure and the mechanism of action of this system remains limited. Here we provide a structural and possibly functional understanding of an ESX-1 element. Full understanding of all the ESX-1 components and structural states could guide structural-based drug and vaccine design in order to tackle the global health threat that tuberculosis is. Materials and Methods Cloning, expression and purification of EspB constructs Different constructs used in this study are listed in S1 Table. DNA fragments were PCR-amplified with KOD Hot Start Master Mix (Novagen®) from genomic DNA of M. tuberculosis H37Rv, M. marinum or M. smegmatis [BEI Resources, National Institute of Allergy and Infectious Diseases (NIAID)], and .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 cloned in a modified pRSET backbone (Invitrogen™) using NsiI and HindIII restriction sites. Constructs included an N-terminal 6×His-tag followed by a tobacco etch virus (TEV) protease cleavage site. EspB mutants and construct EspB2-385 and EspB2-287 were generated using KOD-Plus- Mutagenesis kit (Toyobo Co., Ltd.) from the plasmid encoding the full-length protein. All plasmids were sequenced to verify absence of inadvertent mutations. M. haemophilum and PE25-PPE41 construct were synthesised and codon optimised for expression in Escherichia coli (Eurofins Genomics). For the non–codon optimised constructs, proteins were expressed in Rosetta (DE3) E. coli cells in Overnight Express™ Instant LB Medium (EMD Millipore) supplemented with 100 μg/mL of carbenicillin and 25 μg/mL of chloramphenicol for 50 h at 25 °C. In the case of codon optimisation, the protein was expressed in C41 (DE3) E. coli cells in the same conditions with the respective antibiotic. Prior to protein purification, cells were resuspended in buffer containing 20 mM Tris-HCl (pH 8.0), 300 mM NaCl, 1 mM PMSF, and 25 U/mL benzonase, and were lysed using an EmulsiFlex-C3 homogenizer (Avestin). Proteins were purified with HisPur™ Ni-NTA Resin (ThermoFisher) equilibrated in the lysis buffer and eluted in the same buffer supplemented with 400 mM imidazole. The 6×His-tag was cleaved using TEV protease followed by a second Ni-NTA purification to remove the free 6xHis-tag, uncleaved protein and the His-tagged protease (Kapust et al., 2001). In case higher purity was needed, proteins were purified on a size-exclusion Superdex200 Increase 10/300 GL column (GE Healthcare) in buffer containing 20 mM Tris-HCl (pH 8.0), 300 mM NaCl. Protein was stored at -80 °C until further use. Analytical size exclusion chromatography (SEC) Samples were dialysed overnight in the corresponding buffers and different concentrations of protein were loaded onto a size-exclusion Superdex200 Increase 3.2/300 column (GE Healthcare Life Science) at a flow rate of 50 µL/min. Basic buffer comprised 20 mM Tris-HCl (pH 8.0), 150 mM NaCl, while the acidic buffer was 20 mM acetate buffer (pH 5.5), 150 mM NaCl. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 Native mass spectrometry Native mass spectrometry was used to obtain the high resolution mass information of the samples. M. tuberculosis EspB2-348 (5 mg/mL) was buffer exchanged with 100 mM NH4CH3CO2 (at pH 5.5 and 8.5) using 3-kDa molecular weight cut-off dialysis membrane overnight followed by an extra hour buffer exchange with a fresh NH4CH3CO2 solution at 4 °C. The buffer exchange of fragments produced by limited proteolysis (2 mg/mL) was performed using SEC on a Superdex 200 Increase 3.2/300 column (GE Healthcare Life Science) with 100 mM NH4CH3CO2 at pH 6.8. CH3COOH and NH4OH were used to adjust the pH of NH4CH3CO2 solution. The mass spectrometry measurements were performed in positive ion mode on an ultra-high mass range (UHMR) Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific) with a static nano-electrospray ionization (nESI) source. In- house pulled, gold-coated borosilicate capillaries were used for the sample introduction to the mass spectrometer, and a voltage of 1.2 kV was applied. Mass spectral resolution was set at 4,375 to 8,750 (at m/z=200) and an injection time of 100 to 200 ms was used. For each spectrum, 10 scans were combined, containing 5 to 10 microscans. The inlet capillary temperature was kept at 320 °C. Parameters such as in-source trapping, transfer m/z, detector m/z, trapping gas pressure and mass range were optimized for each analyte separately. All mass spectra were analysed using Thermo Scientific Xcalibur software and spectral deconvolutions were performed with the UniDec software (Marty et al., 2015). Cryo-EM sample preparation, data acquisition and image processing Samples, in 20 mM acetate buffer (pH 5.5), 150 mM NaCl, were diluted to the respective concentrations (Table 1). A volume of 2.5 μL of each sample was applied on glow-discharged UltrAuFoil Au300 R1.2/1.3 grids (Quantifoil), and excess liquid was removed by blotting for 3 s (blot force 5) using filter paper followed by plunge freezing in liquid ethane using a FEI Vitrobot Mark IV at .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 100% humidity at 4 °C. For PE25-PPE41, an acetate buffer (pH 6.5), 150 mM NaCl was used, due to precipitation of the protein at lower pH. Cryo-EM single particle analysis (SPA) data were collected using untilted and tilted schemes (Tan et al., 2017). For EspB2-287 from M. tuberculosis and EspB2-286 from M. marinum, untilted images were recorded on a Titan Krios at 300 kV with a K3 detector operated in super-resolution counting mode. Tilted SPA data were collected for EspB2-348 from M. tuberculosis on a 200-kV Tecnai Arctica TEM using SerialEM (Mastronarde, 2005), using a Falcon III detector in counting mode. Table 2 shows all specifications and statistics for the data sets. Individual micrographs of EspB2-287 from M. haemophilum, EspB2-348 from M. smegmatis as well as PE25-PPE41 from M. tuberculosis were collected on the 200-kV Arctica. Data were processed using the RELION-3 pipeline (Zivanov et al., 2018). Movie stacks were corrected for drift (5 × 5 patches) and dose-weighted using MotionCor2 (Zheng et al., 2017). The local contrast transfer function (CTF) parameters were determined for the drift-corrected micrographs using Gctf (Zhang, 2016). The EspB2-348 data set was collected at two angles of the stage: 0 degrees and 40 degrees. For each tilt angle, a first set of 2D references were generated from manually picked particles in RELION (Scheres, 2012) and these were used for subsequent automatic particle picking. Table 2 lists the number of particles in the final data set after particle picking, 2D classification and 3D classification. The 3D classification was run without imposing symmetry and used to select the heptameric particles. Local CTF parameters were iteratively refined (Zivanov et al., 2018) , which was particularly important for the tilted data set, beamtilt parameters were estimated and particles were polished. Particle subtraction followed by focused classification was used to characterise densities other than that described by the refined model described below. Due to extreme preferred orientation of the datasets of EspB2-348, automatic masking and automatic B-factor estimation in post-processing were hampered by missing wedge artefacts. For this data set, parameters were .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 manually optimised by visual inspection of the resulting maps. Density within the heptameric pore was obtained by a combination of 2D and 3D classification. The initial density map of a loaded complex was generated by symmetry expansion of a C7 3D-refined particle list, followed by 3D classification in C1 without further image alignment. Later iterations employed 45671 unique particles and 3D refinement in C1 while imposing local symmetry for the heptamer. The resulting 5.3 Å map was used to identify a total of eight EspB monomers (heptamer plus one in the middle), and local symmetry averaged. The final resolution of the heptamer maps, listed in Table 2, varied between 2.3 and 3.4 Å, using the gold-standard FSC=0.143 criterion (Scheres & Chen, 2012). Structure determination and refinement The PDB model 4XXX (Korotkova et al., 2015) was used as a starting model in Coot (Emsley & Cowtan, 2004) for manual docking and building into the tilted-scheme SPA data set of EspB2-348 of M. tuberculosis. The final model was refined against the high-resolution sharpened map of EspB2-287 of M. tuberculosis. This model was later used as reference for M. marinum model. Models were refined iteratively through rounds of manual adjustment in Coot (Emsley et al., 2010), real space refinement in Phenix (Afonine et al., 2018) and structure validation using MolProbity (Williams et al., 2018). Limited proteolysis and Edman sequencing Samples were incubated with trypsin for different length of time at a molar ratio of 1:6 (enzyme:substrate) following the Proti-Ace™ Kit (Hampton Research) recommendation. The reaction was stopped by adding SDS-PAGE loading buffer (63 mM Tris-HCl, 2% SDS, 10% glycerol, 0.1% bromophenol blue) and samples were resolved on a 12% polyacrylamide gel. Bands were transferred from the SDS-PAGE gel to a PVDF membrane and stained with 0.1% Coomassie Brilliant Blue R-250, 40% methanol, and 10% acetic acid until bands were visible. The membrane was then washed with water and dried, and EspB cleavage products were cut out. The first ten amino acids were .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 determined by Edman sequencing at the Plateforme Protéomique PISSARO IRIB at the Université de Rouen. Circular dichroism spectroscopy The CD spectra of 5 µM EspB279-460 were recorded either in 50 mM phosphate (pH 8.0), 50 mM NaCl or 10 mM acetate (pH 5.5), 50 mM NaCl at 25 °C in the far-UV region using a Jasco J-1500 CD spectropolarimeter (JASCO Analytical Instruments) on a 0.1 cm path-length cell. Spectra correspond to the average of five repetitive scans acquired every 1 nm with 5-s average time per point and 1-nm band pass. Temperature was regulated with a Peltier temperature-controlled cell holder. Data were corrected by subtracting the CD signal of the buffer over the same wavelength region. The effect of 2,2,2-trifluoroethanol (TFE) was recorded using the aforementioned phosphate buffer. Secondary structure content was estimated by deconvolution using the program BeStSel (Micsonai et al., 2018). Data availability The final maps as well as the half-maps and masks will be deposited in EMPIAR. The refined M. tuberculosis and M. marinum will be deposited within the Protein Data Bank. Acknowledgments We thank Paul van Schayck (UM) for indispensable SerialEM and IT support; the Microscopy CORE Lab (UM) for their technical and scientific support; Yue Zhang (UM) for help in model refinement; Chris Lewis (UM) for help with the tomograms; Laurent Coquet (Université de Rouen, France) for Edman sequencing; Florence Pojer and Stewart Cole (Global Health Institute, Lausanne, Switzerland) for initial sample aliquots and preliminary studies; Ron Heeren and Shane Ellis (UM) for native mass spectrometry support; and Hang Nguyen (UM) for critical reading of the manuscript. This research received funding from the Netherlands Organisation for Scientific Research (NWO) in the framework .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 of the Fund New Chemical Innovations, numbers 731.016.407 and 184.034.014, from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 766970 Q- SORT. This research is also part of the M4I research programme supported by the Dutch Province of Limburg through the LINK programme. Author Contributions AG and RBGR designed the study and wrote the manuscript. AG, VV, YG, GT, AS, NSP and AM performed the experiments. AG, GT, NSP and RBGR analysed the data. CLP, PJP and RBGR supervised the project. All authors read and approved the final manuscript. Declaration of Interests The authors declare no competing interests. References Abdallah AM, Bestebroer J, Savage ND, de Punder K, van Zon M, Wilson L, Korbee CJ, van der Sar AM, Ottenhoff TH, van der Wel NN et al (2011) Mycobacterial secretion systems ESX-1 and ESX-5 play distinct roles in host cell death and inflammasome activation. J Immunol 187: 4744-4753 Abdallah AM, Gey van Pittius NC, Champion PA, Cox J, Luirink J, Vandenbroucke-Grauls CM, Appelmelk BJ, Bitter W (2007) Type VII secretion--mycobacteria show the way. Nat Rev Microbiol 5: 883-891 Afonine PV, Klaholz BP, Moriarty NW, Poon BK, Sobolev OV, Terwilliger TC, Adams PD, Urzhumtsev A (2018) New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallographica Section D 74: 814-840 Ates LS, Ummels R, Commandeur S, van de Weerd R, Sparrius M, Weerdenburg E, Alber M, Kalscheuer R, Piersma SR, Abdallah AM et al (2015) Essential Role of the ESX-5 Secretion System in Outer Membrane Permeability of Pathogenic Mycobacteria. PLoS Genet 11: e1005190 Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci U S A 98: 10037-10041 Beckham KS, Ciccarelli L, Bunduc CM, Mertens HD, Ummels R, Lugmayr W, Mayr J, Rettel M, Savitski MM, Svergun DI et al (2017) Structure of the mycobacterial ESX-5 type VII secretion system membrane complex by single-particle analysis. Nat Microbiol 2: 17047 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 Beckham KS, Ritter C, Chojnowski G, Mullapudi E, Rettel M, Savitski MM, Mortensen SA, Kosinski J, Wilmanns M (2020) Structure of the mycobacterial ESX-5 Type VII Secretion System hexameric pore complex. bioRxiv Bitter W, Houben EN, Bottai D, Brodin P, Brown EJ, Cox JS, Derbyshire K, Fortune SM, Gao LY, Liu J et al (2009) Systematic genetic nomenclature for type VII secretion systems. PLoS Pathog 5: e1000507 Brennan PJ, Nikaido H (1995) The envelope of mycobacteria. Annu Rev Biochem 64: 29-63 Bunduc CM, Bitter W, Houben ENG (2020a) Structure and Function of the Mycobacterial Type VII Secretion Systems. Annu Rev Microbiol Bunduc CM, Fahrenkamp D, Wald J, Ummels R, Bitter W, Houben EN, Marlovits TC (2020b) Structure and dynamics of the ESX-5 type VII secretion system of Mycobacterium tuberculosis. BioRxiv Burggraaf MJ, Ates LS, Speer A, van der Kuij K, Kuijl C, Bitter W (2019) Optimization of secretion and surface localization of heterologous OVA protein in mycobacteria by using LipY as a carrier. Microb Cell Fact 18: 44 Carlsson F, Joshi SA, Rangell L, Brown EJ (2009) Polar localization of virulence-related Esx-1 secretion in mycobacteria. PLoS Pathog 5: e1000285 Cascioferro A, Delogu G, Colone M, Sali M, Stringaro A, Arancia G, Fadda G, Palu G, Manganelli R (2007) PE is a functional domain responsible for protein translocation and localization on mycobacterial cell wall. Mol Microbiol 66: 1536-1547 Chen JM, Zhang M, Rybniker J, Boy-Rottger S, Dhar N, Pojer F, Cole ST (2013) Mycobacterium tuberculosis EspB binds phospholipids and mediates EsxA-independent virulence. Mol Microbiol 89: 1154-1166 Conrad WH, Osman MM, Shanahan JK, Chu F, Takaki KK, Cameron J, Hopkinson-Woolley D, Brosch R, Ramakrishnan L (2017) Mycobacterial ESX-1 secretion system mediates host cell lysis through bacterium contact-dependent gross membrane disruptions. Proc Natl Acad Sci U S A 114: 1371-1376 De Leon J, Jiang G, Ma Y, Rubin E, Fortune S, Sun J (2012) Mycobacterium tuberculosis ESAT-6 exhibits a unique membrane-interacting activity that is not found in its ortholog from non-pathogenic Mycobacterium smegmatis. J Biol Chem 287: 44184-44191 Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA (2004) PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res 32: W665-667 Dulberger CL, Rubin EJ, Boutte CC (2020) The mycobacterial cell envelope - a moving target. Nat Rev Microbiol 18: 47-59 Ekiert DC, Cox JS (2014) Structure of a PE-PPE-EspG complex from Mycobacterium tuberculosis reveals molecular specificity of ESX protein secretion. Proc Natl Acad Sci U S A 111: 14758-14763 Emsley P, Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60: 2126-2132 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of Coot. Acta Crystallographica Section D 66: 486-501 Famelis N, Rivera-Calzada A, Degliesposti G, Wingender M, Mietrach N, Skehel JM, Fernandez- Leiro R, Bottcher B, Schlosser A, Llorca O et al (2019) Architecture of the mycobacterial type VII secretion system. Nature Ferluga J, Yasmin H, Al-Ahdal MN, Bhakta S, Kishore U (2020) Natural and trained innate immunity against Mycobacterium tuberculosis. Immunobiology 225: 151951 Flint JL, Kowalski JC, Karnati PK, Derbyshire KM (2004) The RD1 virulence locus of Mycobacterium tuberculosis regulates DNA transfer in Mycobacterium smegmatis. Proc Natl Acad Sci U S A 101: 12598-12603 Franz J, Lelle M, Peneva K, Bonn M, Weidner T (2016) SAP(E) - A cell-penetrating polyproline helix at lipid interfaces. Biochim Biophys Acta 1858: 2028-2034 Gey Van Pittius NC, Gamieldien J, Hide W, Brown GD, Siezen RJ, Beyers AD (2001) The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram-positive bacteria. Genome Biol 2: RESEARCH0044 Goddard TD, Huang CC, Meng EC, Pettersen EF, Couch GS, Morris JH, Ferrin TE (2018) UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci 27: 14-25 Gray TA, Clark RR, Boucher N, Lapierre P, Smith C, Derbyshire KM (2016) Intercellular communication and conjugation are mediated by ESX secretion systems in mycobacteria. Science 354: 347-350 Henderson R (1995) The potential and limitations of neutrons, electrons and X-rays for atomic resolution microscopy of unstained biological molecules. Q Rev Biophys 28: 171-193 Houben D, Demangel C, van Ingen J, Perez J, Baldeon L, Abdallah AM, Caleechurn L, Bottai D, van Zon M, de Punder K et al (2012a) ESX-1-mediated translocation to the cytosol controls virulence of mycobacteria. Cell Microbiol 14: 1287-1298 Houben EN, Bestebroer J, Ummels R, Wilson L, Piersma SR, Jimenez CR, Ottenhoff TH, Luirink J, Bitter W (2012b) Composition of the type VII secretion system membrane complex. Mol Microbiol 86: 472-484 Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund AC, Blanchet C, Bongcam-Rudloff E, Combet C, Da Costa AL, Deleage G et al (2009) PDB_REDO: automated re-refinement of X-ray structure models in the PDB. J Appl Crystallogr 42: 376-384 Kapust RB, Tozser J, Fox JD, Anderson DE, Cherry S, Copeland TD, Waugh DS (2001) Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. Protein Eng 14: 993-1000 Korotkova N, Freire D, Phan TH, Ummels R, Creekmore CC, Evans TJ, Wilmanns M, Bitter W, Parret AH, Houben EN et al (2014) Structure of the Mycobacterium tuberculosis type VII secretion system chaperone EspG5 in complex with PE25-PPE41 dimer. Mol Microbiol 94: 367-382 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Korotkova N, Piton J, Wagner JM, Boy-Rottger S, Japaridze A, Evans TJ, Cole ST, Pojer F, Korotkov KV (2015) Structure of EspB, a secreted substrate of the ESX-1 secretion system of Mycobacterium tuberculosis. J Struct Biol 191: 236-244 Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947-2948 Liu X, Lieberman J (2020) Knocking 'em Dead: Pore-Forming Proteins in Immune Defense. Annu Rev Immunol 38: 455-485 Lodes MJ, Dillon DC, Mohamath R, Day CH, Benson DR, Reynolds LD, McNeill P, Sampaio DP, Skeiky YA, Badaro R et al (2001) Serological expression cloning and immunological evaluation of MTB48, a novel Mycobacterium tuberculosis antigen. J Clin Microbiol 39: 2485-2493 Lou Y, Rybniker J, Sala C, Cole ST (2017) EspC forms a filamentous structure in the cell envelope of Mycobacterium tuberculosis and impacts ESX-1 secretion. Mol Microbiol 103: 26-38 Luo P, Baldwin RL (1997) Mechanism of helix induction by trifluoroethanol: a framework for extrapolating the helix-forming properties of peptides from trifluoroethanol/water mixtures back to water. Biochemistry 36: 8413-8421 Ma Y, Keil V, Sun J (2015) Characterization of Mycobacterium tuberculosis EsxA membrane insertion: roles of N- and C-terminal flexible arms and central helix-turn-helix motif. J Biol Chem 290: 7314-7322 Marty MT, Baldwin AJ, Marklund EG, Hochberg GK, Benesch JL, Robinson CV (2015) Bayesian deconvolution of mass and ion mobility spectra: from binary interactions to polydisperse ensembles. Anal Chem 87: 4370-4376 Mastronarde DN (2005) Automated electron microscope tomography using robust prediction of specimen movements. J Struct Biol 152: 36-51 McLaughlin B, Chon JS, MacGurn JA, Carlsson F, Cheng TL, Cox JS, Brown EJ (2007) A mycobacterium ESX-1-secreted virulence factor with unique requirements for export. PLoS Pathog 3: e105 Micsonai A, Wien F, Bulyaki E, Kun J, Moussong E, Lee YH, Goto Y, Refregiers M, Kardos J (2018) BeStSel: a web server for accurate protein secondary structure prediction and fold recognition from the circular dichroism spectra. Nucleic Acids Res 46: W315-W322 Noble AJ, Wei H, Dandey VP, Zhang Z, Tan YZ, Potter CS, Carragher B (2018) Reducing effects of particle adsorption to the air-water interface in cryo-EM. Nat Methods 15: 793-795 Ohol YM, Goetz DH, Chan K, Shiloh MU, Craik CS, Cox JS (2010) Mycobacterium tuberculosis MycP1 protease plays a dual role in regulation of ESX-1 secretion and virulence. Cell Host Microbe 7: 210-220 Olsson MH, Sondergaard CR, Rostkowski M, Jensen JH (2011) PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. J Chem Theory Comput 7: 525-537 Phan TH, Ummels R, Bitter W, Houben EN (2017) Identification of a substrate domain that determines system specificity in mycobacterial type VII secretion systems. Sci Rep 7: 42704 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Phan TH, van Leeuwen LM, Kuijl C, Ummels R, van Stempvoort G, Rubio-Canalejas A, Piersma SR, Jimenez CR, van der Sar AM, Houben ENG et al (2018) EspH is a hypervirulence factor for Mycobacterium marinum and essential for the secretion of the ESX-1 substrates EspE and EspF. PLoS Pathog 14: e1007247 Piton J, Pojer F, Wakatsuki S, Gati C, Cole ST (2020) High resolution CryoEM structure of the ring- shaped virulence factor EspB from Mycobacterium tuberculosis. J Struct Biol: X 4 Poweleit N, Czudnochowski N, Nakagawa R, Trinidad DD, Murphy KC, Sassetti CM, Rosenberg OS (2019) The structure of the endogenous ESX-3 secretion system. Elife 8 Pym AS, Brodin P, Majlessi L, Brosch R, Demangel C, Williams A, Griffiths KE, Marchal G, Leclerc C, Cole ST (2003) Recombinant BCG exporting ESAT-6 confers enhanced protection against tuberculosis. Nat Med 9: 533-539 Rosenberg OS, Dovala D, Li X, Connolly L, Bendebury A, Finer-Moore J, Holton J, Cheng Y, Stroud RM, Cox JS (2015) Substrates Control Multimerization and Activation of the Multi-Domain ATPase Motor of Type VII Secretion. Cell 161: 501-512 Ruan J, Xia S, Liu X, Lieberman J, Wu H (2018) Cryo-EM structure of the gasdermin A3 membrane pore. Nature 557: 62-67 Rucker AL, Creamer TP (2002) Polyproline II helical structure in protein unfolded states: lysine peptides revisited. Protein Sci 11: 980-985 Sani M, Houben EN, Geurtsen J, Pierson J, de Punder K, van Zon M, Wever B, Piersma SR, Jimenez CR, Daffe M et al (2010) Direct visualization by cryo-EM of the mycobacterial capsular layer: a labile structure containing ESX-1-secreted proteins. PLoS Pathog 6: e1000794 Schaberg T, Rebhan K, Lode H (1996) Risk factors for side-effects of isoniazid, rifampin and pyrazinamide in patients hospitalized for pulmonary tuberculosis. Eur Respir J 9: 2026-2030 Scheres SH (2012) RELION: implementation of a Bayesian approach to cryo-EM structure determination. J Struct Biol 180: 519-530 Scheres SH, Chen S (2012) Prevention of overfitting in cryo-EM structure determination. Nat Methods 9: 853-854 Siegrist MS, Unnikrishnan M, McConnell MJ, Borowsky M, Cheng TY, Siddiqi N, Fortune SM, Moody DB, Rubin EJ (2009) Mycobacterial Esx-3 is required for mycobactin-mediated iron acquisition. Proc Natl Acad Sci U S A 106: 18792-18797 Simeone R, Bobard A, Lippmann J, Bitter W, Majlessi L, Brosch R, Enninga J (2012) Phagosomal rupture by Mycobacterium tuberculosis results in toxicity and host cell death. PLoS Pathog 8: e1002507 Smart OS, Goodfellow JM, Wallace BA (1993) The pore dimensions of gramicidin A. Biophys J 65: 2455-2460 Solomonson M, Huesgen PF, Wasney GA, Watanabe N, Gruninger RJ, Prehna G, Overall CM, Strynadka NC (2013) Structure of the mycosin-1 protease from the mycobacterial ESX-1 protein type VII secretion system. J Biol Chem 288: 17782-17790 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 Solomonson M, Setiaputra D, Makepeace KAT, Lameignere E, Petrotchenko EV, Conrady DG, Bergeron JR, Vuckovic M, DiMaio F, Borchers CH et al (2015) Structure of EspB from the ESX-1 type VII secretion system and insights into its export mechanism. Structure 23: 571-583 Stanley SA, Johndrow JE, Manzanillo P, Cox JS (2007) The Type I IFN response to infection with Mycobacterium tuberculosis requires ESX-1-mediated secretion and contributes to pathogenesis. J Immunol 178: 3143-3152 Tan YZ, Baldwin PR, Davis JH, Williamson JR, Potter CS, Carragher B, Lyumkis D (2017) Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat Methods 14: 793-796 Theillet FX, Kalmar L, Tompa P, Han KH, Selenko P, Dunker AK, Daughdrill GW, Uversky VN (2013) The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins. Intrinsically Disord Proteins 1: e24360 Uversky VN (2013) The alphabet of intrinsic disorder: II. Various roles of glutamic acid in ordered and intrinsically disordered proteins. Intrinsically Disord Proteins 1: e24684 van der Wel N, Hava D, Houben D, Fluitsma D, van Zon M, Pierson J, Brenner M, Peters PJ (2007) M. tuberculosis and M. leprae translocate from the phagolysosome to the cytosol in myeloid cells. Cell 129: 1287-1298 van Winden VJ, Ummels R, Piersma SR, Jimenez CR, Korotkov KV, Bitter W, Houben EN (2016) Mycosins Are Required for the Stabilization of the ESX-1 and ESX-5 Type VII Secretion Membrane Complexes. mBio 7 Wang Q, Boshoff HIM, Harrison JR, Ray PC, Green SR, Wyatt PG, Barry CE, 3rd (2020) PE/PPE proteins mediate nutrient transport across the outer membrane of Mycobacterium tuberculosis. Science 367: 1147-1151 Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189-1191 Williams CJ, Headd JJ, Moriarty NW, Prisant MG, Videau LL, Deis LN, Verma V, Keedy DA, Hintze BJ, Chen VB et al (2018) MolProbity: More and better reference data for improved all-atom structure validation. Protein Science 27: 293-315 Williamson ZA, Chaton CT, Ciocca WA, Korotkova N, Korotkov KV (2020) PE5-PPE4-EspG3 heterotrimer structure from mycobacterial ESX-3 secretion system gives insight into cognate substrate recognition by ESX systems. J Biol Chem World Health Organization, 2019. Global tuberculosis report. Geneva: World Health Organization. Xie NZ, Du QS, Li JX, Huang RB (2015) Exploring Strong Interactions in Proteins with Quantum Chemistry and Examples of Their Applications in Drug Design. PLoS One 10: e0137113 Xu J, Laine O, Masciocchi M, Manoranjan J, Smith J, Du SJ, Edwards N, Zhu X, Fenselau C, Gao LY (2007) A unique Mycobacterium ESX-1 protein co-secretes with CFP-10/ESAT-6 and is necessary for inhibiting phagosome maturation. Mol Microbiol 66: 787-800 Zhang K (2016) Gctf: Real-time CTF determination and correction. J Struct Biol 193: 1-12 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 Zhang Y, Tammaro R, Peters PJ, Ravelli RBG (2020) Could Egg White Lysozyme be Solved by Single Particle Cryo-EM? J Chem Inf Model 60: 2605-2613 Zheng SQ, Palovcak E, Armache JP, Verba KA, Cheng Y, Agard DA (2017) MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat Methods 14: 331-332 Zinke M, Sachowsky KAA, Öster C, Zinn-Justin S, Ravelli RBG, Schröder GF, Habeck M, Lange A (2020) Spinal Column Architecture of the Flexible SPP1 Bacteriophage Tail Tube. Nature Communications 11: 5759 Zivanov J, Nakane T, Forsberg BO, Kimanius D, Hagen WJ, Lindahl E, Scheres SH (2018) New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7 Zuber B, Chami M, Houssin C, Dubochet J, Griffiths G, Daffe M (2008) Direct visualization of the outer membrane of mycobacteria and corynebacteria in their native state. J Bacteriol 190: 5672-5680 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Figure legends Fig 1. Oligomerisation of EspB is promoted by an acidic environment. (A) Size exclusion chromatography profiles of M. tuberculosis EspB2-358 at 210 µM in 20 mM acetate buffer (pH 5.5), 150 mM NaCl and 20 mM Tris (pH 8.0), 150 mM NaCl. Void volume corresponds to 0.8 mL elution volume. (B) Oligomer/monomer ratios at different protein concentrations in conditions from panel (A). The absorbance values of the oligomer were taken at 1.14 mL while the monomer values were at 1.48 mL. (C–D) Presence of the different oligomer species from M. tuberculosis EspB2-348 at pH 5.5 and pH 8.5 obtained by native mass spectrometry. Fig 2. Impact of EspB C-terminus processing on oligomerisation. (A) Scheme of the different constructs used in this work, where EspB2-460 is in blue, EspB2-358 in orange (MycP1 cleavage site), EspB2-348 in grey, EspB2-287 in yellow and EspB7-278 in green. Structural model from PDB ID 4XXX, while the C-terminal region is a representation of an unfolded protein. Arrows represent the end of each construct. (B) Size exclusion chromatograms of each construct corresponding to the colours in panel (A), resulting from 50 µL sample injection at 220 µM eluted in 20 mM acetate buffer (pH 5.5), 150 mM NaCl. Void volume corresponds to 0.8 mL elution volume. Fig 3. Oligomerisation differences between EspB orthologues despite sharing similar tertiary structure. (A) Evaluation of the oligomerisation of EspB orthologues and PE25-PPE41 by cryo- electron microscopy. Scale bars represent 50 nm. (B) Different views of structural alignment of EspBMtb (yellow – this work), EspBMmar (green – this work), EspBMsmeg (light blue – PDB ID 4WJ1), and PE25-PPE41 (orange – PDB ID 4W4K). (C) Multi-alignment of amino acid sequences of different species from the Mycobacterium genus, as well as the protein pair PE25-PPE41. Numbering and sequence identity is based on the sequence of M. tuberculosis. Rectangles denote residues involved in the oligomerisation of EspB. Alignment was generated using ClustalW server, and figure was .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 created using software Jalview (Waterhouse et al., 2009). The colour scheme of ClustalX is used (Larkin et al., 2007). Fig 4. Loss of the EspB preferential orientation by removal of its C-terminal residues. (A–B) Representative micrograph of EspB2-348 with preferential orientation at 0°-tilt angle or 40°-tilt angle. (C) Representative micrograph of EspB2-287 with random orientation taken at 0°-tilt angle. Insets correspond to the respective 2D classes. Scale bars in A–C represent 50 nm; scale bars in insets represent 5 nm. Fig 5. Cryo-EM reconstruction of EspB2-287 heptamer complex. (A–B) Density map and structural model made with ChimeraX (Goddard et al., 2018), showing each monomer in different colours. For (A) and (B), the upper panels show the top views and the bottom panels show the side views. (C) Model and densities of intramolecular interaction at W176-Y81 and intermolecular interaction at Q48-Q164. Colours follow the conventional colouring code for chemical elements. Fig 6. Characterisation of the EspB oligomer. (A) Electrostatic potential of EspB oligomer at acidic and (B) neural pH. The protonation state was assigned by PROPKA (Olsson et al., 2011) and electrostatic calculations were generated by APBS (Baker et al., 2001) and PDBPQR (Dolinsky et al., 2004). (C) The smallest inner diameter of the EspB oligomer is 40 Å, as calculated by HOLE (Smart et al., 1993). (D) Surface representation of amino acid hydrophobicity according to the Kyte-Doolittle scale (polar residues – purple, non-polar residues – gold). (E) High-resolution 2D class of EspB heptamer with extra density in the middle. (F) 2D projection of the 3D map obtained for the 7+1 EspB oligomer. (G) C1 3D map of 7+1 EspB oligomer with local symmetry applied to the heptamer ring. (H) C1 3D map of the 7+1 EspB oligomer with 8-fold local symmetry applied and models fitted to the map. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 Fig 7. Putative pathways for the oligomerisation of EspB. In Model 1, EspB is cleaved in its C-terminus by the protease MycP1 in the periplasm of mycobacteria leaving hydrophobic residues to insert into the outer membrane; an increase in the local concentration on the membrane leads to oligomerisation of EspB. In model 2, secretion of EspB across the double membrane after MyP1 cleavage allows the protein to bind to either the phagosomal membrane or the external part of the outer membrane. In model 3, after cleavage in the periplasm and secretion to the exterior of the bacterium, EspB undergoes a conformational change dissociating the PE and PPE domains and exposing hydrophobic residues that would allow the insertion into the membrane; while the PPE gets embedded into the membrane in an oligomeric form, the respective PE is able to interact with the PPE of a second molecule forming a tubular structure. Different colours are used for each heptamer- subunit. Regardless of what oligomerisation pathway EspB follows, oligomerised EspB is hypothesised to form part of the larger machinery that completes the inner-membrane complex of ESX-1. Table and their legends Table 1. Constructs used in this study Plasmid name Species Gene product Concentration used for cryo-EM experiments pAG01 M. tuberculosis 6×His-EspB 2-460 0.5 mg/mL pAG02 M. tuberculosis 6×His-EspB 2-358 0.5 mg/mL pAG03 M. tuberculosis 6×His-EspB 2-348 0.5 mg/mL pAG04 M. tuberculosis 6×His-EspB 2-287 6 mg/mL pAG05 M. tuberculosis 6×His-EspB 7-278 - pAG06 M. tuberculosis 6×His-MBP-EspB 279-460 - pAG07 M. tuberculosis 6×His-EspB 2-348 Q48A - pAG08 M. tuberculosis 6×His-EspB 2-348 N55C/T119C - pAG09 M. marinum 6×His-EspB 2-355 0.5 mg/mL pAG10 M. marinum 6×His-EspB 2-286 8.7 mg/mL pAG11 M. haemophilum 6×His-EspB 2-287 1 mg/mL pAG12 M. smegmatis 6×His-EspB 2-407 10 mg/mL pAG13 M. tuberculosis PE25 / 6×His-PPE41 5 mg/mL .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 Table 2. Statistics of cryo-EM data collection, reconstruction and structure refinement EspB 2-348 M. tuberculosis EspB 2-287 M tuberculosis EspB 2-286 M. marinum Grid type Quantifoil UltraAuFoil Au200 mesh R2/2 Quantifoil UltraAuFoil Au300 mesh R1.2/1.3 Quantifoil UltraAuFoil Au300 mesh R1.2/1.3 Microscope TFS Tecnai Arctica TFS Krios TFS Krios Camera Falcon III K3 electron counting K3 electron counting Automated Data Acquisition Software SerialEM EPU EPU Nominal magnification (k×) 110 105 105 Physical pixel size (Å) 0.935 0.834 0.834 Exposure time (s) 43 1.8 1.8 Fluence (e− Å−2) 40 40 40 Micrographs 1457 2334 2421 #fractions 50 40 40 Particles 914683 484786 435505 Symmetry imposed C7 C7 C7 Average resolution (Å) 3.4 2.29 2.43 FSC threshold 0.143 0.143 0.143 Map sharpening B factor (Å2) -180 −80 -91 Refinement Initial model used (PDB entry) 4XXX Model resolution (Å) FSC threshold 2.2 3.2 Model composition Atoms 13125 12488 Hydrogen atoms Protein residues 1617 1603 Waters 301 0 B factors (Å2) R.m.s. deviations Bond lengths (Å) 0.009 0.010 Bond angles (°) 1.087 1.347 Correlation coefficients Mask 0.89 0.66 Box 0.77 0.52 Validation MolProbity score 1.71 1.81 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 Clashscore 4.03 1.18 Rotamers outliers (%) 4.86 4.32 Ramachandran plot Favored (%) 98.24 91.62 Allowed (%) 1.32 7.94 Disallowed (%) 0.44 0.44 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 Expanded View Figure legends Fig EV1. Sequence alignment of EspB from different mycobacterial species. Numbering and sequence identity are based on the sequence of M. tuberculosis. Alignment was generated using ClustalW server, and figure was created using Jalview software (Waterhouse et al., 2009). The colour scheme of ClustalX is used (Larkin et al., 2007). Fig EV2. EspB preferential orientation caused by an interaction to the air-water interface. (A–B) Tomogram slice of EspB2-348 with 27 nm thickness in X,Y and X,Z orientation, respectively. Fig EV3. Cryo-EM analysis of EspB structure. Gold-standard Fourier shell correlation (FSC) plot of EspB2-287 from M. tuberculosis (A) and EspB2-286 from M. marinum (B). (C) Quality of cryo-EM–derived density map. Selected regions showing the fit of the derived atomic model to the cryo-EM density map (black mesh) (D) Size exclusion chromatograms of EspB2-348 from M. tuberculosis and mutants that affect oligomerisation. Fig EV4. Oligomerisation of EspB is independent of the integrity of the PE-PPE linker. (A) Trypsin digestion of different EspB constructs from M. tuberculosis over 1–4 h. (B) Structural model of an EspB monomer (PDB ID 4XXX) showing the PE-region (gold), PPE-region (grey) and the trypsin cleavage site (arrow, residues R121–V112). (C, D) Native mass spectrometry of the trypsin-digested sample, raw and deconvoluted data. Colour coding as in panel (B). Inset table compares the respective mass of the fragments calculated from the sequence and native mass spectrometry. (E) SEC (top) and SDS-PAGE (bottom) of undigested (black) and trypsin-digested (red) EspB2-348. Fig EV5. Characterisation of the C-terminal region of EspB. (A) Kyte-Doolittle hydrophobicity plot of residues 280–460 of EspB from M. tuberculosis. Inset shows the degree of hydrophobicity of residues 280–360 from different species. Window size of 9 was used as parameter. (B, C) Far UV circular .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 dichroism spectra of M. tuberculosis EspB279-460 at different pH and TFE concentrations. Inset in (B) shows the spectrum-difference between pH 5.5 and pH 8.0. Fig EV6. Higher-order oligomer formation. Size exclusion chromatography profiles of EspB2-348 from M. tuberculosis (20 mg/mL) injected onto a Superdex200 Increase 10/300 GL. Inset corresponds to a Blue-Native PAGE of the SEC fractions highlighted in red. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.02.425093doi: bioRxiv preprint https://doi.org/10.1101/2021.01.02.425093 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_03_425155 ---- Lithium ions display weak interaction with amyloid-beta (Aβ) peptides and have minor effects on their aggregation 1 Lithium ions display weak interaction with amyloid-beta (Aβ) peptides and have minor effects on their aggregation Elina Berntsson1,2, Suman Paul1, Faraz Vosough1, Sabrina B. Sholts3, Jüri Jarvet1,4, Per M. Roos5,6, Andreas Barth1, Astrid Gräslund1, Sebastian K. T. S. Wärmländer1,* 1 Department of Biochemistry and Biophysics, Stockholm University, Sweden. 2 Department of Chemistry and Biotechnology, Tallinn University of Technology, Estonia; 3 Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA. 4 The National Institute of Chemical Physics and Biophysics, Tallinn, Estonia. 5 Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden. 6 Department of Clinical Physiology, Capio St. Göran Hospital, Stockholm, Sweden. * Correspondence: seb@dbb.su.se; Tel.: +46 8 162444 Abstract: Alzheimer’s disease (AD) is an incurable disease and the main cause of age- related dementia worldwide, despite decades of research. Treatment of AD with lithium (Li) has showed promising results, but the underlying mechanism is unclear. The pathological hallmark of AD brains is deposition of amyloid plaques, consisting mainly of amyloid-β (Aβ) peptides aggregated into amyloid fibrils. The plaques contain also metal ions of e.g. Cu, Fe, and Zn, and such ions are known to interact with Aβ peptides and modulate their aggregation and toxicity. The interactions between Aβ peptides and Li+ ions have however not been well investigated. Here, we use a range of biophysical techniques to characterize in vitro interactions between Aβ peptides and Li+ ions. We show that Li+ ions display weak and non- specific interactions with Aβ peptides, and have minor effects on Aβ aggregation. These results indicate that possible beneficial effects of Li on AD pathology are not likely caused by direct interactions between Aβ peptides and Li+ ions. Key Words: Alzheimer’s disease; protein aggregation; Metal-protein binding; Neurodegeneration; Pharmaceutics Running Title: Li+ ions have minor effects on Aβ aggregation .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 INTRODUCTION Alzheimer’s disease (AD) is still an incurable disease and the main cause of age-related dementia worldwide (Querfurth & LaFerla, 2010; Prince et al., 2015; Frozza et al., 2018), despite decades of research on putative drugs (Luo et al., 2013; Wärmländer et al., 2013; Decker & Munoz-Torrero, 2016; Kisby et al., 2019). In addition to signs of neuroinflammation and oxidative stress (Agostinho et al., 2010; Al-Hilaly et al., 2013; Wang et al., 2014; Heppner et al., 2015; Regen et al., 2017), AD brains display characteristic lesions in the form of intracellular neurofibrillary tangles, consisting of aggregated hyperphosphorylated tau proteins (Goedert, 2018; Gibbons et al., 2019), and extracellular amyloid plaques, consisting mainly of insoluble fibrillar aggregates of amyloid-β (Aβ) peptides (Glenner & Wong, 1984; Querfurth & LaFerla, 2010). These Aβ fibrils and plaques are the end-product of an aggregation process (Querfurth & LaFerla, 2010; Luo et al., 2016; Selkoe & Hardy, 2016) that involves extra- and/or intracellular formation of intermediate, soluble, and likely neurotoxic Aβ oligomers (Luo et al., 2014; Selkoe & Hardy, 2016; Sengupta et al., 2016; Lee et al., 2017) that can spread from neuron to neuron via exosomes (Nath et al., 2012; Sardar Sinha et al., 2018). The Aβ peptides comprise 37-43 residues and are intrinsically disordered in aqueous solution. They have limited solubility in water due to the hydrophobicity of the central and C- terminal Aβ segments, which may fold into a hairpin conformation upon aggregation (Abelein et al., 2014; Baronio et al., 2019). The charged N-terminal segment is hydrophilic and readily interacts with cationic molecules and metal ions (Luo et al., 2013; Luo et al., 2014; Tiiman et al., 2016; Wallin et al., 2016; Wallin et al., 2017; Owen et al., 2019; Wallin et al., 2020), while the hydrophobic C-terminal segment can interact with membranes where Aβ may exert its toxicity (Österlund et al., 2018; Wärmländer et al., 2019). The interactions between Aβ and metal ions are of particular interest (Duce et al., 2011; Wärmländer et al., .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 2013; Mital et al., 2015; Wärmländer et al., 2019; Wallin et al., 2020), as altered metal concentrations indicative of metal dyshomeostasis are a prominent feature in the brains and fluids of AD patients (Wang et al., 2015; Szabo et al., 2016), and because AD plaques contain elevated amounts of metal ions of e.g. Cu, Fe, and Zn (Beauchemin & Kisilevsky, 1998; Lovell et al., 1998; Miller et al., 2006). Interestingly, although the role of metal ions in AD pathogenesis remains debated (Duce et al., 2011; Modgil et al., 2014; Chin-Chan et al., 2015; Mital et al., 2015; Adlard & Bush, 2018; Huat et al., 2019; Wärmländer et al., 2019), monovalent ions of the alkali metal lithium [i.e., Li+ ions] may provide beneficial effects to patients with neurodegenerative disorders such as amyotrophic lateral sclerosis (ALS) (Fornai et al., 2008; Morrison et al., 2013) or AD (Engel et al., 2008; Mauer et al., 2014; Sutherland & Duthie, 2015; Decker & Munoz- Torrero, 2016; Donix & Bauer, 2016; Morris & Berk, 2016; Kerr et al., 2018; Hampel et al., 2019; Kisby et al., 2019; Priebe & Kanzawa, 2020). Lithium salts are commonly used in psychiatric medication, even though it is not understood how the Li+ ions affect the molecular mechanisms underlying the psychiatric disorders (Dell'Osso et al., 2016). Unlike other pharmaceuticals, Li+ is widely non-selective in its biochemical effects, possibly due to its general propensity to inhibit the many enzymes that have magnesium as a cofactor (Ge & Jakobsson, 2018). Cell and animal studies have provided clues regarding how Li+ ions may affect the AD disease pathology (Nery et al., 2014; Sofola-Adesakin et al., 2014; Zhao et al., 2014; Budni et al., 2017; Habib et al., 2017; Cardillo et al., 2018; Kerr et al., 2018; Habib et al., 2019; Rocha et al., 2020; Wilson et al., 2020). Due to its ability to down-regulate translation, Li+ caused a reduction in protein synthesis and thus Aβ42 levels in an adult-onset Drosophila model of AD (Sofola-Adesakin et al., 2014). Li+ reduces Aβ production by affecting the processing/cleavage of the amyloid-β precursor protein (AβPP) in cells and mice, presumably .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 by down-regulating the levels of phosphorylated APP. A main target of Li+ is the glycogen synthase kinase 3-beta (GSK-3β) (Ryves & Harwood, 2001) which is implicated in AD pathogenesis (Caccamo et al., 2007; Forlenza et al., 2014). In AβPP-transgenic mice, reduced activation of the GSK-3β enzyme was associated with decreased levels of APP phosphorylation that resulted in decreased Aβ production (Rockenstein et al., 2007). One study on mice with traumatic brain injury reported that Li+-treatment improved spatial learning and reduced Aβ production, possibly by reducing the levels of both AβPP and the AβPP-cleaving enzyme BACE1 (Yu et al., 2012). More recent mice studies have reported that treatment with Li+ ions improved Aβ clearance from the brain (Pan et al., 2018), reduced oxidative stress levels (Xiang et al., 2020), improved spatial memory (Habib et al., 2019), and reduced the amounts of Aβ plaques and phosphorylated tau while also improving spatial memory (Liu et al., 2020). Only a few studies have however investigated how Li+ ions could affect the molecular events that appear to underlie AD pathology, such as Aβ aggregation. One study showed that increased ionic strength, i.e. 150 mM of NaF, NaCl, or LiCl, significantly accelerated the kinetics of Aβ amyloid formation, by promoting surface-catalyzed secondary nucleation reactions (Abelein et al., 2016). Another recent study used molecular dynamics simulations to find small but distinct differences in how the three monovalent Li+, N+, and K+ ions interact with Aβ oligomers (Huraskin & Horn, 2019). The therapeutic effect of Li+ on Aβ plaque quality and toxicity has been reported in mice, where Li+ treatment before pathology onset induced smaller plaques with higher Aβ compaction, reduced oligomeric-positive halo, and attenuated capacity to induce neuronal damage (Trujillo-Estrada et al., 2013). One hypothesis is that these neuroprotective effects of Li+ could be mediated by modifications of the plaque toxicity through the astrocytic release of heat shock proteins (Trujillo-Estrada et al., 2013). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 Here, we use a range of biophysical techniques to characterize the in vitro interactions between Li+ ions and Aβ peptides, and how such interactions affect the Aβ amyloid aggregation processes and fibril formation. MATERIALS AND METHODS Sample preparation Recombinant Aβ40 peptides were purchased from AlexoTech AB (Umeå, Sweden) in either unlabeled or uniformly 15N-labeled form. The lyophilized peptides were stored at -80 °C. Samples were dissolved to monomeric form immediately before each measurement. The peptides were first dissolved in 10 mM NaOH, and then sonicated in an ice-bath to avoid having pre-formed aggregates in the sample solutions. Next, the samples were diluted in 20 mM buffer of either sodium phosphate or MES (2-[N-morpholino]ethanesulfonic acid). All preparation steps were performed on a bed of ice, and the peptide concentration was determined by weight. LiCl salt was purchased from Merck & Co. Inc. (USA), and MES hydrate was purchased from Sigma-Aldrich (USA). Synthetic Aβ42 peptides were purchased from JPT Peptide Technologies (Germany) and used to prepare monomeric solutions via size exclusion chromatography. 1 mg of lyophilized Aβ42 powder was dissolved in 250 mL DMSO. A Sephadex G-250 HiTrap desalting column (GE Healthcare, Uppsala) was equilibrated with 5 mM NaOH solution (pH=12.3), and washed with 10-15 mL of 5 mM NaOD, pD=12.7 (Glasoe & Long, 1960) solution. The peptide solution in DMSO was applied to the column, followed by injection of 1.25 mL of 5 mM NaOD. Collection of peptide fractions in 5 mM NaOD on ice was started at a 1 mg/mL flow rate. Ten fractions of 1 mL volumes were collected in 1.5 mL Eppendorf tubes. The absorbance for each fraction at 280 nm was measured with a NanoDrop instrument (Eppendorf, Germany), and peptide concentrations were determined using a molar extinction .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 coefficient of 1280 M-1cm-1 for the single Tyr in Aβ42 (Edelhoch, 1967). The peptide fractions were flash-frozen in liquid nitrogen, covered with argon gas on top in 1.5 mL Eppendorf tubes, and stored at -80°C until used. Sodium dodecyl sulfate (SDS)-stabilized Aβ42 oligomers of two well-defined sizes (approximately tetramers and dodecamers) were prepared according to a previously published protocol (Barghorn et al., 2005), but in D2O, at 4-fold lower peptide concentration and without the original dilution step (Vosough & Barth, manuscript). The reaction mixtures (100 µM Aβ42 in PBS and containing 0.05 % or 0.2% SDS) were incubated together with 0-10 mM LiCl at 37 °C for 24 hours, and then flash- frozen in liquid nitrogen and stored at -20°C for later analysis. Thioflavin T kinetics A FLUOstar Omega microplate reader (BMG LABTECH, Germany) was used to monitor the effect of Li+ ions on Aβ aggregation kinetics, 20 µM monomeric Aβ40 peptides were incubated in 20 mM MES buffer, pH 7.35, together with different concentrations of LiCl (0, 20 μM, 200 μM, 2000 μM) and 50 μM Thioflavin T (ThT). ThT is a fluorescent benzothiazole dye, and its fluorescence intensity increases when bound to amyloid aggregates (Gade Malmos et al., 2017). Samples were placed in a 96-well plate where the sample volume in each well was 100 µL, four replicates per Li+ concentration were measured, the temperature was +37 °C, excitation of the ThT dye was at 440 nm, the ThT fluorescence emission at 480 nm was measured every five minutes, each five-minute cycle involved 140 seconds of shaking at 200 rpm, the samples were incubated for a total of 15 hours, and the assay was repeated three times. To derive parameters for the aggregation kinetics, the ThT fluorescence curves were fitted to the sigmoidal equation 1: (Eq. 1) .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 where F0 and F∞ are the intercepts of the initial and final fluorescence intensity baselines, m0 and m∞ are the slopes of the initial and final baselines, t½ is the time needed to reach halfway through the elongation phase (i.e., aggregation half-time), and τ is the elongation time constant (Gade Malmos et al., 2017). The apparent maximum rate constant, rmax, for the growth of fibrils is given by 1/τ. Tyrosine fluorescence quenching The binding affinity between Aβ40 peptides and Li + ions was evaluated from Cu2+/ Li+ binding competition experiments (Wallin et al., 2020). The affinity of the Cu2+·Aβ40 complex was measured via the quenching effect of Cu2+ ions on the intrinsic fluorescence of Y10, which is the only fluorophore in native Aβ peptides. The fluorescence emission intensity at 305 nm (excitation wavelength 276 nm) was recorded at 20 °C using a Jobin Yvon Horiba Fluorolog 3 fluorescence spectrophotometer (Longjumeau, France). The titrations were carried out by consecutive additions of 0.8 – 3.2 µL aliquots of either 2, 10, or 50 mM stock solutions of CuCl2 to 800 µL of 10 µM Aβ40 in 20 mM MES buffer, pH 7.35, in a quartz cuvette with 4 mm path length. After each addition of CuCl2 the solution was stirred for 30 seconds before recording fluorescence emission spectra. Copper titrations were conducted for Aβ40 samples both in the absence and the presence of 1 mM LiCl. The dissociation constant of the Cu2+·Aβ40 complex was determined by fitting the Cu 2+ titration data to equation 2: (Eq. 2) .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 where I0 is the initial fluorescence intensity without Cu 2+ ions, I∞ is the steady-state (saturated) intensity at the end of the titration series, [Aβ] is the peptide concentration, [Cu] is the concentration of added Cu2+ ions, KD is the dissociation constant of the Cu 2+·Aβ40 complex, and k is a constant accounting for the concentration-dependent quenching effect induced by free (non-bound) Cu2+ ions that may collide with the Y10 residue (Lindgren et al., 2013). This model assumes a single binding site. As no corrections for buffer conditions are made, i.e. in terms of possible interactions between the metal ions and the buffer, the calculated dissociation constant should be considered to be apparent. Atomic force microscopy imaging Samples of 20 µM Aβ40 in 5 mM MES buffer (total volume 100 µL, pH 7.35) with either 0, 20 µM, 200 µM or 2 mM LiCl were put in small Eppendorf tubes and incubated for 72 hours at 37 oC under continuous shaking at 300 rpm. A droplet (1 µL) of incubated solution was then placed on a fresh silicon wafer (Siegert Wafer GmbH, Germany) and left to dry for 2 minutes. Next, 10 uL of Milli-Q H2O was carefully added to the semi-dried sample droplet and soaked immediately with a lint-free wipe, to remove excess salts in a mild manner. The wafer was left to dry in a covered container to protect it from dust, and atomic force microscopy (AFM) images were recorded on the same day. A neaSNOM scattering- type near-field optical instrument (Neaspec GmbH, Germany) was used to collect the AFM images under tapping mode (Ω: 280 kHz, tapping amplitude 50-55 nm) using Pt/Ir-coated monolithic ARROW-NCPt Si tips (NanoAndMore GmbH, Germany) with tip radius <10 nm. Images were acquired on 2.5 x 2.5 µm scan-areas (200 x 200-pixel size) under optimal scan- speed (i.e., 2.5 ms/pixel). The recorded images were minimally processed using the Gwyddion software where a basic plane leveling was performed (Nečas & Klapetek, 2012). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 Nuclear magnetic resonance spectroscopy An Avance 700 MHz nuclear magnetic resonance (NMR) spectrometer (Bruker Inc., USA) equipped with a cryoprobe was used to investigate possible interactions between Li+ ions and monomeric Aβ40 peptides at the atomic level. 2D 1H-15N-HSQC spectra of 92.4 μM monomeric 15N-labeled Aβ40 peptides were recorded at 5 °C with 90/10 H2O/D2O, either in 20 mM MES buffer at pH 7.35 or in 1x PBS buffer (137 mM NaCl, 2.7 mM KCl, and 10 mM phosphate pH 7.4), before and after additions with LiCl. Diffusion measurements were performed on a sample of 55 μM unlabeled monomeric Aβ40 peptide in 20 mM sodium phosphate buffer, 100 % D2O, pD 7.5, at 5 °C, before and after additions with LiCl dissolved in D2O. The diffusion experiments employed pulsed field gradients (PFG:s) according to previously described methods (Danielsson et al., 2002), and methyl group signals between 0.7-0.4 ppm were integrated, evaluated, and corrected for the viscosity of D2O at 5 °C (Cho et al., 1999). All NMR data was processed with the Topspin version 3.6.2 software, and the HSQC crosspeak assignment for Aβ40 in buffer is known from previous studies (Danielsson et al., 2006). Circular dichroism spectroscopy Circular dichroism (CD) spectra of 20 μM Aβ40 peptides in 20 mM sodium phosphate buffer, pH 7.35, were recorded at 20 °C using a Chirascan CD spectrometer (Applied Photophysics, UK) and a quartz cuvette with an optical path length of 2 mm. Measurements were done between 190 − 250 nm, with a step size of 1 nm and a sampling time of 4 s per data point. First, a spectrum was recorded for Aβ40 alone. Next, micelles of 50 mM SDS were added to create a membrane-mimicking environment. Finally, LiCl was titrated to the sample in steps up to a concentration of 512 µM. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 Blue native polyacrylamide gel electrophoresis Homogeneous solutions of 100 µM Aβ42 oligomers prepared in presence and absence of 0 – 10 mM Li+ ions were analyzed with blue native polyacrylamide gel electrophoresis (BN- PAGE) using the Invitrogen system. 4-16% Bis-Tris Novex gels (ThermoFisher Scientific, USA) were loaded with 10 µL of Aβ42 oligomer samples alongside the Amersham High Molecular Weight calibration kit for native electrophoresis (GE Healthcare, USA). The gels were run at 4 °C using the electrophoresis system according to the Invitrogen instructions (ThermoFisher Scientific, USA), and then stained using the Pierce Silver Staining kit according to the instructions (ThermoFisher Scientific, USA). Infrared spectroscopy Fourier-transformed infrared (FTIR) spectra of Aβ42 oligomers were recorded in transmission mode on a Tensor 37 FTIR spectrometer (Bruker Optics, Germany) equipped with a sample shutter and a liquid nitrogen-cooled MCT detector. The unit was continuously purged with dry air during the measurements. 8-10 µL of the 80 µM Aβ42 oligomer samples, containing 0 – 10 mM LiCl, were put between two flat CaF2 discs separated by a 50 µm plastic spacer covered with vacuum grease at the periphery. The assembled discs were mounted in a holder inside the instrument’s sample chamber. The samples were allowed to sit for at least 15 minutes after closing the chamber lid, to avoid interference from CO2 and H2O vapor. FTIR spectra were recorded at room temperature in the 1900-800 cm-1 range, with 300 scans for both background and sample spectra, using a 6 mm aperture and a resolution of 2 cm-1. The light intensities above 2200 cm-1 and below 1500 cm-1 were blocked with respectively a germanium filter and a cellulose membrane (Baldassarre & Barth, 2014). The spectra were analyzed and plotted with the OPUS 5.5 software, and second derivatives were calculated with a 17 cm-1 smoothing range. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 RESULTS ThT fluorescence: influence of Li+ ions on Aβ40 aggregation The fluorescence intensity of the amyloid-marker molecule ThT was measured when 20 µM Aβ40 samples were incubated for 15 hours together with different concentrations of LiCl (Fig. 1). Fitting Eq. 1 to the ThT fluorescence curves yielded the kinetic parameters t1/2 (aggregation half-time) and rmax (maximum aggregation rate) (Fig. 1; Table 1). For 20 µM Aβ40 alone, the aggregation half-time is approximately 3.7 hours under the experimental conditions used, and the maximum aggregation rate is 0.5 hours-1 (Table 1). These kinetic parameters are not much affected by addition of LiCl in 1:1 or 10:1 Li+:Aβ ratios. At the Li+:Aβ ratio of 100:1, the rmax value remains largely unaffected while the aggregation half- time is increased to almost 5 hours (Fig 1; Table 1). The observation that a Li+:Aβ ratio of 100:1 is required to shift the ThT curve clearly shows that Li+ ions do not have a strong effect on the Aβ40 aggregation kinetics. AFM imaging: effects of Li+ ions on the morphology of Aβ40 aggregates AFM images (Fig. 2) were recorded for the aggregation products of 20 µM Aβ40 peptide, incubated for three days without or with LiCl. The control sample without Li+ displays long (> 2 µm) amyloid fibrils that are around 6 nm thick, together with small (< 2 nm) aggregate particles that may be protofibrils (Fig. 2A). The distribution and sizes of these aggregates are rather typical for Aβ40 aggregates formed in vitro (Luo et al., 2014). The Aβ40 samples incubated in the presence of different concentrations of Li+ ions display amyloid fibrils of similar size and shape, although these fibrils are more densely packed and they appear to be more numerous (Figs. 1B, 1C, 1D). Compared to the control sample, there are fewer small (< 2 nm) aggregate particles in the samples incubated together with Li+ ions in 10:1 and 100:1 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 Li+:Aβ ratios. This suggests that Li+ ions may induce some differences in the Aβ40 aggregation process. NMR spectroscopy: interactions between Li+ ions and Aβ40 monomers High-resolution liquid phase NMR experiments were conducted to investigate if residue- specific molecular interactions could be observed between Li+ ions and monomeric Aβ40 peptides. 2D 1H-15N-HSQC spectra showing the amide crosspeak region for 92.4 μM monomeric 15N-labeled Aβ40 peptides are presented in Fig. 3A, before and after addition of LiCl in 1:1, 1:10, and 1:100 Aβ:Li+ ratios in 20 mM MES buffer, 7.35. Addition of Li+ ions induces loss of signal intensity mainly for amide crosspeaks corresponding to residues in the N-terminal half of the peptide, indicating selective Li+ interactions in this region (Fig. 3B). The effects are clearly concentration-dependent. Because Li+ ions are not paramagnetic, this loss of signal intensity is arguably caused by chemical exchange related to structural rearrangements induced by the Li+ ions. As no chemical shift changes are observed for the crosspeak position (Fig. 3A), these Li+-induced secondary structures appear to be short-lived. Figs. 3C and 3D show the results of similar experiments carried out in 1x PBS buffer, i.e. 137 mM NaCl, 2.7 mM KCl, and 10 mM phosphate pH 7.4. Here, the Li+ ions induce virtually no changes in the crosspeak intensities, showing that the weak Li+/Aβ40 interactions observed in pure MES buffer (Figs. 3A,B) disappear when the buffer and ionic strength correspond to physiological conditions. Diffusion measurements were carried out for 55 μM Aβ40 peptides in D2O, before and after addition of LiCl in 1:1, 20:1, and 100:1 Li+:Aβ ratios. Addition of 1:1 Li+ produces an increase in the Aβ40 diffusion rate by around 4%, i.e. from 5.97·10 -11 m2/s to 6.23·10-11 m2/s (Figs. 4A and 4B). This somewhat faster diffusion is likely caused by the Aβ40 peptide adopting a slightly more compact structure in the presence of Li+ ions, an effect similar to .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 that previously reported for zinc ions (Abelein et al., 2015). Addition of even higher Li+ concentrations – 20 and 100 times the Aβ40 concentration – produces diffusion rates that are similar but a little bit lower than the diffusion rate measured for 1:1 Li+:Aβ40 ratio, i.e. respectively 6.19·10-11 m2/s and 6.15·10-11 m2/s (Figs. 4C and 4D), indicating that the effect of Li+ on the Aβ40 secondary structure and diffusion has been saturated. Fluorescence spectroscopy: Li+ binding affinity to the Aβ40 monomer Binding affinities for metal ions to Aβ peptides can often be measured via the quenching effect on the intrinsic fluorescence of Y10, the only fluorophore in native Aβ peptides. However, not all metal ions interfere with tyrosine fluorescence, and initial experiments showed that addition of Li+ ions does not affect the Aβ40 fluorescence. The binding affinity of Li+ ions to Aβ40 was therefore evaluated from binding competition experiments with Cu 2+ ions (Danielsson et al., 2007; Wallin et al., 2020), which induce much stronger tyrosine fluorescence quenching when bound to the peptide than when free in the solution (Lindgren et al., 2013). Fig. 5 shows the results of titrating CuCl2 to Aβ40, both in the absence (red circles) and in the presence (blue triangles) of 1 mM LiCl. Three titrations were carried out for each condition, producing apparent KD values for the Cu2+·Aβ40 complex of respectively 3.1 µM, 2.1 µM, and 5.1 µM without LiCl, i.e. on average 3.4 ± 1.6 µM, and 2.1 µM, 0.9 µM, and 0.8 µM with LiCl present, i.e. on average 1.3 ± 0.8 µM. The obtained values are in line with earlier fluorescence measurements of the Cu2+ binding affinity to the Aβ40 peptide, although this affinity is known to vary with the pH, the buffer, and other experimental conditions (Ghalebani et al., 2012; Alies et al., 2013). The difference between the average measured KD values is not significant at the 5% level with a two-tailed t-test, which shows that Li+ ions are not able to compete with Cu2+ for binding to Aβ. Thus, the Li+ binding affinity for Aβ40 is likely in the millimolar range, or weaker. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 CD spectroscopy: effects of Li+ ions on Aβ40 structure in SDS Although Aβ peptides are generally disordered in aqueous solutions, they adopt an α- helical secondary structure in membranes and membrane-mimicking environments such as SDS micelles (Tiiman et al., 2016; Österlund et al., 2018). Thus, the CD spectrum for Aβ40 in sodium phosphate buffer displays the characteristic minimum for random coil structure at 198 nm (Fig. 6). Addition of 50 mM SDS, which is well above the critical concentration for micelle formation (Österlund et al., 2018), induces an alpha-helical structure with characteristic minima around 208 and 222 nm. Titrating LiCl in concentrations up to 512 µM to the Aβ40 sample slightly increases the general CD intensity, but does not change the overall spectral shape – the minima remain at their respective positions. The intensity changes are not caused by dilution of the sample during the titration, as the added volumes are very small, and as dilution would not increase but rather decrease the CD intensity. The observed changes in CD intensity therefore suggest a small but distinct binding effect of LiCl ions. This binding effect appears to be much weaker than the structural rearrangements and Aβ coil-coil-interactions previously reported to be induced by Cu2+ ions (Tiiman et al., 2016). BN-PAGE: effects of Li+ ions on Aβ42 oligomer formation and stability Well-defined and SDS-stabilized Aβ42 oligomers were prepared in the presence of different amounts of LiCl. SDS treatment of Aβ42 peptides at low concentrations (≤ 7 mM) leads to formation of stable and homogeneous Aβ42 oligomers of certain sizes and conformations (Barghorn et al., 2005; Rangachari et al., 2007). As shown in Fig. 7, two sizes of Aβ42 oligomers are formed in presence of the two SDS concentrations. In 0.2% (6.9 mM) SDS, small oligomers with a molecular weight (MW) around 16-20 kDa are formed. These oligomers appear to contain a large fraction of tetramers (Vosough & Barth, manuscript). In .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 0.05% (1.7 mM) SDS, larger oligomers with MWs around 55-60 kD are formed (Barghorn et al., 2005). These larger oligomers, which most likely contain twelve Aβ42 monomers, display a globular morphology and are therefore sometimes called globulomers (Barghorn et al., 2005). All oligomers were analyzed by BN-PAGE instead of by SDS-PAGE to avoid disruption of the non-cross linked Aβ42 oligomers by the high (>1%) SDS concentrations used in SDS- PAGE (Bitan et al., 2005). As shown in lanes 2-5 and 6-9 of Fig. 7, increasing LiCl concentrations have weak or no effects on the size or homogeneity of the formed Aβ42 oligomers, as the bands retain their shape and intensity. Only for the globulomers subjected to the highest LiCl concentration (10 mM) is the intensity of the BN-PAGE band slightly reduced (lane 5, Fig 7). FTIR spectroscopy: effects of Li+ ions on Aβ42 oligomer structure The secondary structures of Aβ42 oligomers formed with different Li + concentrations were studied with FTIR spectroscopy, where the amide I region (1700-1600 cm-1) is very sensitive to changes in the protein backbone conformation. The technique is useful also in amyloid research, given its capacity to characterize β-sheets (Barth, 2007; Sarroukh et al., 2013). Fig. 8 shows second derivative IR spectra for the amide I region of Aβ42 globulomers (Fig. 8A) and smaller oligomers (Fig. 8B), prepared with different concentrations of Li+ ions. Monomeric Aβ42 displayed a relatively broad band at 1639-1640 cm -1, which is in agreement with the position of the band for disordered (random coil) polypeptides measured in D2O (Barth, 2007). For both types of Aβ42 oligomers, this main band is much narrower and downshifted by about 10 cm-1, while a second smaller band appears around 1685 cm-1. This split band pattern is indicative of an anti-parallel β-sheet conformation (Cerf et al., 2009). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 Earlier studies in our laboratory have shown a relationship between the position and width of this main band, and the size and homogeneity of the Aβ42 oligomers (Vosough & Barth, manuscript). The lower band position of the larger oligomers is in line with this relationship and our previous results, and confirms the different sizes of the oligomers produced at the two SDS concentrations. We have recently observed that a number of transition metal ions induce significant effects on the main band position for Aβ42 oligomers (manuscript in preparation). Because the spectra for Aβ42 oligomers formed with different amounts of LiCl generally superimpose on the IR spectra for the Li+-free oligomers, with no shifts observed for the main band, it appears that Li+ ions have no significant effect on the oligomers’ size or secondary structure. DISCUSSION Lithium as a therapeutic agent Lithium has no known biological functions in the human body. Li+ ions readily pass biological membranes, and are evenly distributed in tissues and easily eliminated via the kidneys (Nordberg et al., 2015). Li+ ions are however far from inert, and several well-defined medical conditions related to abnormal Li+ concentrations exist. In low blood concentrations, Li+ is used as a medication for bipolar and schizoaffective disorders (Machado-Vieira et al., 2009), but at higher concentrations Li+ ions are neurotoxic (Sellers et al., 1982; Emilien & Maloteaux, 1996; Nordberg et al., 2015; Wen et al., 2019). This leaves a narrow therapeutic window of 0.6 -1.2 mM that has to be closely monitored in order to prevent Li+ intoxication, which is easily recognized by EEG (Mignarri et al., 2013) and treatable by reducing the therapeutic dose. Li+ intoxication (>1.5 mM) presents as apathy, vertigo, tremor and gastrointestinal symptoms, in more severe cases confusion, psychosis, myoclonus and cardiac .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 arrhythmias (Nordberg et al., 2015). Li+ intoxication affects also the kidneys with polyuria and elevated U-albumin although overt renal failure is rare (Nordberg et al., 2015). Treatment of bipolar and schizoaffective disorders with Li+ has generated some knowledge about Li+ metabolism in the human body (Wen et al., 2019; Medic et al., 2020). Li+ accumulates to some extent in bone (Birch, 1974), and chronic Li+ effects are implicated in osteomalacia and severe osteoporosis (Roos, 2014). Patients treated with Li+ also show an increased frequency of hypothyroidism and goitre, and widespread effects on several facets of the endocrine system have been noted (Salata & Klein, 1987). The negative effects of Li+ on thyroid function have been clearly demonstrated in a study on populations in the Andean Mountains, where natural exposure to Li+ is high, and where urinary Li+ was found to correlate negatively with free thyroxine (T4) but correlate positively with the pituitary gland hormone thyrotropin (Broberg et al., 2011). The toxicity of Li+ is further emphasized by studies from regions with naturally elevated concentrations of Li+ in potable water, where reduced fetal size has been noted to correlate linearly with increases in blood Li+ (Harari et al., 2015). To what extent Li+ treatment reduces the development of AD symptoms is unclear (Engel et al., 2008; Mauer et al., 2014; Nordberg et al., 2015; Sutherland & Duthie, 2015). Bipolar disorder increases the risk of AD when compared to the general population, and Li+ treatment seems to reduce this risk (Velosa et al., 2020), but the mechanisms mediating this effect are far from elucidated (Kerr et al., 2018). In rare cases even regular-dose long-time Li+ therapy may cause severe intoxication of the central nervous system, characterized by cerebellar dysfunction and cognitive decline (Emilien & Maloteaux, 1996). Lithium interactions with the Aβ40 peptide .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 The NMR (Figs. 3 and 4), fluorescence quenching (Fig. 5), and CD (Fig. 6) experiments show that Li+ ions display weak interaction with the Aβ40 peptide, where the binding affinity for the Li+·Aβ40 complex may be in the millimolar range. The IR and CD results show minor or no effects of Li+ ions on the secondary structures of Aβ40 monomers (Fig. 6) and Aβ42 oligomers (Fig. 8). The Li+ ions may have a small effect on Aβ aggregation, with minor perturbations on the morphology of aggregated Aβ40 fibrils (Fig. 2), and effects on the Aβ40 aggregation kinetics (Fig. 1; Table 1) and Aβ42 oligomer stability (Fig. 7) only at very high Li+ concentrations. These results are in line with previous computer modeling results, which suggest small differences between how the monovalent K+, Li+, and Na+ alkali ions affect Aβ oligomerization (Huraskin & Horn, 2019). As Aβ40 and Aβ42 have identical N-terminal sequences, the two peptide variants should interact very similarly with Li+ ions, which were found to bind to the N-terminal Aβ region (Fig. 3B). The weak affinity between Aβ40 and Li + ions, and the fact that Li+ does not efficiently compete with Cu2+ ions for Aβ binding (Fig. 4), suggest that Li+ ions are not coordinated by specific binding ligands. Instead, Li+ likely engages in non-specific electrostatic interactions with the negatively charged Aβ residues, i.e. D1, E3, D7, E11, E22, and D23 (which are located in the N-terminal and central regions). The weak binding affinity to Aβ40 peptides is not caused by Li + ions being monovalent, as e.g. monovalent Ag+ ions display rather strong and specific binding to Aβ peptides (Wallin et al., 2020). Moreover, divalent Pb2+ and trivalent Cr3+ ions do not bind strongly to Aβ, while divalent Cu2+, Mn2+ and Zn2+ as well as tetravalent Pb4+ ions do (Faller, 2009; Abelein et al., 2015; Tiiman et al., 2016; Wallin et al., 2016; Wallin et al., 2017). Thus, Aβ/metal interactions are not governed by the charge of the metal ion, but rather by its specific properties, such as ionic radius and electron configuration (1s2 for Li+). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 It is illustrative to compare the Aβ interactions with Li+ ions to the well-studied interactions with Cu2+ and Zn2+ ions. These two divalent ions display residue-specific interactions with Aβ peptides, displaying binding affinities in the micromolar-nanomolar range and strong effects on Aβ secondary structure, aggregation, and diffusion (Danielsson et al., 2007; Faller, 2009; Lindgren et al., 2013; Abelein et al., 2015; Tiiman et al., 2016; Owen et al., 2019). Aβ binding to Cu2+ and Zn2+ is coordinated mainly by residue-specific interactions with the N-terminal His residues, i.e. H6, H13, and H14 (Faller, 2009; Lindgren et al., 2013; Abelein et al., 2015; Tiiman et al., 2016). The biological relevance of Cu2+ and Zn2+ ions in AD pathology is demonstrated by their dysregulation in AD patients (Wang et al., 2015; Szabo et al., 2016), and by them being accumulated in plaques of Aβ aggregates in AD brains (Beauchemin & Kisilevsky, 1998; Lovell et al., 1998; Miller et al., 2006). During neuronal signaling Cu2+ and Zn2+ ions are released into the synaptic clefts (Ayton et al., 2013), where they may interact with Aβ peptides to initiate Aβ aggregation (Branch et al., 2017), or modulate the formation and toxicity of Aβ oligomers (Stefaniak & Bal, 2019; Wärmländer et al., 2019). The current results indicate that Li+ ions are not able to compete with Cu2+ or Zn2+ ions for binding to Aβ peptides, and should therefore not be able to influence the in vivo effects of Cu2+ and Zn2+ ions on Aβ aggregation and toxicity. Although high concentrations of Li+ showed some effects on Aβ aggregation (Figs. 1-3;7), these effects are likely at least partly related to ionic strength effects (Abelein et al., 2016). Under physiological ionic strength, no specific interactions are observed between Aβ40 monomers and Li + ions (Fig. 3C,D). Thus, we conclude that the previously reported possible beneficial effects of Li+ on Alzheimer’s disease progression (Mauer et al., 2014; Sutherland & Duthie, 2015; Kerr et al., 2018; Hampel et al., 2019; Velosa et al., 2020) seem not to be caused by direct interactions between Li+ ions and Aβ peptides. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 ACKNOWLEDGMENTS: We thank Elizabeth (Li) Wang for helpful discussions. This work was supported by grants from the Swedish Alzheimer Foundation and the Swedish Research Council to AG, the Swedish Brain Foundation to AG and AB, the Magnus Bergvall Foundation to SW and PR, the Ulla-Carin Lindquist ALS Foundation to PR, and from Olle Engkvist's Foundation, the Stockholm Region, and Knut and Alice Wallenberg Foundation to AB. CONFICT OF INTEREST: The authors declare no conflict of interest. REFERENCES Abelein A, Abrahams JP, Danielsson J, Gräslund A, Jarvet J, Luo J, Tiiman A, Wärmländer SK (2014). The hairpin conformation of the amyloid beta peptide is an important structural motif along the aggregation pathway. J Biol Inorg Chem 19(4-5): 623-634. doi: 10.1007/s00775-014- 1131-8. Abelein A, Gräslund A, Danielsson J (2015). Zinc as chaperone-mimicking agent for retardation of amyloid beta peptide fibril formation. Proc Natl Acad Sci U S A 112(17): 5407-5412. doi: 10.1073/pnas.1421961112. Abelein A, Jarvet J, Barth A, Gräslund A, Danielsson J (2016). Ionic Strength Modulation of the Free Energy Landscape of Abeta40 Peptide Fibril Formation. J Am Chem Soc 138(21): 6893-6902. doi: 10.1021/jacs.6b04511. Adlard PA, Bush AI (2018). Metals and Alzheimer's Disease: How Far Have We Come in the Clinic? J Alzheimers Dis 62(3): 1369-1379. doi: 10.3233/JAD-170662. Agostinho P, Cunha RA, Oliveira C (2010). Neuroinflammation, oxidative stress and the pathogenesis of Alzheimer's disease. Curr Pharm Des 16(25): 2766-2778. doi: 10.2174/138161210793176572. Al-Hilaly YK, Williams TL, Stewart-Parker M, Ford L, Skaria E, Cole M, Bucher WG, Morris KL, Sada AA, Thorpe JR, Serpell LC (2013). A central role for dityrosine crosslinking of Amyloid-beta in Alzheimer's disease. Acta Neuropathol Commun 1: 83. doi: 10.1186/2051-5960-1-83. Alies B, Renaglia E, Rozga M, Bal W, Faller P, Hureau C (2013). Cu(II) affinity for the Alzheimer's peptide: tyrosine fluorescence studies revisited. Anal Chem 85(3): 1501-1508. doi: 10.1021/ac302629u. Ayton S, Lei P, Bush AI (2013). Metallostasis in Alzheimer's disease. Free Radic Biol Med 62: 76-89. doi: 10.1016/j.freeradbiomed.2012.10.558. Baldassarre M, Barth A (2014). Pushing the detection limit of infrared spectroscopy for structural analysis of dilute protein samples. Analyst 139(21): 5393-5399. doi: 10.1039/c4an00918e. Barghorn S, Nimmrich V, Striebinger A, Krantz C, P K, Janson B, Bahr M, Schmidt M, Bitner RS, Harlan J, Barlow E, Ebert U, Hillen H (2005). Globular amyloid β-peptide1-42 oligomer – a homogenous and stable neuropathological protein in Alzheimer’s disease. J Neurochem 95: 834–847. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 Baronio CM, Baldassarre M, Barth A (2019). Insight into the internal structure of amyloid-beta oligomers by isotope-edited Fourier transform infrared spectroscopy. Phys Chem Chem Phys 21(16): 8587-8597. doi: 10.1039/c9cp00717b. Barth A (2007). Infrared spectroscopy of proteins. Biochim Biophys Acta 1767(9): 1073-1101. doi: 10.1016/j.bbabio.2007.06.004. Beauchemin D, Kisilevsky R (1998). A method based on ICP-MS for the analysis of Alzheimer's amyloid plaques. Anal Chem 70(5): 1026-1029. Birch NJ (1974). Lithium accumulation in bone after oral administration in rat and in man. Clin Sci Mol Med 46(3): 409-413. doi: 10.1042/cs0460409. Bitan G, Fradinger EA, Spring SM, Teplow DB (2005). Neurotoxic protein oligomers--what you see is not always what you get. Amyloid 12(2): 88-95. doi: 10.1080/13506120500106958. Branch T, Barahona M, Dodson CA, Ying L (2017). Kinetic Analysis Reveals the Identity of Abeta- Metal Complex Responsible for the Initial Aggregation of Abeta in the Synapse. ACS Chem Neurosci 8(9): 1970-1979. doi: 10.1021/acschemneuro.7b00121. Broberg K, Concha G, Engstrom K, Lindvall M, Grander M, Vahter M (2011). Lithium in drinking water and thyroid function. Environ Health Perspect 119(6): 827-830. doi: 10.1289/ehp.1002678. Budni J, Feijo DP, Batista-Silva H, Garcez ML, Mina F, Belletini-Santos T, Krasilchik LR, Luz AP, Schiavo GL, Quevedo J (2017). Lithium and memantine improve spatial memory impairment and neuroinflammation induced by beta-amyloid 1-42 oligomers in rats. Neurobiol Learn Mem 141: 84-92. doi: 10.1016/j.nlm.2017.03.017. Caccamo A, Oddo S, Tran LX, LaFerla FM (2007). Lithium reduces tau phosphorylation but not A beta or working memory deficits in a transgenic model with both plaques and tangles. Am J Pathol 170(5): 1669-1675. doi: 10.2353/ajpath.2007.061178. Cardillo GM, De-Paula VJR, Ikenaga EH, Costa LR, Catanozi S, Schaeffer EL, Gattaz WF, Kerr DS, Forlenza OV (2018). Chronic Lithium Treatment Increases Telomere Length in Parietal Cortex and Hippocampus of Triple-Transgenic Alzheimer's Disease Mice. J Alzheimers Dis 63(1): 93- 101. doi: 10.3233/JAD-170838. Cerf E, Sarroukh R, Tamamizu-Kato S, Breydo L, Derclaye S, Dufrene YF, Narayanaswami V, Goormaghtigh E, Ruysschaert JM, Raussens V (2009). Antiparallel beta-sheet: a signature structure of the oligomeric amyloid beta-peptide. Biochem J 421(3): 415-423. doi: 10.1042/BJ20090379. Chin-Chan M, Navarro-Yepes J, Quintanilla-Vega B (2015). Environmental pollutants as risk factors for neurodegenerative disorders: Alzheimer and Parkinson diseases. Front Cell Neurosci 9: 124. doi: 10.3389/fncel.2015.00124. Cho CH, Urquidi J, Singh S, Wilse Robinson G (1999). Thermal Offset Viscosities of Liquid H2O, D2O, and T2O. J. Phys. Chem. B 103(11): 1991-1994. Danielsson J, Andersson A, Jarvet J, Gräslund A (2006). 15N relaxation study of the amyloid beta- peptide: structural propensities and persistence length. Magn Reson Chem 44 Spec No: S114-121. doi: 10.1002/mrc.1814. Danielsson J, Jarvet J, Damberg P, Gräslund A (2002). Translational diffusion measured by PFG‐NMR on full length and fragments of the Alzheimer Aβ(1–40) peptide. Determination of hydrodynamic radii of random coil peptides of varying length. Magnetic Resonance in Chemistry 40(13): S89-S97. Danielsson J, Pierattelli R, Banci L, Gräslund A (2007). High-resolution NMR studies of the zinc- binding site of the Alzheimer's amyloid beta-peptide. FEBS J 274(1): 46-59. doi: 10.1111/j.1742-4658.2006.05563.x. Decker M, Munoz-Torrero D (2016). Special Issue: "Molecules against Alzheimer". Molecules 21(12) doi: 10.3390/molecules21121736. Dell'Osso L, Del Grande C, Gesi C, Carmassi C, Musetti L (2016). A new look at an old drug: neuroprotective effects and therapeutic potentials of lithium salts. Neuropsychiatr Dis Treat 12: 1687-1703. doi: 10.2147/NDT.S106479. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 Donix M, Bauer M (2016). Population Studies of Association Between Lithium and Risk of Neurodegenerative Disorders. Curr Alzheimer Res 13(8): 873-878. doi: 10.2174/1567205013666160219112957. Duce JA, Bush AI, Adlard PA (2011). Role of amyloid-β–metal interactions in Alzheimer’s disease. Future Neurol 6(5): 641–659. Edelhoch H (1967). Spectroscopic Determination of Tryptophan and Tyrosine in Proteins. Biochemistry 6(7): 1948–1954. Emilien G, Maloteaux JM (1996). Lithium neurotoxicity at low therapeutic doses Hypotheses for causes and mechanism of action following a retrospective analysis of published case reports. Acta Neurol Belg 96(4): 281-293. Engel T, Goni-Oliver P, Gomez de Barreda E, Lucas JJ, Hernandez F, Avila J (2008). Lithium, a potential protective drug in Alzheimer's disease. Neurodegener Dis 5(3-4): 247-249. doi: 10.1159/000113715. Faller P (2009). Copper and zinc binding to amyloid-beta: coordination, dynamics, aggregation, reactivity and metal-ion transfer. Chembiochem 10(18): 2837-2845. doi: 10.1002/cbic.200900321. Forlenza OV, De-Paula VJ, Diniz BS (2014). Neuroprotective effects of lithium: implications for the treatment of Alzheimer's disease and related neurodegenerative disorders. ACS Chem Neurosci 5(6): 443-450. doi: 10.1021/cn5000309. Fornai F, Longone P, Cafaro L, Kastsiuchenka O, Ferrucci M, Manca ML, Lazzeri G, Spalloni A, Bellio N, Lenzi P, Modugno N, Siciliano G, Isidoro C, Murri L, Ruggieri S, Paparelli A (2008). Lithium delays progression of amyotrophic lateral sclerosis. Proc Natl Acad Sci U S A 105(6): 2052- 2057. doi: 10.1073/pnas.0708022105. Frozza RL, Lourenco MV, De Felice FG (2018). Challenges for Alzheimer's Disease Therapy: Insights from Novel Mechanisms Beyond Memory Defects. Front Neurosci 12: 37. doi: 10.3389/fnins.2018.00037. Gade Malmos K, Blancas-Mejia LM, Weber B, Buchner J, Ramirez-Alvarado M, Naiki H, Otzen D (2017). ThT 101: a primer on the use of thioflavin T to investigate amyloid formation. Amyloid 24(1): 1-16. doi: 10.1080/13506129.2017.1304905. Ge W, Jakobsson E (2018). Systems Biology Understanding of the Effects of Lithium on Affective and Neurodegenerative Disorders. Front Neurosci 12: 933. doi: 10.3389/fnins.2018.00933. Ghalebani L, Wahlström A, Danielsson J, Wärmländer SK, Gräslund A (2012). pH-dependence of the specific binding of Cu(II) and Zn(II) ions to the amyloid-beta peptide. Biochem Biophys Res Commun 421(3): 554-560. doi: 10.1016/j.bbrc.2012.04.043. Gibbons GS, Lee VMY, Trojanowski JQ (2019). Mechanisms of Cell-to-Cell Transmission of Pathological Tau: A Review. JAMA Neurol 76(1): 101-108. doi: 10.1001/jamaneurol.2018.2505. Glasoe PK, Long FA (1960). Use of glass electrodes to measure acidities in deuterium oxide. J Phys Chem 64: 88–90. Glenner GG, Wong CW (1984). Alzheimer's disease: initial report of the purification and characterization of a novel cerebrovascular amyloid protein. Biochem Biophys Res Commun 120(3): 885-890. Goedert M (2018). Tau filaments in neurodegenerative diseases. FEBS Lett 592(14): 2383-2391. doi: 10.1002/1873-3468.13108. Habib A, Sawmiller D, Li S, Xiang Y, Rongo D, Tian J, Hou H, Zeng J, Smith A, Fan S, Giunta B, Mori T, Currier G, Shytle DR, Tan J (2017). LISPRO mitigates beta-amyloid and associated pathologies in Alzheimer's mice. Cell Death Dis 8(6): e2880. doi: 10.1038/cddis.2017.279. Habib A, Shytle RD, Sawmiller D, Koilraj S, Munna SA, Rongo D, Hou H, Borlongan CV, Currier G, Tan J (2019). Comparing the effect of the novel ionic cocrystal of lithium salicylate proline (LISPRO) with lithium carbonate and lithium salicylate on memory and behavior in female APPswe/PS1dE9 Alzheimer's mice. J Neurosci Res 97(9): 1066-1080. doi: 10.1002/jnr.24438. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 Hampel H, Lista S, Mango D, Nistico R, Perry G, Avila J, Hernandez F, Geerts H, Vergallo A, Alzheimer Precision Medicine I (2019). Lithium as a Treatment for Alzheimer's Disease: The Systems Pharmacology Perspective. J Alzheimers Dis 69(3): 615-629. doi: 10.3233/JAD-190197. Harari F, Langeen M, Casimiro E, Bottai M, Palm B, Nordqvist H, Vahter M (2015). Environmental exposure to lithium during pregnancy and fetal size: a longitudinal study in the Argentinean Andes. Environ Int 77: 48-54. doi: 10.1016/j.envint.2015.01.011. Heppner FL, Ransohoff RM, Becher B (2015). Immune attack: the role of inflammation in Alzheimer disease. Nat Rev Neurosci 16(6): 358-372. doi: 10.1038/nrn3880. Huat TJ, Camats-Perna J, Newcombe EA, Valmas N, Kitazawa M, Medeiros R (2019). Metal Toxicity Links to Alzheimer's Disease and Neuroinflammation. J Mol Biol 431(9): 1843-1868. doi: 10.1016/j.jmb.2019.01.018. Huraskin D, Horn AHC (2019). Alkali ion influence on structure and stability of fibrillar amyloid-beta oligomers. J Mol Model 25(2): 37. doi: 10.1007/s00894-018-3920-4. Kerr F, Bjedov I, Sofola-Adesakin O (2018). Molecular Mechanisms of Lithium Action: Switching the Light on Multiple Targets for Dementia Using Animal Models. Front Mol Neurosci 11: 297. doi: 10.3389/fnmol.2018.00297. Kisby B, Jarrell JT, Agar ME, Cohen DS, Rosin ER, Cahill CM, Rogers JT, Huang X (2019). Alzheimer's Disease and Its Potential Alternative Therapeutics. J Alzheimers Dis Parkinsonism 9(5) doi: 10.4172/2161-0460.1000477. Lee SJ, Nam E, Lee HJ, Savelieff MG, Lim MH (2017). Towards an understanding of amyloid-beta oligomers: characterization, toxicity mechanisms, and inhibitors. Chem Soc Rev 46(2): 310- 323. doi: 10.1039/c6cs00731g. Lindgren J, Segerfeldt P, Sholts SB, Gräslund A, Karlström AE, Wärmländer SK (2013). Engineered non-fluorescent Affibody molecules facilitate studies of the amyloid-beta (Abeta) peptide in monomeric form: low pH was found to reduce Abeta/Cu(II) binding affinity. J Inorg Biochem 120: 18-23. doi: 10.1016/j.jinorgbio.2012.11.005. Liu M, Qian T, Zhou W, Tao X, Sang S, Zhao L (2020). Beneficial effects of low-dose lithium on cognitive ability and pathological alteration of Alzheimer's disease transgenic mice model. Neuroreport 31(13): 943-951. doi: 10.1097/WNR.0000000000001499. Lovell MA, Robertson JD, Teesdale WJ, Campbell JL, Markesbery WR (1998). Copper, iron and zinc in Alzheimer's disease senile plaques. J Neurol Sci 158(1): 47-52. Luo J, Mohammed I, Wärmländer SK, Hiruma Y, Gräslund A, Abrahams JP (2014). Endogenous polyamines reduce the toxicity of soluble abeta peptide aggregates associated with Alzheimer's disease. Biomacromolecules 15(6): 1985-1991. doi: 10.1021/bm401874j. Luo J, Otero JM, Yu CH, Wärmländer SK, Gräslund A, Overhand M, Abrahams JP (2013). Inhibiting and reversing amyloid-beta peptide (1-40) fibril formation with gramicidin S and engineered analogues. Chemistry 19(51): 17338-17348. doi: 10.1002/chem.201301535. Luo J, Wärmländer SK, Gräslund A, Abrahams JP (2014). Alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. Biochemistry 53(40): 6302-6308. doi: 10.1021/bi5003579. Luo J, Wärmländer SK, Gräslund A, Abrahams JP (2016). Cross-interactions between the Alzheimer Disease Amyloid-beta Peptide and Other Amyloid Proteins: A Further Aspect of the Amyloid Cascade Hypothesis. J Biol Chem 291(32): 16485-16493. doi: 10.1074/jbc.R116.714576. Luo J, Yu CH, Yu H, Borstnar R, Kamerlin SC, Gräslund A, Abrahams JP, Wärmländer SK (2013). Cellular polyamines promote amyloid-beta (Abeta) peptide fibrillation and modulate the aggregation pathways. ACS Chem Neurosci 4(3): 454-462. doi: 10.1021/cn300170x. Machado-Vieira R, Manji HK, Zarate CA, Jr. (2009). The role of lithium in the treatment of bipolar disorder: convergent evidence for neurotrophic effects as a unifying hypothesis. Bipolar Disord 11 Suppl 2: 92-109. doi: 10.1111/j.1399-5618.2009.00714.x. Mauer S, Vergne D, Ghaemi SN (2014). Standard and trace-dose lithium: a systematic review of dementia prevention and other behavioral benefits. Aust N Z J Psychiatry 48(9): 809-818. doi: 10.1177/0004867414536932. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 Medic B, Stojanovic M, Stimec BV, Divac N, Vujovic KS, Stojanovic R, Colovic M, Krstic D, Prostran M (2020). Lithium - Pharmacological and Toxicological Aspects: The Current State of the Art. Curr Med Chem 27(3): 337-351. doi: 10.2174/0929867325666180904124733. Mignarri A, Chini E, Rufa A, Rocchi R, Federico A, Dotti MT (2013). Lithium neurotoxicity mimicking rapidly progressive dementia. J Neurol 260(4): 1152-1154. doi: 10.1007/s00415-012-6820-z. Miller LM, Wang Q, Telivala TP, Smith RJ, Lanzirotti A, Miklossy J (2006). Synchrotron-based infrared and X-ray imaging shows focalized accumulation of Cu and Zn co-localized with beta-amyloid deposits in Alzheimer's disease. J Struct Biol 155(1): 30-37. doi: 10.1016/j.jsb.2005.09.004. Mital M, Wezynfeld NE, Fraczyk T, Wiloch MZ, Wawrzyniak UE, Bonna A, Tumpach C, Barnham KJ, Haigh CL, Bal W, Drew SC (2015). A Functional Role for Abeta in Metal Homeostasis? N- Truncation and High-Affinity Copper Binding. Angew Chem Int Ed Engl 54(36): 10460-10464. doi: 10.1002/anie.201502644. Modgil S, Lahiri DK, Sharma VL, Anand A (2014). Role of early life exposure and environment on neurodegeneration: implications on brain disorders. Transl Neurodegener 3: 9. doi: 10.1186/2047-9158-3-9. Morris G, Berk M (2016). The Putative Use of Lithium in Alzheimer's Disease. Curr Alzheimer Res 13(8): 853-861. doi: 10.2174/1567205013666160219113112. Morrison KE, Dhariwal S, Hornabrook R, Savage L, Burn DJ, Khoo TK, Kelly J, Murphy CL, Al-Chalabi A, Dougherty A, Leigh PN, Wijesekera L, Thornhill M, Ellis CM, O'Hanlon K, Panicker J, Pate L, Ray P, Wyatt L, Young CA, Copeland L, Ealing J, Hamdalla H, Leroi I, Murphy C, O'Keeffe F, Oughton E, Partington L, Paterson P, Rog D, Sathish A, Sexton D, Smith J, Vanek H, Dodds S, Williams TL, Steen IN, Clarke J, Eziefula C, Howard R, Orrell R, Sidle K, Sylvester R, Barrett W, Merritt C, Talbot K, Turner MR, Whatley C, Williams C, Williams J, Cosby C, Hanemann CO, Iman I, Philips C, Timings L, Crawford SE, Hewamadduma C, Hibberd R, Hollinger H, McDermott C, Mils G, Rafiq M, Shaw PJ, Taylor A, Waines E, Walsh T, Addison-Jones R, Birt J, Hare M, Majid T (2013). Lithium in patients with amyotrophic lateral sclerosis (LiCALS): a phase 3 multicentre, randomised, double-blind, placebo-controlled trial. Lancet Neurol 12(4): 339-345. doi: 10.1016/S1474-4422(13)70037-1. Nath S, Agholme L, Kurudenkandy FR, Granseth B, Marcusson J, Hallbeck M (2012). Spreading of neurodegenerative pathology via neuron-to-neuron transmission of beta-amyloid. J Neurosci 32(26): 8767-8777. doi: 10.1523/JNEUROSCI.0615-12.2012. Nečas D, Klapetek P (2012). Gwyddion: an open-source software for SPM data analysis. Central European Journal of Physics 10: 181-188. doi: https://doi.org/10.2478. Nery LR, Eltz NS, Hackman C, Fonseca R, Altenhofen S, Guerra HN, Freitas VM, Bonan CD, Vianna MR (2014). Brain intraventricular injection of amyloid-beta in zebrafish embryo impairs cognition and increases tau phosphorylation, effects reversed by lithium. PLoS One 9(9): e105862. doi: 10.1371/journal.pone.0105862. Nordberg G, Fowler B, Nordberg M, (eds). (2015). Handbook on the Toxicology of Metals, Elsevier. Owen MC, Gnutt D, Gao M, Wärmländer SKTS, Jarvet J, Gräslund A, Winter R, Ebbinghaus S, Strodel B (2019). Effects of in vivo conditions on amyloid aggregation. Chem Soc Rev 48(14): 3946- 3996. doi: 10.1039/c8cs00034d. Pan Y, Short JL, Newman SA, Choy KHC, Tiwari D, Yap C, Senyschyn D, Banks WA, Nicolazzo JA (2018). Cognitive benefits of lithium chloride in APP/PS1 mice are associated with enhanced brain clearance of beta-amyloid. Brain Behav Immun 70: 36-47. doi: 10.1016/j.bbi.2018.03.007. Priebe GA, Kanzawa MM (2020). Reducing the progression of Alzheimer's disease in Down syndrome patients with micro-dose lithium. Med Hypotheses 137: 109573. doi: 10.1016/j.mehy.2020.109573. Prince M, Wimo A, Guerchet M, Ali G-C, Wu Y-T, Prina M (2015). World Alzheimer Report 2015 - The Global Impact of Dementia. London, UK. Querfurth HW, LaFerla FM (2010). Alzheimer's disease. N Engl J Med 362(4): 329-344. doi: 10.1056/NEJMra0909142. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.2478 https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 Rangachari V, Moore BD, Reed DK, Sonoda LK, Bridges AW, Conboy E, Hartigan D, Rosenberry TL (2007). Amyloid-beta(1-42) rapidly forms protofibrils and oligomers by distinct pathways in low concentrations of sodium dodecylsulfate. Biochemistry 46(43): 12451-12462. doi: 10.1021/bi701213s. Regen F, Hellmann-Regen J, Costantini E, Reale M (2017). Neuroinflammation and Alzheimer's Disease: Implications for Microglial Activation. Curr Alzheimer Res 14(11): 1140-1148. doi: 10.2174/1567205014666170203141717. Rocha NKR, Themoteo R, Brentani H, Forlenza OV, De Paula VJR (2020). Neuronal-Glial Interaction in a Triple-Transgenic Mouse Model of Alzheimer's Disease: Gene Ontology and Lithium Pathways. Front Neurosci 14: 579984. doi: 10.3389/fnins.2020.579984. Rockenstein E, Torrance M, Adame A, Mante M, Bar-on P, Rose JB, Crews L, Masliah E (2007). Neuroprotective effects of regulators of the glycogen synthase kinase-3beta signaling pathway in a transgenic model of Alzheimer's disease are associated with reduced amyloid precursor protein phosphorylation. J Neurosci 27(8): 1981-1991. doi: 10.1523/JNEUROSCI.4321-06.2007. Roos PM (2014). Osteoporosis in neurodegeneration. J Trace Elem Med Biol 28(4): 418-421. doi: 10.1016/j.jtemb.2014.08.010. Ryves WJ, Harwood AJ (2001). Lithium inhibits glycogen synthase kinase-3 by competition for magnesium. Biochem Biophys Res Commun 280(3): 720-725. doi: 10.1006/bbrc.2000.4169. Salata R, Klein I (1987). Effects of lithium on the endocrine system: a review. J Lab Clin Med 110(2): 130-136. Sardar Sinha M, Ansell-Schultz A, Civitelli L, Hildesjo C, Larsson M, Lannfelt L, Ingelsson M, Hallbeck M (2018). Alzheimer's disease pathology propagation by exosomes containing toxic amyloid- beta oligomers. Acta Neuropathol 136(1): 41-56. doi: 10.1007/s00401-018-1868-1. Sarroukh R, Goormaghtigh E, Ruysschaert JM, Raussens V (2013). ATR-FTIR: a "rejuvenated" tool to investigate amyloid proteins. Biochim Biophys Acta 1828(10): 2328-2338. doi: 10.1016/j.bbamem.2013.04.012. Selkoe DJ, Hardy J (2016). The amyloid hypothesis of Alzheimer's disease at 25 years. EMBO Mol Med 8(6): 595-608. doi: 10.15252/emmm.201606210. Sellers J, Tyrer P, Whiteley A, Banks DC, Barer DH (1982). Neurotoxic effects of lithium with delayed rise in serum lithium levels. Br J Psychiatry 140: 623-625. doi: 10.1192/bjp.140.6.623. Sengupta U, Nilson AN, Kayed R (2016). The Role of Amyloid-beta Oligomers in Toxicity, Propagation, and Immunotherapy. EBioMedicine 6: 42-49. doi: 10.1016/j.ebiom.2016.03.035. Sofola-Adesakin O, Castillo-Quan JI, Rallis C, Tain LS, Bjedov I, Rogers I, Li L, Martinez P, Khericha M, Cabecinha M, Bahler J, Partridge L (2014). Lithium suppresses Abeta pathology by inhibiting translation in an adult Drosophila model of Alzheimer's disease. Front Aging Neurosci 6: 190. doi: 10.3389/fnagi.2014.00190. Stefaniak E, Bal W (2019). Cu(II) Binding Properties of N-Truncated Abeta Peptides: In Search of Biological Function. Inorg Chem 58(20): 13561-13577. doi: 10.1021/acs.inorgchem.9b01399. Sutherland C, Duthie AC (2015). Invited commentary on ... Lithium treatment and risk for dementia in adults with bipolar disorder. Br J Psychiatry 207(1): 52-54. doi: 10.1192/bjp.bp.114.161729. Szabo ST, Harry GJ, Hayden KM, Szabo DT, Birnbaum L (2016). Comparison of Metal Levels between Postmortem Brain and Ventricular Fluid in Alzheimer's Disease and Nondemented Elderly Controls. Toxicol Sci 150(2): 292-300. doi: 10.1093/toxsci/kfv325. Tiiman A, Luo J, Wallin C, Olsson L, Lindgren J, Jarvet J, Roos PM, Sholts SB, Rahimipour S, Abrahams JP, Karlström AE, Gräslund A, Wärmländer SKTS (2016). Specific Binding of Cu(II) Ions to Amyloid-Beta Peptides Bound to Aggregation-Inhibiting Molecules or SDS Micelles Creates Complexes that Generate Radical Oxygen Species. J Alzheimers Dis 54(3): 971-982. doi: 10.3233/JAD-160427. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Trujillo-Estrada L, Jimenez S, De Castro V, Torres M, Baglietto-Vargas D, Moreno-Gonzalez I, Navarro V, Sanchez-Varo R, Sanchez-Mejias E, Davila JC, Vizuete M, Gutierrez A, Vitorica J (2013). In vivo modification of Abeta plaque toxicity as a novel neuroprotective lithium-mediated therapy for Alzheimer's disease pathology. Acta Neuropathol Commun 1: 73. doi: 10.1186/2051-5960-1-73. Velosa J, Delgado A, Finger E, Berk M, Kapczinski F, de Azevedo Cardoso T (2020). Risk of dementia in bipolar disorder and the interplay of lithium: a systematic review and meta-analyses. Acta Psychiatr Scand doi: 10.1111/acps.13153. Vosough F, Barth A (manuscript). Characterization of homogeneous and heterogeneous amyloid-β42 oligomer preparations with biochemical methods and infrared spectroscopy reveals a correlation between infrared spectrum and oligomer size. Wallin C, Friedemann M, Sholts SB, Noormagi A, Svantesson T, Jarvet J, Roos PM, Palumaa P, Gräslund A, Wärmländer SKTS (2020). Mercury and Alzheimer's Disease: Hg(II) Ions Display Specific Binding to the Amyloid-beta Peptide and Hinder Its Fibrillization. Biomolecules 10(1): 44. doi: 10.3390/biom10010044. Wallin C, Jarvet J, Biverstål H, Wärmländer S, Danielsson J, Gräslund A, Abelein A (2020). Metal ion coordination delays amyloid-beta peptide self-assembly by forming an aggregation-inert complex. J Biol Chem 295(21): 7224-7234. doi: 10.1074/jbc.RA120.012738. Wallin C, Kulkarni YS, Abelein A, Jarvet J, Liao Q, Strodel B, Olsson L, Luo J, Abrahams JP, Sholts SB, Roos PM, Kamerlin SC, Gräslund A, Wärmländer SK (2016). Characterization of Mn(II) ion binding to the amyloid-beta peptide in Alzheimer's disease. J Trace Elem Med Biol 38: 183- 193. doi: 10.1016/j.jtemb.2016.03.009. Wallin C, Sholts SB, Österlund N, Luo J, Jarvet J, Roos PM, Ilag L, Gräslund A, Wärmländer S (2017). Alzheimer's disease and cigarette smoke components: effects of nicotine, PAHs, and Cd(II), Cr(III), Pb(II), Pb(IV) ions on amyloid-beta peptide aggregation. Sci Rep 7(1): 14423. doi: 10.1038/s41598-017-13759-5. Wang X, Wang W, Li L, Perry G, Lee HG, Zhu X (2014). Oxidative stress and mitochondrial dysfunction in Alzheimer's disease. Biochim Biophys Acta 1842(8): 1240-1247. doi: 10.1016/j.bbadis.2013.10.015. Wang ZX, Tan L, Wang HF, Ma J, Liu J, Tan MS, Sun JH, Zhu XC, Jiang T, Yu JT (2015). Serum Iron, Zinc, and Copper Levels in Patients with Alzheimer's Disease: A Replication Study and Meta- Analyses. J Alzheimers Dis 47(3): 565-581. doi: 10.3233/JAD-143108. Wen J, Sawmiller D, Wheeldon B, Tan J (2019). A Review for Lithium: Pharmacokinetics, Drug Design, and Toxicity. CNS Neurol Disord Drug Targets 18(10): 769-778. doi: 10.2174/1871527318666191114095249. Wilson EN, Do Carmo S, Welikovitch LA, Hall H, Aguilar LF, Foret MK, Iulita MF, Jia DT, Marks AR, Allard S, Emmerson JT, Ducatenzeiler A, Cuello AC (2020). NP03, a Microdose Lithium Formulation, Blunts Early Amyloid Post-Plaque Neuropathology in McGill-R-Thy1-APP Alzheimer-Like Transgenic Rats. J Alzheimers Dis 73(2): 723-739. doi: 10.3233/JAD-190862. Wärmländer S, Tiiman A, Abelein A, Luo J, Jarvet J, Söderberg KL, Danielsson J, Gräslund A (2013). Biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. Chembiochem 14(14): 1692-1704. doi: 10.1002/cbic.201300262. Wärmländer SKTS, Österlund N, Wallin C, Wu J, Luo J, Tiiman A, Jarvet J, Gräslund A (2019). Metal binding to the Amyloid-β peptides in the presence of biomembranes: potential mechanisms of cell toxicity. Journal of Biological Inorganic Chemistry 24: 1189–1196 Xiang J, Cao K, Dong YT, Xu Y, Li Y, Song H, Zeng XX, Ran LY, Hong W, Guan ZZ (2020). Lithium chloride reduced the level of oxidative stress in brains and serums of APP/PS1 double transgenic mice via the regulation of GSK3beta/Nrf2/HO-1 pathway. Int J Neurosci 130(6): 564-573. doi: 10.1080/00207454.2019.1688808. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Yu F, Zhang Y, Chuang DM (2012). Lithium reduces BACE1 overexpression, beta amyloid accumulation, and spatial learning deficits in mice with traumatic brain injury. J Neurotrauma 29(13): 2342-2351. doi: 10.1089/neu.2012.2449. Zhao L, Gong N, Liu M, Pan X, Sang S, Sun X, Yu Z, Fang Q, Zhao N, Fei G, Jin L, Zhong C, Xu T (2014). Beneficial synergistic effects of microdose lithium with pyrroloquinoline quinone in an Alzheimer's disease mouse model. Neurobiol Aging 35(12): 2736-2745. doi: 10.1016/j.neurobiolaging.2014.06.003. Österlund N, Kulkarni YS, Misiaszek AD, Wallin C, Krüger DM, Liao Q, Mashayekhy Rad F, Jarvet J, Strodel B, Wärmländer SKTS, Ilag LL, Kamerlin SCL, Gräslund A (2018). Amyloid-beta Peptide Interactions with Amphiphilic Surfactants: Electrostatic and Hydrophobic Effects. ACS Chem Neurosci 9(7): 1680-1692. doi: 10.1021/acschemneuro.8b00065. 20 μM Aβ40 1:1 Li +:Aβ 10:1 Li+:Aβ 100:1 Li+:Aβ t1/2 [hours] 3.7 ± 0.7 3.6 ± 2.1 3.8 ± 0.6 4.9 ± 1.3 rmax [hours -1] 0.5 ± 0.1 0.4 ± 0.2 0.5 ± 0.2 0.4 ± 0.1 Table 1. Kinetic parameters for Aβ40 fibril formation, i.e. aggregation half-time (t1/2) and maximum aggregation rate (rmax), derived from fitting the curves in Fig. 1 to Eq. 1. FIGURES .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 Fig. 1. Amyloid fibril formation monitored by ThT aggregation. Samples of 20 μM Aβ40 peptides in 20 mM MES buffer, pH 7.35, were incubated at +37 °C together with 50 μM Thioflavin-T and different concentrations of LiCl: 0 μM – black; 20 μM – red; 200 μM – green; 2000 μM – blue. The circles represent average data points for four replicates, while the solid lines are derived from fitting to Eq. 1. Fig. 2. Solid state AFM images (A1-D1) of aggregates of 20 µM Aβ40, incubated in 5 mM MES buffer, pH 7.35, for 72 hours at +37 °C with 300 rpm shaking, together with different concentrations of LiCl. A. control sample - no LiCl; B. 20 µM LiCl; C. 200 µM LiCl; D. 2 mM LiCl. The height profile graphs (A2-D2) below the AFM images correspond to the cross- sections of Aβ40 fibrils shown as white lines in the AFM images. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 Fig. 3. NMR experiments for interactions between Aβ40 monomers and Li + ions. (A) 2D 1H- 15N-HSQC spectra of 92.4 μM 15N-labeled Aβ40 peptides in 20 mM MES buffer, pH 7.35 at +5 °C, recorded for Aβ40 peptides alone (dark sky blue) and in the presence of either 924 μM LiCl (1:10 Aβ:Li ratio; passion red) or 9.24 mM LiCl (1:100 Aβ:Li ratio; Robin egg blue). (B) Relative intensities of Aβ40 residue crosspeaks shown in (A), after addition of LiCl in 1:1, 1:10, and 1:100 Aβ:Li ratios. (C and D) similar experiments as in A and B, but carried out in the presence of 1x PBS buffer, and for Aβ:Li ratios of 1:1, 1:5, and 1:50. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Fig. 4. NMR diffusion data for 55 μM Aβ40 peptides in sodium phosphate buffer, pH 7.35 at +5 °C, recorded both in absence (A) and presence of different Li+ concentrations, i.e. 55 μM (B), 1.1 mM (C), and 5.5 mM (D). Fig. 5. Binding curves for the Cu2+·Aβ40 complex, obtained from the quenching effect of Cu2+ ions on the intrinsic fluorescence of Aβ residue Y10. CuCl2 was titrated to 10 µM Aβ40 in 20 mM MES buffer, pH 7.35 at 20 °C, both in the absence (red dots) and the presence (blue triangles) of 1 mM LiCl. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 Fig. 6. CD spectra of 20 μM Aβ40 peptides at 20 °C in 20 mM sodium phosphate buffer, pH 7.35. Spectra were recorded for Aβ in buffer only (black), after addition of 50 mM micellar SDS (brown), and after subsequent addition of between 2 µM (blue) and 512 µM (gray) of LiCl. The inset figure shows a close-up of the CD signals for the LiCl titration in the 210-230 nm range. Fig. 7. BN-PAGE gel showing the effects of different concentrations of Li+ ions on the formation of SDS-stabilized Aβ42 oligomers. Lane 1: monomers prepared in 5 mM NaOD. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 Lanes 2-5: Aβ42 globulomers formed after 24 hours of incubation with 0.05% SDS and different LiCl concentrations. Lanes 6-9: Aβ42 oligomers formed after 24 hours of incubation with 0.2% SDS and different LiCl concentrations. Fig. 8. Second derivatives of infrared absorbance spectra for 100 µM Aβ42 monomers (black) and 80 µM SDS-stabilized Aβ42 oligomers formed in absence (blue) and presence of 0.1 mM (red), 1 mM (purple), and 10 mM (green) of LiCl. The results are shown for Aβ42 globulomers at 0.05% SDS (A) and smaller oligomers at 0.2% SDS (B). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425155doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425155 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_03_425159 ---- 6-gingerol interferes with amyloid-beta (Aβ) peptide aggregation 1 6-gingerol interferes with amyloid-beta (Aβ) peptide aggregation Elina Berntsson1, Suman Paul1, Sabrina B. Sholts2, Jüri Jarvet1,3, Andreas Barth1, Astrid Gräslund1, Sebastian K. T. S. Wärmländer1,* 1 Department of Biochemistry and Biophysics, Stockholm University, Sweden. 2 Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA. 3 The National Institute of Chemical Physics and Biophysics, Tallinn, Estonia. * Correspondence: seb@dbb.su.se; Tel.: +46-8-162444 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract Alzheimer’s disease (AD) is the most prevalent age-related cause of dementia. AD affects millions of people worldwide, and to date there is no cure. The pathological hallmark of AD brains is deposition of amyloid plaques, which mainly consist of amyloid-β (Aβ) peptides, commonly 40 or 42 residues long, that have aggregated into amyloid fibrils. Intermediate aggregates in the form of soluble Aβ oligomers appear to be highly neurotoxic. Cell and animal studies have previously demonstrated positive effects of the molecule 6-gingerol on AD pathology. Gingerols are the main active constituents of the ginger root, which in many cultures is a traditional nutritional supplement for memory enhancement. Here, we use biophysical experiments to characterize in vitro interactions between 6-gingerol and Aβ40 peptides. Our experiments with atomic force microscopy imaging, and nuclear magnetic resonance and Thioflavin-T fluorescence spectroscopy, show that the hydrophobic 6-gingerol molecule interferes with formation of Aβ40 aggregates, but does not interact with Aβ40 monomers. Thus, together with its favourable toxicity profile, 6-gingerol appears to display many of the desired properties of an anti-AD compound. Key Words: Alzheimer’s disease; Amyloid aggregation; Neurodegeneration; Ginger; Therapeutics; Dementia .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 INTRODUCTION Alzheimer’s disease (AD) is a progressive and currently incurable neurodegenerative disorder, and the leading cause of age-related dementia worldwide (Frozza et al., 2018; Querfurth and LaFerla, 2010). Although AD brains typically display signs of neuroinflammation and oxidative stress (Agostinho et al., 2010; Regen et al., 2017; Wang et al., 2014b), the main characteristic lesions in AD brains are extracellular amyloid plaques (Querfurth and LaFerla, 2010; Selkoe and Hardy, 2016), which mainly consist of insoluble fibrillar aggregates of amyloid-β (Aβ) peptides (Querfurth and LaFerla, 2010). The Aβ peptides comprise 37-43 residues and are intrinsically disordered in aqueous solution. They have limited solubility in water due to the hydrophobicity of the central and C-terminal segments, which may fold into a hairpin conformation upon aggregation (Abelein et al., 2014; Baronio et al., 2019). The charged N-terminal segment of Aβ peptides is hydrophilic and interacts readily with cationic molecules and metal ions (Luo et al., 2014a; Owen et al., 2019; Wärmländer et al., 2013). The Aβ fibrils and plaques that characterize AD neuropathology are the end- products of Aβ aggregation processes (Owen et al., 2019; Selkoe and Hardy, 2016) that involve extra- and/or intracellular formation of intermediate, soluble, and likely neurotoxic Aβ oligomers (Luo et al., 2014b; Sengupta et al., 2016) which may transfer from neuron to neuron via e.g. exosomes (Sardar Sinha et al., 2018). Oligomers of Aβ42 appear to be the most cell-toxic species (Sengupta et al., 2016). The formation of Aβ oligomers is influenced by interactions with various entities such as cellular membranes, small molecules, other proteins, and metal ions (Luo et al., 2016a, b; Owen et al., 2019; Wärmländer et al., 2019; Österlund et al., 2018a). Significant effort has been put into finding suitable molecules – i.e., drug candidates - that may modulate the Aβ aggregation processes (Leshem et al., 2019; Luo et al., 2013; Richman et al., 2013), but so far no drug has been approved (Frozza et al., 2018). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 Some investigations of potential anti-AD substances have focused on natural plant compounds, such as gingerols, which are phenolic phytochemical compounds present in the subterranean stem, or rhizome, of angiosperms of the ginger (Zingiberaceae) family (Wang et al., 2014a). Consumed worldwide as a spice and herbal medicine, the rhizome of ginger (Zingiber officinale) has demonstrated anti-inflammatory, antioxidant, antiemetic, analgesic, and antimicrobial effects (Sharifi-Rad et al., 2017). Ginger is a common ingredient in traditional healthy diets in many cultures (Iranshahy and Javadi, 2019; Khodaie and Sadeghpoor, 2015). According to Arabian folk wisdom, ginger improves memory and enhances cognition (Saenghong et al., 2012). Gingerols are generally considered to be safe for humans (Kaul and Joshi, 2001; Wang et al., 2014a). Yet, they are cytotoxic towards blood cancer and lung cancer cells (de Lima et al., 2018; Semwal et al., 2015), and in vitro studies have demonstrated positive effects also on bowel (Jeong et al., 2009), breast (Lee et al., 2008), ovary (Rhode et al., 2007), and pancreas cancer (Park et al., 2006). The major pharmacologically-active variant is 6-gingerol, which has been associated with the prevention and treatment of neurodegenerative diseases such as AD (Choi et al., 2018; Jeong et al., 2013; Mohd Sahardi and Makpol, 2019; Wang et al., 2014a). Its chemical structure is shown in Fig. 1. The anti-oxidant and anti- inflammatory properties of 6-gingerol are potentially useful against AD (Mohd Sahardi and Makpol, 2019), which may explain why 6-gingerol has been reported to reduce markers for neuroinflammation and oxidative stress, as well as decrease Aβ levels, in mice and cell AD models (Halawany et al., 2017; Zeng et al., 2015). Little is however known about the molecular mechanisms by which 6-gingerol exerts its positive effects on the AD pathology models. For example, interactions between gingerols and Aβ peptides have not been studied at the molecular level. Here, we use biophysical techniques – liquid-phase fluorescence and nuclear magnetic resonance (NMR) spectroscopy together with solid-state atomic force microscopy (AFM) - to investigate possible in vitro interactions between 6-gingerol and Aβ40 peptides, and how such interactions may affect the Aβ40 aggregation and amyloid formation processes. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 Figure 1. Chemical structure for the hydrophobic plant metabolite 6-gingerol. MW = 294.4 g/mol. MATERIALS AND METHODS Reagents and sample preparation 6-gingerol was purchased as a powder from Sigma-Aldrich Inc. (USA), and dissolved in DMSO (dimethyl sulfoxide). Recombinant unlabeled or uniformly 15N-labeled Aβ40 peptides, with the primary sequence DAEFR5HDSGY10EVHHQ15KLVFF20AEDVG25SNKGA30IIGLM35VGGVV40, were purchased lyophilized from AlexoTech AB (Umeå, Sweden). The peptides were stored at -80 °C until used. The peptide concentration was determined by weight, and the peptide samples were dissolved to monomeric form immediately before each measurement. In brief, the peptides were dissolved in 10 mM sodium hydroxide, pH 12, at a 1 mg/ml concentration and sonicated in an ice-bath for at least three minutes to avoid having pre-formed aggregates in the peptide solutions. The peptide solution was then further diluted in 20 mM buffer of either sodium phosphate or MES (2-[N- morpholino]ethanesulfonic acid) at pH 7.35. All sample preparation steps were performed on ice. ThT fluorescence monitoring Aβ aggregation kinetics To monitor the effect of 6-gingerol on Aβ40 aggregation kinetics, 15 µM monomeric Aβ40 peptides were incubated in 20 mM MES buffer pH 7.35 in the presence of five different concentrations of 6-gingerol (15, 75, 150, 300, and 1500 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 µM) together with DMSO (0.1%, 0.6%, 1%, 2% and 10%; vol/vol). Additionally, a control sample without 6-gingerol but containing 2% DMSO was prepared. All samples contained 50 μM Thioflavin T (ThT), which is a benzothiazole dye that displays increased fluorescence intensity when bound to amyloid aggregates (Gade Malmos et al., 2017). The ThT dye was excited at 440 nm, and the fluorescence emission at 480 nm was measured every five minutes in a 96-well plate in a FLUOstar Omega microplate reader (BMG LABTECH, Germany). The sample volume in each well was 35 µl, four replicates per condition were measured, the temperature was +37 °C, and each five-minute cycle involved 140 seconds of shaking at 200 rpm. The assay was repeated three times. Even though the ThT fluorescence signal reached its maximum value after about seven hours, the incubation in the microplate reader continued for 72 hours to allow the samples to aggregate into mature fibrils that could be observed with AFM imaging (below). To derive parameters for the aggregation kinetics, the ThT fluorescence curves were fitted to the sigmoidal equation 1: (Eq. 1) where F0 and F∞ are the intercepts of the initial and final fluorescence intensity baselines, m0 and m∞ are the slopes of the initial and final baselines, τ½ is the time needed to reach halfway through the elongation phase (i.e., aggregation half-time), and τelon is the elongation time constant (Gade Malmos et al., 2017). The apparent maximum rate constant for fibrillar growth, rmax, is defined as 1/τelon. Atomic force microscopy (AFM) imaging of Aβ fibrils Samples for AFM imaging were taken from the samples used in the ThT fluorescence measurements, after 72 h of incubation. AFM images were recorded for the two control samples of 15 µM Aβ40 in MES buffer, with and without 2% added DMSO, and for the three samples of 15 µM Aβ40 together with 15 µM, 75 µM, and .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 300 µM of 6-gingerol. Droplets of 1 µl incubated sample were placed on fresh silicon wafers (Siegert Wafer GmbH, Germany) and allowed to sit for 2 minutes. Next, 10 µl Milli-Q water was added to the droplets, and all excess fluid was removed immediately with a lint-free wipe. The wafers were left to dry in a covered container to protect from dust, and AFM images were recorded on the same day. A neaSNOM scattering-type near-field optical instrument (Neaspec GmbH, Germany) was used to collect the AFM images under tapping mode (Ω: 280 kHz, tapping amplitude 50-55 nm) using Pt/Ir-coated monolithic ARROW-NCPt Si tip (NanoAndMore GmbH, Germany) with tip radius <10 nm. Images were acquired on 2.5 x 2.5 µm scan-areas (200 x 200-pixel size) under optimal scan-speed (i.e., 2.5 ms/pixel), and both topographic and mechanical phase images were recorded. Images were minimally processed using the Gwyddion software where a basic plane levelling was performed (Nečas and Klapetek, 2012). Nuclear magnetic resonance (NMR) spectroscopy An Avance 700 MHz NMR spectrometer (Bruker Inc., USA) equipped with a cryogenic probe was used to record 2D 1H-15N-HSQC spectra at +20 °C of 92.4 μM monomeric 15N-labeled Aβ40 peptides (500 μl), either in only 20 mM sodium phosphate buffer at pH 7.35 (90/10 H2O/D2O), or in phosphate buffer together with 50 mM SDS (sodium dodecyl sulphate) detergent. As the critical micelle concentration (CMC) for SDS is around 8 mM (Österlund et al., 2018b), most of the SDS was present as micelles. Both samples were titrated, first with additions of pure DMSO, and then by 6-gingerol dissolved in DMSO. The NMR data was processed with the Topspin version 3.6.2 software, and the Aβ40 HSQC crosspeak assignment in buffer (Danielsson et al., 2006) and in SDS micelles (Jarvet et al., 2007) is known from previous work. RESULTS ThT fluorescence kinetics .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 Fig. 2 shows ThT fluorescence intensity curves for 15 µM Aβ40 peptides, incubated in the presence of varying concentrations of 6-gingerol and DMSO. These curves reflect the formation of amyloid aggregates, and they all display a generally sigmoidal shape. Fitting Eq. 1 to the curves produces the kinetic parameters τ½, rmax, and τlag (Table 1). Addition of DMSO alone, which was used to dissolve the 6- gingerol, has minor effects on the aggregation kinetics, i.e. by slightly increasing the lag time from 0.94 to 0.98 hrs and decreasing the aggregation half time from 2.2 to 1.9 hrs (Fig. 2, Table 1). With 6-gingerol, some additions produce aggregation kinetics that differ from the control samples. For example, addition of 75 µM 6-gingerol appears to slow down the aggregation (τlag = 1.3 h; τ½ = 3.3 h), while addition of 150 µM 6-gingerol appears to speed up the aggregation (τlag = 0.5 h; τ½ = 1.7 h). There is however variation in these measurements, and there is no overall trend of faster or slower kinetics for the series of 6-gingerol additions. Thus, these data indicate that 6- gingerol has no systematic effect on Aβ40 aggregation or amyloid formation. Figure 2. ThT fluorescence curves showing the aggregation kinetics of 15 µM Aβ40 in 20 mM MES buffer, pH 7.35, at 37 °C. Black: buffer only; Red: 2% DMSO; Blue: 15 µM 6-gingerol; Pink: 75 µM 6-gingerol; Green: 150 µM 6-gingerol; Dark blue: 300 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 µM 6-gingerol; and Purple: 1500 µM 6-gingerol. Average curves from four replicates are shown. Table 1. Kinetic parameters (τ½, τlag, and rmax) for fibril formation of 15 µM Aβ40 peptides, derived from fitting Eq. 1 to the ThT fluorescence curves shown in Fig. 2. Aβ control in buffer Aβ control in 2% DMSO +15 µM 6-gingerol +75 µM 6-gingerol +150 µM 6-gingerol +300 µM 6-gingerol +1500 µM 6-gingerol τ½ (hours) 2.17 ± 0.1 1.95 ± 0.03 2.1 ± 0.04 3.34 ± 0.08 1.7 ±0.06 2.03 ± 0.07 1.80 ± 0.05 τlag (hours) 0.94 ± 0.12 0.98 ± 0.08 0.99 ± 0.08 1.35 ± 0.12 0.50 ± 0.08 0.96 ± 0.13 1.04 ± 0.15 rmax (hours-1) 1.62 ± 0.06 2.05 ± 0.07 1.80 ± 0.07 1.01 ± 0.07 1.66 ± 0.05 1.86 ± 0.11 2.69 ± 0.17 AFM imaging AFM images were recorded for some of the samples used in the ThT fluorescence measurements, i.e. the two control samples of 15 µM Aβ40 peptides in buffer with and without 2% DMSO, and the samples with additions of 15 µM, 75 µM, and 300 µM of 6-gingerol (Fig. 3). These samples were incubated for 72 h, to ensure aggregation into the mature elongated fibrils seen in Fig. 3A. Incubation in the presence of 2% DMSO produced similar fibrils, although together with small non- fibrillar clumps (Fig. 3B). Somewhat similar results, although with even more clumps, were obtained for the samples incubated together with 15 and 75 µM 6-gingerol, which also contained 0.1% and 0.6% DMSO, respectively (Figs. 3C and 3D). The sample with 300 µM of 6-gingerol and 2% DMSO does however display a different morphology, as it clearly contains more amorphous clumps than elongated fibrils (Fig. 3E). When evaluating these samples, it is a confounding factor that DMSO appears to slightly affect the fibril formation. The sample with 300 µM 6-gingerol however contains 2% DMSO (Fig. 3E), i.e. the same amount of DMSO as the control sample with DMSO (Fig. 3B). Thus, the different morphologies of the Aβ40 aggregates in these two samples is clearly caused by the added 6-gingerol and not by the DMSO alone. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 Figure 3. AFM images showing aggregates of 15 µM Aβ40 peptide. (A) Aβ40 in buffer. (B) Aβ40 in DMSO. (C) Aβ40 and 15 µM 6-gingerol in DMSO, (D) Aβ40 and 75 µM 6-gingerol in DMSO, (E) Aβ40 and 300 µM 6-gingerol in DMSO. Top row: height profiles. Bottom row: mechanical phase images. NMR spectroscopy NMR experiments were conducted to investigate possible molecular interactions between 6-gingerol and the monomeric Aβ40 peptide. The finger-print region of the 1H,15N-HSQC spectrum of 92 μM monomeric 15N-labeled Aβ40 peptide is shown in Fig. 4 (blue spectrum), both for Aβ40 in buffer and for Aβ40 bound to SDS micelles. The SDS micelles were here used as a simple model for a membrane environment that is suitable for NMR studies (Österlund et al., 2018a; Österlund et al., 2018b). In both environments, addition of DMSO (2% in the buffer sample and 3% in the sample with SDS micelles) induces chemical shifts of most crosspeaks (Fig. 4, red spectra). This is consistent with previous NMR studies of Aβ40 in DMSO (Wallin et al., 2017). Addition of 6-gingerol dissolved in DMSO increased the DMSO concentration to 4% in the buffer sample and to 5% in the sample with SDS micelles. This addition induces chemical shift changes for the NMR crosspeaks that are perfectly consistent with the changes induced by DMSO alone (Fig. 4, orange spectra). This shows that 6-gingerol does not have any strong interaction of its own with monomeric Aβ40, neither in aqueous solution nor in a membrane environment. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Figure 4. 2D NMR 1H,15N-HSQC spectra recorded at +20 °C for 92 μM monomeric Aβ40 peptide in 20 mM sodium phosphate buffer, pH 7.3, for (A) Aβ40 in buffer alone, and (B) Aβ40 bound to micelles of 50 mM SDS. The spectra were recorded before (blue) and after addition of DMSO (red), and then after addition of 1.84 mM 6- gingerol in DMSO. DISCUSSION Given the ancient history and cultural importance of ginger in many parts of the world (Iranshahy and Javadi, 2019; Khodaie and Sadeghpoor, 2015; Saenghong et al., 2012), it is desirable to understand the molecular mechanisms behind its proposed benefits to human health. Such mechanistic investigations may also expand ethnomedical research, which often focuses on population-level medical effects and exposure/uptake levels (Sholts et al., 2017; Wärmländer et al., 2011). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 Here, we show that 6-gingerol interferes with the aggregation mechanisms of Aβ40 peptide aggregation, by inducing aggregation into amorphous clumps rather than into elongated fibrils (Fig. 3). Our ThT fluorescence assays show that 6-gingerol has no systematic effect on the kinetics of the Aβ40 aggregation process, and that approximately the same amount of amyloid aggregates is formed with and without 6- gingerol (Fig. 2). From a medical perspective, however, the most important aspect of Aβ aggregation may not be the amount or speed of aggregation, but rather the properties of the aggregates. The neuronal death in AD appears to be mainly caused by small oligomeric Aβ aggregates of unknown composition and structure (Luo et al., 2014b; Sardar Sinha et al., 2018; Sengupta et al., 2016) that might disrupt cell membranes (Wärmländer et al., 2019). Thus, the observed interference of 6-gingerol with the Aβ aggregation processes could provide a molecular explanation of the previously observed beneficial effects of gingerols on cell and animal models of AD pathology (Choi et al., 2018; Halawany et al., 2017; Jeong et al., 2013; Mohd Sahardi and Makpol, 2019; Wang et al., 2014a; Zeng et al., 2015). The NMR results show that 6-gingerol does not interact with monomeric Aβ40, neither in aqueous solution nor in membrane-mimicking micelles. Thus, interaction appears to take place only when oligomers or larger aggregates have formed. This is not unreasonable, as Aβ oligomers are considered to be more hydrophobic than the amphiphilic Aβ monomers (Wärmländer et al., 2019), and thus more likely to interact with the hydrophobic 6-gingerol molecules. In fact, the ideal AD drug is a molecule that interferes with toxic Aβ aggregates but not with the Aβ monomers, as the latter may have beneficial biological functions in their non-aggregated form (Dominy et al., 2019; Frozza et al., 2018; Querfurth and LaFerla, 2010; Rajendran and Annaert, 2012). As a molecule that is non-toxic (Kaul and Joshi, 2001), easy to produce and administer, and small enough to easily pass through the blood-brain-barrier, 6- gingerol has suitable properties for use as a drug. This study suggests that 6-gingerol may be used to combat AD by interfering with the aggregation of Aβ peptides. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 CONFLICT OF INTEREST The authors declare no conflicts of interest. ACKNOWLEDGMENTS We thank Teodor Svantesson and Georgia Pilkington for helpful discussions and advice. REFERENCES Abelein, A., Abrahams, J. P., Danielsson, J., Gräslund, A., Jarvet, J., Luo, J., Tiiman, A. and Wärmländer, S. K. (2014). The hairpin conformation of the amyloid beta peptide is an important structural motif along the aggregation pathway. J Biol Inorg Chem 19, 623-634. Agostinho, P., Cunha, R. A. and Oliveira, C. (2010). Neuroinflammation, oxidative stress and the pathogenesis of Alzheimer's disease. Curr Pharm Des 16, 2766-2778. Baronio, C. M., Baldassarre, M. and Barth, A. (2019). Insight into the internal structure of amyloid- beta oligomers by isotope-edited Fourier transform infrared spectroscopy. Phys Chem Chem Phys 21, 8587-8597. Choi, J. G., Kim, S. Y., Jeong, M. and Oh, M. S. (2018). Pharmacotherapeutic potential of ginger and its compounds in age-related neurological disorders. Pharmacol Ther 182, 56-69. Danielsson, J., Andersson, A., Jarvet, J. and Gräslund, A. (2006). 15N relaxation study of the amyloid beta-peptide: structural propensities and persistence length. Magn Reson Chem 44 Spec No, S114-121. de Lima, R. M. T., Dos Reis, A. C., de Menezes, A. P. M., Santos, J. V. O., Filho, J., Ferreira, J. R. O., de Alencar, M., da Mata, A., Khan, I. N., Islam, A., Uddin, S. J., Ali, E. S., Islam, M. T., Tripathi, S., Mishra, S. K., Mubarak, M. S. and Melo-Cavalcante, A. A. C. (2018). Protective and therapeutic potential of ginger (Zingiber officinale) extract and [6]-gingerol in cancer: A comprehensive review. Phytother Res 32, 1885-1907. Dominy, S. S., Lynch, C., Ermini, F., Benedyk, M., Marczyk, A., Konradi, A., Nguyen, M., Haditsch, U., Raha, D., Griffin, C., Holsinger, L. J., Arastu-Kapur, S., Kaba, S., Lee, A., Ryder, M. I., Potempa, B., Mydel, P., Hellvard, A., Adamowicz, K., Hasturk, H., Walker, G. D., Reynolds, E. C., Faull, R. L. M., Curtis, M. A., Dragunow, M. and Potempa, J. (2019). Porphyromonas gingivalis in Alzheimer's disease brains: Evidence for disease causation and treatment with small- molecule inhibitors. Sci Adv 5, eaau3333. Frozza, R. L., Lourenco, M. V. and De Felice, F. G. (2018). Challenges for Alzheimer's Disease Therapy: Insights from Novel Mechanisms Beyond Memory Defects. Front Neurosci 12, 37. Gade Malmos, K., Blancas-Mejia, L. M., Weber, B., Buchner, J., Ramirez-Alvarado, M., Naiki, H. and Otzen, D. (2017). ThT 101: a primer on the use of thioflavin T to investigate amyloid formation. Amyloid 24, 1-16. Halawany, A. M. E., Sayed, N. S. E., Abdallah, H. M. and Dine, R. S. E. (2017). Protective effects of gingerol on streptozotocin-induced sporadic Alzheimer's disease: emphasis on inhibition of beta-amyloid, COX-2, alpha-, beta - secretases and APH1a. Sci Rep 7, 2902. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 Iranshahy, M. and Javadi, B. (2019). Diet therapy for the treatment of Alzheimer’s disease in view of traditional Persian medicine: A review. Iranian Journal of Basic Medical Sciences 22, 1102- 1117. Jarvet, J., Danielsson, J., Damberg, P., Oleszczuk, M. and Gräslund, A. (2007). Positioning of the Alzheimer Abeta(1-40) peptide in SDS micelles using NMR and paramagnetic probes. J Biomol NMR 39, 63-72. Jeong, C. H., Bode, A. M., Pugliese, A., Cho, Y. Y., Kim, H. G., Shim, J. H., Jeon, Y. J., Li, H., Jiang, H. and Dong, Z. (2009). [6]-Gingerol suppresses colon cancer growth by targeting leukotriene A4 hydrolase. Cancer Res 69, 5584-5591. Jeong, J. K., Moon, M. H., Park, Y. G., Lee, J. H., Lee, Y. J., Seol, J. W. and Park, S. Y. (2013). Gingerol- induced hypoxia-inducible factor 1 alpha inhibits human prion peptide-mediated neurotoxicity. Phytother Res 27, 1185-1192. Kaul, P. N. and Joshi, B. S. (2001). Alternative medicine: Herbal drugs and their critical appraisal - Part II. In Progress in Drug Research (E. Jucker, Ed., Vol. 57, pp. 1-75. Birkhäuser, Basel, Switzerland. Khodaie, L. and Sadeghpoor, O. (2015). Ginger from ancient times to the new outlook. Jundishapur J Nat Pharm Prod 10, e18402. Lee, H. S., Seo, E. Y., Kang, N. E. and Kim, W. K. (2008). [6]-Gingerol inhibits metastasis of MDA-MB- 231 human breast cancer cells. J Nutr Biochem 19, 313-319. Leshem, G., Richman, M., Lisniansky, E., Antman-Passig, M., Habashi, M., Gräslund, A., Wärmländer, S. K. T. S. and Rahimipour, S. (2019). Photoactive chlorin e6 is a multifunctional modulator of amyloid-beta aggregation and toxicity via specific interactions with its histidine residues. Chem Sci 10, 208-217. Luo, J., Mohammed, I., Wärmländer, S. K., Hiruma, Y., Gräslund, A. and Abrahams, J. P. (2014a). Endogenous polyamines reduce the toxicity of soluble abeta peptide aggregates associated with Alzheimer's disease. Biomacromolecules 15, 1985-1991. Luo, J., Otero, J. M., Yu, C. H., Wärmländer, S. K., Gräslund, A., Overhand, M. and Abrahams, J. P. (2013). Inhibiting and reversing amyloid-beta peptide (1-40) fibril formation with gramicidin S and engineered analogues. Chemistry 19, 17338-17348. Luo, J., Wärmländer, S. K., Gräslund, A. and Abrahams, J. P. (2014b). Alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. Biochemistry 53, 6302-6308. Luo, J., Wärmländer, S. K., Gräslund, A. and Abrahams, J. P. (2016a). Cross-interactions between the Alzheimer Disease Amyloid-beta Peptide and Other Amyloid Proteins: A Further Aspect of the Amyloid Cascade Hypothesis. J Biol Chem 291, 16485-16493. Luo, J., Wärmländer, S. K., Gräslund, A. and Abrahams, J. P. (2016b). Reciprocal Molecular Interactions between the Abeta Peptide Linked to Alzheimer's Disease and Insulin Linked to Diabetes Mellitus Type II. ACS Chem Neurosci 7, 269-274. Mohd Sahardi, N. F. N. and Makpol, S. (2019). Ginger (Zingiber officinale Roscoe) in the Prevention of Ageing and Degenerative Diseases: Review of Current Evidence. Evid Based Complement Alternat Med 2019, 5054395. Nečas, D. and Klapetek, P. (2012). Gwyddion: an open-source software for SPM data analysis. Central European Journal of Physics 10, 181-188. Owen, M. C., Gnutt, D., Gao, M., Wärmländer, S. K. T. S., Jarvet, J., Gräslund, A., Winter, R., Ebbinghaus, S. and Strodel, B. (2019). Effects of in vivo conditions on amyloid aggregation. Chem Soc Rev 48, 3946-3996. Park, Y. J., Wen, J., Bang, S., Park, S. W. and Song, S. Y. (2006). [6]-Gingerol induces cell cycle arrest and cell death of mutant p53-expressing pancreatic cancer cells. Yonsei Med J 47, 688-697. Querfurth, H. W. and LaFerla, F. M. (2010). Alzheimer's disease. N Engl J Med 362, 329-344. Rajendran, L. and Annaert, W. (2012). Membrane trafficking pathways in Alzheimer's disease. Traffic 13, 759-770. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 Regen, F., Hellmann-Regen, J., Costantini, E. and Reale, M. (2017). Neuroinflammation and Alzheimer's Disease: Implications for Microglial Activation. Curr Alzheimer Res 14, 1140- 1148. Rhode, J., Fogoros, S., Zick, S., Wahl, H., Griffith, K. A., Huang, J. and Liu, J. R. (2007). Ginger inhibits cell growth and modulates angiogenic factors in ovarian cancer cells. BMC Complement Altern Med 7, 44. Richman, M., Wilk, S., Chemerovski, M., Wärmländer, S. K., Wahlström, A., Gräslund, A. and Rahimipour, S. (2013). In vitro and mechanistic studies of an antiamyloidogenic self- assembled cyclic D,L-alpha-peptide architecture. J Am Chem Soc 135, 3474-3484. Saenghong, N., Wattanathorn, J., Muchimapura, S., Tongun, T., Piyavhatkul, N., Banchonglikitkul, C. and Kajsongkram, T. (2012). Zingiber officinale Improves Cognitive Function of the Middle- Aged Healthy Women. Evid Based Complement Alternat Med 2012, 383062. Sardar Sinha, M., Ansell-Schultz, A., Civitelli, L., Hildesjö, C., Larsson, M., Lannfelt, L., Ingelsson, M. and Hallbeck, M. (2018). Alzheimer's disease pathology propagation by exosomes containing toxic amyloid-beta oligomers. Acta Neuropathol 136, 41-56. Selkoe, D. J. and Hardy, J. (2016). The amyloid hypothesis of Alzheimer's disease at 25 years. EMBO Mol Med 8, 595-608. Semwal, R. B., Semwal, D. K., Combrinck, S. and Viljoen, A. M. (2015). Gingerols and shogaols: Important nutraceutical principles from ginger. Phytochemistry 117, 554-568. Sengupta, U., Nilson, A. N. and Kayed, R. (2016). The Role of Amyloid-beta Oligomers in Toxicity, Propagation, and Immunotherapy. EBioMedicine 6, 42-49. Sharifi-Rad, M., Varoni, E. M., Salehi, B., Sharifi-Rad, J., Matthews, K. R., Ayatollahi, S. A., Kobarfard, F., Ibrahim, S. A., Mnayer, D., Zakaria, Z. A., Sharifi-Rad, M., Yousaf, Z., Iriti, M., Basile, A. and Rigano, D. (2017). Plants of the Genus Zingiber as a Source of Bioactive Phytochemicals: From Tradition to Pharmacy. Molecules 22. Sholts, S. B., Smith, K., Wallin, C., Ahmed, T. M. and Wärmländer, S. (2017). Ancient water bottle use and polycyclic aromatic hydrocarbon (PAH) exposure among California Indians: a prehistoric health risk assessment. Environmental health : a global access science source 16, 61. Wallin, C., Sholts, S. B., Österlund, N., Luo, J., Jarvet, J., Roos, P. M., Ilag, L., Gräslund, A. and Wärmländer, S. K. T. S. (2017). Alzheimer's disease and cigarette smoke components: effects of nicotine, PAHs, and Cd(II), Cr(III), Pb(II), Pb(IV) ions on amyloid-beta peptide aggregation. Sci Rep 7, 14423. Wang, S., Zhang, C., Yang, G. and Yang, Y. (2014a). Biological properties of 6-gingerol: a brief review. Nat Prod Commun 9, 1027-1030. Wang, X., Wang, W., Li, L., Perry, G., Lee, H. G. and Zhu, X. (2014b). Oxidative stress and mitochondrial dysfunction in Alzheimer's disease. Biochimica et biophysica acta 1842, 1240- 1247. Wärmländer, S., Tiiman, A., Abelein, A., Luo, J., Jarvet, J., Söderberg, K. L., Danielsson, J. and Gräslund, A. (2013). Biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. Chembiochem 14, 1692-1704. Wärmländer, S. K., Sholts, S. B., Erlandson, J. M., Gjerdrum, T. and Westerholm, R. (2011). Could the health decline of prehistoric California indians be related to exposure to polycyclic aromatic hydrocarbons (PAHs) from natural bitumen? Environ Health Perspect 119, 1203-1207. Wärmländer, S. K. T. S., Österlund, N., Wallin, C., Wu, J., Luo, J., Tiiman, A., Jarvet, J. and Gräslund, A. (2019). Metal binding to the Amyloid-β peptides in the presence of biomembranes: potential mechanisms of cell toxicity. Journal of Biological Inorganic Chemistry 24, 1189–1196. Zeng, G. F., Zong, S. H., Zhang, Z. Y., Fu, S. W., Li, K. K., Fang, Y., Lu, L. and Xiao, D. Q. (2015). The Role of 6-Gingerol on Inhibiting Amyloid beta Protein-Induced Apoptosis in PC12 Cells. Rejuvenation Res 18, 413-421. Österlund, N., Kulkarni, Y. S., Misiaszek, A. D., Wallin, C., Krüger, D. M., Liao, Q., Mashayekhy Rad, F., Jarvet, J., Strodel, B., Wärmländer, S. K. T. S., Ilag, L. L., Kamerlin, S. C. L. and Gräslund, A. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 (2018a). Amyloid-beta Peptide Interactions with Amphiphilic Surfactants: Electrostatic and Hydrophobic Effects. ACS Chem Neurosci 9, 1680-1692. Österlund, N., Luo, J., Wärmländer, S. K. T. S. and Gräslund, A. (2018b). Membrane-mimetic systems for biophysical studies of the amyloid-beta peptide. Biochim Biophys Acta Proteins Proteom. Dominy, S.S., Lynch, C., Ermini, F., Benedyk, M., Marczyk, A., Konradi, A., Nguyen, M., Haditsch, U., Raha, D., Griffin, C., Holsinger, L.J., Arastu-Kapur, S., Kaba, S., Lee, A., Ryder, M.I., Potempa, B., Mydel, P., Hellvard, A., Adamowicz, K., Hasturk, H., Walker, G.D., Reynolds, E.C., Faull, R.L.M., Curtis, M.A., Dragunow, M., Potempa, J., 2019. Porphyromonas gingivalis in Alzheimer's disease brains: Evidence for disease causation and treatment with small- molecule inhibitors. Sci Adv 5, eaau3333. Frozza, R.L., Lourenco, M.V., De Felice, F.G., 2018. Challenges for Alzheimer's Disease Therapy: Insights from Novel Mechanisms Beyond Memory Defects. Front Neurosci 12, 37. Querfurth, H.W., LaFerla, F.M., 2010. Alzheimer's disease. N Engl J Med 362, 329-344. Rajendran, L., Annaert, W., 2012. Membrane trafficking pathways in Alzheimer's disease. Traffic 13, 759-770. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.03.425159doi: bioRxiv preprint https://doi.org/10.1101/2021.01.03.425159 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_04_425171 ---- Dynamic closed states of a ligand-gated ion channel captured by cryo-EM and simulations Dynamic closed states of a ligand-gated ion channel captured by cryo-EM and simulations Urška Rovšnik1, Yuxuan Zhuang1, Björn O Forsberg 1,2, Marta Carroni 1, Linnea Yvonnesdotter1, Rebecca J Howard1, Erik Lindahl1,3 1Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, 17121 Solna, Sweden 2Division of Structural Biology, Wellcome Centre for Human Genetics, University of Oxford, OX3 7BN Oxford, United Kingdom 3 Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, 17121 Solna, Sweden Corresponding author: Erik Lindahl, Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 17121 Solna, Sweden; erik.lindahl@scilifelab.se Abstract Ligand-gated ion channels are critical mediators of electrochemical signal transduction across evolution. Biophysical and pharmacological characterization of these receptor proteins relies on high-quality structures in multiple, subtly distinct functional states. However, structural data in this family remain limited, particularly for resting and intermediate states on the activation pathway. Here we report cryo-electron microscopy (cryo-EM) structures of the proton-activated Gloeobacter violaceus ligand-gated ion channel (GLIC) under three pH conditions. Decreased pH was associated with improved resolution and sidechain rearrangements at the subunit/domain interface, particularly involving functionally important residues in the β1–β2 and M2–M3 loops. Molecular dynamics simulations substantiated flexibility in the closed-channel extracellular domains relative to the transmembrane ones, and supported electrostatic remodeling around E35 and E243 in proton-induced gating. Exploration of secondary cryo-EM classes further indicated a low-pH population with an expanded pore. These results support a dissection of protonation and activation steps in pH-stimulated conformational cycling in GLIC, including interfacial rearrangements largely conserved in the pentameric channel family. 1 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Introduction Pentameric ligand-gated ion channels are major mediators of fast synaptic transmission in the mammalian nervous system, and serve a variety of biological roles across evolution [1]. Representative X-ray and cryo-electron microscopy (cryo-EM) structures in this family have confirmed a five-fold pseudosymmetric architecture, conserved from prokaryotes to humans [2]. The extracellular domain (ECD) of each subunit contains β-strands β1–β10, with the characteristic Cys- or Pro-loop [3] connecting β6–β7, and loops A–F enclosing a canonical ligand-binding site [4] at the interface between principal and complementary subunits. The transmembrane domain (TMD) contains α-helices M1–M4, with M2 lining the channel pore, and an intracellular domain of varying length (2–80 residues) inserted between M3 and M4. Extracellular agonist binding is thought to favor subtle structural transitions from resting to intermediate or ‘flip’ states [5], opening of a transmembrane pore [6], and in most cases a refractory desensitized phase [7]. Accordingly, a detailed understanding of pentameric channel biophysics and pharmacology depends on high-quality structural templates in multiple functional states. However, high-resolution structures can be biased by stabilizing measures such as ligands, mutations, and crystallization, leaving open questions as to the wild-type activation process. As a model system in this family, the Gloeobacter violaceus proton-gated ion channel (GLIC) has historically offered both insights and limitations [8]. This prokaryotic receptor has been functionally characterized in multiple cell types [9] and crystallizes readily under activating conditions (pH ≤ 5.5) [10], [11], producing apparent open structures up to 2.22 Å resolution [12] in the absence and presence of various ligands [13]–[22] and mutations [23]–[26] . Additional low-pH X-ray structures of GLIC have been reported in lipid-modulated [27] and so-called locally closed states [28]–[31] , with a hydrophobic constriction at the pore midpoint (I233, I9’ in prime notation) as predicted for closed channels throughout the family [32]. Crystallography at neutral pH has also been reported, but only to relatively low resolution (4.35 Å), suggesting a resting state with a relatively expanded, twisted 2 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ ECD as well as a contracted pore [33], [34]. Alternative structural methods have supported the existence of multiple nonconducting conformations [35]–[37] , and biochemical studies have implicated titratable residues including E35 and E243 in pH sensing [12], [26], [37], [38]. However, due in part to limited structural data for wild-type GLIC in resting, intermediate, or desensitized states, the mechanism of proton gating remains unclear. Here, we report single-particle cryo-EM structures and molecular dynamics (MD) simulations of GLIC at pH 7, 5, and 3. Taking advantage of the relatively flexible conditions accessible to cryo-EM, we resolve multiple closed structures, distinct from those previously reported by crystallography. We find rearrangements of E35 and E243 differentiate deprotonated versus protonated conditions, providing a dynamic rationale for proton-stimulated remodeling. Classification of cryo-EM data further indicated a minority population with a contracted ECD and expanded pore. These results support a dissection of protonation and activation steps in pH-stimulated conformational cycling, by which GLIC preserves a general gating pathway via interfacial electrostatics rather than ligand binding. Results Differential resolution of GLIC cryo-EM structures with varying pH To characterize the resting state of the prokaryotic pentameric channel GLIC, we first obtained single-particle cryo-EM data under resting conditions (pH 7), resulting in a map to 4.1 Å overall resolution (Fig 1A–B, Fig EV1, Appendix Fig S1, Appendix Fig S2, Table 1). Local resolution was between 3.5 and 4.0 Å in the TMD, including complete backbone traces for all four transmembrane helices. Sidechains in the TMD core were clearly resolved (Fig EV2A), including a constriction at the I233 hydrophobic gate (I9’, 2.9 Å Cβ-atom radius), consistent with a closed pore. Whereas some extracellular regions were similarly well resolved (Fig EV2B), local resolution in the ECD was generally lower (Fig 1B), with some atoms that could not be definitively built in the β1–β2 loop, β8–β9 loop (loop F), and at the apical end of the ECD (Fig 2B). 3 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ GLIC has been thoroughly documented as a proton-gated ion channel, conducting currents in response to low extracellular pH with half-maximal activation around pH 5 [9]. Taking advantage of the flexible buffer conditions accessible to cryo-EM, we obtained additional reconstructions under partial and maximal (pH 5 and pH 3) activating conditions, producing maps to 3.4 Å and 3.6 Å, respectively (Fig 1C–D, Fig EV1, Appendix Fig S1, Appendix Fig S2). Overall map quality improved at lower pH, though local resolution in the TMD remained high relative to the ECD (Fig 1C–D). As a partial check for our map comparisons, we also selected random subsets containing equivalent numbers of particles from each dataset; we found the pH-5 and pH-3 datasets still produced higher-quality reconstructions than those at pH 7 (Appendix Fig S3), indicating that differential resolution could not be trivially attributed to data quantity. Surprisingly, backbone alignments of models at both pH 5 and pH 3 indicated close fits to the pH-7 model (root mean-squared deviation over non-loop Cα atoms, RMSD ≤ 0.6 Å) in both the ECD and TMD, including a closed conformation of the transmembrane pore (Fig 1B–D, Fig 2A). All three models deviated moderately from resting (PDB ID: 4NPQ, ECD RMSD ≤ 1.4 Å, TMD RMSD ≤ 0.8 Å) but further from open X-ray structures (PDB ID: 4HFI, ECD RMSD ≤ 2.2 Å, TMD RMSD ≤ 1.9 Å), suggesting systematic differences in EM versus crystallized conditions, as well as general alignment to a conserved closed-state backbone. Still, variations in local resolution and sidechain orientation indicated pH-dependent conformational changes at the subunit-domain interface, as described below. Sidechain rearrangements in low-pH structures In the ECD, differential resolution was notable in the β1–β2 loop, particularly in the principal proton-sensor [12], [26] residue E35. At pH 7 and pH 5, little definitive density was associated with this sidechain (Fig 2B, left, center); conversely at pH 3, it clearly extended towards the complementary loop F, forming a possible hydrogen bond with T158 (3.5 Å donor-acceptor; Fig 2B, right). Notably, this interaction mirrored that observed in open X-ray structures (Fig EV3), despite the general absence of open-like backbone rearrangements in the cryo-EM structure. At the midpoint of the same β1–β2 loop, density surrounding basic residue K33 was 4 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ similarly absent at pH 7 and pH 5, but clearly defined a sidechain oriented down towards the TMD at pH 3 (Fig 2B). An additional acidic residue, D31, could also be uniquely built at pH 3, oriented in towards the central vestibule. Although not in direct contact with neighboring sidechains or domains, its enhanced definition further supported stabilization of the β1–β2 loop. Among seven other acidic residues (E75, D97, D115, D122, D145, D161, D178) associated with improved densities at low pH, only D122 has been shown to substantially influence channel properties [30]; this residue is involved in an electrostatic network conserved across evolution, with substitutions decreasing channel expression as well as function [26], suggesting its role may involve assembly or architecture more than proton sensitivity. In the TMD, rearrangements were observed particularly in the M2–M3 loop, a region thought to couple ECD activation to TMD-pore opening. At pH 7, K248 at the loop midpoint oriented down toward the M2 helix, where it could form an intrasubunit hydrogen bond with E243. Conversely, at pH 5 and pH 3, K248 reoriented out towards the complementary subunit. Residue K248 has been implicated in GLIC ECD-TMD coupling [28], while E243 was shown to be an important proton sensor [12]; indeed, rearrangement of K248 to an interfacial orientation is also evident in open X-ray structures, with an accompanying iris-like motion of the M2–M3 region—including both K248 and E243—outward from the channel pore (Fig EV3). Thus, sidechain arrangements in both the ECD and TMD were consistent with proton activation, while maintaining a closed pore. Remodeled electrostatic contacts revealed by molecular dynamics To elucidate the basis for variations in local resolution (Fig 1B–D) and sidechain orientation (Fig 2B–D) described above, and assess whether it is a property of the state or experiment, we ran quadruplicate 1-µs all-atom MD simulations of each cryo-EM structure, embedded in a lipid bilayer and 150 mM NaCl. To further test the role of pH, we ran parallel simulations with a subset of acidic residues modified to approximate the probable protonation pattern under activating conditions, as previously described [14]. For comparison, X-ray structures reported previously under resting and activating conditions were also simulated, at neutral and low-pH 5 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ protonation states respectively. Simulation RMSD converged to a similar degree within 250 ns (Fig EV4A), with all except the open X-ray structure dehydrated around the hydrophobic gate (Fig EV4B). Simulations of all three cryo-EM structures exhibited elevated RMSD for the extracellular domains (RMSD<3.5 Å) versus transmembrane regions (RMSD<2.0 Å), consistent with higher flexibility in the ECD; both domains exhibited similarly low RMSD in simulations of the open X-ray structure (Fig EV4A). In the ECD, simulations suggested a dynamic basis for pH-dependent interactions of the E35 proton sensor at the intersubunit β1–β2/loop-F interface (Fig 3A–C). Under resting (deprotonated) conditions, negatively charged E35 attracted cations from the extracellular medium, forming a direct electrostatic contact with Na + in >35% of simulation frames (Fig 3A–B). These environmental ions were not coordinated by other protein motifs in a rigid binding site, potentially explaining poorly resolved densities in this region in neutral-pH structures. Cation coordination decreased slightly in the pH-3 structure even under deprotonated conditions, but was effectively eliminated in all simulations under activating (protonated) conditions. In parallel, mean Cα-distances between E35 and the complementary T158 contracted in protonated simulations to values approaching the open X-ray structure (Fig 3A, C), as the now-uncharged glutamate released Na + and became available to interact with the proximal threonine. In the TMD, simulations further substantiated gating-like rearrangements in the M2–M3 loop (Fig 3D–F). In simulations of the pH-7 structure under deprotonated conditions, the K248 sidechain was attracted down in each subunit towards the negatively charged E243; similar to the starting structure (Fig 2C–D), these residues formed an electrostatic contact in >70% of trajectory frames (Fig 3D–E). In simulations of the pH-3 structure, K248 more often oriented out toward the subunit interface (Fig 3D–E), also as seen in the corresponding structure (Fig 2C–D). Moreover, E243-K248 interactions decreased in protonated versus deprotonated simulations of all three structures, with the prevalence of this contact in protonated simulations at pH 3 (<25%) approaching that in open X-ray structures (Fig 3E). 6 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Projecting the M2–M3 loop conformations onto the two lowest principal component (PC) degrees of freedom further revealed distinct populations at pH 7, pH 5, and pH 3 (Fig 3F). The two dominant PCs for this motif were associated with flipping of K248 from a downward to outward orientation (PC1), and stretching of the loop across the subunit/domain interface (PC2). Projected along these axes, structures determined in decreasing pH conditions increasingly approximated the open X-ray structure, particularly in protonated simulations. Thus, in addition to substantiating differential stability in extracellular and transmembrane regions, MD simulations offered a rationale for dynamic pH-dependent rearrangements at the subunit/domain interface. Minority classes suggest alternative states Compared to the best-quality reconstructions obtained at each pH (state 1, Fig 1B–D), cryo-EM data classification in all cases identified minority populations, indicating the presence of multiple conformations that could correspond to functionally relevant states. In particular, a minority class (state 2) at pH 3 was visibly contracted and rotated in the ECD relative to pH 3 (state 1) (Fig EV5A). Although a complete atomic model could not be built at this resolution (4.9 Å), refinement of the pH-3 state-1 backbone into the state-2 density revealed systematic reductions in ECD spread and domain twist, echoing transitions from resting to open X-ray structures (Fig EV5B) [33], [34]. Minority classes could also be reconstructed at pH 7 and pH 5, although to lower resolution (5.8 Å and 5.1 Å respectively), and with less apparent divergence from state 1 in each condition (Appendix Fig S4A–C). In the TMD, pH-3 state 2 also exhibited a tilted conformation of the upper M2 helices, outward towards the complementary subunit and away from the channel pore relative to state 1 (Fig 4A–C). Whereas the upper pore in state-1 models was almost indistinguishable from that of the resting X-ray structure (Fig 4, Appendix Fig S4A–C), in pH-3 state 2 it transitioned substantially towards the open X-ray state (Fig 4B). Static pore profiles [39] revealed expansion of pH-3 state 2 at channel-facing residues S230–I240 (S6’–I16’) (Fig 4D). The open X-ray structure was initially even more expanded: MD simulations of that state consistently converged to a more contracted pore at and above S6’; indeed, some open-state replicates sampled 7 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ profiles overlapping pH-3 state 2 (Fig 4D), while remaining hydrated at the I9’ hydrophobic gate (Fig EV4B). In contrast, simulations of state-1 cryo-EM and resting X-ray structures did not substantially contract in the upper pore (Appendix Fig S4D–H). Thus, minority classes indicated the presence of alternative functional states consistent with activating transitions at low pH. Discussion Structures of GLIC in this work represent the first reported by cryo-EM, to our knowledge, covering multiple pH conditions and revealing electrostatic interactions at key subunit interfaces which are further substantiated by microsecond-scale MD simulations. Our data support a multi-step model for proton activation, in which closed states are characterized by a relatively flexible expanded ECD and a contracted upper pore (Fig 5A). Protonation of both ECD (E35) and TMD (E243) glutamates relieves charge interactions associated with the resting state, enabling sidechain remodeling particularly in the β1–β2 and M2–M3 loops, without necessarily altering the backbone fold (Fig 5B). Further rearrangements of the backbone are proposed to retain protonated sidechain arrangements by contracting the ECD and expanding the TMD pore, as indicated both by a minority class in our low-pH cryo-EM data (Fig 4), and by comparisons with apparent open X-ray structures (Fig 5C). Direct involvement of extracellular loops β1–β2 and F in proton sensing proved consistent with several recent predictions. Mutations at β1–β2 residue E35 were among the most impactful of any acidic residues in previous scanning experiments [26]. Moreover, past spectroscopic studies showed the pH of receptor activation recapitulates the individual pKa of this residue, implicating it as the key proton sensor [12]. In contrast, mutations at K33 have not been shown to dramatically influence channel function; indeed, previous crosslinking with the M2–M3 loop showed this position can either preserve or inhibit proton activation [28], suggesting the improved definition we observed for this sidechain at low pH was more a byproduct of local remodeling than a determinant of gating. At E35’s closest contact, loop-F residue T158, chemical labeling has been shown to reversibly inhibit 8 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ activation [12], supporting a role in channel function. Interestingly, loop F adopted a different conformation in our structures at pH 5 compared to pH 7 or pH 3 (Fig 2B–C), suggesting this region samples a range of conformations; indeed, previous spin-labeling studies indicated this position, along with several neighbors on the β8 strand, to be highly dynamic [40]. Although its broader role in pentameric channel gating remains controversial, loop F has often been characterized as an unstructured motif that undergoes substantial rearrangement during ligand binding [41], echoing the mechanism proposed here for GLIC. Transmembrane residues E243 and K248 have been similarly implicated in channel function, albeit secondary to E35 in proton sensing. Residue E243 on the upper M2 helix is exposed to solvent, and has been predicted to protonate at low pH [14], [38]. Previous studies have shown some mutations at this position to be silent, while others dramatically alter pH sensitivity [12], [26], [37], [42], suggesting its involvement in state-dependent interactions is complex. Interestingly, E243 has also been shown to mediate interactions with allosteric modulators via a cavity at the intersubunit interface [16], indicating a role for this residue in agonist sensitivity and/or coupling. At K248, cysteine substitution was previously shown to increase proton sensitivity [28], consistent with a weakening of charge interactions specific to the resting state (Fig 5). Past simulations based on X-ray structures also showed K248 to prefer intrasubunit interactions at rest, versus intersubunit interactions in the open state [38], although E343/K248 interactions were particularly apparent in the present work. Our reconstructions offer a structural rationale for the predominance of open and locally closed states in the crystallographic literature. The apparent resting state (pH 7) was characterized by relatively low reconstructed resolution (Fig 1B, Fig 2A) and flexibility in the ECD (Fig EV4A, Fig 5A), particularly at the domain interface and peripheral surfaces, potentially conferring entropic favorability. Crystallization enforces conformational homogeneity, and may select for rigidified states particularly at crystal-contact surfaces; according to the model above (Fig 5), such conditions could bias towards a more uniform open state. Interestingly, our simulations 9 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ suggested the apparent open pore of the X-ray structure may not persist outside the crystal, potentially sampling more contracted conformations similar to pH-3 state 2 (Fig 4) while remaining generally hydrated (Fig EV4). Conversely, cryo-EM could be expected to reveal favored but flexible states (Fig 2, Fig 4), with the caveat that there might instead be a bias towards higher-resolution states. A heterogeneous mixture of closed states is notably consistent with previous atomic force microscopy studies in GLIC [36]. Whereas loose packing of the ECD core has been proposed as a gating strategy specific to eukaryotic members of this channel family [43]; our data indicate an expanded, flexible ECD may also be important to earlier evolutionary branches. Multiple GLIC structures reported in this work were characterized by closed pores, including states consistent with either deprotonated or protonated conditions. It is theoretically possible that electrostatic conditions might be modified in cryo-EM by interaction with the glow-discharged grid or air-water interface, masking effects of protonation. However, we consistently noted subtle shifts in stability and conformation, indicating that local effects of protonation were reflected in the major resolved class. Indeed, improved resolution of several acidic residues at low pH appeared consistent with protonation, given the tendency of anionic sidechains to resolve poorly by cryo-EM [44]. Notably, the protonated closed state proposed here (Fig 5B) differs from previously reported locally closed and lipid-modulated forms, which have been captured for multiple GLIC variants at low pH [27], [29]–[31] ; the ECD in these structures is generally indistinguishable from that of the open state, suggesting the corresponding variations or modulators decouple extracellular transitions from pore opening [7], [37]. In contrast, the minority class at pH 3 (state 2) approached open-state properties in both domains, including a contracted and untwisted ECD (Fig EV5) and a partly expanded pore (Fig 4). With a resting-like backbone configuration, but sidechains consistent with proton activation, the low-pH cryo-EM (state-1) structure may correspond to a pre-open state on the opening pathway [37], [45], [46]. The predominance of this state implies a submaximal open probability even at pH 3. Due in part to its low conductance in single-channel recordings [9], the open probability of GLIC is not well established; 10 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ however other family members, including some subtypes of nicotinic acetylcholine and GABAA receptors [47], [48], are known to flicker between conductance states even at high agonist concentrations, consistent with a large population of closed channels. An intriguing alternative is that this structure corresponds to a desensitized state, which would be expected to predominate at pH 3 subsequent to channel opening [35]. However, desensitized states in this family are generally thought to transition through an open state upon ligand dissociation, before returning to rest; aside from sidechain reorientation, no structural rearrangements are immediately obvious that would prevent transition directly to the resting state (Fig 2). Indeed, none of our cryo-EM models resembled desensitized structures of other pentameric channels, thought to retain an expanded upper TMD [27], but block conduction at a secondary, intracellular gate [7]. Although proton activation appears to be a particular adaptation in GLIC, remodeling at the subunit/domain interface mirrors putative gating mechanisms in several of its ligand-activated relatives (Appendix Fig S5). In particular, protonation of E35 and E243 are proposed to release charge interactions in the β1–β2 loop and upper M2 helix, enabling remodeling in loop F and the M2–M3 loop (Fig 2, Fig 3, Fig 5B). Further rearrangement to the open state contracts both the β1–β2/M2 and F/M2–M3 clefts (Fig 5C, Appendix Fig S5A). The same pattern is evident in agonist-bound versus apo structures of ELIC, GluCl, glycine and nicotinic receptors (Appendix Fig S5B–E) [49]–[54] , and in open/desensitized versus inhibitor-bound structures of DeCLIC and GABAA receptors (Appendix Fig S5F–G) [55], [56]. A noted exception is the 5-HT 3A receptor, in which loop F instead translocates outward and the M2–M3 loop inward (Appendix Fig S5H), suggesting that apparent open states reported for 5-HT 3A may sample a divergent mechanism of gating [57]–[59] . The subtle dynamics of allosteric signal transduction in pentameric ligand-gated ion channels, and their sensitivity to drug modulation, have driven substantial interest in characterizing endpoint and intermediate structures along the gating pathway. Our data substantiate a protonated closed state, accompanied by a minority population with an expanded pore, and spotlight intrinsic challenges in capturing flexible 11 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ conformations. We further offer a rationale for proton-stimulated sidechain remodeling of multiple residues at key interfaces, with apparent parallels in other family members. Dissection of the gating landscape of a ligand-gated ion channel thus illuminates both insights and limitations of GLIC as a model system in this family, and support a mechanistic model in which entropy favors a flexible, expanded ECD, with agonists stabilizing rearrangements at the subunit/domain interface. Materials and Methods GLIC expression and purification Expression and purification of GLIC-MBP was adapted from protocols published by Nury and colleagues [14]. Briefly, C43(DE3) E. coli transformed with GLIC-MBP in pET-20b were cultured overnight at 37° C. Cells were inoculated 1:50 into 2xYT media with 100 μg/mL ampicillin, grown at 37° C to OD600 = 0.7, induced with 100 μM isopropyl-β-D-1-thiogalactopyranoside, and shaken overnight at 20° C. Membranes were harvested from cell pellets by sonication and ultracentrifugation in buffer A (300 mM NaCl, 20 mM Tris-HCl pH 7.4) supplemented with 1 mg/mL lysozyme, 20 μg/mL DNase I, 5 mM MgCl2, and protease inhibitors, then frozen or immediately solubilized in 2 % n-dodecyl-β-D-maltoside (DDM). Fusion proteins were purified in batch by amylose affinity (NEB), eluting in buffer B (buffer A with 0.02% DDM) with 2–20 mM maltose, then further purified by size exclusion chromatography in buffer B. After overnight thrombin digestion, GLIC was isolated from its fusion partner by size exclusion, and concentrated to 3–5 mg/mL by centrifugation. Cryo-EM sample preparation and data acquisition For freezing, Quantifoil 1.2/1.3 Cu 300 mesh grids (Quantifoil Micro Tools) were glow-discharged in methanol vapor prior to sample application. 3 μl sample was applied to each grid, which was then blotted for 1.5 s and plunge-frozen into liquid ethane using a FEI Vitrobot Mark IV. Micrographs were collected on an FEI Titan Krios 300 kV microscope with a post energy filter Gatan K2-Summit direct detector camera. Movies were collected at nominal 165,000x magnification, equivalent to a 12 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ pixel spacing of 0.82Å. A total dose of 40.8 e−/Å2 was used to collect 40 frames over 6 sec, using a nominal defocus range covering -2.0 to -3.8 µm. Image processing Motion correction was carried out with MotionCor2 [60]. All subsequent processing was performed through the RELION 3.1 pipeline [61]. Defocus was estimated from the motion corrected micrographs using CtfFind4 [62]. Following manual picking, initial 2D classification was performed to generate references for autopicking. Particles were extracted after autopicking, binned and aligned to a 15Å density generated from the GLIC crystal structure (PDB ID: 4HFI [17]) by 3D auto-refinement. The acquired alignment parameters were used to identify and remove aberrant particles and noise through multiple rounds of pre-aligned 2D- and 3D-classification. The pruned set of particles was then refined, using the initially obtained reconstruction as reference. Per-particle CTF parameters were estimated from the resulting reconstruction using RELION 3.1. Global beam-tilt was estimated from the micrographs and correction applied. Micelle density was eventually subtracted and the final 3D auto-refinement was performed using a soft mask covering the protein, followed by post-processing, utilizing the same mask. Local resolution was estimated using the RELION implementation. Post-processed densities were improved using ResolveCryoEM, a part of the PHENIX package (release 1.18 and later) [63] based on maximum-likelihood density modification, previously used to improve maps in X-ray crystallography [64]. Densities from both RELION post-processing and ResolveCryoEM were used for building; figures show output from ResolveCryoEM (Fig 2, Fig EV2). Densities for minority classes were obtained by systematic and extensive 3D-classification rounds in RELION 3.1, with iterative modifications to parameters including angular search, T parameter, and class number. 13 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Model building Models were built starting from a template using an X-ray structure determined at pH 7 (PDB ID: 4NPQ [33], chain A), fitted to each reconstructed density. PHENIX 1.18.2-3874 [63] real-space refinement was used to refine this model, imposing 5-fold symmetry through NCS restraints detected from the reconstructed cryo-EM map. The model was incrementally adjusted in COOT 0.8.9.1 EL [65] and re-refined until conventional quality metrics were optimized in agreement with the reconstruction. Model statistics are summarized in Table 1. Model alignments were performed using the match function in UCSF Chimera [66] on Cα atoms, excluding extracellular loops, for residues 17–192 (ECD) or 196–314 (TMD). MD simulations Manually built cryo-EM structures, as well as previously published X-ray structures (resting, PDB ID: 4NPQ [33]; open, PDB ID: 4HFI [17]), were used as starting models for MD simulations. The Amber99sb-ILDN force field [67] was used to describe protein interactions. Each protein was embedded in a bilayer of 520 Berger [68] 1-palmitoyl-2-oleoyl- sn -glycero-3-phosphocholine lipids. Each system was solvated in a 14 * 14 * 15 nm 3 box using the TIP3P water model [69], and NaCl was added to bring the system to neutral charge and an ionic strength of 150 mM. All simulations were performed with GROMACS 2019.3 [70]. Systems were energy-minimized using the steepest descent algorithm, then relaxed for 100ps in the NVT ensemble at 300 K using the velocity rescaling thermostat [71]. Bond lengths were constrained [72], particle mesh Ewald long-range electrostatics used [73], and virtual sites for hydrogen atoms implemented, enabling a time step of 5 fs. Heavy atoms of the protein were restrained during relaxation, followed by another 45 ns of NPT relaxation at 1 bar using Parrinello-Rahman pressure coupling [74] and gradually releasing the restraints. Finally, the system was relaxed with all unresolvable residues unrestrained for an additional 150 ns. For each relaxed system, four replicates of 1 μs unrestrained simulations were generated. 14 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Analyses were performed using VMD [75], CHAP [39], and MDTraj [76]. Time-dependent RMSDs were calculated for Cα atoms in generally resolved regions of the ECD (residues 15–48, 66–192) or TMD (residues 197–313). The number of sodium ions around E35 was quantified within a distance of 5 Å, using simulation frames sampled every 10 ns (400 total frames from 4 simulations in each condition), as described in Fig 3. PC analysis of the M2–M3 loop was performed on Cα atoms of residues E243–P250 of five superposed static models (three cryo-EM structures, resting and open X-ray structures), treating each subunit separately. The simulations were then projected onto PC1 (36% of the variance) versus PC2 (26% of the variance), and were plotted using kernel density estimation. Representative motions for PC1 and PC2 were visualized as sequences of snapshots from blue (negative values) to purple (positive values). ECD radius and domain twist were quantified as in previous work [38]. ECD radius was determined by the average distance from the Cα-atom center-of-mass (COM) of each subunit ECD to that of the full ECD, projected onto a plane perpendicular to the channel axis. Domain twist was determined by the average dihedral angle defined by COM coordinates of 1) a single subunit-ECD, 2) the full ECD, 3) the full TMD, and 4) the same single-subunit TMD. Data Availability Three-dimensional cryo-EM density maps of the pentameric ligand-gated ion channel GLIC in detergent micelles have been deposited in the Electron Microscopy Data Bank under accession numbers EMD-11202 (pH 7), EMD-11208 (pH 5) and EMD-11209 (pH 3), respectively. Each deposition includes the cryo-EM sharpened and unsharpened maps, both half-maps and the mask used for final FSC calculation. Coordinates of all models have been deposited in the Protein Data Bank. The accession numbers for the three GLIC structures are 6ZGD (pH 7), 6ZGJ (pH 5) and 6ZGK (pH 3). Full input data, parameters, settings, commands and trajectory subsets from MD simulations are archived at Zenodo.org under DOI: 10.5281/zenodo.4320552. Densities for minority classes are available upon request. 15 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Acknowledgments The authors would like to thank the Swedish Cryo-EM National Facility staff, in particular Julian Conrad, José Miguel de la Rosa Trevin and Stefan Fleischmann from Stockholm and Michael Hall from Umeå, for kind assistance with data collection, modeling and supervision. This work was supported by grants from the Knut and Alice Wallenberg Foundation, the Swedish Research Council (2017-04641, 2018-06479, 2019-02433), the Swedish e-Science Research Centre, and the BioExcel Center of Excellence (EU 823830). UR was supported by a scholarship from the Sven and Lilly Lawski Foundation. The cryo-EM data were collected at the Swedish national cryo-EM facility funded by the Knut and Alice Wallenberg Foundation, Erling Persson and Kempe Foundations. Computational resources were provided by the Swedish National Infrastructure for Computing. Author Contributions Conceptualisation: RJH, EL; methodology: UR, YZ, BOF, RJH; software: UR, YZ, BOF; validation: UR, YZ, BOF, MC, LY; formal analysis: UR, YZ; investigation: UR, YZ, RJH; resources: MC, RJH, EL; data curation: UR, YZ, RJH, EL; original draft: UR, YZ, RJH; review & editing: UR, YZ, BOF, MC, LY, RJH, EL; visualization: UR, YZ, RJH; supervision: RJH, EL; project administration: MC, RJH; funding acquisition: EL. Conflict of interest The authors declare that they have no conflict of interest. 16 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ References [1] A. Tasneem, L. M. Iyer, E. Jakobsson, and L. Aravind, “Identification of the prokaryotic ligand-gated ion channels and their implications for the mechanisms and origins of animal Cys-loop ion channels,” Genome Biol., vol. 6, p. R4, Dec. 2004, doi: 10.1186/gb-2004-6-1-r4. [2] Á. Nemecz, M. S. Prevost, A. Menny, and P.-J. Corringer, “Emerging Molecular Mechanisms of Signal Transduction in Pentameric Ligand-Gated Ion Channels,” Neuron, vol. 90, no. 3, pp. 452–470, May 2016, doi: 10.1016/j.neuron.2016.03.032. [3] M. Jaiteh, A. Taly, and J. Hénin, “Evolution of Pentameric Ligand-Gated Ion Channels: Pro-Loop Receptors,” PLoS ONE, vol. 11, no. 3, Mar. 2016, doi: 10.1371/journal.pone.0151934. [4] T. Lynagh and S. A. Pless, “Principles of agonist recognition in Cys-loop receptors,” Front. Physiol., vol. 5, p. 160, 2014, doi: 10.3389/fphys.2014.00160. [5] Á. Nemecz, M. S. Prevost, A. Menny, and P.-J. Corringer, “Emerging Molecular Mechanisms of Signal Transduction in Pentameric Ligand-Gated Ion Channels,” Neuron, vol. 90, no. 3, pp. 452–470, May 2016, doi: 10.1016/j.neuron.2016.03.032. [6] C. J. B. daCosta and J. E. Baenziger, “Gating of Pentameric Ligand-Gated Ion Channels: Structural Insights and Ambiguities,” Structure, vol. 21, no. 8, pp. 1271–1283, Aug. 2013, doi: 10.1016/j.str.2013.06.019. [7] M. Gielen and P.-J. Corringer, “The dual-gate model for pentameric ligand-gated ion channels activation and desensitization,” J. Physiol., vol. 596, no. 10, pp. 1873–1902, 15 2018, doi: 10.1113/JP275100. [8] P.-J. Corringer et al., “Atomic structure and dynamics of pentameric ligand-gated ion channels: new insight from bacterial homologues,” J. Physiol., vol. 588, no. 4, pp. 565–572, 2010, doi: 10.1113/jphysiol.2009.183160. [9] N. Bocquet et al., “A prokaryotic proton-gated ion channel from the nicotinic acetylcholine receptor family,” Nature, vol. 445, no. 7123, p. 116, Jan. 2007, doi: 10.1038/nature05371. [10] R. J. C. Hilf and R. Dutzler, “Structure of a potentially open state of a proton-activated pentameric ligand-gated ion channel,” Nature, vol. 457, no. 7225, pp. 115–118, Jan. 2009, doi: 10.1038/nature07461. [11] N. Bocquet et al., “X-ray structure of a pentameric ligand-gated ion channel in an apparently open conformation,” Nature, vol. 457, no. 7225, pp. 111–114, Jan. 2009, doi: 10.1038/nature07462. [12] H. Hu et al., “Electrostatics, proton sensor, and networks governing the gating transition in GLIC, a proton-gated pentameric ion channel,” Proc. Natl. Acad. Sci. U. S. A., vol. 115, no. 52, pp. E12172–E12181, Dec. 2018, doi: 10.1073/pnas.1813378116. [13] R. J. C. Hilf, C. Bertozzi, I. Zimmermann, A. Reiter, D. Trauner, and R. Dutzler, “Structural basis of open channel block in a prokaryotic pentameric ligand-gated ion channel,” Nat. Struct. Mol. Biol., vol. 17, no. 11, pp. 1330–1336, Nov. 2010, doi: 10.1038/nsmb.1933. [14] H. Nury et al., “X-ray structures of general anaesthetics bound to a pentameric ligand-gated ion channel,” Nature, vol. 469, no. 7330, pp. 428–431, Jan. 2011, doi: 10.1038/nature09647. 17 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ [15] J. Pan et al., “Structure of the pentameric ligand-gated ion channel GLIC bound with anesthetic ketamine,” Struct. Lond. Engl. 1993, vol. 20, no. 9, pp. 1463–1469, Sep. 2012, doi: 10.1016/j.str.2012.08.009. [16] L. Sauguet et al., “Structural basis for potentiation by alcohols and anaesthetics in a ligand-gated ion channel,” Nat. Commun., vol. 4, p. ncomms2682, Apr. 2013, doi: 10.1038/ncomms2682. [17] L. Sauguet et al., “Structural basis for ion permeation mechanism in pentameric ligand-gated ion channels,” EMBO J., vol. 32, no. 5, pp. 728–741, Mar. 2013, doi: 10.1038/emboj.2013.17. [18] Z. Fourati, L. Sauguet, and M. Delarue, “Genuine open form of the pentameric ligand-gated ion channel GLIC,” Acta Crystallogr. D Biol. Crystallogr., vol. 71, no. 3, pp. 454–460, Mar. 2015, doi: 10.1107/S1399004714026698. [19] L. Sauguet, Z. Fourati, T. Prangé, M. Delarue, and N. Colloc’h, “Structural Basis for Xenon Inhibition in a Cationic Pentameric Ligand-Gated Ion Channel,” PLOS ONE, vol. 11, no. 2, p. e0149795, Feb. 2016, doi: 10.1371/journal.pone.0149795. [20] B. Laurent, S. Murail, A. Shahsavar, L. Sauguet, M. Delarue, and M. Baaden, “Sites of Anesthetic Inhibitory Action on a Cationic Ligand-Gated Ion Channel,” Structure, vol. 24, no. 4, pp. 595–605, Apr. 2016, doi: 10.1016/j.str.2016.02.014. [21] Z. Fourati et al., “Structural Basis for a Bimodal Allosteric Mechanism of General Anesthetic Modulation in Pentameric Ligand-Gated Ion Channels,” Cell Rep., vol. 23, no. 4, pp. 993–1004, Apr. 2018, doi: 10.1016/j.celrep.2018.03.108. [22] Z. Fourati, L. Sauguet, and M. Delarue, “Structural evidence for the binding of monocarboxylates and dicarboxylates at pharmacologically relevant extracellular sites of a pentameric ligand-gated ion channel,” Acta Crystallogr. Sect. Struct. Biol., vol. 76, no. 7, pp. 668–675, Jul. 2020, doi: 10.1107/S205979832000772X. [23] H. Nury et al., “One-microsecond molecular dynamics simulation of channel gating in a nicotinic receptor homologue,” Proc. Natl. Acad. Sci., vol. 107, no. 14, pp. 6275–6280, Apr. 2010, doi: 10.1073/pnas.1001832107. [24] D. Mowrey, Q. Chen, Y. Liang, J. Liang, Y. Xu, and P. Tang, “Signal Transduction Pathways in the Pentameric Ligand-Gated Ion Channels,” PLOS ONE, vol. 8, no. 5, p. e64326, maj 2013, doi: 10.1371/journal.pone.0064326. [25] G. Gonzalez-Gutierrez, Y. Wang, G. D. Cymes, E. Tajkhorshid, and C. Grosman, “Chasing the open-state structure of pentameric ligand-gated ion channels,” J. Gen. Physiol., p. jgp.201711803, Oct. 2017, doi: 10.1085/jgp.201711803. [26] Á. Nemecz, H. Hu, Z. Fourati, C. Van Renterghem, M. Delarue, and P.-J. Corringer, “Full mutational mapping of titratable residues helps to identify proton-sensors involved in the control of channel gating in the Gloeobacter violaceus pentameric ligand-gated ion channel,” PLoS Biol., vol. 15, no. 12, Dec. 2017, doi: 10.1371/journal.pbio.2004470. [27] S. Basak, N. Schmandt, Y. Gicheru, and S. Chakrapani, “Crystal structure and dynamics of a lipid-induced potential desensitized-state of a pentameric ligand-gated channel,” eLife, vol. 6, 06 2017, doi: 10.7554/eLife.23886. [28] M. S. Prevost et al., “A locally closed conformation of a bacterial pentameric proton-gated ion channel,” Nat. Struct. Mol. Biol., vol. 19, no. 6, p. nsmb.2307, May 2012, doi: 10.1038/nsmb.2307. [29] G. Gonzalez-Gutierrez, L. G. Cuello, S. K. Nair, and C. Grosman, “Gating of the proton-gated ion channel from Gloeobacter violaceus at pH 4 as revealed by 18 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ X-ray crystallography,” Proc. Natl. Acad. Sci. U. S. A., vol. 110, no. 46, pp. 18716–18721, Nov. 2013, doi: 10.1073/pnas.1313156110. [30] C. Bertozzi, I. Zimmermann, S. Engeler, R. J. C. Hilf, and R. Dutzler, “Signal Transduction at the Domain Interface of Prokaryotic Pentameric Ligand-Gated Ion Channels,” PLOS Biol., vol. 14, no. 3, p. e1002393, Mar. 2016, doi: 10.1371/journal.pbio.1002393. [31] Z. Fourati et al., “Barbiturates Bind in the GLIC Ion Channel Pore and Cause Inhibition by Stabilizing a Closed State♦,” J. Biol. Chem., vol. 292, no. 5, pp. 1550–1558, Feb. 2017, doi: 10.1074/jbc.M116.766964. [32] A. J. Thompson, H. A. Lester, and S. C. R. Lummis, “The structural basis of function in Cys-loop receptors,” Q. Rev. Biophys., vol. 43, no. 4, pp. 449–499, Nov. 2010, doi: 10.1017/S0033583510000168. [33] L. Sauguet et al., “Crystal structures of a pentameric ligand-gated ion channel provide a mechanism for activation,” Proc. Natl. Acad. Sci. U. S. A., vol. 111, no. 3, pp. 966–971, Jan. 2014, doi: 10.1073/pnas.1314997111. [34] A. Taly, J. Hénin, J.-P. Changeux, and M. Cecchini, “Allosteric regulation of pentameric ligand-gated ion channels,” Channels, vol. 8, no. 4, pp. 350–360, Jul. 2014, doi: 10.4161/chan.29444. [35] P. Velisetty and S. Chakrapani, “Desensitization Mechanism in Prokaryotic Ligand-gated Ion Channel,” J. Biol. Chem., vol. 287, no. 22, pp. 18467–18477, May 2012, doi: 10.1074/jbc.M112.348045. [36] Y. Ruan et al., “Structural titration of receptor ion channel GLIC gating by HS-AFM,” Proc. Natl. Acad. Sci. U. S. A., vol. 115, no. 41, pp. 10333–10338, Oct. 2018, doi: 10.1073/pnas.1805621115. [37] A. Menny et al., “Identification of a pre-active conformation of a pentameric channel receptor,” eLife, vol. 6, doi: 10.7554/eLife.23955. [38] B. Lev et al., “String method solution of the gating pathways for a pentameric ligand-gated ion channel,” Proc. Natl. Acad. Sci., vol. 114, no. 21, pp. E4158–E4167, May 2017, doi: 10.1073/pnas.1617567114. [39] G. Klesse, S. Rao, M. S. P. Sansom, and S. J. Tucker, “CHAP: A Versatile Tool for the Structural and Functional Annotation of Ion Channel Pores,” J. Mol. Biol., vol. 431, no. 17, pp. 3353–3365, Aug. 2019, doi: 10.1016/j.jmb.2019.06.003. [40] P. Velisetty, S. V. Chalamalasetti, and S. Chakrapani, “Structural basis for allosteric coupling at the membrane-protein interface in GLIC,” J. Biol. Chem., p. jbc.M113.523050, Dec. 2013, doi: 10.1074/jbc.M113.523050. [41] M. Nys, D. Kesters, and C. Ulens, “Structural insights into Cys-loop receptor function and ligand recognition,” Biochem. Pharmacol., vol. 86, no. 8, pp. 1042–1053, Oct. 2013, doi: 10.1016/j.bcp.2013.07.001. [42] R. J. Howard et al., “Structural basis for alcohol modulation of a pentameric ligand-gated ion channel,” Proc. Natl. Acad. Sci. U. S. A., vol. 108, no. 29, pp. 12149–12154, Jul. 2011, doi: 10.1073/pnas.1104480108. [43] C. D. Dellisanti, S. M. Hanson, L. Chen, and C. Czajkowski, “Packing of the extracellular domain hydrophobic core has evolved to facilitate pentameric ligand-gated ion channel function,” J. Biol. Chem., vol. 286, no. 5, pp. 3658–3670, Feb. 2011, doi: 10.1074/jbc.M110.156851. [44] C. F. Hryc et al., “Accurate model annotation of a near-atomic resolution cryo-EM map,” Proc. Natl. Acad. Sci., Mar. 2017, doi: 10.1073/pnas.1621152114. 19 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ [45] R. Lape, D. Colquhoun, and L. G. Sivilotti, “On the nature of partial agonism in the nicotinic receptor superfamily,” Nature, vol. 454, no. 7205, pp. 722–727, Aug. 2008, doi: 10.1038/nature07139. [46] N. Mukhtasimova, W. Y. Lee, H.-L. Wang, and S. M. Sine, “Detection and trapping of intermediate states priming nicotinic receptor channel opening,” Nature, vol. 459, no. 7245, p. 451, May 2009, doi: 10.1038/nature07923. [47] C. Carignano, E. P. Barila, and G. Spitzmaul, “Analysis of neuronal nicotinic acetylcholine receptor α4β2 activation at the single-channel level,” Biochim. Biophys. Acta, vol. 1858, no. 9, pp. 1964–1973, Sep. 2016, doi: 10.1016/j.bbamem.2016.05.019. [48] A. L. Germann, S. R. Pierce, T. C. Senneff, A. B. Burbridge, J. H. Steinbach, and G. Akk, “Steady-state activation and modulation of the synaptic-type α1β2γ2L GABAA receptor by combinations of physiological and clinical ligands,” Physiol. Rep., vol. 7, no. 18, p. e14230, 2019, doi: https://doi.org/10.14814/phy2.14230. [49] P. Kumar et al., “Cryo-EM structures of a lipid-sensitive pentameric ligand-gated ion channel embedded in a phosphatidylcholine-only bilayer,” Proc. Natl. Acad. Sci., vol. 117, no. 3, pp. 1788–1798, Jan. 2020, doi: 10.1073/pnas.1906823117. [50] T. Althoff, R. E. Hibbs, S. Banerjee, and E. Gouaux, “X-ray structures of GluCl in apo states reveal a gating mechanism of Cys-loop receptors,” Nature, vol. 512, no. 7514, pp. 333–337, Aug. 2014, doi: 10.1038/nature13669. [51] R. E. Hibbs and E. Gouaux, “Principles of activation and permeation in an anion-selective Cys-loop receptor,” Nature, vol. 474, no. 7349, pp. 54–60, Jun. 2011, doi: 10.1038/nature10139. [52] A. Kumar et al., “Mechanisms of activation and desensitization of full-length glycine receptor in lipid nanodiscs,” Nat. Commun., vol. 11, no. 1, p. 3752, Jul. 2020, doi: 10.1038/s41467-020-17364-5. [53] M. M. Rahman et al., “Structure of the Native Muscle-type Nicotinic Receptor and Inhibition by Snake Venom Toxins,” Neuron, vol. 106, no. 6, pp. 952-962.e5, Jun. 2020, doi: 10.1016/j.neuron.2020.03.012. [54] A. Gharpure et al., “Agonist Selectivity and Ion Permeation in the α3β4 Ganglionic Nicotinic Receptor,” Neuron, vol. 104, no. 3, pp. 501-511.e6, Nov. 2019, doi: 10.1016/j.neuron.2019.07.030. [55] H. Hu, R. J. Howard, U. Bastolla, E. Lindahl, and M. Delarue, “Structural basis for allosteric transitions of a multidomain pentameric ligand-gated ion channel,” Proc. Natl. Acad. Sci., vol. 117, no. 24, pp. 13437–13446, Jun. 2020, doi: 10.1073/pnas.1922701117. [56] J. J. Kim et al., “Shared structural mechanisms of general anaesthetics and benzodiazepines,” Nature, vol. 585, no. 7824, pp. 303–308, Sep. 2020, doi: 10.1038/s41586-020-2654-5. [57] S. Basak, Y. Gicheru, S. Rao, M. S. P. Sansom, and S. Chakrapani, “Cryo-EM reveals two distinct serotonin-bound conformations of full-length 5-HT 3A receptor,” Nature, vol. 563, no. 7730, p. 270, Nov. 2018, doi: 10.1038/s41586-018-0660-7. [58] L. Polovinkin et al., “Conformational transitions of the serotonin 5-HT3 receptor,” Nature, vol. 563, no. 7730, pp. 275–279, Nov. 2018, doi: 10.1038/s41586-018-0672-3. [59] S. Basak et al., “Cryo-EM structure of 5-HT3A receptor in its resting conformation,” Nat. Commun., vol. 9, no. 1, Dec. 2018, doi: 20 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ 10.1038/s41467-018-02997-4. [60] S. Q. Zheng, E. Palovcak, J.-P. Armache, K. A. Verba, Y. Cheng, and D. A. Agard, “MotionCor2 - anisotropic correction of beam-induced motion for improved cryo-electron microscopy,” Nat. Methods, vol. 14, no. 4, pp. 331–332, Apr. 2017, doi: 10.1038/nmeth.4193. [61] J. Zivanov et al., “New tools for automated high-resolution cryo-EM structure determination in RELION-3,” eLife, vol. 7, p. e42166, Nov. 2018, doi: 10.7554/eLife.42166. [62] A. Rohou and N. Grigorieff, “CTFFIND4: Fast and accurate defocus estimation from electron micrographs,” J. Struct. Biol., vol. 192, no. 2, pp. 216–221, Nov. 2015, doi: 10.1016/j.jsb.2015.08.008. [63] P. D. Adams et al., “PHENIX : a comprehensive Python-based system for macromolecular structure solution,” Acta Crystallogr. D Biol. Crystallogr., vol. 66, no. 2, pp. 213–221, Feb. 2010, doi: 10.1107/S0907444909052925. [64] T. C. Terwilliger, S. J. Ludtke, R. J. Read, P. D. Adams, and P. V. Afonine, “Improvement of cryo-EM maps by density modification,” Nat. Methods, vol. 17, no. 9, Art. no. 9, Sep. 2020, doi: 10.1038/s41592-020-0914-9. [65] P. Emsley and K. Cowtan, “Coot : model-building tools for molecular graphics,” Acta Crystallogr. D Biol. Crystallogr., vol. 60, no. 12, pp. 2126–2132, Dec. 2004, doi: 10.1107/S0907444904019158. [66] E. F. Pettersen et al., “UCSF Chimera--a visualization system for exploratory research and analysis,” J. Comput. Chem., vol. 25, no. 13, pp. 1605–1612, Oct. 2004, doi: 10.1002/jcc.20084. [67] K. Lindorff-Larsen et al., “Improved side-chain torsion potentials for the Amber ff99SB protein force field,” Proteins, vol. 78, no. 8, pp. 1950–1958, Jun. 2010, doi: 10.1002/prot.22711. [68] O. Berger, O. Edholm, and F. Jähnig, “Molecular dynamics simulations of a fluid bilayer of dipalmitoylphosphatidylcholine at full hydration, constant pressure, and constant temperature.,” Biophys. J., vol. 72, no. 5, pp. 2002–2013, May 1997. [69] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, “Comparison of simple potential functions for simulating liquid water,” J. Chem. Phys., vol. 79, no. 2, pp. 926–935, Jul. 1983, doi: 10.1063/1.445869. [70] M. J. Abraham et al., “GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers,” SoftwareX, vol. 1–2, pp. 19–25, Sep. 2015, doi: 10.1016/j.softx.2015.06.001. [71] G. Bussi, D. Donadio, and M. Parrinello, “Canonical sampling through velocity rescaling,” J. Chem. Phys., vol. 126, no. 1, p. 014101, Jan. 2007, doi: 10.1063/1.2408420. [72] B. Hess, “P-LINCS: A Parallel Linear Constraint Solver for Molecular Simulation,” J. Chem. Theory Comput., vol. 4, no. 1, pp. 116–122, Jan. 2008, doi: 10.1021/ct700200b. [73] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen, “A smooth particle mesh Ewald method,” J. Chem. Phys., vol. 103, no. 19, pp. 8577–8593, Nov. 1995, doi: 10.1063/1.470117. [74] M. Parrinello and A. Rahman, “Crystal Structure and Pair Potentials: A Molecular-Dynamics Study,” Phys. Rev. Lett., vol. 45, no. 14, pp. 1196–1199, Oct. 1980, doi: 10.1103/PhysRevLett.45.1196. 21 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ [75] W. Humphrey, A. Dalke, and K. Schulten, “VMD: Visual molecular dynamics,” J. Mol. Graph., vol. 14, no. 1, pp. 33–38, Feb. 1996, doi: 10.1016/0263-7855(96)00018-5. [76] R. T. McGibbon et al., “MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories,” Biophys. J., vol. 109, no. 8, pp. 1528–1532, Oct. 2015, doi: 10.1016/j.bpj.2015.08.015. 22 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Figure Legends Figure 1: Differential resolution of GLIC cryo-EM structures with varying pH. A. Cartoon representations of GLIC, viewed from the membrane plane (top) or from the extracellular side (bottom). Pentameric rings represent the connected extracellular (ECD, light gray) and transmembrane (TMD, medium gray) domains, with the latter embedded in a lipid bilayer (gradient) and surrounding a membrane-spanning pore formed by the second helix from each subunit (M2, dark gray). B. Cryo-EM density for the majority class (state 1) at pH 7 to 4.1 Å overall resolution, viewed as in panel A from the membrane plane (top) or from the extracellular side (bottom). Density is colored by local resolution according to scale bar at far right, and contoured at both high (left) and low threshold (right) to reveal fine and coarse detail, respectively. C. Density viewed as in panel B for state 1 at pH 5, reconstructed to 3.4 Å overall resolution. D. Density as in panel B for state 1 at pH 3, reconstructed to 3.6 Å overall resolution. Figure 2: Sidechain rearrangements at subunit interfaces in low-pH structures. A. Overlay of predominant (state-1) GLIC cryo-EM structures at pH 7 (blue), pH 5 (green), and pH 3 (lavender), aligned on the full pentamer. Two adjacent subunits are viewed as ribbons from the channel pore, showing key motifs including the β1–β2 and Pro loops and M1–M4 helices from the principal subunit (P), and loop F from the complementary subunit (C). B. Zoom views of the upper gray-boxed region in panel A, showing cryo-EM densities (mesh at σ = 0.25) and sidechain atoms (sticks, colored by heteroatom) around the intersubunit ECD interface between a single principal β1–β2 loop and complementary loop F at each pH. As indicated by dotted circles, sidechains including β1–β2 residues K33 and E35 could not be definitively built at pH 7 (left) or pH 5 (center), but were better resolved at pH 3 (right), including a possible hydrogen bond between E35 and T158 (dashed line, 3.2 Å). C. Zoom views of the black-boxed region in panel A, showing key sidechains (sticks, colored by heteroatom) at the domain interface between one principal β1–β2, pre-M1, and M2–M3 region, and the complementary loop-F and M2 region. Dotted circles indicate sidechains that could not be definitively built in the corresponding conditions; dashed lines indicate possible hydrogen bonds 23 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ implicated here in proton-stimulated conformational cycling. Residues contributing to a conserved electrostatic network at the domain interface (D32, R192, Y197) are also shown. D. Zoom views of the lower gray-boxed region in panel A, showing cryo-EM densities (mesh) and sidechain atoms (sticks, colored by heteroatom) around the intersubunit TMD interface between principal and complementary M2–M3 regions at each pH. A potential hydrogen bond between E243 and K248 at pH 7 (left, dashed line, 3.1 Å) is disrupted at pH 5 (center) and pH 3 (right), allowing K248 to reorient towards the subunit interface. Figure 3: Remodeled electrostatic contacts revealed by molecular dynamics. A. Zoom views as in Fig 2B of the ECD interface between a single principal (P, right) β1–β2 loop and complementary (C, left) loop F (lavender ribbons) in representative snapshots from MD simulations of the pH-3 (state-1) cryo-EM structure, with sidechains modified to approximate resting (deprotonated, top) or activating (protonated, bottom) conditions. Depicted residues and proximal ions (sticks, colored by heteroatom) show deprotonated E35 in contact with Na +, while protonated E35 interacts with T158. B. Charge contacts between E35 and environmental Na + ions in simulations under deprotonated (solid) but not protonated (striped) conditions of state-1 cryo-EM structures determined at pH 7 (blue), pH 5 (green), or pH 3 (lavender). Histograms represent median ± 95 % confidence interval (CI) over all simulations in the corresponding condition. Horizontal bars represent median ± CI values for simulations of resting (gray) or open (black) X-ray structures. C. Histograms as in panel B showing intersubunit Cα-distances between E35 and T158, which decrease in protonated (striped) versus deprotonated (solid) conditions. D. Zoom views as in Fig 2D of the TMD interface between principal (P, right) and complementary (C, left) M2–M3 loops (lavender ribbons) in representative snapshots from simulations of the pH-3 (state-1) cryo-EM structure. Depicted residues (sticks, colored by heteroatom) show K248 oriented down towards E243 in deprotonated conditions (top), but out towards the subunit interface in protonated conditions (bottom). E. Histograms as in panel B showing electrostatic contacts between E243 and K248, which decrease in pH-3 (lavender) versus pH-7 (blue) and pH-5 structures (green), and in protonated (striped) versus deprotonated (solid) simulation conditions. F. Principal component (PC) analysis of M2–M3 loop motions in simulations under deprotonated (top) or protonated conditions (bottom) of state-1 cryo-EM structures determined at pH 7 (blue), pH 5 (green), and pH 3 (lavender). For comparison, simulations of previous resting (gray) and open (black) 24 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ X-ray structures are shown at right, and open-structure results are superimposed in each panel. Inset cartoons illustrate structural transitions associated with dominant PCs (blue–lavender from negative to positive values), representing flipping of residue K248 (PC1) and stretching of the M2–M3 loop (PC2). Figure 4: Minority classes suggest alternative states. A. Overlay as in Figure 2A of state-1 (lavender) and state-2 (purple) GLIC cryo-EM structures, along with apparent resting (white, PDB ID: 4NPQ) and open (gray, PDB ID: 4HFI) X-ray structures, aligned on the full pentamer. Adjacent principal (P) and complementary (C) subunits are viewed as ribbons from the channel pore. B. Zoom views of the black-boxed region in panel A, showing key motifs at the domain interface between one principal β1-β2, pre-M1, and M2–M3 region, and the complementary loop-F and M2 region, for resting (white) and open (gray) X-ray structures overlaid with pH-3 cryo-EM state 1 (top, lavender) or state 2 (bottom, purple). C. Zoom views as in panel B, showing cryo-EM densities (mesh) and backbone ribbons for pH-3 state 1 (top, lavender) or state 2 (bottom, purple). D. Pore profiles [39] representing Cα radii for pH-3 cryo-EM state-1 (lavender) and state-2 (purple) structures, open X-ray (black) structure, and quadruplicate 1-μs MD simulations of the open X-ray model (median, dashed black; 95 % confidence interval, gray). Figure 5: Protonation and activation in GLIC pH gating. A. Cartoon of the GLIC resting state, corresponding to a deprotonated closed conformation, as represented by the predominant cryo-EM structure at pH 7. Views are of the full protein (top) from the membrane plane, and of the ECD (middle) and TMD (bottom) from the extracellular side, showing key motifs at two opposing subunit interfaces including the principal β1–β2 (green) and M2–M3 loops (blue), complementary F (purple) and β5–β6 (dark gray) loops, and the remainder of the protein in light gray. By the model proposed here, under resting conditions the key acidic residue E35 (green circles) in the β1–β2 loop is deprotonated, and involved in transient interactions with environmental cations (e.g. Na +, black circles). Flexibility of the corresponding ECD is indicated by motion lines, associated with relatively low resolution by cryo-EM and high RMSD ibn MD simulations. In parallel, deprotonated E243 (light blue circles) in the M2 helix attracts K248 (dark blue circles) in the M2–M3 loop, maintaining a contracted upper pore. 25 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ B. Cartoon as in panel A, showing a protonated but still closed conformation, as represented by the predominant cryo-EM structure at pH 3. In the ECD, protonation of E35 releases environmental cations and enables it instead to form a stabilizing contact with the complementary subunit via T158 (purple circles) in loop F, associated with partial rigidification of the ECD. In the TMD, protonation of E243 releases K248, allowing it to orient outward/upward towards the subunit/domain interface. C. Cartoon as in panel A, showing the putative protonated open state, as represented by previous open X-ray structures. Key sidechains (E35, T158, E243, K248) are arranged similar to the protonated closed state, accompanied by general contraction of the ECD including loop F, expansion of the upper TMD including the M2–M3 loop, and opening of the ion conduction pathway. 26 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Table Table 1: Cryo-EM data processing and model building statistics. 27 Data collection and processing pH 7 data set pH 5 data set pH 3 data set Microscope FEI Titan Krios FEI Titan Krios FEI Titan Krios Magnification 165,000 165,000 165,000 Voltage (kV) 300 300 300 Electron exposure (e - /Å2 ) ~ 50 ~ 50 ~ 50 Defocus range (μm) 2.0 – 3.8 2.0 – 3.8 2.0 – 3.8 Pixel size (Å) 0.82 0.82 0.83 Symmetry imposed C5 C5 C5 Number of images ~ 5300 ~ 7000 ~ 6400 Particles picked ~ 700,000 ~ 1 million ~ 690,000 Particles refined 86,201 351,643 214,463 Refinement Initial model used 4NPQ 4NPQ 4NPQ Resolution (Å) 4.1 3.4 3.6 FSC threshold 0.143 0.143 0.143 Map sharpening B-factor - 278 -223 - 225 Model composition Non-hydrogen protein atoms 10,175 11,555 11,630 Protein residues 1440 1540 1535 Ligands 0 0 0 B-factor (Å2 ) 57 20 34 RMSD Bond Lengths (Å) 0.006 0.005 0.006 Bond angles (º) 0.616 0.599 0.664 Validation Molprobity score 1.87 1.93 1.77 Clashscore 10.73 9.66 6.12 Poor rotamers (%) 0 0 0 Ramachandran plot Favored (%) 95.4 93.7 93.4 Allowed (%) 4.6 6.3 6.6 Outliers (%) 0 0 0 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Expanded View Figure Legends Figure EV1: Cryo-EM image-processing pipeline. A. Representative micrograph from grid screening on a Falcon-3 detector (Talos-Arctica), showing detergent-solubilized GLIC particles. B. Representative 2D class averages at 0.82 Å/px in a 256 x 256 pixel box and a 180-Å mask. C. Overview of cryo-EM processing pipelines for data collected at pH 7 (blue), pH 5 (green), and pH 3 (lavender) (see Methods). Figure EV2: Cryo-EM densities in α-helical and β-strand regions. A. Density (mesh) and corresponding atomic model (sticks, colored by heteroatom) for the M2 helix (E222–E243) at pH 7 (blue, left), pH 5 (green, center), and pH 3 (lavender, right). B. Density and corresponding model, shown as in panel A, for the β7 strand (P120–I128). Sidechains that could not be definitively built at pH 7 (D122, Q124, L126) are represented by Cβ atoms. Figure EV3: Interfacial rearrangements in previous X-ray structures. A. Overlay as in Figure 2A of previous X-ray structures crystallized under resting (white, PDB ID: 4NPQ) and activating (gray, PDB ID: 4HFI) conditions. Two adjacent subunits are viewed as ribbons from the channel pore, showing key motifs including the β1–β2 and Pro loops and M1–M4 helices from the principal subunit (P), and loop F from the complementary subunit (C). B. Zoom views as in Figure 2C of the black-boxed region in panel A, showing key sidechains (sticks, colored by heteroatom) at a single domain interface in resting (white, left) and open (gray, right) X-ray structures. Dotted circle indicates the sidechain of K33, which could not be definitively built in resting conditions. Center panel shows major backbone transitions from overlaid resting to open states (orange arrows). 28 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ Figure EV4: ECD flexibility in closed-pore simulations. A. Root mean-squared deviations (RMSDs) over time for Cα-atoms of the ECD (solid) and TMD (dotted) in four replicate 1-μs MD simulations of cryo-EM structures determined at pH 7 (blue), pH 5 (green), and pH 3 (lavender). Simulations were performed with sidechain charges approximating resting (deprotonated, top) or activating (protonated, bottom) conditions [14]. Reference simulations of resting (gray, top) and open (black, bottom) X-ray structures are shown at right. B. Hydration at the hydrophobic gate during simulations under deprotonated (solid) or protonated (striped) conditions as depicted in panel A, quantified by water occupancy between I233 (I9’) and A237 (A13’) in the channel pore. Histograms represent median ± 95 % confidence interval (CI) over all simulations in the corresponding condition. Figure EV5: Contraction and untwisting of the ECD in pH-3 state 2. A. Views as in Figure 1B of pH-3 state 1 (lavender) and state 2 (purple) cryo-EM densities, shown from the membrane plane (left) or extracellular side (right). Arrows represent inward contraction and counter-clockwise untwisting of the ECD in state 2 relative to state 1. B. Histograms indicating parallel trends in ECD contraction (left) and untwisting (right) from resting (gray) to open (black) X-ray structures, and from pH-3 state-1 (lavender) to state-2 (purple) cryo-EM structures. 29 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ ECD TMD ECD TMD ECD TMD M2 ECD TMD M2 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ A B CDeprotonated Closed Protonated OpenProtonated Closed + + + + + + + + + + Figure 5 ECD TMD M2 β1–β2 + – + – + – + – F TMD M2 ECD + – β1–β2 + – F –+ + – M2–M3 M2–M3 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425171doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425171 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_04_425177 ---- The engineered peptide construct NCAM1-Aβ inhibits aggregation of the human prion protein (PrP) 1 The engineered peptide construct NCAM1-Aβ inhibits aggregation of the human prion protein (PrP) Maciej Gielnik 1, Lilia Zhukova 2, Igor Zhukov 2, Astrid Gräslund 3, Maciej Kozak 1,4, Sebastian K.T.S. Wärmländer 3,* 1 Department of Macromolecular Physics, Adam Mickiewicz University, Poznań, Poland; maciejgielnik@amu.edu.pl (M.G.); mkozak@amu.edu.pl (M.K.) 2 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warszawa, Poland; lilia@ibb.waw.pl (L.Z.); igor@ibb.waw.pl (I.Z.) 3 Department of Biochemistry and Biophysics, Arrhenius Laboratories, Stockholm University, 106 91 Stockholm, Sweden; astrid@dbb.su.se (A.G.); seb@dbb.su.se (S.W.) 4 National Synchrotron Radiation Centre SOLARIS, Jagiellonian University, Kraków, Poland. * Correspondence: seb@dbb.su.se; Tel.: +46-8-16 24 44 Abstract: In prion diseases, the prion protein (PrP) becomes misfolded and forms fibrillar aggregates, which are resistant to proteinase degradation and become responsible for prion infectivity and pathology. So far, no drug or treatment procedures have been approved for prion disease treatment. We have previously shown that engineered cell-penetrating peptide constructs can reduce the amount of prion aggregates in infected cells. The molecular mechanisms underlying this effect are however unknown. Here, we use atomic force microscopy (AFM) imaging to show that the aggregation of the human PrP protein can be inhibited by equimolar amounts of the 25 residues long engineered peptide construct NCAM1-Aβ. Keywords: Creutzfeldt-Jakob disease; AFM imaging; amyloid; drug design; drug transport; protein-peptide binding .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 1. Introduction Prion and amyloid diseases are both characterized by aggregation of misfolded proteins or peptides (Jaunmuktane and Brandner, 2019; Miller, 2009; Sengupta and Udgaonkar, 2018; Verma et al., 2015), such as the prion (PrP) protein (Creutzfeldt- Jakob disease), α-synuclein (Parkinson’s disease), and amyloid-β (Aβ) and tau (Alzheimer’s disease). Many of these proteins and peptides may co-aggregate or at least influence each other’s aggregation (Luo et al., 2016, 2017; Ren et al., 2019; Wallin et al., 2018). Factors that modulate the aggregation of one of these proteins, such as small molecules, potential drug compounds, lipids, and metal ions, can often modulate also the aggregation processes of other proteins in this family (Ambadi Thody et al., 2018; Chemerovski-Glikman et al., 2016; Gielnik et al., 2019; Owen et al., 2019; Richman et al., 2013; Robinson and Pinheiro, 2010; Wallin et al., 2017; Wärmländer et al., 2013; Wärmländer et al., 2019; Österlund et al., 2018). This suggests that the underlying mechanisms may be the same in prion and amyloid diseases (Jaunmuktane and Brandner, 2019; Jucker and Walker, 2018; Miller, 2009). Prion aggregates are however particularly infectious, as they spread between cells (Jaunmuktane and Brandner, 2019; Jucker and Walker, 2018), and are not degraded by cellular processes such as proteinase digestion (Jaunmuktane and Brandner, 2019; Löfgren et al., 2008; Söderberg et al., 2014). The toxic species in amyloid and prion diseases are generally considered to be small toxic oligomeric aggregates (Sengupta and Udgaonkar, 2018; Verma et al., 2015), but so far no drugs or treatments that target such aggregates have been approved against prion diseases (Hyeon et al., 2020; Lee et al., 2019; Mashima et al., 2020). Potential drug molecules may interfere with oligomer formation in various ways: by reducing production of the protein, by inhibiting its aggregation, by diverting the aggregation pathway(s) towards non-toxic forms, or by reducing the lifetime of the toxic forms, for example by promoting rapid aggregation into larger non-toxic aggregates. We have previously demonstrated anti-prion properties in short peptide constructs (up to 30 residues) with sequences derived from the unprocessed N-termini of mouse and .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 bovine prion proteins: such PrP-derived peptides induced lower amounts of prion aggregates resistant to proteinase K in prion-infected cells (Löfgren et al., 2008; Söderberg et al., 2014). The PrP-derived peptides consisted of an N-terminal signal peptide segment (different for mouse and bovine PrP), together with a conserved positively charged and hydrophobic hexapeptide (KKRPKP) corresponding to the first six residues of the processed PrP protein. Our earlier studies had shown that peptides with such sequences were able to interact with and penetrate cell membranes (Lundberg et al., 2002; Magzoub et al., 2005; Magzoub et al., 2006; Oglecka et al., 2008). The anti- prion effects of the PrP-derived peptides were lost when the KKRPKP hexapeptide was coupled to various peptides with cell-penetrating properties (Söderberg et al., 2014). The anti-prion effects were however retained when KKRPKP was coupled to the signal sequence of the Neural cell adhesion molecule-1 (i.e., NCAM11-19) (Söderberg et al., 2014). The mouse PrP1-28 segment and the NCAM11-19-KKRPKP construct are both amyloidogenic in themselves, as they form amyloid fibrils by self-aggregation (Mukundan et al., 2017; Pansieri et al., 2019). The NCAM11-19-KKRPKP construct was recently shown to inhibit aggregation of the amyloid-β peptide involved in Alzheimer’s disease (Henning-Knechtel et al., 2020), and to promote in vitro aggregation of the amyloid protein S100A9 (Pansieri et al., 2019), which is involved in amyloid-related and other inflammatory processes (Horvath et al., 2018; Wang et al., 2019; Wang et al., 2014). Almost identical results were obtained for a similar amyloidogenic 25 residue-construct, i.e. NCAM11-19-KKLVFF (from here onwards: NCAM1-Aβ) (Pansieri et al., 2019). The KLVFF sequence originates from the hydrophobic core (residues 16-20) of the Aβ peptide: this pentapeptide is known to inhibit aggregation of the full-length Aβ peptide (Tjernberg et al., 1996). In the NCAM1-Aβ construct, an additional lysine residue was added to the KLVFF sequence for increased solubility (Pansieri et al., 2019). The molecular properties of the NCAM1-Aβ sequence and its segments are shown in Table 1, including .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 hydrophobicity values calculated according to the Wimley-White whole residue hydrophobicity scale (Wang et al., 2016; Wimley and White, 1996). As the NCAM1-Aβ construct inhibits fibrillation of the Aβ peptide (Henning- Knechtel et al., 2020), but promotes (co-)aggregation of the S100A9 protein (Pansieri et al., 2019), it is unclear how the construct may affect the aggregation of the PrP protein (if at all). Here, we use Atomic Force Microscopy (AFM) imaging to investigate if there is a direct effect of the NCAM1-Aβ construct on the in vitro aggregation of the human PrP protein. Answering this question might help clarify the mechanisms underlying the previously observed beneficial effects of such peptide constructs on PrP infectivity (Löfgren et al., 2008; Söderberg et al., 2014). Table 1. Primary sequences and molecular properties of the human PrP protein, the NCAM1-Aβ peptide construct, and its parts. Protein Sequence Isoelectric point (pI) Molecular weight [g mol-1] Net charge at pH 7 Theoretical hydrophobicity [kcal mol-1] huPrP23-231 UniProt ID: P04156 (209 aa) 9.39 22747 +7 - NCAM11-19-K- Aβ16-20 (NCAM1- Aβ) NH2-MLRTKDLIWTL FFLGTAVSKKLVFF- NH2 11.67 2974.7 +4 -3.83 NCAM11-19 (NCAM1) NH2-MLRTKDLIWTL FFLGTAVS-NH2 11.39 2211.7 +2 -3.06 KKLVFF NH2-KKLVFF-COOH 10.69 781 +2 -0.77 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 2. Materials and Methods 2.1 Sample preparation Human recombinant prion protein (huPrP) was prepared according to a previously published protocol (Morillas et al., 1999; Zahn et al., 1997), albeit with some modifications. The plasmid contained the full-length (23-231)huPrP protein in fusion with an N-terminal HisTag, and the thrombin cleavage site was cloned into the pRSETB vector (Invitrogen, USA). The construct was expressed in E. Coli (BL21- DE3) grown in LB growth medium with 100 µg/mL ampicillin. Expression was induced by isopropyl β-D-galactopyranoside (IPTG) at OD600 = 0.8. Sonication of the lysates was performed in a buffer containing 100 mM Tris at pH 8, 10 mM K2HPO4, 10 mM glutathione (GSH), 6 M GuHCl, and 0.5 mM phenylmethane sulfonyl fluoride (PMSF). The solution was centrifuged and the supernatant loaded to Ni-NTA resin (GE Healthcare) and eluted with buffer E (100 mM Tris at pH 5.8, 10 mM K2HPO4, and 500 mM imidazole). After washing the resin, the protein was purified with two- step dialysis, initially against 10 mM phosphate buffer with 0.1 mM PMSF at pH 5.8, and then against Milli-Q H2O with 0.1 mM PMSF. After thrombin cleavage, the pure huPrP protein (i.e., with the HisTag removed) was concentrated using an Amicon Ultra 0.5 ml centrifugal filter (Merck & Co., USA) with an NMWL cutoff of 3 kDa. The final protein concentration was determined by spectrophotometry using an extinction coefficient of ε280 = 57995 M -1cm-1 (Gasteiger et al., 2005). The quality of the final protein was controlled by mass spectrometry (molecular mass 22747 Da - Table 1). The NCAM1-Aβ peptide (Table 1) was purchased as a custom order from the PolyPeptide Group (France) in lyophilized form. The peptide was dissolved in Milli-Q water, and its concentration was determined via triplicate UV absorption measurements at 280 nm, using a DS-11 spectrophotometer (DeNovix, USA) and an extinction coefficient of ε280 = 5500 M -1cm-1 (Gasteiger et al., 2005). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 2.2 Sample incubation The initial buffer in the huPrP solution was exchanged to ultrapure water by triplicate diafiltration using an Amicon Ultra 0.5 ml centrifugal filter (Merck & Co., USA) with an NMWL cutoff of 3 kDa. Samples of 0.5 µM NCAM1-Aβ, 2.5 µM NCAM1-Aβ, 0.5 µM huPrP, and 0.5 µM NCAM1-Aβ + 0.5 µM huPrP were then prepared in 10 mM sodium phosphate buffer, pH 7.5, with 100 mM NaCl and 2 M urea. The urea was added as it has previously been shown to promote unfolding of the native PrP structure, which is the first step towards aggregation (Julien et al., 2009; Swietnicki et al., 2000). The samples were incubated for 72 hours at 50 ℃ with magnetic stirring at 400 rpm. Subsamples were taken out for AFM imaging (below) after 8 and 72 hours, respectively. 2.3 Atomic force microscopy (AFM) imaging Incubated samples (5 μl) were transferred to freshly cleaved mica plates and left to absorb for 1 min, rinsed three times with 300 μl of pure water, and then dried under a gentle flow of nitrogen. AFM imaging was performed on a JPK Nanowizard 4 (Bruker, Germany) AFM unit using Tap150Al-G cantilevers (Ted Pella Inc., USA) in air intermittent contact mode. The scan rate was 0.3 - 0.7 Hz, the scan area size was 5 μm x 5 μm or 10 μm x 10 μm, with 512 x 512 or 1024 x 1024 pixel resolution respectively. The AFM images were analyzed using the Gwyddion 2.54 software (Necas and Klapetek, 2012). 3. Results and Discussion AFM images of the aggregation products present in the samples after 8 hours of incubation are shown in Figs. 1A-D. The sample of 0.5 µM huPrP readily self- aggregated into long fibrils (Fig. 1A) that are approximately 3 - 4 nm thick (judged by their measured height, as width is not accurately represented in AFM images). This is somewhat thinner but still in line with the results of previous studies on PrP fibrils (Terry and Wadsworth, 2019; Vazquez-Fernandez et al., 2017; Yamaguchi and Kuwata, 2018). A few very large aggregate clumps, over 10 nm high, can also be seen (Fig. 1A). For NCAM1-Aβ, the 0.5 µM sample shows small aggregate clumps (Fig .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 1B). Some of them are relatively large, with heights over 6 nm, and may or may not be early stages of fibrillar aggregates (Luo et al., 2014). The 2.5 µM NCAM1-Aβ sample shows numerous mature fibrils, about 2 – 3 nm high, together with aggregate clumps (Fig. 1C). The more abundant amount of fibrils for 2.5 µM of NCAM1-Aβ confirms earlier results showing that NCAM1-Aβ self-aggregates faster at higher concentrations (Pansieri et al., 2019). Figure 1. AFM images of: (A) 0.5 µM huPrP protein; (B) 0.5 µM NCAM1-Aβ peptide; (C) 2.5 µM NCAM1-Aβ peptide; and (D) 0.5 µM huPrP protein + 0.5 µM NCAM1-Aβ peptide. All samples in A-D were incubated for 8 hours. (E) 0.5 µM huPrP protein + 0.5 µM NCAM1-Aβ peptide, incubated for 72 hours. All studied samples were incubated at 50 ℃ in 10 mM sodium phosphate buffer, pH 7.5, with 100 mM NaCl and 2 M urea, and with magnetic stirring at 400 rpm. The white scale bars are 500 nm. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 Interestingly, the sample containing both 0.5 µM NCAM1-Aβ and 0.5 µM huPrP displays no fibrils, but only numerous small aggregate clumps, about 2 nm high (Fig. 1D). Even after 72 hours no fibrils can be seen, but the aggregate clumps are then fewer and larger, around 3 – 4 nm high (Fig 1E). As it cannot be ruled out that these small aggregate clumps will eventually form fibrils, it is not possible to tell if fibrillation is completely inhibited, or if the fibrillation rate merely is significantly reduced. Nonetheless, the absence of fibrillar aggregates of huPrP in the presence of equimolar concentrations of NCAM1-Aβ clearly shows that the peptide construct directly interacts with the huPrP protein and interferes with its aggregation. As both molecules are positively charged (Table 1), it stands to reason that they interact mainly via hydrophobic forces. The aggregation-inhibiting effect of NCAM1-Aβ (Fig. 1) appears to provide an explanation, at a molecular level, to our earlier observations that such peptide constructs significantly reduce the levels of prion aggregates in prion-infected cells (Löfgren et al., 2008; Söderberg et al., 2014). As both the NCAM1-Aβ peptide and the huPrP protein can form amyloid fibrils by themselves (Figs. 1A and 1C), the two molecules may interact via cross-aggregation, to form smaller non-fibrillar co- aggregates (Fig. 1E) that could be less toxic than pure huPrP aggregates (Luo et al., 2016, 2017). If so, the huPrP/NCAM1-Aβ interactions would be similar to the interactions between Aβ and NCAM1-Aβ (Henning-Knechtel et al., 2020). In any case, the huPrP/NCAM1-Aβ interactions are very different from the interactions between NCAM1-Aβ and S100A9 protein, where amyloid aggregation is promoted (Pansieri et al., 2019). Because the NCAM1-Aβ construct has different effects on different aggregating proteins, it would be interesting to study how this construct might affect the aggregation of other disease-related prion proteins, such as those involved in animal diseases like bovine spongiform encephalopathy (BSE), chronic wasting disease in cervids, and sheep scrapie (Vazquez-Fernandez et al., 2017). .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 4. Conclusions Our atomic force microscopy images show that the in vitro aggregation of the human PrP protein is inhibited by equimolar amounts of the 25 residues long engineered peptide NCAM1-Aβ. Thus, a very likely molecular-level explanation to our previous observation that such cell-penetrating peptide constructs can reduce the amount of prion aggregates in infected cells, is that these peptide constructs directly interact with the PrP protein and prevent its fibrillation. Funding: The research of MG, IZ, LZ and MK was supported by an OPUS research grant (2014/15/B/ST4/04839) from the National Science Centre (Poland). AG was supported by grants from the Swedish Research Council and from Byggmästare Engkvist´s Foundation. Conflicts of Interest: The authors declare no conflict of interests. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 References Ambadi Thody, S., Mathew, M.K., Udgaonkar, J.B., 2018. Mechanism of aggregation and membrane interactions of mammalian prion protein. Biochim Biophys Acta Biomembr. Chemerovski-Glikman, M., Rozentur-Shkop, E., Richman, M., Grupi, A., Getler, A., Cohen, H.Y., Shaked, H., Wallin, C., Wärmländer, S.K., Haas, E., Gräslund, A., Chill, J.H., Rahimipour, S., 2016. Self-Assembled Cyclic d,l-alpha-Peptides as Generic Conformational Inhibitors of the alpha-Synuclein Aggregation and Toxicity: In Vitro and Mechanistic Studies. Chemistry 22, 14236-14246. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S.e., Wilkins, M.R., Appel, R.D., Amos, B., 2005. Protein Identification and Analysis Tools on the ExPASy Server, in: Walker, J.M. (Ed.), The Proteomics Protocols Handbook. Humana Press, pp. 571-607. Gielnik, M., Pietralik, Z., Zhukov, I., Szymańska, A., Kwiatek, W.M., Kozak, M., 2019. PrP (58–93) peptide from unstructured N-terminaldomain of human prion protein forms amyloid- likefibrillar structures in the presence of Zn2+ions. RSC Advances 9, 22211–22219. Henning-Knechtel, A., Kumar, S., Wallin, C., Król, S., Wärmländer, S., Jarvet, J., Esposito, G., Kirmizialtin, S., Gräslund, A., Hamilton, A.D., Magzoub, M., 2020. Designed cell-penetrating peptide inhibitors of amyloid-beta aggregation and cytotoxicity. Cell Reports Physical Science 1, 100014. Horvath, I., Iashchishyn, I.A., Moskalenko, R.A., Wang, C., Wärmländer, S.K.T.S., Wallin, C., Gräslund, A., Kovacs, G.G., Morozova-Roche, L.A., 2018. Co-aggregation of pro-inflammatory S100A9 with alpha-synuclein in Parkinson's disease: ex vivo and in vitro studies. J Neuroinflammation 15, 172. Hyeon, J.W., Noh, R., Choi, J., Lee, S.M., Lee, Y.S., An, S.S.A., No, K.T., Lee, J., 2020. BMD42-2910, a Novel Benzoxazole Derivative, Shows a Potent Anti-prion Activity and Prolongs the Mean Survival in an Animal Model of Prion Disease. Exp Neurobiol 29, 93-105. Jaunmuktane, Z., Brandner, S., 2019. The role of prion-like mechanisms in neurodegenerative diseases. Neuropathol Appl Neurobiol. Jucker, M., Walker, L.C., 2018. Propagation and spread of pathogenic protein assemblies in neurodegenerative diseases. Nat Neurosci 21, 1341-1349. Julien, O., Chatterjee, S., Thiessen, A., Graether, S.P., Sykes, B.D., 2009. Differential stability of the bovine prion protein upon urea unfolding. Protein Sci 18, 2172-2182. Kristensen, M., Birch, D., Morck Nielsen, H., 2016. Applications and Challenges for Use of Cell- Penetrating Peptides as Delivery Vectors for Peptide and Protein Cargos. Int J Mol Sci 17. Lee, S.M., Kim, S.S., Kim, H., Kim, S.Y., 2019. THERPA v2: an update of a small molecule database related to prion protein regulation and prion disease progression. Prion 13, 197-198. Lundberg, P., Magzoub, M., Lindberg, M., Hallbrink, M., Jarvet, J., Eriksson, L.E., Langel, U., Gräslund, A., 2002. Cell membrane translocation of the N-terminal (1-28) part of the prion protein. Biochem Biophys Res Commun 299, 85-90. Luo, J., Wärmländer, S.K., Gräslund, A., Abrahams, J.P., 2014. Alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. Biochemistry 53, 6302-6308. Luo, J., Wärmländer, S.K., Gräslund, A., Abrahams, J.P., 2016. Reciprocal Molecular Interactions between the Abeta Peptide Linked to Alzheimer's Disease and Insulin Linked to Diabetes Mellitus Type II. ACS Chem Neurosci 7, 269-274. Luo, J., Wärmländer, S.K., Gräslund, A., Abrahams, J.P., 2017. Cross-interactions between the Alzheimer disease amyloid-beta peptide and other amyloid proteins. A FURTHER ASPECT OF THE AMYLOID CASCADE HYPOTHESIS. J Biol Chem 292, 2046. Löfgren, K., Wahlström, A., Lundberg, P., Langel, U., Gräslund, A., Bedecs, K., 2008. Antiprion properties of prion protein-derived cell-penetrating peptides. FASEB J 22, 2177-2184. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Magzoub, M., Oglecka, K., Pramanik, A., Eriksson, G.L.E., Gräslund, A., 2005. Membrane perturbation effects of peptides derived from the N-termini of unprocessed prion proteins. Biochim Biophys Acta 1716, 126-136. Magzoub, M., Sandgren, S., Lundberg, P., Oglecka, K., Lilja, J., Wittrup, A., Eriksson, G.L.E., Langel, U., Belting, M., Gräslund, A., 2006. N-terminal peptides from unprocessed prion proteins enter cells by macropinocytosis. Biochem Biophys Res Commun 348, 379-385. Mashima, T., Lee, J.H., Kamatari, Y.O., Hayashi, T., Nagata, T., Nishikawa, F., Nishikawa, S., Kinoshita, M., Kuwata, K., Katahira, M., 2020. Development and structural determination of an anti- PrP(C) aptamer that blocks pathological conformational conversion of prion protein. Sci Rep 10, 4934. Miller, G., 2009. Neurodegeneration. Could they all be prion diseases? Science 326, 1337-1339. Morillas, M., Swietnicki, W., Gambetti, P., Surewicz, W.K., 1999. Membrane environment alters the conformational structure of the recombinant human prion protein. J Biol Chem 274, 36859- 36865. Mukundan, V., Maksoudian, C., Vogel, M.C., Chehade, I., Katsiotis, M.S., Alhassan, S.M., Magzoub, M., 2017. Cytotoxicity of prion protein-derived cell-penetrating peptides is modulated by pH but independent of amyloid formation. Arch Biochem Biophys 613, 31-42. Necas, D., Klapetek, P., 2012. Gwyddion: an open-source software for SPM data analysis. Central European Journal of Physics 10, 181-188. Oglecka, K., Lundberg, P., Magzoub, M., Eriksson, G.L.E., Langel, U., Gräslund, A., 2008. Relevance of the N-terminal NLS-like sequence of the prion protein for membrane perturbation effects. Biochim Biophys Acta 1778, 206-213. Owen, M.C., Gnutt, D., Gao, M., Wärmländer, S.K.T.S., Jarvet, J., Gräslund, A., Winter, R., Ebbinghaus, S., Strodel, B., 2019. Effects of in vivo conditions on amyloid aggregation. Chem Soc Rev 48, 3946-3996. Pansieri, J., Ostojic, L., Iashchishyn, I.A., Magzoub, M., Wallin, C., Wärmländer, S., Gräslund, A., Nguyen Ngoc, M., Smirnovas, V., Svedruzic, Z., Morozova-Roche, L.A., 2019. Pro- Inflammatory S100A9 Protein Aggregation Promoted by NCAM1 Peptide Constructs. ACS Chem Biol 14, 1410-1417. Ren, B., Zhang, Y., Zhang, M., Liu, Y., Zhang, D., Gong, X., Feng, Z., Tang, J., Chang, Y., Zheng, J., 2019. Fundamentals of cross-seeding of amyloid proteins: an introduction. J Mater Chem B 7, 7267-7282. Richman, M., Wilk, S., Chemerovski, M., Wärmländer, S.K., Wahlström, A., Gräslund, A., Rahimipour, S., 2013. In vitro and mechanistic studies of an antiamyloidogenic self-assembled cyclic D,L- alpha-peptide architecture. J Am Chem Soc 135, 3474-3484. Robinson, P.J., Pinheiro, T.J., 2010. Phospholipid composition of membranes directs prions down alternative aggregation pathways. Biophys J 98, 1520-1528. Santuccione, A., Sytnyk, V., Leshchyns'ka, I., Schachner, M., 2005. Prion protein recruits its neuronal receptor NCAM to lipid rafts to activate p59fyn and to enhance neurite outgrowth. J Cell Biol 169, 341-354. Schmitt-Ulms, G., Legname, G., Baldwin, M.A., Ball, H.L., Bradon, N., Bosque, P.J., Crossin, K.L., Edelman, G.M., DeArmond, S.J., Cohen, F.E., Prusiner, S.B., 2001. Binding of neural cell adhesion molecules (N-CAMs) to the cellular prion protein. J Mol Biol 314, 1209-1225. Sengupta, I., Udgaonkar, J.B., 2018. Structural mechanisms of oligomer and amyloid fibril formation by the prion protein. Chem Commun (Camb) 54, 6230-6242. Swietnicki, W., Morillas, M., Chen, S.G., Gambetti, P., Surewicz, W.K., 2000. Aggregation and fibrillization of the recombinant human prion protein huPrP90-231. Biochemistry 39, 424- 431. Söderberg, K.L., Guterstam, P., Langel, U., Gräslund, A., 2014. Targeting prion propagation using peptide constructs with signal sequence motifs. Arch Biochem Biophys 564, 254-261. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 Terry, C., Wadsworth, J.D.F., 2019. Recent Advances in Understanding Mammalian Prion Structure: A Mini Review. Front Mol Neurosci 12, 169. Tjernberg, L.O., Näslund, J., Lindqvist, F., Johansson, J., Karlström, A.R., Thyberg, J., Terenius, L., Nordstedt, C., 1996. Arrest of beta-amyloid fibril formation by a pentapeptide ligand. J Biol Chem 271, 8545-8548. Vazquez-Fernandez, E., Young, H.S., Requena, J.R., Wille, H., 2017. The Structure of Mammalian Prions and Their Aggregates. Int Rev Cell Mol Biol 329, 277-301. Verma, M., Vats, A., Taneja, V., 2015. Toxic species in amyloid disorders: Oligomers or mature fibrils. Ann Indian Acad Neurol 18, 138-145. Wallin, C., Hiruma, Y., Wärmländer, S., Huvent, I., Jarvet, J., Abrahams, J.P., Gräslund, A., Lippens, G., Luo, J., 2018. The Neuronal Tau Protein Blocks in Vitro Fibrillation of the Amyloid-beta (Abeta) Peptide at the Oligomeric Stage. J Am Chem Soc 140, 8138-8146. Wallin, C., Luo, J., Jarvet, J., Wärmländer, S.K.T.S., Gräslund, A., 2017. The Amyloid-b Peptide in Amyloid Formation Processes: Interactions with Blood Proteins and Naturally Occurring Metal Ions. Israel Journal of Chemistry 57, 674-685. Wang, C., Iashchishyn, I.A., Kara, J., Fodera, V., Vetri, V., Sancataldo, G., Marklund, N., Morozova- Roche, L.A., 2019. Proinflammatory and amyloidogenic S100A9 induced by traumatic brain injury in mouse model. Neurosci Lett 699, 199-205. Wang, C., Klechikov, A.G., Gharibyan, A.L., Wärmländer, S.K.T.S., Jarvet, J., Zhao, L., Jia, X., Narayana, V.K., Shankar, S.K., Olofsson, A., Brännström, T., Mu, Y., Gräslund, A., Morozova-Roche, L.A., 2014. The role of pro-inflammatory S100A9 in Alzheimer's disease amyloid- neuroinflammatory cascade. Acta Neuropathol 127, 507-522. Wang, G., Li, X., Wang, Z., 2016. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res 44, D1087-1093. Wimley, W.C., White, S.H., 1996. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol 3, 842-848. Wärmländer, S.K.T.S., Tiiman, A., Abelein, A., Luo, J., Jarvet, J., Söderberg, K.L., Danielsson, J., Gräslund, A., 2013. Biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. Chembiochem 14, 1692-1704. Wärmländer, S.K.T.S., Österlund, N., Wallin, C., Wu, J., Luo, J., Tiiman, A., Jarvet, J., Gräslund, A., 2019. Metal binding to the amyloid-beta peptides in the presence of biomembranes: potential mechanisms of cell toxicity. J Biol Inorg Chem 24, 1189-1196. Yamaguchi, K.I., Kuwata, K., 2018. Formation and properties of amyloid fibrils of prion protein. Biophys Rev 10, 517-525. Zahn, R., von Schroetter, C., Wüthrich, K., 1997. Human prion proteins expressed in Escherichia coli and purified by high-affinity column refolding. FEBS Lett 417, 400-404. Österlund, N., Kulkarni, Y.S., Misiaszek, A.D., Wallin, C., Kruger, D.M., Liao, Q., Mashayekhy Rad, F., Jarvet, J., Strodel, B., Wärmländer, S.K.T.S., Ilag, L.L., Kamerlin, S.C.L., Gräslund, A., 2018. Amyloid-beta Peptide Interactions with Amphiphilic Surfactants: Electrostatic and Hydrophobic Effects. ACS Chem Neurosci 9, 1680-1692. .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425177doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425177 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_04_425209 ---- Amino acids targeted based metabolomics study in non-segmental Vitiligo: a pilot study Amino acids targeted based metabolomics study in non-segmental Vitiligo: a pilot study Rezvan Marzabani1, Hassan Rezadoost1, Peyman Chopanian4, Nikoo Mozafari 2, Mohieddin Jafari3, Mehdi Mirzaie4, Mehrdad Karimi5 1Department of Phytochemistry, Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, G.C., Tehran, Iran 2Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran 3Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland 4Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran. 5School of Traditional Medicine, Tehran University of Medical Sciences, Tehran, Iran .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Abstract Introduction: Vitiligo is an asymptomatic disorder that results from the loss of pigments (melanin), causing skin or mucosal depigmentation and impairs beauty. Objective: Due to the complexity of the pathogenesis of this disease and various theories including self-safety theory, oxidative stress, neurological theory and internal defects of melanocytes behind it, and finally, the vast role of amino acids in body metabolism and various activities of the body, amino acids targeted based metabolomics was set up to follow any fluctuation inside this disease. Methodology: The study of amino acid profiles in plasma of people with non-segmental vitiligo using a liquid chromatography equipped with fluorescent detector was performed to find remarkable biomarkers for the diagnosis and evaluation of disease severity of patients with vitiligo. Twenty-two amino acids derivatized with o-phthalaldehyde (OPA) and fluorylmethyloxycarbonyl chloride (FMOC), were precisely determined. Next, the concentrations of these twenty-two amino acids and their corresponding molar ratios were calculated in 37 patients (including 18 females and 19 males) and corresponding 34 healthy individuals (18 females and 16 males). Using R programing, the data were completely analyzed between the two groups of patients and healthy to find suitable and reliable biomarkers. Results: Interestingly, comparing the two groups, in the patient group, tyrosine, cysteine, the ratio of tyrosine to lysine and the ratio of cysteine to ornithine were increased while, arginine, lysine, ornithine and glycine ratios to cysteine have been decreased. These amino acids were selected for identification of patients with accuracy of detection of approximately 0.95 using the assessment of logistic regression. Conclusion: These results indicate a disruption of the production of melanin, increased immune activity and oxidative stress, which are also involved in the effects of vitiligo. Therefore, these amino acids can be used as biomarker for the evaluation of risk, prevention of complications in individuals at risk and monitoring of treatment process. Keywords: Vitiligo, plasma, metabolomics, amino acids, liquid chromatography, R programing .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Introduction Vitiligo is a common chronic skin disorder in which pigment-producing cells, or melanocytes are getting in trouble that can result in varying patterns and degrees of skin depigmentation. Patients are characterized by loss of epidermal melanocytes and progressive depigmentation. It is appeared in two main types, non-segmental (generalized) or segmental (Armstrong, 2011; Sahoo et al., 2017). Regardless of much research, the etiology of vitiligo and the reasons of melanocyte death are still unclear (Singh et al., 2016). A complex immune, genetic, environmental, and biochemical causes are behind Vitiligo and the exact molecular mechanisms of its development and progression is not clear (Liang et al., 2019; Sahoo et al., 2017; Singh et al., 2016). Although several vitiligo susceptibility loci identified by genome-wide association studies were reported, but study examining monozygotic twins reported a vitiligo concordance rate of 23%, suggesting a strong environmental contribution to the pathogenesis (Singh et al., 2016). Zheleva et al. (Zheleva et al., 2018), in their work, revealed oxidative stress is a triggering event in the melanocytic destruction and is probably involved in the etiopathogenesis of vitiligo disease. Oxidative stress biomarkers could be finding in the skin and blood of vitiligo patients. Hamidizadeh et al. (Hamidizadeh et al., 2020), in their study compered hopelessness, anxiety, depression and general health of vitiligo patients in comparison with normal controls and confirmed that anxiety and hopelessness levels were significantly higher in vitiligo patients than those who are in healthy controls. It is northly to know, vitiligo worldwide prevalence is in the range of 0.5% to 2% (Ding et al., 2014). But, one the main problems accompanied with vitiligo is its psychological aspect that is experienced by many patients around the globe (Grimes and Miller, 2018). Next to social or psychological distress, people with vitiligo may be at increased risk of sunburn, skin cancer, eye problems, such as inflammation of the iris (iritis) and hearing loss (Jakku et al., 2019). There are many both conventional and unconventional therapies for vitiligo. They are including L- phenylalanine, PGE2 and antioxidant agents, Alpha Lipoic Acid, Flavonoids, Glutathione (GSH), Fluorouracil, L-DOPA, Levamisole, L-Phenylalanine, Melagenine, Omega-3 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ polyunsatured fatty acids, cream cointaning Pseudocatalase, Resveratrol Soybeans, Metals such as zinc, Minoxidil (Gianfaldoni et al., 2018) Although, the pathophysiology of vitiligo is complex, the studies revealing vitiligo cells have unique lipid and metabolite profiles (Sahoo et al., 2017). This led to the question of which factors been associated with vitiligo activity in skin and blood. These biomarkers allow an early and accurate determination of treatment response and the progression of the disease. Up to now some biomarkers is recommended for Vitiligo. Several markers which are received linked to vitiligo and associated with disease activity. Besides providing insights into the driving mechanisms of vitiligo, these findings could reveal potential biomarkers. Although genomic analyses have been performed to investigate the pathogenesis of vitiligo, but the role of small molecules and serum proteins in vitiligo remains unknown. providing insights into the driving mechanisms of vitiligo, these findings could reveal potential biomarkers. Metabolomics is a powerful and promising analytical tool that allows assessment of global low-molecular-weight metabolites in biological systems. It has a great potential for identifying useful biomarkers for early diagnosis, prognosis and assessment of therapeutic interventions in clinical practice (Liang et al., 2019; Speeckaert et al., 2017). Despite the current evidence of the effects of metabolic system on immune system and oxidative stress as two important factors in the development of vitiligo, it seems necessary to more investigation of metabolite fluctuation in this disease. We were keen to establish whether levels of important substrates such as amino acids as the most important primary metabolites were altered in vitiligo cells. This might therefore contribute to the vitiligo phenotype in melanocytes. Then, the aim of this study was to investigate a comprehensive profile of amino acids in plasma of people with vitiligo in comparison with healthy people to find a fast-determinable biomarker. For this a liquid chromatography equipped with fluorescent detector was applied. .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Material and methods Patient samples After receiving ethical approval (The study protocol was approved by the ethics committee of our institution. Also, informed consent was obtained voluntarily from each participant at the time of enrollment) from the Shahid Beheshti University of Medical, all participants signed written informed consent. Table 1 is demonstrating the complete characterization of the case studies. In summary 37 cases with vitiligo and 33 healthy ones attended to the dermatology clinic of Shohadaye Tajrish Educational Hospital. The diagnosis of vitiligo was based on the characteristic loss of skin pigmentation and the examination under Wood's lamp. Blood samples were entered in the tube vacutainers 10 mL containing 0.15 K2EDTA (to prevent clotting) and were centrifugal at 4000 rpm at 4 °c for 20 minutes. Supernatant was isolated and reserved for HPLC-FD analysis at -80 °C. Table 1. Demographics of the study cohort Information HCs* Vitiligo Male 16 19 Female 18 18 Age, years** 35.8±11.8 35.1±12.1 Duration of the disease(year) _ 10.8±9.5 Illness severity (body surface area involvement (%) _ 30.8±20.9 Active disease (having new lesions during last 6 months) _ 23 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Positive Family history _ 24 * HCs= Healthy controls, **= Means ± SD Amino acid Analysis In order to prepare the samples for analysis, the samples are transferred from -80 ° C refrigerator and placed in the ice to be melted. To 50 µ L of sample 20 norleucine (500 μM) and then 200 µ L of methanol kept at -20 °C and all are mixed for five seconds. To completely deproteination, they are kept at -20 °C for 2 hours. At the next stage, The samples are centrifuged at 13000 rom for twelve minutes at 4 °C. The supernatant is completely transfered to Heidolph rotary evaporator and dried in vacu. These samples could be reserved at 4 °C for four weeks. For HPLC analysis, previously dried samples were dissolved in 100 µ l of water (containing 0.01 formic acid) with help of ultrasonic device for 5 minutes. To 10 µ l of each sample 10 µ L OPA ( for derivatization of primary amino acids) and one minute late 10 µ L FMOC (for secondary amino acid derivatization) 20 µ L of this sample are injected HPLC column (Fekkes, 2012; Wu et al., 2016) For the HPLC-DAD method, a Knauer system (WellChrom, Germany) equipping with a K-1001 pump, a K-2800 fast scanning UV detector with simultaneous detection at four wavelengths, an autosampler S3900 (Midas), a K-5004 analytical degasser, and a 2301Rheodyneinjector with a 20 µ L loop was used. HPLC separation was achieved using a Eurospher C18 column (4.6 mm × 250 mm, 5 µ m), with a gradient elution program at a flow rate of 1.0 ml min−1. The mobile phase was composed of A (acetonitrile + 0.05% three flouro acetic acid, v/v) and B (0.05% aqueous trifluoroacetic acid, v/v). The following gradient was applied: 0–10 min, isocratic gradient 70% B; 10–30 min, linear gradient 70-40% B; 30–40 min, linear 40–20% B; 40–50 min, linear 20-0% B; 50–65 min, linear 0-70% B; 65–75 min, isocratic gradient 70% B. The UV absorbance was monitored at 335 nm. All injection volumes of sample and standard solutions were 20 µ l. The chromatographic peaks of the sample solution were identified by spiking and comparing their retention times and UV spectra with those of reference standards. Quantitative analysis was carried out by integration of the peak using the external standard method. Identification of amino acids were conducted using fluorescence at 337 nm and 470 nm for .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ adsorption and excitation respectively for primary amino acids and while detectors and 262 wavelengths (for second amino acids) and 338 nm (for first-type amino acids) related to PDA detectors . To check the accuracy of the procedure, five plasma samples related to individuals The patients were analyzed that RSD every 22 amino acids less than 7 were obtained (Amorini et al., 2017; Douglas, 2003; Wu et al., 2016). Statistical methods For statisticalAnalysis, we used Metaboanalyst 4.0 . Before the analysis, we applied The data conversion and the Mean Center scale and finally the data with normal normalize quantile (the data were analyzed by Shapiro-Wilk test in software R and the data was not normal for some amino acid¬). To compare between study groups by R software , we used Mann-Whitney U test with the FDR correction (benjamini Hochberg) ( ��2ول ). In addition to comparing two groups of patients and healthy patients, the relationship between severity of disease, disease activity, family history, and duration of each amino acid was usedto evaluate Mann Whitney U tests with FDR correction (benjamini Hochberg) and the average prediction score (Random forest) (table 5, image 2). In examining the trend of difference (variation Figure 6) in the amount of metabolites, the sample was used in two groups and clustering of partial separator (PLS-DA) method(Figure 7). In addition, to compare and investigate the correlation between two to two amino acids at the same time in all participants, the correlation matrix was plotted with a significant difference asaheatmap(Form5). It was also plotted to investigate the relationship between each amino acid and the participants of the heat map (Figure 8). Also, in order to investigate the effect of each amino acid and their ratios (20 superior ratio based on Pvalues) We selected as biomarker in the expression of the probability of the cause or severity of the disease, we used logistic regression, the results of the sensitivity and specificity of the test and the result of the system performance curve (ROC) Multiple queries (image 6). .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Results Totally determined 22 amino acids were determined in the studied samples. Table S1. is demonstrating the absolute concentration for determination of twenty-two amino acids in participant group (34 healthy cases and 37 vitiligo cases). First, we performed principal component analysis with all samples, which showed that samples were well clustered in two completely separated clusters (Figure. 1). Figure 1. PCA analysis shows the homogeneity of data obtained by HPLC-FLD. Samples are completely grouped to tow separated cluster. PC1 and PC2 are covered 34% and 22% all data obtained. is in al o to .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Next, Amino acid distribution was evaluated by Shapiro test. Also, t-test was used to show amino acids differences in concentration between vitiligo and healthy samples. Adjusted p- values calculated by Benjamini Hochberg methods. Figure 2(a). is demonstrating volcano graph in which horizontal and vertical axes are corresponding to log2 fold change of sample concentrations and -log10 adjusted p-values respectively. As illustrated in figure 2(c-d), there is a significant increasing in Cys, Pro and Glu, while Lys, Arg, Orn, His and Gly are decreased in vitiligo patients. Figure 2(b). is showing Gini error reduction diagram (average accuracy reduction, average prediction score) obtained from Random Forrest algorithm with tree number of 500. The green dots in vitiligo have increased and the red dots in vitiligo have decreased. w - ph le s a in cy er .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Figure 2. (a) Volcano graph related to amino acids concentration change in the studied Vitiligo samples, (b) Gini error reduction diagram with tree number of 500, (c-d) Box-plot for amino acids fluctuation in both healthy and vitiligo samples. The red boxes show the metabolic concentration values in healthy individuals (control), while the blue ones show the metabolic concentration values in the sick individuals (vitiligo). The adjust p-values for each metabolite are mentioned in the figure. To show the specificity and sensitivity of the studied biomarkers, ROC graph was used. Also, an individual ROC curve was plotted for amino acids with highest changes (Figure 3(b). Interestingly, Cys and Lys showed the maximum of area under curve (AUC) up to 0.91. For these two amino acids a logistic regression was done and its corresponding ROC diagram was drawn. Positive/negative coefficient is implying to the role of each of the selected amino acids in Increasing or decreasing the risk of vitiligo. Next, based on random forest method a confusion matrix developed in which two group of our study are completely classified (Figure 3). Figure 3. (a) ROC curve to show the sensitivity and specificity of the studied amino acids (Cys, Lys, Tyr, Orn, Pro, Glu, Leu, and Gly), (b) Selected ROC curve for Cys and with the highest variation, (c) confusion matrix, based on random forest is completely ini nd the ch an b). or as in on ro, on .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Following the question on the variation of amino acids concentration inside Vitiligo cases with two category, more than 25% and less than 25%, Glu found to be a reliable biomarker. Its concentration (Log2 FC< -0.5) is significantly decreased in the patient showing more than 25% (Figure 4(a-b)). Figure 4. Volcano diagram related to patients with more and less 25% of Vitiligo. (a) Glu is classifying the cases according to Vitiligo severity, (b) Glu is decreased in Patina t with more than 25% of Vitiligo. As the ratio of biomarkers especially amino acids would be a reliable sign of disease, volcano diagram for different ratio of amino acids in the Vitiligo samples are prepared. According to figure 5, ratios including Cys/Orn, Gly/Cys, and…are significantly group the cases of the study. ith Its % no to .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Figure 5. Volcano diagram related to amino acid variation between the studied cases. Metabolic Pathway Analysis Metabolite set enrichment analysis (MSEA) was used to explore the metabolites highly enriched and associated with possible metabolic pathways. Pathway-associated metabolite and disease- associated metabolite analyses were performed shows the majority of the metabolic pathways that are significantly altered in Vitiligo cases. Using pathway associated metabolite sets with enrichment analysis, the main pathways affected were detected. Pathway impact as checked by Metaboanalyst??? has shown that about 35 pathways differ between Vitiligo and healthy samples, of which the first 12 pathways are very significant. following metabolites and metabolic cycles are found to be changed in Vitiligo cases: Arginine and proline metabolism, glycine and serine metabolism, glutathione metabolism, urea cycle, ammonia recycling, glutamate metabolism, alanine metabolism, carnitine synthesis, cycteine metabolism, lysine degradation, beta-alanine metabolism, aspartate metabolism, and methyl histidine metabolism. These are pathways and metabolic cycles, which differed significantly between VCs and HCs. On the other hand, disease-associated metabolite sets compared between VCs and HCs. Ornithine transcarbamylase deficiency (OTC), Hyperornithinemia with gyrate atrophy (HOGA), ed - ys ed 35 ry es: ea is, nd ed s. ), .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Delta-pyrrolide-5-carboxylate synthase, continuous ambulatory peritoneal dialysis, hyperprolinemia-type II, short bwel syndrome (under arginine -free), Argininosuccinic aciduria (ASL), acute seizures, 2-hydroxyglutaric acidemia, 3-phosphoglycerate dehydrogenase deficiency dementia, dicarboxylic aminoaciduria, histinemia, hyperlysinemia I-Family I, phosphoserine aminotransferase deficiency, short-bwel syndrome, and SOTOS syndrome are the most disease-associated metbaile we found here. . Figure 6. Pathway-associated metabolite and disease-associated metabolite analyses is, ria se I, he .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Figure 7. Specify pathway analysis algorithms: Over Representation Analysis : Hypergeometric Test Pathway Topology Analysis : Relative-betweeness Centrality Discussion To best of our knowledge, there are few studied on the role of amino acids in vitiligo. They are focus only on the one or two amino acids and their metabolites which are associated with the production pathway of melanin (phenylalanine, tyrosine and glucosamine, trimethylamine, cysteine, homocysteine and thiol). However, no studies have been conducted to investigate the profile of free amino acids, to investigate changes in those and metabolic pathways of vitiligo. Amino acids play an important role in detoxification and immune responses through regulating the activation of T lymphocytes, B lymphocytes, natural killer cells, and macrophages (1), cellular redox state, gene expression, and lymphocyte proliferation (2), and the production of antibodies, cytokines, and other toxic compounds for the cell (3). In most of the cell types, arginine is produced from citrulline as a precursor and is involved in regulating the activity of the immune system by producing nitric oxide. Proline and glutamate synthesize ornithine by producing pyrroline-5-carboxylate (P5C). In addition, it is catabolized by proline oxidase in different organs to produce hydrogen peroxide and P5C. By converting P5C ric re he e, he o. ng 1), of in ate by 5C .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ into proline, a reduction occurs in the ratio of NADP+ to P5C reductase-dependent NADPH. The proline-P5C cycle regulates the cellular redox state and cell proliferation. In addition, ornithine is converted into citrulline and regenerates arginine using aspartate. Given the metabolic pathways, in which arginine is involved, each of its products has a specific function, including ornithine, as a product of arginine, proline, and glutamate, which contributes to the production of glutamate, glutamine, and polyamines, and mitochondrial integrity, polyamines, as the products of arginine and methionine, affect gene expression, DNA and protein production, ion channel activity, cell death, antioxidants, cellular activity, proliferation and differentiation of lymphocytes, and creatine, as a product of arginine, methionine, and glycine, has antioxidant, antiviral, and anti- tumor activity. Therefore, concomitant decrease in arginine and ornithine and increase in proline may indicate impaired arginine and proline metabolism and urea cycle. As a result, there is a disruption in the response to oxidative stress and cell damage. There are several serine-pathways involving one-carbon metabolism, one of which is glycine synthesis. Glycine is involved in synthesizing many important physiological molecules, including purine nucleotides, glutathione, and Heme (a cofactor containing an iron atom). In addition, glycine itself is a potent antioxidant scavenging free radical. Therefore, glycine is essential for the proliferation and antioxidative defense of leukocytes, and is an anti- inflammatory, immunomodulatory, and cytoprotective agent, the reduction of which indicates impaired glycine/serine and glutathione metabolism, which, in turn, disrupts cellular immunity and response to oxidative stress. Ammonia is considered as an important source of nitrogen and a by-product of cellular metabolism. In addition, it is absorbed through reducing amine synthesis catalyzed by glutamine synthetase and glutamate dehydrogenase, the secondary reactions of which enable other amino acids such as glutamate, proline, and aspartate to obtain this nitrogen directly. Glutamate regulates the expression of nitric oxide synthases (iNOS) in specific tissues and is indirectly involved in regulating the animal immune system. Aspartate, acting as a precursor for nucleotide synthesis, contributes to various metabolic pathways and is important for lymphocyte proliferation. Further, it is necessary for regenerating arginine produced from citrulline in active macrophages and maintaining the intracellular concentration of arginine to sustain NO level in response to immune challenges. Glutamate and aspartate play stimulating roles in the central and .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ peripheral nervous systems, affecting ionotropic and metabotropic receptors (peptide or polypeptide hormone receptors and neurotransmitters on the plasma membrane which play an important role in the immune system). They transport the reducing agents across the mitochondrial membrane, thereby regulating glycolysis and cellular redox state through the malate/aspartate shuttle. In addition, alanine, as a major substrate for hepatic glucose synthesis, is a significant energy substrate for leukocytes, thereby affecting immune function. �-alanine is the only non-essential beta amino acid which occurs naturally and is formed by various metabolic organs. Additionally, they are involved in producing glutamate, aspartate, glutamine, and glycine in a part of their metabolic pathways. Aspartate and glutamate, along with glutamine, are the main source of energy for enterocytes (intestinal epithelial cells). The results showed that glutamate increased among the patient group compared to the control group, indicating increased immune system activity and impaired cellular redox state. Methionine is converted into homocysteine (used as a source of sulfur) in the course of its metabolism and cysteine is produced after homocysteine binds to serine and an intermediary cystathionine is formed. Some studies examined homocysteine and thiols in vitiligo patients and found an increase in homocysteine due to essential cofactors and folate for the activity of methionine synthetase and, consequently a decrease in its activity and the methionine reproduction cycle, which led to an increase in cysteine. Tyrosine is converted into dopaquinone, a highly intermediary metabolite, by tyrosinase which is important for regulating melanogenesis. Dopaquinone reacts rapidly with cysteine as it increases to get involved in the production of pheomelanin, which is considered as a common type of melanin pigment found in the hair and skin, the color of which changes from yellow to red as its concentration increases. When the cysteine level does not decrease, the reaction does not lead to the production of eumelanin pigment, the increased concentration of which changes the color from light brown to black [1]. By increasing thiol levels, the production of melanin is impaired. In addition, the dynamic thiol/disulfide homeostasis regulates the storage of antioxidants, detoxification, apoptosis, and many signal mechanisms including cell division and growth. The results indicated that an increase in cysteine and ratio of cysteine to ornithine and a decrease in the ratio of glycine, arginine, ornithine, and lysine to cysteine in the patient group. Thus, impaired cysteine metabolism disrupts pigment production, increases the activity of the immune system, and .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ counteracts the effects of oxidative stress due to the deficiency in the production of antioxidant compounds such as taurine and glutathione, which result in damaging melanocytes and decreasing pigments. Lysine, which reduced in the group of vitiligo patients, has multiple catabolic pathways, the main one of which is in the liver, where saccharopine, glutamate, alpha-aminoadipate 6- semialdehyde, and acetyl-CoA are produced [2]. In the human body, carnitine, involved in fatty acid metabolism, is biosynthesized using amino acids lysine and methionine. Carnitine and its esters help reduce oxidative stress [3]. In addition, dietary or extracellular lysine can modulate the entry of arginine into leukocytes and the production of NO by iNOs through sharing the like transport systems with arginine. Histidine is converted into urocanic acid through one of its metabolic pathways by enzymatic catalysis of histamine ammonia-lyase. UCA is a unique photoreceptor and cis-UCA is converted into trans-urocanic acid (trans-UCS) by absorbing ultraviolet (UV) radiation from the sun, which controls the activity of the immune system against the UV radiation from the sun. Increased or decreased histidine level from the normal state disrupts the function of the skin immune system [4, 5]. Decreased histidine in the patients triggers the activity of the immune system in response to the existing stimuli, making their skin cells more vulnerable to UV radiation than the normal state. 3-methylhistidine is formed by the posttranslational methylation of histidine residues from major myofibrillar proteins (actin, and myosin). In humans, it is associated with a variety of diseases including type 2 diabetes, eosinophilic esophagitis, and kidney disease. In addition, 3- methylhistidine is associated with the metabolic disorder of propionic acidemia. Measuring 3- methylhistidine provides an indicator of the rate, at which muscle protein breaks down. It is also a biological marker for meat intake, muscle protein breakdown, and intestinal proteins. The clinical features of vitiligo are classified in different ways, one of which is based on the extent of the spots on the body surface. In the patients who were divided into two groups, with limited extent of spots (less than 25%) and large spots (greater than 25%), glutamate decreased by increasing spot extent. Due to the role of glutamate in regulating the protein synthesis and breakdown in the cell and cell cycle, its lower level in these people can indicate impaired cell .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ metabolism and increased cell death, resulting in increased complications in people with more severe disease. Given these cases, the meta-analytic recommendations for the diseases associated with impaired pathways are better understood such as ornithine transcarbamylase (OTC) deficiency (an inherited disorder that causes ammonia to accumulate in the blood due to deficiency in the transcarbamylase), hyperornithinemia with gyrate atrophy (HOGA) (an inherited disorder characterized by progressive vision loss). Disruption of ornithine aminotransferase production helps convert ornithine into another molecule, called P5C. P5C can be converted into amino acids (glutamate and proline), delta-pyrrolide-5-carboxylate synthase (difficulty in degrading proline to P5C), continuous ambulatory peritoneal dialysis (difficulty in excreting all urea and ammonia and, therefore, the need for dialysis), hyperprolinemia-type II (problems with proline degradation increase proline and P5C), short bowel syndrome (under arginine-free) (the small intestine is required for arginine synthesis). Therefore, limited access to essential amino acids in the patients with SBS leads to a defect in the intermediates of the urea cycle, ornithine, citrulline, and arginine, as well as a reduction in these amino acids, which may lead to hyperammonemia, argininosuccinic aciduria (ASL), as a urea cycle disorder which causes ammonia to accumulate in the blood. Other suggested disorders are all inherited diseases which cause complications and metabolic disorders. Based on the results obtained from the review of data and results of previous studies in this area, it is observed that reduced melanin production due to increased cysteine in the patients as well as autoimmunity, and oxidative stress (increased glutamic acid and proline and decreased arginine, glycine, lysine, histidine, and ornithine in patients) simultaneously can damaging melanocytes, result in vitiliginous lesions on the skin surface of patients. Thus, examining the proposed biomarkers may be helpful in early diagnose of at risk patients , in addition considering the changes in glutamic acid levels as biomarkers can be useful for determining the prognosis of the disease. Also Understanding the role of these biomarkers in vitiligo can provide the scientific basis for the development of novel therapeutic approaches in this disease. Conclusion Acknowledgments .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ Amorini, A.M., Lazzarino, G., Di Pietro, V., Signoretti, S., Lazzarino, G., Belli, A., Tavazzi, B., (2017). Severity of experimental traumatic brain injury modulates changes in concentrations of cerebral free amino acids. Journal of cellular and molecular medicine 21(3), 530-542. Armstrong, A., (2011). Advances in Malignant Melanoma: Clinical and Research Perspectives. BoD–Books on Demand. Ding, X., Du, J., Zhang, J., (2014). The Epidemiology and Treatment of Vitiligo: A Chinese Perspective. Pigmentary Disorders 1(148), 2376-0427.1000148. Douglas, C.A., (2003). Amino acid analysis in wines by liquid chromatography: UV and fluorescence detection without sample enrichment. Stellenbosch: Stellenbosch University. Fekkes, D., (2012). Automated analysis of primary amino acids in plasma by high-performance liquid chromatography, Amino Acid Analysis. Springer, pp. 183-200. Gianfaldoni, S., Tchernev, G., Lotti, J., Wollina, U., Satolli, F., Rovesti, M., França, K., Lotti, T., (2018). Unconventional Treatments for Vitiligo: Are They (Un) Satisfactory? Open access Macedonian journal of medical sciences 6(1), 170. Grimes, P., Miller, M., (2018). Vitiligo: Patient stories, self-esteem, and the psychological burden of disease. International journal of women's dermatology 4(1), 32-37. Hamidizadeh, N., Ranjbar, S., Ghanizadeh, A., Parvizi, M.M., Jafari, P., Handjani, F., (2020). Evaluating prevalence of depression, anxiety and hopelessness in patients with Vitiligo on an Iranian population. Health and Quality of Life Outcomes 18(1), 20. Jakku, R., Thappatla, V., Kola, T., Kadarla, R.K., (2019). VITILIGO-An Overview. Asian Journal of Pharmaceutical Research and Development 7(5), 124-132. Liang, L., Li, Y., Tian, X., Zhou, J., Zhong, L., (2019). Comprehensive lipidomic, metabolomic and proteomic profiling reveals the role of immune system in vitiligo. Clinical and experimental dermatology 44(7), e216-e223. Sahoo, A., Lee, B., Boniface, K., Seneschal, J., Sahoo, S.K., Seki, T., Wang, C., Das, S., Han, X., Steppie, M., (2017). MicroRNA-211 regulates oxidative phosphorylation and energy metabolism in human vitiligo. Journal of Investigative Dermatology 137(9), 1965-1974. Singh, R.K., Lee, K.M., Vujkovic-Cvijin, I., Ucmak, D., Farahnik, B., Abrouk, M., Nakamura, M., Zhu, T.H., Bhutani, T., Wei, M., (2016). The role of IL-17 in vitiligo: A review. Autoimmunity reviews 15(4), 397-404. Speeckaert, R., Speeckaert, M., De Schepper, S., van Geel, N., (2017). Biomarkers of disease activity in vitiligo: A systematic review. Autoimmunity reviews 16(9), 937-945. Wu, J.-L., Yu, S.-Y., Wu, S.-H., Bao, A.-M., (2016). A sensitive and practical RP-HPLC-FLD for determination of the low neuroactive amino acid levels in body fluids and its application in depression. Neuroscience letters 616, 32-37. Zheleva, A., Nikolova, G., Karamalakova, Y., Hristakieva, E., Lavcheva, R., Gadjeva, V., (2018). Comparative study on some oxidative stress parameters in blood of vitiligo patients before and after combined therapy. Regulatory Toxicology and Pharmacology 94, 234-239. .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.04.425209doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425209 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_04_425218 ---- Extracellular endosulfatase Sulf-2 harbours a chondroitin/dermatan sulfate chain that modulates its enzyme activity 1 Title Extracellular endosulfatase Sulf-2 harbours a chondroitin/dermatan sulfate chain that modulates its enzyme activity Short title : Extracellular endosulfatase Sulf-2 is a new proteoglycan Authors and Affiliation Rana El Masri1$, Amal Seffouh1$, Caroline Roelants2, Ilham Seffouh3, Evelyne Gout1, Julien Pérard4, Fabien Dalonneau1, Kazuchika Nishitsuji5, Fredrik Noborn6, Mahnaz Nikpour6, Göran Larson6, Yoann Crétinon1, Kenji Uchimura7, Régis Daniel3, Hugues Lortat-Jacob1, Odile Filhol2 and Romain R. Vivès*1 From 1Univ. Grenoble Alpes, CNRS, CEA, IBS, Grenoble, France, 2Inovarion, 75005 Paris, France, 3Université Paris-Saclay, Univ Evry, CNRS, LAMBE, 91025, Evry-Courcouronnes, France, 4Univ-Grenoble alpes, CNRS, IRIG - DIESE - CBM, CEA-Grenoble, 38000 Grenoble, France, 5Department of Biochemistry, Wakayama Medical University, Wakayama, 641-8509 Japan, 6Department of Laboratory Medicine, University of Gothenburg, Sahlgrenska University Hospital, Gothenburg, Sweden, 7Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, F-59000 Lille, France, 8Université Grenoble Alpes, Inserm, CEA, IRIG-Biology of Cancer and Infection, UMR_S 1036, F-38000 Grenoble. $The authors contributed equally to this work *Correspondence should be addressed to: Romain R. Vivès, IBS, 71 Avenue des Martyrs CS 10090, 38044 GRENOBLE CEDEX 9, France. Phone: (+33) 4.57.42.85.08; Fax: (+33) 4.76.50.19.90; Email: romain.vives@ibs.fr, and for in vivo studies to Odile Filhol, CEA-Grenoble, 17 Avenue des Martyrs, 38054 GRENOBLE CEDEX 9, France. Phone: (+33) 4.38.78.56.45; Fax: (+33) 4.38.78.50.58; Email: odile.filhol-cochet@cea.fr. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 2 Key words Sulfatase – enzyme - Glycosaminoglycan – Proteoglycan – post-translational modifications Author contributions RElM and AS performed most of the biochemical experiments, with additional contributions from EG, FD and YC. CR and RElM performed in vivo experiments and data processing under the supervision of OF. JP performed SAXS analysis and modelling. KN and KU performed biochemical analysis of HSulf-2 endogenous expression. All the glycoproteomics LC-MS/MS analyses were prepared, performed and interpreted by FN, MN and GL in collaboration with the BIOMS proteomics core facility at the University of Gothenburg. RD and IS performed MS analysis. RV, OF and HLJ interpreted the data and supervised experimental work. RV, RElM, KU and OF wrote the manuscript with the help of all co-authors. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 3 Abstract Sulfs represent a class of unconventional sulfatases, which differ from all other members of the sulfatase family by their structures, catalytic features and biological functions. Through their specific endosulfatase activity in extracellular milieu, Sulfs provide an original post-synthetic regulatory mechanism for heparan sulfate complex polysaccharides and have been involved in multiple physiopathological processes, including cancer. However, Sulfs remain poorly characterized enzymes, with major discrepancies regarding their in vivo functions. Here we show that human Sulf-2 (HSulf-2) features a unique polysaccharide post-translational modification. We identified a chondroitin/dermatan sulfate glycosaminoglycan (GAG) chain, attached to the enzyme substrate binding domain. We found that this GAG chain affects enzyme/substrate recognition and tunes HSulf- 2 activity in vitro and in vivo using a mouse model of tumorigenesis and metastasis. In addition, we showed that mammalian hyaluronidase acted as a promoter of HSulf-2 activity by digesting its GAG chain. In conclusion, our results highlight HSulf-2 as a unique proteoglycan enzyme and its newly- identified GAG chain as a critical non-catalytic modulator of the enzyme activity. These findings contribute in clarifying the conflicting data on the activities of the Sulfs and introduce a new paradigm into the study of these enzymes. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 4 Introduction Eukaryotic sulfatases have historically been defined as intracellular exoenzymes participating in the metabolism of a large array of sulfated substrates such as steroids, glycolipids, and glycosaminoglycan (GAGs), through hydrolysis of sulfate ester bonds under acidic conditions (Hanson et al., 2004). However, the field took a dramatic turn two decades ago, with the discovery of the Sulfs (Dhoot et al., 2001; Morimoto-Tomita et al., 2002). Unlike all other sulfatases, Sulfs were shown to be extracellular endosulfatases that catalyzed the specific 6-O-desulfation of cell-surface and extracellular matrix heparan sulfate (HS), a polysaccharide with vast protein binding properties and biological functions (El Masri et al., 2017; Li and Kusche-Gullberg, 2016; Sarrazin et al., 2011). And unlike all other sulfatases, Sulfs could not be related to a straightforward metabolic function, but rapidly emerged as a novel major regulatory mechanism of HS biological activities, with roles in many physiopathological processes, including embryonic development, tissue homeostasis and cancer (Bret et al., 2011; Rosen and Lemjabbar-Alaoui, 2010; Vives et al., 2014). Sulfs share a common molecular organization (Figure S1). The furin-processed mature form features a general sulfatase-conserved N-terminal catalytic domain (CAT) including the enzyme active site (and notably, the catalytic N-formylglycine (FGly) converted cysteine residue), and a unique highly basic hydrophilic domain (HD), which shares no homology with any other known protein and is responsible for high affinity binding to HS substrates (Ai et al., 2006; Frese et al., 2009; Seffouh et al., 2013, 2019a; Tang and Rosen, 2009). Sulfs display a number of post-translational modifications (PTM)(Morimoto- Tomita et al., 2002). Furin cleavage (Tang and Rosen, 2009) and N-glycosylations (Ambasta et al., 2007; Seffouh et al., 2019b) may be dispensable for the enzyme activity, but play a role in the enzyme attachment to the cell surface, while conversion of C88 into a FGly residue is a hallmark of all sulfatases and is essential for the catalytic activity (Dierks et al., 2005). Recent studies reported that human Sulfs (HSulfs) catalyzed the 6-O-desulfation of HS through an original, processive and orientated mechanism (Seffouh et al., 2013), and that substrate recognition by the enzyme HD domain involved multiple, highly dynamic, non-conventional interactions (Harder et al., 2015; Walhorn et al., 2018). However, despite increasing interest, Sulfs remain to be highly elusive enzymes. Little is known about their molecular structures, catalytic mechanisms and substrate specificities. Our limited understanding of these enzymes is well illustrated by the wealth of conflicting data in the literature, reporting major discrepancies between in vitro and in vivo data, according to the biological system or the enzyme isoforms considered. This is particularly clear in cancer, where both anti-oncogenic and pro-oncogenic .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 5 effects of the Sulfs have been reported (Rosen and Lemjabbar-Alaoui, 2010; Vives et al., 2014; Yang et al., 2011). Results HSulf-2 is an enzyme modified with a CS/DS chain Recently, we achieved for the first time high yield expression and purification of HSulf-2, which paved the way to progress in the biochemical characterization of this enzyme (Seffouh et al., 2019a). Surprisingly, the purification step of size-exclusion chromatography highlighted an unexpectedly high apparent molecular weight (aMW) for the enzyme (> ~1000 kDa, for a theoretical molecular weight of 98170 Da, Figure 1A), although possible protein aggregation or high order oligomerization were ruled out by quality control negative staining electron microscopy (Seffouh et al., 2019a). Noteworthy, we also failed to detect the C-terminal chain containing the enzyme HD domain using PAGE analysis (Figure 1D, lane 1), even if the presence of both chains was ascertained by Edman N-terminal sequencing (Seffouh et al., 2019a). In line with this, we previously reported unusually weak mass spectrometry ionization efficiency of the HSulf-2 C-terminal chain (Seffouh et al., 2019b). Small angle X-ray scattering (SAXS) analysis of the protein yielded Guinier plots and pair-distribution function in accordance with a Dmax of 40+/-3 nm, suggesting an elongated molecular shape with an aMW of ~700 kDa, which supported our size-exclusion chromatography data (Figure S2A-E). Furthermore, results suggested the presence of two distinct domains within HSulf-2: a globular domain and an extended, flexible, probably partially unfolded region. Interestingly, similar analysis performed on a HD-devoid HSulf-2 variant (HSulf-2 HD) showed only the globular domain (Figure S2F-K), which 11 nm size fitted that of a modelled structure of the CAT domain (Figure S2K). However, it seemed unlikely that the HD domain on its own could account for the second, large flexible region. We thus initially speculated that the enzyme could have been purified in complex with HS substrate polysaccharide chains. To test this, HSulf-2 was treated with heparinases (to digest potentially bound HS substrate) or with chondroitinase ABC (to digest non-substrate GAGs of CS/DS types) prior to size-exclusion chromatography. Results showed no effect of the heparinase treatment (Figure S3B), while digestion with chondroitinase ABC dramatically delayed HSulf-2 size-exclusion chromatography elution time, thus indicating the presence of CS/DS associated to the enzyme (Figure 1B). Attempts to dissociate the HSulf-2-CS/DS complex with high NaCl concentrations were unsuccessful (Figure S3C), thereby suggesting covalent linkage between .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 6 the polysaccharide and the protein. In addition, chondroitinase ABC treatment allowed for detection of a broad additional band of ~50 kDa apparent molecular weight (Figure 1D). This band was assigned to the enzyme C-terminal subunit, as confirmed by Western blot (WB) analysis (Figure 1D). Of note, the HSulf-2 HD variant (lacking the HD but not the enzyme C-terminal region) did not exhibit such band on PAGE (Seffouh et al., 2019a). We therefore concluded from these results that a CS/DS chain was covalently attached to the HSulf-2 HD domain. The presence of such a chain could account for the high aMW determined by SAXS and size-exclusion chromatography, and could also hinder migration/detection of the C-terminal subunit in PAGE/WB. GAGs are covalently bound to specific glycoproteins termed proteoglycans (PGs), through a specific attachment site involving the serine residue of a SG dipeptide, primed by a xylose residue (Esko and Zhang, 1996). Xylosides are widely used inhibitors of GAG assembly on such motifs (Chua and Kuberan, 2017). As such, size-exclusion chromatography analysis of HSulf-2 expressed in xyloside-treated HEK 293 cells showed a dramatic reduction of the high aMW form and concomitant increase of a form eluting as the chondroitinase ABC-treated HSulf-2 (Figure S3D), further supporting the presence of a covalently attached GAG chain. Examination of HSulf-2 amino-acid sequence showed two SG dipeptides: S508G and S583G. We thus expressed and produced a HSulf-2 variant lacking these two motifs (HSulf-2ΔSG). The HSulf-2ΔSG variant eluted at the same time as seen in the chondroitinase ABC- treated wild type (WT) HSulf-2 (Figure 1C), with restored detection of the C-terminal chain by Coomassie blue-stained PAGE and WB analysis (Figure 1D). Both SG dipeptides are located within the enzyme HD domain, but on each side of the furin cleavage site (Figure S1). As our PAGE/WB data located the CS/DS chain on the C-terminal subunit, we thus speculated that the S583G motif downstream the furin cleavage sites was the actual GAG attachment site on HSulf-2. To assess this, we performed single mutations of the first and second sites. Size-exclusion analysis of the resulting variants validated the presence of a CS/DS-type GAG chain on the S583G, but not S508G, dipeptide motif (Figure S3E and S3F). Finally, we also confirmed that the presence of N- and C-terminal tags did not bias the results, as tobacco etch virus (TEV) protease digestion did not affect size-exclusion chromatography elution times of HSulf-2 WT, chondroitinase ABC-treated HSulf-2 WT or HSulf-2ΔSG (Figure S4). To characterize the HSulf-2 GAG chain further, we analyzed both HSulf-2 WT and HSulf-2ΔSG variants by mass spectrometry. MALDI-TOF MS analysis of HSulf-2ΔSG showed a major peak at m/z 53,885 that we assigned to the doubly charged ion [M+2H]2+ of the whole HSulf-2 variant. Interestingly, corresponding mono- and triple- charged ions at m/z 108,250 and 37,180 were also observed. Based on this distribution of multiple charged ions, an average experimental mass value of 108,388 ± 506 g.mol-1 was determined for the whole HSulf-2ΔSG. HSulf-2ΔSG thus exhibited a ~25,000 Da lower .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 7 average mass value compared to HSulf-2 WT average molecular weight previously determined by MALDI MS (133,115 Da) (Seffouh et al., 2019b). Such mass difference likely corresponds to mass contribution of the GAG chain. (Figure S5). Covalent linkage of a ~25 kDa sulfated GAG polysaccharide to HSulf-2 would result in a huge increase of the hydrodynamic volume, which is consistent with the high aMW observed in size-exclusion chromatography (Figure 1A) and in SAXS analysis (Figure S2). Altogether, these data provide converging evidence that HSulf-2 features a unique PTM at the level of its HD domain, corresponding to a covalently-linked CS/DS polysaccharide chain. This result thus highlights HSulf-2 as a new member of the large PG family. Endogenous expression of GAG-modified HSulf-2 We Identified a GAG chain on HSulf-2 when overexpressed in HEK transfected cells. To confirm the physiological relevance of these findings, we sought to demonstrate the presence of this GAG chain on the naturally occurring enzyme. In that attempt, we first used a strategy originally designed to identify new proteoglycans (Noborn et al., 2015). Nano-scale liquid chromatography MS/MS analysis of trypsin- and chondroitinase ABC-digested PGs isolated from the culture medium of human neuroblastoma SH- SY5Y cells led to the identification of a HSulf-2 specific, 21 amino acid long glycopeptide highlighting a CS/DS attachment site on the S583 residue of HSulf-2 (Figure S6). To get further insights into GAG modification of endogenously expressed HSulf-2, we analyzed HSulf- 2 expressed by two additional cell types: MCF7 human breast cancer cells and human umbilical vein endothelial cells (HUVECs). Detection and characterization of endogenous Sulfs were challenging. Expression yields are usually low, and WB immunodetection yields different band patterns, depending on cells, PTMs and furin cleavages (see Figure S1). To address these issues, we made use of Sulf high N-glycosylation content and used a protocol of culture medium enrichment based on a lectin affinity chromatography. We analyzed enriched conditioned medium by WB, using antibodies raised against either HSulf-2 N-terminal (H2.3) (Uchimura et al., 2006) or C-terminal (2B4) (Lemjabbar-Alaoui et al., 2010) subunits (Figure S1). WB analysis of HSulf-2 secreted in the conditioned medium of MCF7 using 2B4 yielded broad diffuse bands, respectively in the ~66-130 and ~185-270 kDa size range (Figure 2A, lane 1). We attributed these bands to CS/DS-conjugated C-terminal subunit fragments and to a CS/DS- conjugated full-length, furin-uncleaved HSulf-2 form, respectively. The presence of the CS/DS chain was confirmed by chondroitinase ABC treatment, which converted the broad bands into two sets of well-defined bands, corresponding to GAG-depolymerized C-terminal fragments and full-length .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 8 unprocessed forms (Figure 2A, lane 2). These changes could definitely be attributed to the cleavage of the GAG chains, as the treatment with heat inactivated chondroitinase ABC showed the same band pattern as the non-treated samples (Figure 2A, lane 3). Analysis of HSulf-2 from HUVEC pre-purified conditioned medium yielded similar results (Figure 2B), but with remarkable differences. First, detected signals were markedly less intense. Although quantification by immunodetection should be cautiously considered, this suggested that expression levels of HSulf-2 were different between MCF7 and HUVEC cells. We also noticed discrepancies regarding furin processing activity (distinct ratios between processed and unprocessed forms). WB analysis using the N-terminal reactive H2.3 antibody confirmed the presence of the N-terminal subunit being unaffected by the chondroitinase ABC treatment (~75 kDa size), and supported the identification of full-length unprocessed forms within the analyzed samples (~160-250 kDa size range, Figure 2C and 2D). Finally, although WB analysis showed similar band patterns for these two cell lines (Figure 2A and 2B), GAG-conjugated fragments from HUVEC cells migrated at slightly lower aMW (~60-75 and ~160- 250 kDa, Figure 2B, lane 1). In addition, we also detected bands corresponding to GAG-lacking fragments, for HSulf-2 from HUVEC at least (C-ter and unprocessed enzyme, see Figure 2C and 2D). Altogether, these results confirm that endogenously expressed HSulf-2 harbor a CS/DS chain and indicate the co-existence of GAG-modified and GAG-free forms. Furthermore, our data suggest cell- dependent specificities of HSulf-2 PTM (e.g. furin processing, and the GAG structure), which could provide additional regulation/diversity of the enzyme structural and functional features. HSulf-2 GAG chain modulates enzyme activity in vitro The HD is a major functional domain of the Sulfs. This domain is required for the enzyme high affinity binding to HS substrates and for processive 6-O-endosulfatase activity (Frese et al., 2009; Seffouh et al., 2013; Tang and Rosen, 2009). We thus anticipated that the presence of a GAG chain on this domain would significantly affect the enzyme substrate recognition and activity. To study this, we assessed HSulf-2 WT and HSulf-2ΔSG 6-O-endosulfatase activities, using heparin as a surrogate of HS. We analyzed the disaccharide composition of HSulf-2 treated heparin and measured the content of [UA(2S)-GlcNS(6S)] trisulfated disaccharide, which is the enzyme’s primary substrate (Frese et al., 2009; Pempe et al., 2012; Seffouh et al., 2013). Results showed enhanced digestion of the disaccharide substrate with HSulf-2ΔSG versus HSulf-2 WT, and a concomitant increase in the [UA(2S)- GlcNS] disaccharide product (Figure 3A). We speculated that the observed increase in endosulfatase activity could be due to an improved enzyme-substrate interaction. We thus analyzed the binding of .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 9 HSulf-2 WT and HSulf-2ΔSG to surface coated biotinylated heparin. Results showed an increase, although modest in HSulf-2ΔSG binding to heparin, with calculated KDs of 10.2±0.4 nM and 7.1±0.8 nM for HSulf-2 WT and HSulf-2ΔSG, respectively (Figure 3B). The second functional domain of the Sulfs, CAT, comprises the enzyme active site. CAT alone is unable to catalyze HS 6-O-desulfation, but it exhibits a generic arylsulfatase activity that can be measured using the fluorogenic pseudosubstrate 4-methyl umbelliferyl sulfate (4-MUS). Surprisingly, HSulf-2ΔSG showed greater (~2.5 fold increase) arylsulfatase activity than that of HSulf-2 WT (Figure 3C). We thus concluded from these observations that newly identified CS/DS chain of HSulf-2 regulates the enzyme activity, both by modulating HD domain/substrate interaction and by hindering access to the active site. We hypothesize that these effects could be due to electrostatic hindrance preventing the interaction of the enzyme functional domains with sulfated substrates. Aside enzyme activity, the interaction of HS with the Sulf HD domain is also involved in the retention of the enzyme at the cell surface, a mechanism that may also govern diffusion and bioavailability of the enzyme within tissues (Frese et al., 2009). To investigate this, we analyzed the interaction of HSulf- 2 WT and HSulf-2ΔSG with cellular HS by FACS, using human amniotic epithelial Wish cells as a model. Again, results showed a significant increase in binding of the HSulf-2 form lacking the CS/DS chain to Wish cells (Figure 3D). These data therefore suggest that HSulf-2 GAG chain may also influence enzyme retention at the cell surface. As GAG-lacking HSulf-2ΔSG variant exhibited enhanced HS 6-O-endosulfatase activity, we sought to investigate whether enzymatic removal of HSulf-2 GAG chain would lead to a similar effect. Hyaluronidases are the only mammalian enzymes to exhibit chondroitinase activity (Csoka et al., 2001; Bilong M. et al., manuscript in revision). We found that treatment of Hsulf-2 WT with hyaluronidase allowed WB detection of the ~50 kDa band corresponding to the enzyme C-terminal subunit (Figure 3E), and boosted heparin 6-O-desulfation (Figure 3F), with an efficiency similar to that of the HSulf- 2ΔSG variant. HSulf-2 GAG chain modulates tumor growth and metastasis in vivo We next investigated the effect of HSulf-2 GAG chain on tumor progression in vivo, using a mouse xenograft model of tumorigenesis and metastasis. For this, we overexpressed by lentiviral transduction either HSulf-2 WT or HSulf-2ΔSG in MDA-MB-231, a human breast cancer cell line that does not express any HSulfs endogenously (Peterson et al., 2010). After selection, stable expression of Sulfs in .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 10 transduced cells was confirmed by WB. In contrast, cells transduced with an unrelated protein (DsRed) showed no endogenous expression of the Sulfs (Figure S7A). We also validated the endosulfatase activity of HSulf-2 produced in MDA-MB-231, by treating heparin with concentrated conditioned medium from transduced cells. Results showed no activity for the medium of DsRed-transfected cells, while conditioned medium from either HSulf-2 WT or HSulf-2ΔSG transduced cells efficiently digested heparin, as shown by the increase of [UA(2S)-GlcNS] disaccharide product (Figure S7B). Again, results suggested higher endosulfatase activity for HSulf-2ΔSG transduced cells. Finally, we confirmed the presence of the CS/DS chain on MDA-MB-231 HSulf-2 WT, by treating conditioned medium with chondroitinase ABC, followed by WB analysis (Figure S7C). Of note, results also showed a significant proportion of GAG-free and full-length, unprocessed forms of HSulf-2 in the chondroitinase ABC-untreated conditioned medium (Figure S7C, lane 1). DsRed, HSulf-2 WT or HSulf-2ΔSG transduced MDA-MB-231 cells were then xenografted into the mammary gland of mice with severe combined immunodeficiency (SCID). Tumor volumes were monitored every 2 days and xenografted SCID mice were euthanized when the first tumors reached 1 cm3 in size (Day 52), in accordance with the European ethical rule on animal experimentation. Primary tumors, along with lymph nodes and lungs, were collected for further analysis. Results showed little effects of HSulf-2 WT expression on the tumor size (Figure 4A). Our data are therefore in disagreement with previous work, which reported either anti-oncogenic (Peterson et al., 2010) or pro-oncogenic (Zhu et al., 2016) effects of HSulf-2 WT expression in MDA-MB-231 cells using similar in vivo mouse models. However, it should be noted that a major difference between these three studies is the size of xenograft tumors achieved (~0.04 cm3 and 3-4 cm3 respectively, in the studies mentioned above). Such conflicting data clearly exemplify the complexity of HSulf regulatory functions and possible bias, which could result from the experimental design. In contrast, expression of the HSulf-2ΔSG variant significantly promoted tumor growth, in comparison to both DsRed and HSulf-2 WT conditions. Noteworthy, WB analysis of tumors confirmed sustained expression of the enzyme in both HSulf-2 WT and HSulf-2ΔSG -but not DsRed- conditions (Supp. Figure S7D). Histological analysis of tumor sections using an eosin/hematoxylin staining showed greater necrotic area in control tumors than in HSulf- expressing tumors (Figure 4B). As necrosis is a hallmark of hypoxia in growing tumors that is mainly due to lack of angiogenesis, we studied tumor vascularization using α Smooth Muscle Actin (αSMA) immunostaining. Results showed no apparent changes in αSMA labelling upon HSulf-2 WT expression. However, tumor vascularization was increased in HSulf-2ΔSG tumors (Figure S8A and S8B). We next analyzed lymph nodes and lungs for secondary tumors. Lung, which is a primary target for metastasis .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 11 in this tumor model, was affected in all conditions (Figure 4C). However, size of metastasis-induced secondary tumors was significantly greater in HSulf-2ΔSG expressing tumors (Figure 4D and 4E). Moreover, tumor metastasis could be observed with higher frequency in lateral (Left Axillary LN) but also contra-lateral (Right Axillary LN) lymph nodes for HSulf-2 expressing tumors (Figures S8C). The CS/DS chain borne by HSulf-2 is thus functionally relevant in vivo, at least in the context of cancer, where it attenuated the effect of the enzyme on tumor growth and metastatic invasion. In contrast, forms of HSulf-2 lacking the CS/DS chain stimulate the metastatic properties of cancer cells, thus highlighting the importance of HSulf-2 GAG modification status for considering the enzyme as a potential therapeutic target for treating human cancer. Discussion In this study, we have shown the presence of a covalently-linked CS/DS chain on extracellular sulfatase HSulf-2, and demonstrated its functional relevance. Although CS/DS chains have been previously identified on the mucin-like domain of ADAMTS (Mead et al., 2018), the identification of HSulf-2 as a new secreted PG is unprecedented. It is well established that GAG chains provide most of PG’s biological activities, usually through the ability of the polysaccharide to bind and modulate a wide array of structural and signaling proteins. However, we show here that the GAG chain present on Hsulf-2 directly modulates its enzyme activity. These findings open new and unexpected perspectives in the understanding of the enzyme biological functions, and should contribute to clarify discrepancies in the literature. Here, we first demonstrated that the CS/DS chain modulates HSulf-2 6-O-endosulfatase activity in vitro, most likely by competing with sulfated substrates for HS binding site occupancy, and/or through electrostatic hindrance. In support to this, we located this GAG chain on the HD domain of HSulf-2, which is critical for substrate binding. However, in an in vivo biological context, we anticipate that Hsulf-2 GAG chain could also modulate the enzyme function through other mechanisms. GAGs bind a wide array of cell-surface and extracellular matrix proteins. The HSulf-2 CS/DS chain could therefore promote the recruitment of GAG-binding proteins, with potentially significant functional consequences. These interactions may involve HSulf-2 in the regulation of matricrin signaling activities, or influence the diffusion and distribution of the enzyme within tissues. In line with this, our FACS- based cell-binding assay suggested enhanced attachment to the cell surface of the HSulf-2ΔSG variant vs the HSulf-2 WT form. Consequently, in vivo HSulf-2 “GAGosylation” status may not only influence the extent of HS 6-O-desulfation, but also the range of the enzyme activity and access to specific HS subsets in tissues. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 12 We thus next analyzed the effect of HSulf-2 GAG chain in an in vivo mouse xenograft model of cancer Our data showed that overexpression of the HSulf-2ΔSG variant with enhanced activity in vitro promoted significantly tumor growth, vascularization and metastasis in vivo. The development of metastasis is a multistep process, which is a major factor of poor prognosis in cancer. Although the biological mechanisms that drive metastasis are relatively unknown, the role of cancer cell-derived matrisome proteins as prometastatic has been recently highlighted (Tian et al., 2020). Based on our data, we speculate that HSulf-2 may also participate to the extracellular cellular matrix remodeling process, and could provide an additional target to act on metastasis development. Furthermore, HSulf- 2 “GAGosylation” status serve as a new metastatic promoting marker. Beyond the field of cancer, this concept of “GAGosylation” status should prove to be critical for studying the biological functions of the Sulfs, as this may confer to the enzyme a tremendous level of functional and structural heterogeneity. It is first well known that the structure and binding properties of GAGs vary according to the biological context. We therefore anticipate further regulation of HSulf- 2 catalytic activity and/or diffusion properties, depending on structural features of its CS/DS chain. In addition, our data highlighted differences in HSulf-2 furin processing amongst analyzed cell types. This could be functionally relevant, as furin maturation may affect HSulf-2 cell surface/extracellular localization as well as in vivo activity (Tang and Rosen, 2009). As GAGs have been previously shown to promote furin activity (Pasquato et al., 2007), we could thus hypothesize that the presence of HSulf-2 newly identified CS/DS chain at the vicinity of the two major furin cleavage sites may also influence HSulf-2 maturation status. Here, we used a mutagenesis generated GAG-lacking HSulf-2 variant in our functional assays. However, our data suggest the co-existence of both GAG-conjugated and GAG-free HSulf-2, as PAGE analysis of GAG conjugated HSulf-2 fragments from MCF7 and HUVECs yielded distinctly different band patterns (Figure 2). The balance of expression between these two forms may therefore be critical for the control of HS 6-O-desulfation process. The underlying mechanisms are likely to be complex and multifactorial. Interestingly, we showed that hyaluronidases could efficiently digest Hsulf-2 GAG chain and enhance its endosulfatase activity (Figure 3E and F). Mammalian hyaluronidases are a family of 6 enzymes that catalyze the degradation of hyaluronic acid (HA) and also exhibit the ability to depolymerize CS (Csoka et al., 2001; Jedrzejas and Stern, 2005; Kaneiwa et al., 2010). Hyaluronidase expression is increased in some cancers (McAtee et al., 2014), with suggested roles in tumor invasion and tumor-associated inflammation (Dominguez Gutierrez et al., 2020; McAtee et al., 2014). However, their precise contribution remains poorly understood and contradictory. Here, we propose a new function for these .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 13 enzymes, which may provide an activating mechanism of HSulf-2, through their ability to “unleash” the enzyme from its GAG chain. This perspective urges to investigate in detail the interplay of HSulf-2 and hyaluronidases during tumor progression as well as in other physiopathological conditions. In line with this, we have analyzed in details the molecular features of Hsulf-2 GAG chain and the effects of hyaluronidase on its structure and activity (Seffouh et al., manuscript in preparation). Last but not least, analysis of the other human isoform HSulf-1 showed an absence of any GAG chain, at least in our HEK293 overexpressing system (Figure S3G). “GAGosylation” status could thus account for the functional differences reported between these two secreted endosulfatases. In conclusion, we report here a most unexpected PTM of HSulf-2, by identifying the presence of a CS/DS chain on the enzyme. Our data highlight this GAG chain as a novel non-catalytic regulatory element of HSulf-2 activity, and pave the way to new directions in the study of this highly intriguing enzyme and complex regulatory mechanism of HS activity. Finally, it is worth noting that such a structurally and functionally relevant feature as a GAG chain on HSulf-2 has remained overlooked for more than 20 years. Beyond the field of the Sulfs, our findings therefore strongly encourage reconsidering afresh the importance of PTMs in complex enzymatic systems. Material and Methods Antibodies against HSulf‑2 The epitopes of antibodies against HSful-2 are summarized in Fig. S1. Polyclonal antibody H19 was newly produced by Biotem (Apprieu, France), by immunizing rabbits with a mix of 2 peptides derived from HSulf-2 sequence (C506DSGDYKLSLAGRRKKLF and T563KRHWPGAPEDQDDKDG), located with the HD domain, on each side of the furin cleavage site (see Fig. S1). Consequently, H19 is specific of the HD domain and recognizes both HSulf-2 N- and C-terminal subunits. The 2B4 monoclonal antibody, which is specific of HSulf-2 C-terminal subunit, was purchased from R&Dsystems (Mab7087). Of note, analysis of whole lysates prepared from cultured cells or tissues with 2B4 yields a sharp ~130 kDa band. This band presumably corresponds to a form in synthesis, such as non furin-processed/GAG- unmodified HSulf-2. Meanwhile, analysis of conditioned medium with 2B4 shows multiple bands corresponding to HSulf-2 unmodified or Furin/GAG-modified C-terminal subunit. The H2.3 polyclonal antibody, which is specific to the HSulf-2 N-terminal subunit, was previously described (Uchimura et al., 2006). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 14 Production and purification of recombinant WT and mutant HSulf‑2 The expression and purification of HSulf-2 and mutants were performed as described previously (Seffouh et al., 2019a). 2019). Briefly, FreeStyle HEK293 cells (Thermo fisher scientific) were transfected with pcDNA3.1 vector encoding for HSulf-2 cDNA flanked by TEV cleavable SNAP (20.5 kDa) and His tags at N- and C-terminus, respectively. The protein was purified from conditioned medium by cation exchange chromatography on a SP sepharose column (GE healthcare) in 50 mM Tris, 5 mM MgCl2, 5 mM CaCl2, pH 7.5, using a 0.1-1 M NaCl gradient, followed by size exclusion chromatography (Superdex200, GE healthcare) in 50 mM Tris, 300 mM NaCl, 5 mM MgCl2, 5 mM CaCl2, pH 7. Treatment of HSulf-2 with chondroitinase ABC was achieved by incubating 250 µg of enzyme with 100mU chondroitinase ABC (Sigma) overnight at 4°C. HSulf-2ΔSG mutants (ΔSG, ΔSG1, ΔSG2) were generated by site directed mutagenesis (ISBG Robiomol platform) and purified as above. Analysis of HSulf-2 expression MDA-MB-231 cells were lysed with RIPA buffer for 2 h at 4°C and tissues were disrupted and lysed in RIPA buffer (Sigma-Aldrich) using a MagNA Lyser instrument (Roche) with ceramic beads. Supernatants were collected and protein concentration was determined using a BCA protein Assay kit (Thermo Scientific). Cell lysates (eq. of 3.105 cells), tumor lysates (eq. of 50 µg of total proteins) or purified recombinant proteins were then separated by SDS-PAGE, followed by transfer onto PVDF membrane. Proteins were probed using either rabbit polyclonal H19 (dil. 1/1000) or mouse monoclonal 2B4 (dil. 1/500) antibodies, followed by incubation with HRP-conjugated anti-rabbit (Thermo Scientific, dil. 1/5000), anti-mouse (Thermo Scientific, dil. 1/5000) secondary antibodies. Endogenous CS/DS modification of HSulf-2 was analyzed in two cell lines: the MCF-7 human breast cancer cells and human umbilical vein endothelial cells (HUVECs). MCF-7 cells were cultured at 37 °C for 48 h in OPTI-MEM, after which culture medium was collected and concentrated on Amicon Ultra Filters (30 kDa cut-off, Millipore, Burlington, MA). Conditioned medium from MDA-MB231 cells was prepared likewise, using FreeStyle medium instead of OPTI-MEM. HSulf-2 in concentrated samples were analyzed by Western blotting as described below. HUVECs were cultured at 37 °C in OPTI-MEM containing 0.5 % FBS for 24 h, after which culture medium was collected and concentrated on Amicon Ultra Filters. Concentrated samples were incubated with GlcNAc-binding wheat germ agglutinin (WGA)-coated beads (Vector Laboratories, Burlingame, CA) at 4 °C overnight, and proteins that were captured by WGA beads were analyzed by Western blotting. For elimination of CS/DS chains, the concentrated MCF-7 culture media or WGA bead-bound materials were treated with chondroitinase .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 15 ABC (1 U/mL) or heat-inactivated chondroitinase ABC (1 U/mL), at 37 °C for 1 h. Proteins in the samples were separated by SDS-PAGE with 5–20% gels (Wako Pure Chemical Industries, Osaka, Japan) and were transferred to PVDF membranes. HSulf-2 proteins were probed with the 2B4 mouse monoclonal anti- HSulf-2 antibody (dil. 1/500) or the H2.3 rabbit polyclonal anti-HSulf-2 antibody (dil. 1/500) followed by a horseradish peroxidase-labeled anti-mouse or rabbit IgG antibody (Cell Signaling Technology, Danvers, MA) and ImmunoStar LD (Wako Pure Chemical Industries). Signals were visualized by using a LuminoGraph image analyzer (ATTO, Tokyo, Japan). SAXS analysis SAXS data were collected at the European Synchrotron Radiation Facility (Grenoble, France) on the BM29 beamline at BioSAXS. The standard energy as set to 12.5 keV and a Pilatus 1M detector was used to record the scattering patterns. The sample-to-detector distance was set to 2.867m (q-range is 0.025 - 6 nm-1). Samples were set in quartz glass capillary with an automated sample changer. The scattering curve of the buffer (before and after) solution was subtracted from the sample’s SAXS curves. Scattering profiles were measured at several concentrations, from 0.5 to 1.5 mg/mL at room temperature. Data were processed using standard procedures with the ATSAS v2.8.3 suite of programs (Petoukhov et al., 2012). The ab initio determination of the molecular shape of the proteins was performed as previously described (Pérard et al., 2018). Radius of gyration (Rg) and forward intensity at zero angle (I(0)) were determined with the programs PRIMUS (Konarev et al., 2003), by using the Guinier approximation at low Q value, in a Q.Rg range < 1.5: 𝑙𝑛𝐼(𝑄) = 𝑙𝑛 𝐼 (0) − 𝑅 𝑄 3 Porod volumes and Kratky plot were determined using the Guinier approximation and the PRIMUS programs. The pairwise distance distribution function P(r) were calculated by indirect Fourier transform with the program GNOM (Svergun, 1992). The maximum dimension (Dmax) value was adjusted in order that the Rg R value obtained from GNOM agreed with that obtained from Guinier analysis. In order to build ab initio models, several independent DAMMIF (Franke and Svergun, 2009) models were calculated in slow mode with pseudo chain option and merged using the program DAMAVER (Konarev et al., 2003). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 16 MALDI-TOF MS analysis of HSulf-2ΔSG MS experiments were carried out on a MALDI Autoflex speed TOF/TOF MS instrument (Bruker Daltonics, Germany), equipped with a SmartBeam II™ laser pulsed at 1 kHz. The spectra were recorded in the positive linear mode (delay: 600 ns; ion source 1 (IS1) voltage: 19.0 kV; ion source 2 (IS2) voltage: 16.6 kV; lens voltage: 9.5 kV). MALDI data acquisition was carried out in the mass range 5000– 150000 Da, and 10000 shots were summed for each spectrum. Mass spectra were processed using FlexAnalysis software (version 3.3.80.0, Bruker Daltonics). The instrument was calibrated using mono- and multi-charged ions of BSA (BSA Calibration Standard Kit, AB SCIEX, France). HSulf-2ΔSG was desalted as previously described (Seffouh et al., 2019b). MALDI-TOF MS analysis was achieved by mixing 1.5 μL of sinapinic acid matrix at 20 mg/mL in acetonitrile/water (50/50; v/v), 0.1% TFA, with 1.5 μL of the desalted protein solution (0.57 mg/mL). LC-MS/MS identification of Hsulf-2 GAG chain and its attachment site The glycoproteomics protocol used for characterizing proteoglycans has been published earlier(Noborn et al., 2015) and most recently summarized in detail for analyses of CS proteoglycans of human cerebrospinal fluid (Noborn et al Methods in Molecular Biology, in press). In the present work, the starting material was conditioned cell media, without fetal calf serum, obtained from SH- SY5Y cells kindly provided by Drs. Thomas Daugbjerg-Madsen and Katrine Schjoldager, University of Copenhagen, Denmark. In vitro enzyme activity assays Detailed protocols for arylsulfatase and endosulfatase assays have been described elsewhere (Seffouh et al., 2019a). For the arylsulfatase assay, the enzyme (2 µg) was incubated for 4h with 10 mM 4MUS (Sigma) in 50 mM Tris, 10 mM MgCl2 pH 7.5 for 1-4 h at 37°C, and the reaction was followed by fluorescence monitoring (excitation 360 nm, emission 465 nm). Results are expressed as a fold of fluorescence increase compared to negative control (4MUS alone) and corresponds to means +/- SD of three independent experiments. The endosulfatase assay was achieved by incubating 25 µg of Heparin with 3 µg of enzymes in 50 mM Tris, 2.5 mM MgCl2 pH 7.5, for 4 h at 37°C. Disaccharide composition of Sulf-treated heparin was then determined by exhaustive digestion of the polysaccharide (48 hours at 37°C) with a cocktail of heparinase I, II and III (10 mU each), followed by RPIP-HPLC analysis (Henriet et al., 2017), using NaCl Gradients calibrated with authentic standards .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 17 (Iduron). Hsulf-2 (2 µg) digestion with hyaluronidase (Sigma/Aldrich, 2µg) was achieved after a 2 h incubation in 50 mM Tris pH 7.5 at 37°C, prior to the endosulfatase assay, which was performed as above. Incubation of heparin with hyaluronidase alone showed no effect on the disaccharide analysis (data not shown). Analysis of HSulf-2/heparin binding immuno-assay As reported before (Seffouh et al., 2019a), microliter plates were first coated with 10 µg/ml streptavidin in TBS buffer, then incubated with 10 µg/ml biotinylated heparin, and saturated with 2% BSA. All the incubations were achieved for 1h at RT, in 50 mM Tris-Cl, 150 mM NaCl, pH 7.5 (TBS) buffer. Next, the recombinant protein was added, then probed with H2.3 primary rabbit polyclonal anti-HSulf- 2 antibody (dil. 1/1000) followed by fluorescent-conjugated secondary anti-rabbit antibody (Jackson ImmunoResearch Laboratories, dil. 1/500). All the incubations were performed for 2 h at 4 °C in TBS, 0.05% Tween, and were separated by extensive washes with TBS, 0.05% Tween. Finally, fluorescence of each well was measured (excitation 485 nm, emission 535 nm). KDs were determined by Scatchard analysis of the binding data. Results shown are representative of three independent experiments. FACS analysis Wish cells (1million for each condition) were washed with PBS, 1% BSA (the same buffer is used all along the experiment), and incubated with 5 µg of HSulf-2 enzymes (2 h at 4°C). After extensive washing, cells were incubated with H2.3 primary antibody (dil. 1/500, 1 h at 4°C), washed again, then with secondary AlexA 488-conjugated antibody (Jackson ImmunoResearch Laboratories, dil. 1/500, 1 h at 4°C). FACS analysis of cell fluorescence was performed on a MACSQuant Analyzer (Miltenyi Biotec, excitation 485 nm, emission 535 nm) by calculating median over 25000 events. Data are represented as means +/- SD of three independent experiments. Lentiviral transduction of MDA-MB 231 cells. HSulf-2 (WT and variant) encoding cDNAs were cloned into the pLVX lentiviral vector (Clonetech). This vector was then used in combination with viral vectors GAG POL (psPAX2) and ENV VSV-G (pCMV) to transduce HEK293T and produce lentiviruses released in the extracellular medium. The pLVX-Ds-Red N1 (Clonetech) was used as negative control. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 18 MDA-MB-231 cells were purchased from ATCC and were cultured in Leibovitz’s medium (Life Technologies) supplemented with 10% fetal bovine serum, 100 U/ml of penicillin, 100 µg/ml of streptomycin (Life Technologies). For infection, MDA-MB-231 cells were plated into 6 well-plates (8 x 105 in 2 mL of serum-supplemented Leibovitz’s medium). The day after, adherent cells were incubated with 1 ml of lentiviral medium diluted in 1 mL of serum-supplemented medium containing 8 μg/μL of polybrene (Sigma/Aldrich). After 4 h, 1 mL of medium were added to cultures and transduction was maintained for 16h before washing the cells and changing the medium. For stable transduction, puromycin selection was started 36 h post-infection (at the concentration of 2 μg/mL, Life Technologies) and was maintained thereafter. In vivo experiments In vivo experiment protocols were approved by the institutional guidelines and the European Community for the Use of Experimental Animals. 7-weeks-old female NOD SCID GAMMA/J mice were purchased from Charles River and maintained in the Animal Resources Centre of our department. 1x106 MDA-MB-231 cells resuspended in 50 % MatrigelTM (Becton Dickinson) in Leibovitz medium (Life Technologies) were injected into the fat pad of #4 left mammary gland. Tumor growth was recorded by sequential determination of tumor volume using caliper. Tumor volume was calculated according to the formula V = ab²/2 (a, length; b, width). Mice were euthanized after 52 days through cervical dislocation. Tumors and axillary lymph nodes were collected, weighed and either fixed for 2h in 4 % paraformaldehyde (PFA) and embedded in paraffin, or stored at -80°C for WB analysis. Tissue necrosis was assessed by Hematoxylin/eosin staining and ImageJ quantification. For vascularization analysis, sections (5μm thick) of formalin-fixed, paraffin embedded tumor tissue samples were dewaxed, rehydrated and subjected to antigen retrieval in citrate buffer (Antigen Unmasking Solution, Vector Laboratories) with heat. Slides were incubated for 10 min in hydrogen peroxide H2O2 to block endogenous peroxidases and then 30 min in saturation solution (Histostain, Invitrogen) to block non- specific antibody binding. This was followed by overnight incubation, at 4°C, with primary antibody against αSMA (Ab124964, Abcam, dil. 1/500). After washing, sections were incubated with a suitable biotinylated secondary antibody (Histostain kit, Invitrogen) for 10 min. Antigen-antibody complexes were visualized by applying a streptavidin-biotin complex (Histostain, Invitrogen) for 10 min followed by NovaRED substrate (Vector Laboratories). Sections were counterstained with hematoxylin to visualize nucleus. Control sections were incubated with secondary antibody alone. Lungs were inflated using 4% PFA and embedded in paraffin. The metastatic burden was assessed by serial sectioning of the entire lungs, every 200µm. Hematoxylin and eosin staining was performed on lung and lymph .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 19 nodes sections (5 µm thick). Images were acquired using AxioScan Z1 (Zeiss) slide scanner and quantified using FiJi software. Statistical analysis Experimental data are shown as mean ± standard error of the mean (SEM) unless specified otherwise. Comparisons between multiple groups were carried out by a repeated-measures two-way analysis of variance (ANOVA) with Tukey’s multiple comparisons test to evaluate the significance of differential tumor growth between three groups of mice; an ordinary two-way ANOVA with Bonferroni’s test and an ordinary one-way ANOVA were carried out to evaluate in vitro activity and binding of HSulf-2, the differential level of necrosis and vascularization inside tumors, and pulmonary metastases (number and area). Prism 6 (GraphPad Software, Inc., CA) was used for analyses. Probability value of less than 0.05 was considered to be significant. * P < 0.05, ** P < 0.01, *** P < 0.001 and **** P < 0.0001. Acknowledgments The authors would like to thank the animal unit staff (Jeannin I., Bama S., Magallon C., Chaumontel N. and Pointu H.) at the Interdiciplinary Research Institute of Grenoble (IRIG) for animal husbandry. This work used the platforms of the Grenoble Instruct-ERIC center (ISBG; UMS 3518 CNRS-CEA-UJF-EMBL) within the Grenoble Partnership for Structural Biology (PSB). Platform access was supported by FRISBI (ANR-10-INBS-05-02) and GRAL, a project of the University Grenoble Alpes graduate school (Ecoles Universitaires de Recherche) CBH-EUR-GS (ANR-17-EURE-0003). This work was also supported by the CNRS and the GDR GAG (GDR 3739), the “Investissements d’avenir” program Glyco@Alps (ANR-15- IDEX-02), by grants from the Agence Nationale de la Recherche (ANR-12-BSV8-0023 and ANR-17-CE11- 0040) and Université Grenoble-Alpes (UGA AGIR program), the Swedish Research Council (2017-00955 to GL and to the Swedish National Infrastructure for Biological Mass Spectrometry (BIOMS)), and the Inga-Britt and Arne Lundbergs Forskningsstiftelse. IBS acknowledges integration into the Interdisciplinary Research Institute of Grenoble (IRIG, CEA). References Ai, X., Do, A.T., Kusche-Gullberg, M., Lindahl, U., Lu, K., and Emerson, C.P., Jr. (2006). Substrate specificity and domain functions of extracellular heparan sulfate 6-O-endosulfatases, QSulf1 and QSulf2. J Biol Chem 281, 4969–4976. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 20 Ambasta, R.K., Ai, X., and Emerson, C.P., Jr. (2007). Quail Sulf1 function requires asparagine-linked glycosylation. J Biol Chem 282, 34492–34499. Bret, C., Moreaux, J., Schved, J.F., Hose, D., and Klein, B. (2011). SULFs in human neoplasia: implication as progression and prognosis factors. J Transl Med 9, 72. Chua, J.S., and Kuberan, B. (2017). Synthetic Xylosides: Probing the Glycosaminoglycan Biosynthetic Machinery for Biomedical Applications. Acc. Chem. Res. 50, 2693–2705. Csoka, A.B., Frost, G.I., and Stern, R. (2001). The six hyaluronidase-like genes in the human and mouse genomes. Matrix Biol. 20, 499–508. Dhoot, G.K., Gustafsson, M.K., Ai, X., Sun, W., Standiford, D.M., and Emerson, C.P., Jr. (2001). Regulation of Wnt signaling and embryo patterning by an extracellular sulfatase. Science 293, 1663– 1666. Dierks, T., Dickmanns, A., Preusser-Kunze, A., Schmidt, B., Mariappan, M., von Figura, K., Ficner, R., and Rudolph, M.G. (2005). Molecular basis for multiple sulfatase deficiency and mechanism for formylglycine generation of the human formylglycine-generating enzyme. Cell 121, 541–552. Dominguez Gutierrez, P.R., Kwenda, E.P., Donelan, W., O’Malley, P., Crispen, P.L., and Kusmartsev, S. (2020). Hyal2 expression in tumor-associated myeloid cells mediates cancer-related inflammation in bladder cancer. Cancer Res. El Masri, R., Seffouh, A., Lortat-Jacob, H., and Vivès, R.R. (2017). The “in and out” of glucosamine 6-O- sulfation: the 6th sense of heparan sulfate. Glycoconj. J. 34, 285–298. Esko, J.D., and Zhang, L. (1996). Influence of core protein sequence on glycosaminoglycan assembly. Curr. Opin. Struct. Biol. 6, 663–670. Franke, D., and Svergun, D.I. (2009). DAMMIF, a program for rapid ab-initio shape determination in small-angle scattering. J. Appl. Crystallogr. 42, 342–346. Frese, M.A., Milz, F., Dick, M., Lamanna, W.C., and Dierks, T. (2009). Characterization of the human sulfatase Sulf1 and its high affinity heparin/heparan sulfate interaction domain. J Biol Chem 284, 28033–28044. Hanson, S.R., Best, M.D., and Wong, C.H. (2004). Sulfatases: structure, mechanism, biological activity, inhibition, and synthetic utility. Angew Chem Int Ed Engl 43, 5736–5763. Harder, A., Möller, A.-K., Milz, F., Neuhaus, P., Walhorn, V., Dierks, T., and Anselmetti, D. (2015). Catch bond interaction between cell-surface sulfatase Sulf1 and glycosaminoglycans. Biophys. J. 108, 1709–1717. Henriet, E., Jäger, S., Tran, C., Bastien, P., Michelet, J.-F., Minondo, A.-M., Formanek, F., Dalko-Csiba, M., Lortat-Jacob, H., Breton, L., et al. (2017). A jasmonic acid derivative improves skin healing and induces changes in proteoglycan expression and glycosaminoglycan structure. Biochim. Biophys. Acta 1861, 2250–2260. Jedrzejas, M.J., and Stern, R. (2005). Structures of vertebrate hyaluronidases and their unique enzymatic mechanism of hydrolysis. Proteins Struct. Funct. Bioinforma. 61, 227–238. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 21 Kaneiwa, T., Mizumoto, S., Sugahara, K., and Yamada, S. (2010). Identification of human hyaluronidase-4 as a novel chondroitin sulfate hydrolase that preferentially cleaves the galactosaminidic linkage in the trisulfated tetrasaccharide sequence. Glycobiology 20, 300–309. Konarev, P.V., Volkov, V.V., Sokolova, A.V., Koch, M.H.J., and Svergun, D.I. (2003). PRIMUS: a Windows PC-based system for small-angle scattering data analysis. J. Appl. Crystallogr. 36, 1277– 1282. Lemjabbar-Alaoui, H., van Zante, A., Singer, M.S., Xue, Q., Wang, Y.Q., Tsay, D., He, B., Jablons, D.M., and Rosen, S.D. (2010). Sulf-2, a heparan sulfate endosulfatase, promotes human lung carcinogenesis. Oncogene 29, 635–646. Li, J.-P., and Kusche-Gullberg, M. (2016). Heparan Sulfate: Biosynthesis, Structure, and Function. Int. Rev. Cell Mol. Biol. 325, 215–273. McAtee, C.O., Barycki, J.J., and Simpson, M.A. (2014). Emerging roles for hyaluronidase in cancer metastasis and therapy. Adv. Cancer Res. 123, 1–34. Mead, T.J., McCulloch, D.R., Ho, J.C., Du, Y., Adams, S.M., Birk, D.E., and Apte, S.S. (2018). The metalloproteinase-proteoglycans ADAMTS7 and ADAMTS12 provide an innate, tendon-specific protective mechanism against heterotopic ossification. JCI Insight 3. Morimoto-Tomita, M., Uchimura, K., Werb, Z., Hemmerich, S., and Rosen, S.D. (2002). Cloning and characterization of two extracellular heparin-degrading endosulfatases in mice and humans. J Biol Chem 277, 49175–49185. Noborn, F., Gomez Toledo, A., Sihlbom, C., Lengqvist, J., Fries, E., Kjellen, L., Nilsson, J., and Larson, G. (2015). Identification of chondroitin sulfate linkage region glycopeptides reveals prohormones as a novel class of proteoglycans. Mol Cell Proteomics 14, 41–49. Pasquato, A., Dettin, M., Basak, A., Gambaretto, R., Tonin, L., Seidah, N.G., and Di Bello, C. (2007). Heparin enhances the furin cleavage of HIV-1 gp160 peptides. FEBS Lett 581, 5807–5813. Pempe, E.H., Burch, T.C., Law, C.J., and Liu, J. (2012). Substrate specificity of 6-O-endosulfatase (Sulf- 2) and its implications in synthesizing anticoagulant heparan sulfate. Glycobiology 22, 1353–1362. Pérard, J., Nader, S., Levert, M., Arnaud, L., Carpentier, P., Siebert, C., Blanquet, F., Cavazza, C., Renesto, P., Schneider, D., et al. (2018). Structural and functional studies of the metalloregulator Fur identify a promoter-binding mechanism and its role in Francisella tularensis virulence. Commun. Biol. 1, 93. Peterson, S.M., Iskenderian, A., Cook, L., Romashko, A., Tobin, K., Jones, M., Norton, A., Gomez-Yafal, A., Heartlein, M.W., Concino, M.F., et al. (2010). Human Sulfatase 2 inhibits in vivo tumor growth of MDA-MB-231 human breast cancer xenografts. BMC Cancer 10, 427. Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda, M., Gorba, C., Mertens, H.D.T., Konarev, P.V., and Svergun, D.I. (2012). New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Crystallogr. 45, 342–350. Rosen, S.D., and Lemjabbar-Alaoui, H. (2010). Sulf-2: an extracellular modulator of cell signaling and a cancer target candidate. Expert Opin Ther Targets 14, 935–949. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 22 Sarrazin, S., Lamanna, W.C., and Esko, J.D. (2011). Heparan sulfate proteoglycans. Cold Spring Harb Perspect Biol 3. Seffouh, A., Milz, F., Przybylski, C., Laguri, C., Oosterhof, A., Bourcier, S., Sadir, R., Dutkowski, E., Daniel, R., van Kuppevelt, T.H., et al. (2013). HSulf sulfatases catalyze processive and oriented 6-O- desulfation of heparan sulfate that differentially regulates fibroblast growth factor activity. Faseb J 27, 2431–2439. Seffouh, A., El Masri, R., Makshakova, O., Gout, E., Hassoun, Z.E.O., Andrieu, J.-P., Lortat-Jacob, H., and Vivès, R.R. (2019a). Expression and purification of recombinant extracellular sulfatase HSulf-2 allows deciphering of enzyme sub-domain coordinated role for the binding and 6-O-desulfation of heparan sulfate. Cell. Mol. Life Sci. CMLS 76, 1807–1819. Seffouh, I., Przybylski, C., Seffouh, A., El Masri, R., Vivès, R.R., Gonnet, F., and Daniel, R. (2019b). Mass spectrometry analysis of the human endosulfatase Hsulf-2. Biochem. Biophys. Rep. 18, 100617. Svergun, D.I. (1992). Determination of the regularization parameter in indirect-transform methods using perceptual criteria. J. Appl. Crystallogr. 25, 495–503. Tang, R., and Rosen, S.D. (2009). Functional consequences of the subdomain organization of the sulfs. J Biol Chem 284, 21505–21514. Tian, C., Öhlund, D., Rickelt, S., Lidström, T., Huang, Y., Hao, L., Zhao, R.T., Franklin, O., Bhatia, S.N., Tuveson, D.A., et al. (2020). Cancer Cell-Derived Matrisome Proteins Promote Metastasis in Pancreatic Ductal Adenocarcinoma. Cancer Res. 80, 1461–1474. Uchimura, K., Morimoto-Tomita, M., Bistrup, A., Li, J., Lyon, M., Gallagher, J., Werb, Z., and Rosen, S.D. (2006). HSulf-2, an extracellular endoglucosamine-6-sulfatase, selectively mobilizes heparin- bound growth factors and chemokines: effects on VEGF, FGF-1, and SDF-1. BMC Biochem 7, 2. Vives, R.R., Seffouh, A., and Lortat-Jacob, H. (2014). Post-Synthetic Regulation of HS Structure: The Yin and Yang of the Sulfs in Cancer. Front Oncol 3, 331. Walhorn, V., Möller, A.-K., Bartz, C., Dierks, T., and Anselmetti, D. (2018). Exploring the Sulfatase 1 Catch Bond Free Energy Landscape using Jarzynski’s Equality. Sci. Rep. 8, 16849. Yang, J.D., Sun, Z., Hu, C., Lai, J., Dove, R., Nakamura, I., Lee, J.S., Thorgeirsson, S.S., Kang, K.J., Chu, I.S., et al. (2011). Sulfatase 1 and sulfatase 2 in hepatocellular carcinoma: associated signaling pathways, tumor phenotypes, and survival. Genes. Chromosomes Cancer 50, 122–135. Zhu, C., He, L., Zhou, X., Nie, X., and Gu, Y. (2016). Sulfatase 2 promotes breast cancer progression through regulating some tumor-related factors. Oncol. Rep. 35, 1318–1328. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 23 Figures Fig. 1 : Purification and characterization of HSulf-2 and HSulf-2ΔSG Size exclusion chromatography profile of HSulf-2 WT (A), chondroitinase ABC pre-treated HSulf-2 (B) and HSulf-2ΔSG (C) ; grey bars indicate Sulf-containing fractions. (D) PAGE/Coomassie blue staining and Western blot analysis of HSulf-2 WT (WT, lanes 1), HSulf-2ΔSG (ΔSG, lanes 2) and chondroitinase .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 24 ABC pre-treated HSulf-2 (CS, lane 3), using the anti HD H19 antibody. Analysis shows a ~95 kDa band corresponding to HSulf-2 N-terminal subunit in fusion with the SNAP-tag (SNAP-Nter) and multiple/broad ~50 kDa bands corresponding to the C-terminal subunit, which includes HSulf-2 HD domain (Cter). Of note, a residual ~75 kDa band corresponding to the N-terminal subunit lacking its SNAP-tag could also be detected (Nter). In addition, Coomassie blue staining but not WB, revealed the presence of a full-length, unprocessed GAG-free HSulf-2 form (SNAP-unprocessed). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 25 Fig. 2 : Endogenous expression of GAG bearing HSulf-2 in MCF7 and HUVEC cells Western blot analysis of pre-purified concentrated conditioned medium from MCF7 (A, C) and HUVEC (B, D) using anti C-ter 2B4 (A, B) and anti N-ter H2.3 antibodies (C, D), prior to (0, lanes 1) or after treatment with chondroitinase ABC (CS, lanes 2). Digestions with heat-inactivated chondroitinase ABC were used as controls (CS inac., lanes 3). The nature of detected bands is shown as follow: black .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 26 symbols for CS/DS conjugated fragments ; white symbols for GAG-free fragments ; triangles for the N- terminal subunit, squares for the C-terminal subunit. Of note, analysis indicate the presence in both samples of unprocessed forms (triangle + square, sharp band at ~125 kDa), and at least in the HUVEC conditioned medium, the presence of GAG-free HSulf-2 forms (bands corresponding to C-ter fragments within the 40-57 kDa MW range, and an unprocessed form at 125 kDa detected in the untreated samples, gel B lanes 1 and 3). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 27 Fig. 3 : Biological activities of HSulf-2 WT and GAG-free HSulf-2. (A) HS 6-O-endosulfatase activity of HSulf-2 WT (black symbols) and HSulf-2ΔSG (white symbols) was assessed by monitoring the time course digestion of [UA(2S)-GlcNS(6S)] trisulfated disaccharides .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 28 (NS2S6S, square) into UA(2S)-GlcNS] disulfated disaccharides (NS2S, circle). Data are expressed as a percentage of total disaccharide content. Ordinary two-way ANOVA with time of incubation and type of HSulf-2 as factors revealed significant effects on 6-O-endosulfatase activity (time: F3, 16 = 272.7, P < 0.0001; HSulf-2 type: F1, 16 = 140.5, P < 0.0001; interaction: F3, 16 = 28.57, P < 0.0001) (left panel) and a concomitant increase in the digested product (time: F3, 16 = 394, P < 0.0001; HSulf-2 type: F1, 16 = 195, P < 0.0001; interaction: F3, 16 = 39.94, P < 0.0001) (right panel). Post-hoc Bonferroni’s test showed significant difference in the HS 6-O-endosulfatase activity at 1h incubation and thereafter until 7h in HSulf-2 ΔSG (n=3) compared with HSulf-2 WT (n=3). Error bars indicate SD. (B) Binding immunoassay of HSulf-2 WT (black) and HSulf-2ΔSG (white) to a streptavidin-immobilized heparin surface. Data are representative of three independent experiments. (C) The aryl-sulfatase activity of HSulf-2 WT (n=3, black) and HSulf-2ΔSG (n=3, white) was measured using 4MUS fluorogenic pseudo-substrate. Results are expressed as a fluorescence fold increase compared to negative control (4MUS alone, n=3, grey). (D) Binding of HSulf-2 WT (n=3, black) and HSulf-2ΔSG (n=3, white) to the surface of human amnion- derived Wish cells was monitored by FACS using the H2.3 anti-HSulf-2 antibody. Ordinary one-way ANOVA with type of HSulf-2 as a factor revealed significant effects on a sulfatase activity (F2, 6 = 90.18, P < 0.0001) (C), and a cell-surface binding (F2, 6 = 536.7, P < 0.0001). (D) Post-hoc Tukey’s range test showed significant difference in the 4MUS activity and binding to human Wish cells in HSulf-2ΔSG compared with control (n=3, grey) or HSulf-2 WT. Error bars indicate SD. (E) Western blot analysis of HSulf-2 WT (lane 1) and Hyaluronidase treated HSulf-2 WT (lane 2), using the anti HD H19 antibody. (F) [UA(2S)-GlcNS(6S)] trisulfated disaccharide (NS2S6S, black) and UA(2S)-GlcNS] disulfated disaccharid (NS2S, white) content (as in (A), expressed as a percentage of total disaccharide content, n=3) of heparin, without (Hp) or after digestion with HSulf-2 WT or Hyaluronidase (HYAL)- treated HSulf-2 WT (4 h at 37 °C). Data show significantly increased heparin 6-O-desulfation for HYAL-treated HSulf-2 WT. Error bars indicate SD (****P<0.0001). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 29 Fig. 4 : Effects of HSulf-2 WT and HSulf-2ΔSG during tumor progression and metastasis. (A) Time course measurement of tumor size induced by MDA-MB-231 cells expressing DsRed, HSulf-2 WT or HSulf-2ΔSG. Statistical analysis was performed using a two-way ANOVA test, ***P≤0.001 and **P<0.01. Pictures representative of each tumor group, at day 52, (A, right panel). (B) Histological analysis of necrotic area using eosin/hematoxin staining of tumors expressing mock DsRed, HSulf-2 WT and HSulf-2ΔSG. The percentage of necrotic area was determined on three sections from each of the .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 30 six mice in each group (one-way ANOVA test, multiple comparison (Tukey’s test, n=6), ***P≤0.001 and **P<0.01). (C) Histological analysis of the percentage of pulmonary metastatic area from DsRed, HSulf- 2 and HSulf-2ΔSG expressing tumors. The measurement was performed on three sections from each of the six mice in each group (one-way ANOVA test, multiple comparison, Tukey’s test, n=18 **P<0.01 and ***P<0.001). (D) The size of pulmonary metastasis in each group was quantified and analyzed as in C (n=12 **P<0.05 and ***P<0.001). (E) Representative images of hematoxylin/eosin stained sections of indicated lung. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 31 Supplementary material Fig. S1 : Schematic representation of HSulf-2 molecular organization, PMTs and antibody epitopes. HSulf-2 870 amino-acid (a.a.) pro-protein comprises a signal peptide (SP, black box) and a polypeptide processed through Furin cleavage (black arrows) into two N-terminal (N-term) and C-terminal (C-term) subunits. HSulf-2 comprises two major functional domains: a catalytic domain (CAT, in grey) and a highly basic hydrophilic region (HD, hatched in grey), and features a C-terminal region sharing homology with glucosamine-6-sulfatase homolog (C, dotted). Potential N-glycosylation sites (N), the catalytic FGly residue (FGly, in red and bold) and the SG dipeptides (blue, in bold for S583G) and antibody epitopes (black bars) are indicated. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 32 Fig. S2 : Study of HSulf-2 and HSulf-2 HD by Small Angle X-Ray scattering SAX Analysis of HSulf-2 (Panels A-E) and HSulf-2 HD (Panels F-K). (A) Scattering curves of experimental data of HSulf-2 in solution. (B) Linear dependence of ln[I(Q)] vs Q2 determined by Guinier plot at 0.6mg/ml with a Rg of 13nm and I0 of 700 with a porod volume more than 1000. HSulf-2 give a MWexp: 700 kDa. MWmalls: 1000 kDa , MWth protein: 98,53 kDa. This data indicate the presence of elongated molecule with a potential rode shape. (C) Pair distribution function p(R) in arbitrary units (arb.u) vs. r (nm) determined by GNOM with a Dmax of 40 nm +/- 3 nm indicate that the HSulf-2 is an elongated molecule in solution. (D) I globularity and flexibility analysis of HSulf-2. Kratky plot(I(q)*q2 vs. q) of HSulf-2 not converge to the q axis witch and indicate the presence of mixture of multidomain protein with flexible linker and unfolded region (could be allocate to the GAG). (E) Final ab initio model of HSulf-2 generated with individual DAMMIF model in slow mode. DAMAVER classification under NSD value indicates the presence of several clusters (NSD > 1.5) for HSulf-2 suggesting the presence of flexible regions. The proposed final model of HSulf-2 combines 14 of the 49 models calculated with the best NSD (between 1.5 to 1.7). (F) Scattering curves of experimental data of HSulf-2 HD domain in solution. (G) Linear dependence of ln[I(Q)] vs Q2 determined by Guinier plot at several concentrations between 0,5 to 3mg/ml give a linear region with Rg of 3.9nm and a I0 of 87 with a porod volume of 144. MWexp: 87 kDa (MWmalls: 84 kDa, MWth protein: 64 kDa). This data indicate the presence of potential globular protein. (H) Pair distribution function p(R) in arbitrary units (arb.u) vs. r (nm) determined by GNOM give a Dmax of 11 nm with a relative globular shape. (I) Kratky plot of HSulf-2 HD present a .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 33 "bell-shape" peak at low q and converges to the q axis at high q corresponding to a well-folded globular protein. (J) Final ab initio model of HSulf-2 HD generated with individual DAMMIF model in slow mode and merged with Damaver (NSD < 0.7). The HSulf-2 HD ab initio model give a globular envelope with a small-elongated part. (K) Superimposition of prediction structure of HSulf-2 HD based on pdb: 4UPL (from Silicibacter pomeroyi) structure (PHYRE analysis) into HSulf-2 HD SAXS envelope with Supcomb20 program. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 34 Fig. S3 : Size-exclusion chromatography of HSulf-2 WT, HSulf-2 variants and HSulf-1 Size exclusion chromatography profile of HSulf-2 WT without (A), or following pre-treatment with heparinase I, II, III (B), 2 M NaCl (C), or xyloside (D). Size exclusion chromatography profile of HSulf-2 ΔS508G (E), ΔS583G (F), or HSulf-1 (G). Grey bars show Sulf-containing peaks. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 35 Fig. S4 : Size-exclusion chromatography of TEV-treated HSulf-2 WT and variant. Size exclusion chromatography profile of TEV-treated HSulf-2 WT (A), chondroitinase ABC-treated HSulf-2 WT (B), or HSulf-2ΔSG (C). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 36 Fig. S5 : MALDI-TOF mass spectrometry analysis of HSulf-2ΔSG. Mass spectrum of HSulf-2ΔSG in positive ionization mode (100 kDa-filtrated HSulf-2ΔSG mixed with sinapinic acid matrix, linear mode). HSulf-2ΔSG is detected as the protonated species [M+H]+ and corresponding doubly and tri-charged [M+2H]2+ and [M+3H]3+ ions. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 37 Fig. S6 : LC-MS/MS detection of a CS/DS GAG linkage region attached to S583 of HSulf-2 HSulf-2 glycopeptides were obtained by trypsin digestion of media of cultured SH-SY5Y cells, followed by enrichment on a SAX column, and thereafter treatment with chondroitinase ABC. The spectral files were filtered for the MS2 diagnostic ion at m/z 362.1083 corresponding to the delta-hexuronic acid - .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 38 N-acetylgalactosamine disaccharide, common to all CS/DS linkage region glycopeptides. (A) MS2 spectrum of the 578-DGGDFSGTGGLPDYSAANPIK-598 glycopeptide obtained by HCD with normalized collision energy of 20%, providing prominent glycan fragmentations. (B) MS2 spectrum of the same glycopeptide obtained at normalized collision energy of 35%, displaying peptide sequence fragmentation with b- and y-ions annotated in the sequence. The positioning and distinction of sulfate (79.9568 u) and phosphate (79.9663 u) modifications were done by manually interpreting the MS2 spectra. The MS2 spectrum thus displayed a mass shift of 79.9570 u between m/z 362.1083 and m/z 442.0653, demonstrating the presence of a sulfate modification on the GalNAc residue. A mass shift of 212.0084 u was observed between m/z 1019.9724 (2+) and m/z 1125.9766 (2+), demonstrating the presence of a xylose plus phosphate modification of the peptide (the theoretical mass of this modification is 212.0086 u (132.0423 u + 79.9663 u). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 39 Fig. S7 : Expression and activity of HSulf-2 in MDA-MB-231 transduced cells (A) Western blot analysis of Dsred (lanes 1), HSulf-2 WT (lanes 2) or HSulf-2ΔSG (lanes 3) transduced MDA-MB-231 cell lysates, using anti C-terminal HSulf-2 2B4 antibody and anti-actin antibody (Sigma, ref A-2066) as a loading control. (B) Endosulfatase activity was monitored by treating heparin with Dsred, HSulf-2 WT or HSulf-2ΔSG transduced MDA-MB-231 cell conditioned medium. Results are .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 40 expressed as fold increase of NS2S disaccharide content compared to untreated heparin (control). (C) Western blot analysis (H19 antibody) of HSulf-2 WT transduced MDA-MB-231 cell conditioned medium prior to (WT, lane 1) or after (WT / CSase, lane 2) treatment with chondroitinase ABC. (D) Western blot analysis (2B4 antibody) of mice tumor lysates resulting from injections of DsRed (lanes 1 and 2), HSulf- 2 WT (lanes 3 and 4) or HSulf-2ΔSG (lanes 5 and 6) transduced MDA-MB-231 cells. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 41 Fig. S8 : Effects of HSulf-2 WT and HSulf-2ΔSG on tumor metastasis. (A) Histological analysis of the vascularized area, using α Smooth Muscle Actin (αSMA) immunostaining of tumors expressing mock DsRed, HSulf-2 WT and HSulf2ΔSG. The calculation of vascularized area of tumors was performed on five mice in each group. For each mouse, 4 ROIs (Region Of Interest) were quantified: The αSMA positive area was measured for each ROI and divided by the total ROI’s area .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 42 giving for each mice a percentage of vascularized area. In each group, the median of this percentage was divided by the mean of the median of the 5 DsRed control mice (one-way ANOVA, multiple comparison, Bonferroni test,*P=0.07, **P=0.03). (B) Representative images of αSMA staining with different magnifications. (C) Percentage of mice with metastasis found in the lung, left axillary and right axillary lymph node (LN) from DsRed, HSulf- 2 and HSulf-2ΔSG expressing tumors. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 4, 2021. ; https://doi.org/10.1101/2021.01.04.425218doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425218 http://creativecommons.org/licenses/by-nd/4.0/ 10_1101-2021_01_04_425348 ---- Distinct roles and actions of PDI family enzymes in catalysis of nascent-chain disulfide formation 1 1 Distinct roles and actions of PDI family enzymes in catalysis of nascent-chain 2 disulfide formation 3 4 Chihiro Hirayama 1 , Kodai Machida 2# , Kentaro Noi 3# , Tadayoshi Murakawa 4 , Masaki 5 Okumura 1,5 , Teru Ogura 6,7 , Hiroaki Imataka 2 , and Kenji Inaba 1* 6 7 1 Institute of Multidisciplinary Research for Advanced Materials, Tohoku University, 8 Sendai, Miyagi 980-8577, Japan 9 2 Graduate School of Engineering, University of Hyogo, Himeji, Hyogo 671-2280, Japan 10 3 Institute for NanoScience Design, Osaka University, Toyonaka, Osaka 560-8531, Japan 11 4 Graduate School of Life Science and Technology, Tokyo Institute of Technology, 12 Yokohama, Kanagawa, 226-8503, Japan 13 5 Frontier Research Institute for Interdisciplinary Sciences, Tohoku University, Sendai, 14 Miyagi 980-8578, Japan 15 6 Institute of Molecular Embryology and Genetics, Kumamoto University, Kumamoto, 16 Kumamoto 860-0811, Japan 17 7 Faculty of Life Sciences, Kumamoto University, Kumamoto 862-0973, Japan 18 19 # These authors contributed equally to this work 20 21 *Correspondence & Lead contact: 22 Kenji Inaba, Institute of Multidisciplinary Research for Advanced Materials, Tohoku 23 University, Katahira 2-1-1, Aoba-ku, Sendai, Miyagi 980-8577, Japan 24 E-mail: kenji.inaba.a1@tohoku.ac.jp 25 Tel: +81-22-217-5604 26 Fax: +81-22-217-5605 27 ORCID: 0000-0001-8229-0467 28 Running title: Nascent-chain disulfide bond formation 29 30 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 2 Abstract 31 The mammalian endoplasmic reticulum (ER) harbors more than 20 members of 32 the protein disulfide isomerase (PDI) family that act to maintain proteostasis. 33 Herein, we developed an in vitro system for directly monitoring PDI- or 34 ERp46-catalyzed disulfide bond formation in ribosome-associated nascent chains 35 (RNC) of human serum albumin. The results indicated that ERp46 more efficiently 36 introduced disulfide bonds into nascent chains with short segments exposed outside 37 the ribosome exit site than PDI. Single-molecule analysis by high-speed atomic 38 force microscopy further revealed that PDI binds nascent chains persistently, 39 forming a stable face-to-face homodimer, whereas ERp46 binds for a shorter time 40 in monomeric form, indicating their different mechanisms for substrate 41 recognition and disulfide bond introduction. Similarly to ERp46, a PDI mutant 42 with an occluded substrate-binding pocket displayed shorter-time RNC binding 43 and higher efficiency in disulfide introduction than wild-type PDI. Altogether, 44 ERp46 serves as a more potent disulfide introducer especially during the early 45 stages of translation, whereas PDI can catalyze disulfide formation in RNC when 46 longer nascent chains emerge out from ribosome. 47 48 Keywords 49 nascent chain, protein disulfide isomerase, ERp46, disulfide bond, co-translational 50 folding, high-speed atomic force microscopy, ER proteostasis 51 52 53 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 3 Introduction 54 Over billions of years of evolution, living organisms have developed ingenious 55 mechanisms to promote protein folding (Hartl et al, 2011). The oxidative network 56 catalyzing protein disulfide bond formation in the endoplasmic reticulum (ER) is a 57 prime example. While canonical protein disulfide isomerase (PDI) and ER 58 oxidoreductin-1 (Ero1) were previously postulated to constitute a primary disulfide 59 bond formation pathway (Araki & Inaba, 2012; Mezghrani et al, 2001; Tavender & 60 Bulleid, 2010), more than 20 different PDI family enzymes and multiple PDI oxidases 61 besides Ero1 have recently been identified in the mammalian ER, suggesting the 62 development of highly diverse oxidative networks in higher eukaryotes (Nguyen et al, 63 2011; Schulman et al, 2010; Tavender et al, 2010). Each PDI family enzyme is likely to 64 play a distinct role in catalyzing the oxidative folding of different substrates, 65 concomitant with some functional redundancy, leading to the efficient production of a 66 wide variety of secretory proteins with multiple disulfide bonds (Bulleid & Ellgaard, 67 2011; Okumura et al, 2015; Sato & Inaba, 2012). 68 Our previous in vitro studies using model substrates such as reduced and 69 denatured bovine pancreatic trypsin inhibitor (BPTI) and ribonuclease A (RNase A) 70 demonstrated that different PDI family enzymes participate in different stages of 71 oxidative protein folding, resulting in the accelerated folding of native enzymes (Kojima 72 et al, 2014; Sato et al, 2013). Multiple PDI family enzymes cooperate to synergistically 73 increase the speed and fidelity of disulfide bond formation in substrate proteins. 74 However, whether mechanistic insights gained by in vitro experiments using full-length 75 substrates are applicable to real events of oxidative folding in the ER remains an 76 important question. Indeed, some previous works demonstrated that newly synthesized 77 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 4 polypeptide chains undergo disulfide bond formation and isomerization 78 co-translationally, presumably via catalysis by specific PDI family members (Kadokura 79 et al, 2020; Molinari & Helenius, 1999; Robinson & Bulleid, 2020; Robinson et al, 80 2020; Robinson et al, 2017). Furthermore, nascent chains play important roles in their 81 own quality control by modulating the translation speed to increase the yield of native 82 folding; if a nascent chain fails to fold or complete translation, then the resultant 83 aberrant ribosome-nascent chain complexes are degraded or destabilized (Buhr et al, 84 2016; Chadani et al, 2017; Matsuo et al, 2017). These observations suggest that 85 understanding real events of oxidative protein folding in cells requires systematic 86 analysis of how PDI family enzymes act on nascent polypeptide chains during synthesis 87 by ribosomes. 88 To this end, we herein developed an experimental system for directly 89 monitoring disulfide bond formation in ribosome-associated human serum albumin 90 (HSA) nascent chains of different lengths from the N-terminus. The resultant 91 ribosome-nascent chain complexes (RNCs) were reacted with two ubiquitously 92 expressed PDI family members, ER-resident protein 46 (ERp46) and canonical PDI. 93 These two enzymes were previously shown to have distinct roles in catalyzing oxidative 94 protein folding: ERp46 engages in rapid but promiscuous disulfide bond introduction 95 during the early stages of folding, while PDI serves as an effective proofreader of 96 non-native disulfides during the later stages (Kojima et al., 2014; Sato et al., 2013). The 97 subsequent maleimidyl polyethylene glycol (mal-PEG) modification of free cysteines 98 and Bis-Tris (pH7.0) PAGE analysis enabled us to detect the oxidation status of the 99 HSA nascent chains conjugated with transfer RNA (tRNA). Using high-speed atomic 100 force microscopy (HS-AFM), we further visualized PDI and ERp46 acting on the RNCs 101 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 5 at the single-molecule level. Collectively, the results indicated that although both ERp46 102 and PDI could introduce a disulfide bond into the ribosome-associated HSA nascent 103 chains, they demanded different lengths of the HSA segment exposed outside the 104 ribosome exit site, and displayed different mechanisms of action against the RNC. The 105 present systematic in vitro study using RNC containing different lengths of HSA 106 nascent chains mimics co-translational disulfide bond formation in the ER, and the 107 results provide a framework for understanding the mechanistic basis of oxidative 108 nascent-chain folding catalyzed by PDI family enzymes. 109 110 Results 111 The efficiency of disulfide bond introduction into HSA nascent chains by 112 PDI/ERp46 113 To investigate whether PDI family enzymes can introduce disulfide bonds into a 114 substrate during translation, we first prepared RNCs in vitro. For this purpose, we made 115 use of a cell-free protein translation system reconstituted with eukaryotic elongation 116 factors 1 and 2, eukaryotic release factors 1 and 3 (eRF1 and eRF3), aminoacyl-tRNA 117 synthetases, tRNAs, and ribosome subunits, developed previously by Imataka and 118 colleagues (Machida et al, 2014). HSA was chosen as a model substrate for the 119 following reasons. Firstly, the three-dimensional structure of HSA has been solved at 120 high resolution (Sugio et al, 1999), providing information on the exact location of 17 121 disulfide bonds in its native structure. Secondly, native-state HSA contains an unpaired 122 cysteine, Cys34, near the N-terminal region, which has potential to form a non-native 123 disulfide bond with one of the subsequent cysteines, serving as a good indicator of 124 whether a non-native disulfide is introduced by ERp46 or PDI during the early stage of 125 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 6 translation. Thirdly, overall conformation and kinetics of disulfide bond regeneration 126 were characterized for reduced full-length HSA (Lee & Hirose, 1992), which is 127 beneficial for discussing similarities and differences in post- and co-translational 128 oxidative folding. Forth, no N-glycosylation sites are contained in the first 95 amino 129 acids of HSA, implying that HSA nascent chains synthesized by the cell-free system are 130 equivalent to those synthesized in the ER in regard to N-glycosylation. Finally, the 131 involvement of PDI family enzymes in intracellular HSA folding has been demonstrated 132 (Koritzinsky et al, 2013; Rutkevich et al, 2010; Rutkevich & Williams, 2012), ensuring 133 the physiological relevance of the present study. 134 To stall the translation of HSA at specified sites, a uORF2 arrest sequence 135 (Alderete et al, 1999) was inserted into appropriate sites of the expression plasmid (Fig 136 1A). We first prepared two versions of the RNC containing different lengths of HSA 137 nascent chains: RNC 69-aa and RNC 82-aa. Since the ribosome exit tunnel 138 accommodates a polypeptide chain of ~30 amino acid (aa) residues (Zhang et al, 2013), 139 the N-terminal 57 residues of HSA (excluding the N-terminal 6-aa pro-sequence) are 140 predicted to be exposed outside the ribosome exit tunnel in RNC 69-aa, including 141 Cys34 and Cys53 (Fig 1A). In the RNC 82-aa construct, the N-terminal 70 residues of 142 HSA, including Cys62 as well as Cys34/Cys53, are predicted to emerge from the 143 ribosome (Fig 1A). Notably, Cys53 and Cys62 form a native disulfide bond, whereas 144 Cys34 is unpaired in the native structure of HSA domain I. 145 When RNC 69-aa was employed as a substrate, neither PDI nor ERp46 could 146 efficiently introduce a disulfide bond into the nascent chain (Fig 1C and 1D). However, 147 both enzymes introduced a disulfide bond into RNC 82-aa with higher efficiency than 148 into RNC 69-aa (Fig 1E and 1F), suggesting that the length of the exposed HSA 149 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 7 segment or the distance of a pair of cysteines from the ribosome exit site is critical for 150 disulfide bond introduction by PDI and ERp46. For either construct, a faint band was 151 seen between the bands of ‘no SS’ and ‘1 SS’, and this band was even fainter without 152 GSH/GSSG (the second lane from the left) and had a tendency to get stronger at late 153 time points. Presumably, this band represents a species in which one of free cysteines is 154 glutathionylated, and the species increased gradually in the course of the reaction. 155 Of note, ERp46 introduced a disulfide bond into RNC 82-aa at a much higher 156 rate than PDI, indicating that ERp46 serves as a more competent disulfide bond 157 introducer to RNCs than PDI (Fig 1F). The remarkable difference in disulfide bond 158 introduction efficiency by these two enzymes seems unlikely to be explained simply by 159 the different number of redox-active Trx-like domains in PDI (two) and ERp46 (three) 160 (Fig 1B). Also, the redox states in the presence of 1 mM GSH and 0.2 mM GSSG are 161 similar between these two enzymes (Fig EV1A and EV1B), suggesting their comparable 162 redox potentials. Thus, the different ability of ERp46 and PDI to introduce a disulfide 163 into 82-aa is likely caused by other factors such as different structural features and 164 different mechanism of substrate recognition, as discussed below. 165 Next, to identify which cysteine pair forms a disulfide bond in RNC 82-aa, we 166 constructed three cysteine mutants in which either Cys34, Cys53, or Cys62 was mutated 167 to alanine (Fig 2A). The assays using the mutants showed that whereas PDI was unable 168 to introduce a disulfide bond into RNC 82-aa C34A and C53A (Fig 2B, top and middle), 169 the enzyme introduced a Cys34-Cys53 non-native disulfide bond into RNC 82-aa C62A 170 (Fig 2B, bottom), at almost the same rate as the generation of the ‘1 SS’ species in 82-aa 171 (Fig 1E and 1F). PDI could not introduce a Cys53-Cys62 native disulfide bond, 172 presumably because this cysteine pair is located too close to the ribosome exit site (see 173 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 8 also Fig 3B and 3C). Conversely, the slow but possible formation of a Cys34-Cys53 174 non-native disulfide in 82 aa by PDI suggests that the distance between a cysteine pair 175 of interest and the ribosome exit site is key to allowing the enzyme to catalyze disulfide 176 bond introduction into RNCs. Considering the different locations of the Cys34-Cys53 177 and Cys53-Cys62 pairs on RNC 82-aa, a distance of ~18 residues from the ribosome 178 exit site appears to be necessary for the PDI-catalyzed reaction (see also the 179 Discussion). 180 In contrast to PDI, ERp46 could introduce a native disulfide bond into RNC 181 82-aa C34A (Fig 2C, top). Like PDI, ERp46 also introduced a non-native disulfide bond 182 between Cys34 and Cys53 into RNC 82-aa C62A, but its efficiency was lower than that 183 of a Cys53-Cys62 native disulfide (Fig 2C, bottom). No disulfide bond was formed 184 between Cys34 and Cys62 by either ERp46 or PDI (Fig 2C, middle), presumably due to 185 the considerable spatial separation of these two cysteines. Based on these results, we 186 concluded that for efficient disulfide bond introduction into RNCs, ERp46 requires an 187 intermediary polypeptide segment with a shorter distance between a cysteine pair of 188 interest and the ribosome exit site than PDI. We here note that ERp46-catalyzed 189 generation of the ‘1 SS’ species was faster in 82-aa than in 82-aa C34A (Fig 1F and 2C). 190 This observation may suggest the occurrence of Cys34-mediated disulfide bond 191 formation in 82-aa, namely, the formation of a Cys34-Cys53 non-native disulfide and, 192 possibly, its rapid isomerization to a Cys53-Cys62 native disulfide. 193 194 Accessibility of PDI/ERp46 to cysteines on the ribosome-HSA nascent chain 195 complex 196 To examine the accessibility of PDI and ERp46 to Cys residues on RNC 82-aa, we 197 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 9 constructed three RNC 82-aa mono-Cys mutants in which either Cys34, Cys53, or 198 Cys62 on the HSA nascent chain was retained, and investigated whether a mixed 199 disulfide could be formed between the RNC 82-aa mutant and a trapping mutant of PDI 200 or ERp46 in which all CXXC redox-active sites were mutated to CXXA. Both PDI and 201 ERp46 formed a mixed disulfide bond with Cys34 and Cys53 on RNC 82-aa with high 202 probability, but covalent linkages to Cys62 were marginal (Fig 2D and 2E). The results 203 suggest that the redox-active sites of PDI and ERp46 could gain access to Cys34 and 204 Cys53, but to a much lesser extent, to Cys62, probably due to steric collision with the 205 ribosome. Nevertheless, ERp46 efficiently introduced a native disulfide bond between 206 Cys53 and Cys62 (Fig 2C, top), presumably because ERp46 first attacked Cys53 on the 207 HSA nascent chain, and the resultant mixed disulfide was subjected to nucleophilic 208 attack by Cys62 (Fig 2F, right). By contrast, the mixed disulfide between PDI and 209 Cys53 on the HSA nascent chain seems unlikely to be attacked by Cys62, probably due 210 to steric collision between PDI and the ribosome (Fig 2F, left). In line with this idea, 211 PDI adopts a U-like overall conformation with restricted movements of four thioredoxin 212 (Trx)-like domains (Tian et al, 2006; Wang et al, 2012), whereas ERp46 forms a highly 213 flexible V-shape conformation composed of three Trx-like domains and two long (~20 214 aa) interdomain linkers (Kojima et al., 2014). 215 216 Correlations between cysteine accessibility and the efficiency of disulfide bond 217 introduction by PDI/ERp46 218 Based on the results presented above, we believe that the distance between cysteines of 219 interest and the ribosome exit site is critical for efficient disulfide introduction by PDI 220 and ERp46. To test this hypothesis, we increased the distance of the Cys53-Cys62 pair 221 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 10 from the ribosome exit site by inserting an extended polypeptide segment composed of 222 [SG]5 or [SG]10 repeat immediately after Cys62 on RNC 82-aa C34A (Fig 3A), and 223 investigated the effects of the insertions on the efficiency of disulfide bond formation. 224 While PDI was unable to introduce a Cys53-Cys62 native disulfide into RNC 82-aa 225 C34A (Fig 2B, top), insertion of a [SG]5 repeat allowed this reaction, and nearly 70% of 226 82-aa C34A was disulfide-bonded within a reaction time of 360 s (Fig 3B, upper and 227 3C). The insertion of a longer repeat [SG]10 further promoted disulfide bond formation 228 (Fig 3B, lower and 3C). 229 A similar enhancement following [SG] repeat insertion was observed for 230 ERp46-catalyzed reactions. However, ERp46 exhibited a striking difference from PDI: 231 insertion of a [SG]5 repeat was long enough to introduce a Cys53-Cys62 native disulfide 232 into RNC 82-aa C34A within 15 s, and insertion of a [SG]10 repeat gave only a small 233 additional enhancement (Fig 3D and 3E). Thus, the presence of a disordered or 234 extended segment of ~18 aa (Asp63Phe70 + [SG]5 repeat) between a cysteine pair of 235 interest and the ribosome exit site was necessary and sufficient for ERp46 to generate a 236 Cys53-Cys62 disulfide rapidly, whereas PDI required a longer segment of ~28 aa 237 (Asp63Phe70 + [SG]10 repeat) in this intermediary region for efficient introduction of 238 a Cys53-Cys62 disulfide. Thus, ERp46 seems to be more capable of introducing a 239 disulfide bond near the ribosome exit site than PDI. In other words, ERp46 likely has 240 the higher potential to introduce a disulfide bond into the HSA nascent chain during the 241 earlier stages of translation than PDI. 242 To verify that Cys53-Cys62 disulfide formation facilitated by [SG]10 repeat 243 insertion was ascribed to higher accessibility of PDI/ERp46 to Cys62, we again 244 investigated mixed disulfide bond formation between trapping mutants of PDI/ERp46 245 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 11 and each cysteine on RNC 82-aa following [SG]10 repeat insertion. Both PDI and 246 ERp46 formed a mixed disulfide with all cysteines including Cys62 (Fig 3F and 3G), 247 indicating that there is a correlation between the accessibility of PDI/ERp46 to a target 248 pair of cysteines and the efficiency of disulfide bond introduction by the enzymes. 249 250 Disulfide bond introduction into a longer HSA nascent chain by PDI/ERp46 251 In addition to the [SG]-repeat insertion, we examined the effect of natural HSA 252 sequence extension on PDI- or ERp46-mediated disulfide formation. For this purpose, 253 we prepared RNC 95-aa in which the N-terminal 83 amino acids of HSA (excluding the 254 N-terminal 6-aa pro-sequence), including Cys34, Cys53, Cys62, and Cys75, are 255 predicted to emerge from ribosome (Fig 4A). With this construct, however, we had a 256 technical problem with detection of the reduced species, because mal-PEG modification 257 of four cysteines greatly diminished the gel-to-membrane transfer efficiency. We 258 overcame this problem by using photo-cleavable mal-PEG (PEG-PCMal) and 259 irradiating UV light to the SDS gel after the gel electrophoresis and before the 260 membrane transfer. 261 Consequently, we observed both PDI and ERp46 introduced a disulfide bond 262 into 95-aa (Fig 4B), but the efficiency was slower than that into 82-aa (Fig 1E and 1F), 263 although a longer polypeptide chain is exposed outside the ribosome exit site in RNC 264 95-aa. Thus, the effect of natural sequence extension was opposite to that of [SG]-repeat 265 insertion. Formation of some higher-order structure or exposure of another cysteine may 266 somehow prevent PDI and ERp46 from introducing a disulfide bond into RNC 95-aa. 267 Thus, a longer polypeptide chain exposed outside ribosome does not always lead to a 268 higher disulfide formation rate. Rather, it is suggested that PDI and ERp46 can 269 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 12 introduce a disulfide bond into a nascent chain with higher efficiency when the 270 necessary and minimum length emerges out. 271 Given that four cysteines are exposed outside the ribosome in RNC 95-aa, we 272 next investigated whether PDI and ERp46 can catalyze nascent-chain disulfide 273 formation additionally or synergistically. The mixture of PDI and ERp46 generated a ‘1 274 SS’ species, but not a ‘2 SS’ species, like PDI or ERp46 alone (Fig 4B and 4C). Notably, 275 the presence of PDI inhibited ERp46-mediated disulfide formation, possibly due to its 276 competition with ERp46 for binding to RNC 95-aa. Thus, neither additional nor 277 synergistic effect was observed (Fig 4B and 4C). In this regard, our previous 278 observation for the synergistic cooperation of PDI and ERp46 in RNase A oxidative 279 folding (Sato et al., 2013) was not true for the ribosome-associated HSA nascent chain. 280 281 Single-molecule analysis of ERp46 by high-speed atomic force microscopy 282 To explore the mechanisms by which PDI and ERp46 recognize and act on RNCs at the 283 molecular level, we employed HS-AFM (Kodera et al, 2010; Noi et al, 2013; Okumura 284 et al, 2019; Uchihashi et al, 2018). While our previous HS-AFM analysis revealed that 285 PDI molecules form homodimers in the presence of unfolded substrates (Okumura et al., 286 2019), the structure and dynamics of ERp46 have not been analyzed using this 287 experimental approach. Therefore, we first observed ERp46 molecules alone by 288 immobilizing the N-terminal His-tag on a Co 2+ -coated mica surface. AFM images 289 revealed various overall shapes of ERp46 (Fig 5A), and some particle images clearly 290 demonstrated the presence of three thioredoxin (Trx)-like domains in ERp46 (Fig 5A, 291 left). To assess the overall structures of ERp46, we calculated the circularity of each 292 molecule and performed statistical analysis (Uchihashi et al., 2018). Circularity is a 293 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 13 measure of how circular the outline of an observed molecule is, defined by the equation 294 4S/L 2 , where L and S are the contour length of the outline and the area surrounded by 295 the outline, respectively. Thus, a circularity of 1.0 indicates a perfect circle, and values 296 <1 indicate a more extended conformation. 297 Statistical analysis based on circularity classified randomly chosen ERp46 298 particles into two major groups: opened V-shape and round/compact O-shape (Fig 5A). 299 Histograms with Gaussian fitting curves indicated that ~80% of ERp46 molecules 300 adopted V-shape conformations while ~20% adopted O-shape conformations (Fig 5B). 301 There was no large difference in height between these two conformations, suggesting 302 that the three Trx-like domains of ERp46 are arranged within the same plane in either 303 conformation. Successive AFM images acquired every 100 ms revealed that ERp46 304 adopted an open V-shape conformation during nearly 75% of the observation time, 305 while the protein also adopted an O-shape conformation occasionally (Fig 5C, 5D, 5E 306 and Movie EV1). The histogram calculated from the time-course snapshots was similar 307 to that calculated from images of 200 molecules at a certain timepoint (Fig 5B and 5E). 308 Importantly, structural insights gained by HS-AFM analysis are in good agreement with 309 those from small-angle X-ray scattering (SAXS) analysis: both analyses consistently 310 indicate the coexistence of a major population of molecules with an open V-shape and a 311 minor population with a compact O-shape (Kojima et al., 2014). 312 313 Single-molecule analysis of PDI/ERp46 acting on 82-aa RNC by HS-AFM 314 PDI and ERp46 are predicted to bind RNCs transiently during disulfide bond 315 introduction, but transient interactions would make it harder to observe and analyze the 316 mode of PDI/ERp46 binding to RNCs. More practically, at least 5 mins are required to 317 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 14 prepare for starting HS-AFM measurements after adding PDI or ERp46 to RNCs 318 immobilized onto a mica surface. If we employed RNCs containing natural HSA 319 sequences, PDI or ERp46 would complete nascent-chain disulfide formation during this 320 setup time. We therefore constructed HSA 82-aa RNC with Cys34, Cys53, and Cys62 321 mutated to Ala (hereafter referred to as 82-aa CA RNC), with the intension of trapping 322 RNC molecules bound to PDI/ERp46. After testing several RNC immobilization 323 methods, we chose to immobilize RNC on a Ni 2+ -coated mica surface. As a result, most 324 RNC molecules were observed to lie sideways on the mica surface, while nascent chains 325 were difficult to visualize, probably due to their flexible and extended structural nature 326 (Fig 6A). 327 When oxidized PDI or ERp46 were added to onto the RNC-immobilized mica 328 surface, PDI/ERp46-like particles were observed in the peripheral region of ribosomes. 329 When no-chain RNC (NC-RNC), comprising only the N-terminal FLAG tag and the 330 subsequent uORF2 but no segment from HSA, was immobilized on the mica surface, 331 far fewer particles were observed near RNCs (within 25 Å from the outline of 332 ribosomes) by HS-AFM despite the presence of PDI/ERp46 (Fig EV2A and EV2B). 333 These results confirm that we successfully observed PDI/ERp46 molecules acting on 334 HSA nascent chains associated with ribosomes. 335 Notably, the HS-AFM analysis revealed that PDI bound RNCs in both 336 monomeric and dimeric forms at an approximate ratio of 7:3 (Fig 6B), as reported 337 previously for reduced and denatured BPTI and RNase A as substrates (Okumura et al., 338 2019). Thus, PDI likely recognizes HSA nascent chains in a similar manner to 339 full-length substrates. Statistical analysis of RNC binding rates revealed that whereas 340 most monomeric PDI molecules (52/55 molecules) bound RNC for 10 s or shorter (Fig 341 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 15 6D, Fig EV3A and Movie EV2), most homodimeric PDI molecules (17/19 molecules) 342 bound RNC for 60 s or longer (Fig 6D, Fig EV3B and Movie EV3). By contrast, ERp46 343 molecules in the periphery of RNCs were only present in monomeric form (Fig 6C). 344 Importantly, nearly 20% (12/59 molecules) of ERp46 molecules bound RNC for 10 to 345 20 s (Fig 6D, Fig EV3C and Movie EV4), while a smaller portion (8/59 molecules) 346 bound RNC for ~60 s (Fig 6D). It is also notable that significant portion of PDI and 347 ERp46 molecules bound ribosomes for <5 s. This may indicate that PDI/ERp46 binds or 348 approaches RNCs only transiently possibly via diffusion, without tight interactions. 349 The histogram of the distance between the edge of ribosomes and the center of 350 ribosome-neighboring PDI/ERp46 molecules indicated that both PDI and ERp46 bound 351 RNCs at positions ~16 nm distant from ribosomes with a single-Gaussian distribution 352 with a half width of ~11 nm (Fig 6E), suggesting that both enzymes recognize similar 353 sites of the HSA nascent chain. Given that the distance between adjacent amino acids is 354 approximately 3.5 Å along an extended strand, Cys34, Cys53, and Cys62 are calculated 355 to be 130 Å, 63 Å, and 35 Å distant from the ribosome exit site, respectively. The 356 distributions of PDI and ERp46 molecules bound to RNC 82-aa seem consistent with 357 their accessibility to Cys34 and Cys53, but not to Cys62, as revealed by their mixed 358 disulfide formation with RNC 82-aa (Fig 2D and E). 359 360 Role of the PDI hydrophobic pocket in oxidation of the HSA nascent chain 361 It is widely known that the PDI b’ domain contains a hydrophobic pocket that acts as a 362 primary substrate-binding site (Klappa et al, 1998). To examine the involvement of the 363 hydrophobic pocket in PDI-catalyzed disulfide bond formation in the HSA nascent 364 chain, we mutated I289, one of the central residues that constitute the hydrophobic 365 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 16 pocket, to Ala, and compared the efficiency of disulfide bond introduction into RNC 366 82-aa between wild-type (WT) and mutant I289A proteins. In this mutant, the x-linker 367 flanked by b’ and a’ domains tightly binds the hydrophobic pocket, unlike in WT, 368 thereby preventing PDI from tightly binding an unfolded substrate (Bekendam et al, 369 2016; Nguyen et al, 2008). ERp57, another primary member of the PDI family, has a 370 U-shape domain arrangement similar to PDI, but does not contain the hydrophobic 371 pocket in the b’ domain. For comparison, we also monitored ERp57-catalyzed disulfide 372 introduction into RNC 82-aa. 373 Despite the occlusion or lack of the hydrophobic substrate-binding pocket, both 374 PDI I289A and ERp57 were found to introduce a disulfide bond into RNC 82-aa at a 375 higher rate than PDI WT (Fig 7A and B). This result suggests that the hydrophobic 376 pocket is involved in binding the HSA nascent chain, but this binding appears to rather 377 slow down disulfide introduction into a nascent chain. 378 To further explore the mechanism by which PDI I289A introduced a disulfide 379 bond at a faster rate than PDI WT, we analyzed its binding to RNC using HS-AFM. The 380 analysis revealed that, while nearly one-third of PDI I289A molecules formed dimers in 381 the presence of RNC 82-aa like PDI WT, the mutant dimers bound RNC for a shorter 382 time than the WT dimers (Fig 7C and Movie EV6). Thus, the RNC-binding time of PDI 383 I289A showed similar distribution to that of ERp46 (Fig 7D and Movies EV5 and EV6), 384 which seems consistent with the higher disulfide introduction efficiency of PDI I289A 385 than that of PDI WT. PDI I289A also bound RNCs at positions ~16 nm distant from 386 ribosome with a single-Gaussian distribution (Fig 7E), suggesting that PDI I289A 387 recognizes similar sites of the HSA nascent chain as PDI and ERp46. 388 389 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 17 Discussion 390 A number of studies have recently investigated co-translational oxidative 391 folding in the ER (Kadokura et al., 2020; Robinson et al., 2020; Robinson et al., 2017). 392 The present study showed that while both PDI and ERp46 can introduce a disulfide 393 bond into a nascent chain co-translationally, ERp46 catalyzes this reaction more 394 efficiently than PDI and requires a shorter nascent chain segment exposed outside the 395 ribosome exit. Thus, ERp46 appears to be capable of introducing a disulfide bond into a 396 nascent chain during the earlier stages of translation than PDI. The efficient introduction 397 of a Cys53-Cys62 native disulfide on RNC 82-aa by ERp46 (Fig 2) suggests that a 398 separation of ~8 aa residues between a C-terminal cysteine on a nascent chain and the 399 ribosome exit site (i.e., residues 63-70) is sufficient for ERp46 to catalyze this reaction 400 (Fig 8). When a nascent chain was elongated by the insertion of [SG]-repeat sequences, 401 PDI could also introduce the native disulfide bond into RNCs to some extent (Fig 3B 402 and 3C). Thus, PDI appears to act on a nascent chain to introduce a disulfide bond when 403 the distance between a C-terminal cysteine on a nascent chain and the ribosome exit site 404 reaches ~18 aa residues (i.e., residues 63-70 + [SG]5 repeat; Fig 8). 405 Disulfide bond formation in partially ER-exposed nascent chains was indeed 406 observed with the ADAM10 disintegrin domain, which has a dense disulfide bonding 407 pattern and little defined structure (Robinson et al., 2020). Thus, disulfide bond 408 formation seems to be allowed before the higher order structure is defined in a nascent 409 chain. This could be the case with a Cys34-Cys53 nonnative disulfide and a 410 Cys53-Cys62 native disulfide on RNC 82-aa, since the N-terminal 82-residue HSA 411 fragment alone is unlikely to fold to a globular native-like structure though the fragment 412 of residue 35 to 56 is predicted to form an -helix according to the HSA native structure. 413 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 18 In contrast, some proteins including 2-microglobulin (2M) and prolactin are shown to 414 form disulfide bonds only after a folding domain is fully exposed to the ER or a 415 polypeptide chain is released from ribosome, suggesting their folding-driven disulfide 416 bond formation. Notably, PDI binds 2M when the N-terminal ~80 residues of 2M are 417 exposed to the ER, and completes disulfide bond introduction at the even later stages of 418 translation (Robinson et al., 2017). Thus, PDI has been demonstrated to engage in 419 disulfide bond formation during late stages of translation or after translation in the ER. 420 Regarding mechanistic insight, the present HS-AFM analysis visualized PDI 421 and ERp46 acting on nascent chains at the single-molecule level. We found that PDI 422 forms a face-to-face homodimer that binds a nascent chain, as is the case with reduced 423 and denatured full-length substrates (Okumura et al., 2019). On the other hand, ERp46 424 maintains a monomeric form while binding a nascent chain. Interestingly, the PDI dimer 425 binds a nascent chain much more persistently than the PDI monomer and ERp46, 426 suggesting that the PDI dimer holds a nascent chain tightly inside its central 427 hydrophobic cavity. In agreement with this observation, a hydrophobic-pocket mutant 428 (I289A) of PDI bound a nascent chain for shorter time and introduced a disulfide bond 429 into a nascent chain more rapidly than the WT enzyme, as was the case with ERp46. In 430 this context, PDI competed with ERp46 for acting on RNC 95-aa, and thereby inhibited 431 ERp46-mediated disulfide introduction (Fig 4 and Fig 8). Thus, PDI family enzymes do 432 not always work synergistically to accelerate oxidative protein folding, but may 433 possibly inhibit each other during co-translational disulfide bond formation. 434 How the ER membrane translocon channel is involved in co-translational 435 oxidative folding catalyzed by PDI family enzymes remains an important question. It is 436 possible that PDI and ERp46 form a supramolecular complex with ribosomes and the 437 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 19 Sec61 translocon channel via a nascent chain. Indeed, PDI was previously identified as 438 a luminal protein that was in close contact with translocating nascent chains (Klappa et 439 al, 1995). Additionally, the oligosaccharyltransferase complex (Harada et al, 2009) and 440 an ER chaperone calnexin (Farmery et al, 2000) have been reported to interact with the 441 ribosome-associated Sec61 channel to catalyze N-glycosylation and folding of nascent 442 chains in the ER, respectively. In this regard, it will be interesting to examine the close 443 co-localization of PDI/ERp46 with the Sec61 channel in the presence or absence of 444 nascent chains in transit into the ER lumen by super-resolution microscopy or other 445 tools. Systematic studies with a wider range of substrates of different lengths from the 446 ribosome exit site and different numbers of cysteine pairs, and with other PDI family 447 members potentially having different functional roles, will provide further mechanistic 448 and physiological insights into co-translational oxidative folding and protein quality 449 control in the ER. 450 451 Materials & Methods 452 Construction of HSA plasmids 453 DNA fragments encoding specific regions (69-aa, N-terminal pro-sequence 6-aa + the 454 subsequent 63-aa; 82-aa, N-terminal pro-sequence 6-aa + the subsequent 76-aa; 95-aa, 455 N-terminal pro-sequence 6-aa + the subsequent 89-aa) of HSA were amplified by PCR 456 with appropriate primers and inserted into the pUC-T7-HCV-FLAG-2A-uORF 457 expression plasmid, as described in Machida et al. (2014). The amplified fragments 458 were replaced with the 2A region to generate pUC-T7-HCV-FLAG-HSA (69-aa or 459 82-aa)-uORF2. RNC 82-aa C34A/C53A/C62A and mono-Cys mutants were constructed 460 using the QuikChange method with appropriate primers (Table 1). RNC 82-aa C34A 461 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 20 with [SG]5 or [SG]10 repeats were constructed by the Prime STAR MAX (Takara Bio 462 Inc., Japan) method using appropriate primers (Table 1). 463 464 Expression and purification of PDI and ERp46 465 Overexpression and purification of human PDI and ERp46, and their mutants, were 466 performed as described previously (Kojima et al., 2014; Sato et al., 2013). An ERp46 467 trapping mutant with a CXXA sequence in all Trx-like domains was constructed by the 468 QuikChange method using appropriate sets of primers. 469 470 Preparation of RNCs using a translation system reconstituted with human factors 471 A cell-free translation system was reconstituted with eEF1 (50 M), eEF2 (1 M), 472 eRF1/3 (0.5 M), aminoacyl-tRNA synthetases (0.15 g/l), tRNAs (1 g/l), 40S 473 ribosomal subunit (0.5 M), 60S ribosomal subunit (0.5 M), PPA1 (0.0125 M), 474 amino acids mixture (0.1 mM) and T7 RNA polymerase (0.015 g/l) (Machida et al., 475 2014). We added 1.0 µL template plasmid (0.5 mg/mL) into 19 µL of this cell-free 476 system, and the mixture was incubated for at least 34.5 h at 32C. After HKMS buffer 477 (comprising 25 mM HEPES-KOH (pH 7.0), 150 mM KCl, 5 mM Mg(OAc)2, and 1.0 M 478 sucrose) was added, samples were ultra-centrifuged at 100,000 g overnight at 4 C to 479 recover the RNC as a pellet. After removing the supernatant, pellets were resuspended 480 in HKM buffer comprising 25 mM HEPES-KOH (pH 7.0), 150 mM KCl, and 5 mM 481 Mg(OAc)2. 482 483 Monitoring PDI- and ERp46-mediated disulfide bond introduction into RNCs 484 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 21 The RNC suspension prepared as described above was mixed with PDI or ERp46 (0.1 485 M each) and glutathione/oxidized glutathione (GSH/GSSG; 1.0 mM:0.2 mM; 486 NACALAI TESQUE, INC., Japan). Aliquots were collected after incubation at 30C for 487 the indicated times, and reactions were quenched with mal-PEG 5K (2 mM; NOF 488 CORPORATION, Japan) for RNC 69-aa and RNC 82-aa. After cysteine alkylation at 489 room temperature for 20 min, samples were separated by 12% Bis-Tris (pH7.0) PAGE 490 (Thermo Fisher Scientific K.K., Japan) in the presence of the reducing reagent 491 -mercaptoethanol -ME; 10% v/v; NACALAI TESQUE, INC., Japan). After 492 transferring onto a polyvinylidene fluoride (PVDF) membrane (Merck KGaA, 493 Darmstadt, Germany), bands on the membrane were visualized using Chemi-Lumi One 494 Ultra (NACALAI TESQUE, INC., Japan) and a ChemiDocTM Imaging System 495 (Bio-Rad Laboratories, Inc., CA, USA). Signal intensity was quantified using ImageLab 496 software (Bio-Rad Laboratories, Inc., CA, USA). 497 For RNC 95-aa, reactions were quenched with PEG-PCMal (Dojindo, Japan). 498 After cysteine alkylation at room temperature for 20 min, samples were separated by 499 10% Bis-Tris (pH7.0) PAGE (Thermo Fisher Scientific K.K., Japan) in the presence of 500 the reducing reagent -ME10% v/v;). After gel electrophoresis, the gel was subjected 501 to UV irradiation (302 nm, 8 W) for 30 min. The subsequent procedures were the same 502 as described above. 503 504 Monitoring intermolecular disulfide bond linkage between PDI/ERp46 and 505 ribosome-HSA nascent chain complexes 506 To detect the intermolecular disulfide bond linkage between PDI/ERp46 and the 507 ribosome-HSA nascent chain complex, we employed RNC 82-aa mono-Cys mutants 508 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 22 retaining one of Cys34, Cys53, or Cys62. The RNC suspension prepared as described 509 above was mixed with a PDI or ERp46 trapping mutant (1 M each) and diamide (100 510 µM). Aliquots were collected after incubation at 30C for 10 min, and reactions were 511 quenched with N-ethylmaleimide (2 mM; NACALAI TESQUE, INC., Japan). Samples 512 were analyzed by Nu-PAGE and western blotting as described above. 513 514 High-speed atomic force microscopy imaging 515 The structural dynamics of PDI and ERp46 were probed using a high-speed AFM 516 instrument developed by Toshio Ando’s group (Kanazawa University). Data acquisition 517 for ERp46 was performed as described previously (Okumura et al., 2019). Briefly, 518 His6-tagged ERp46 was immobilized on a Co 2+ -coated mica surface through the 519 N-terminal His-tag. To this end, a droplet (10 L) containing 1 nM ERp46 was loaded 520 onto the mica surface. After a 3 min incubation, the surface was washed with TRIS 521 buffer (50 mM TRIS-HCl pH7.4, 300 mM NaCl). Single-molecule imaging was 522 performed in tapping mode (spring constant, ~0.1 N/m; resonant frequency, 0.8–1 MHz; 523 quality factor in water, ~2) and analyzed using Kodec4.4.7.39 software developed by 524 Toshio Ando’s group (Kanazawa University). AFM observations were made in fixed 525 imaging areas (400 × 400 Å 2 ) at a scan rate of 0.1 s/frame. Each molecule was observed 526 separately on a single frame with the highest pixel setting (60 × 60 pixels). Cantilevers 527 (Olympus, Tokyo, Japan) were 6–7 m long, 2 m wide, and 90 nm thick. For AFM 528 imaging, the free oscillation amplitude was set to ~1 nm, and the set-point amplitude 529 was around 80% of the free oscillation amplitude. The estimated tapping force was <30 530 pN. A low-pass filter was used to remove noise from acquired images. The area of a 531 single ERp46 molecule in each frame was calculated using LabView 2013 (National 532 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 23 Instruments, Austin, TX, USA) with custom-made programs. 533 To observe the binding of PDI/ERp46 to RNCs by HS-AFM, RNCs were 534 immobilized on a Ni 2+ -coated mica surface via electrostatic interactions. To this end, a 535 droplet (10 L) containing RNCs was loaded onto the mica surface. After a 10 min 536 incubation, the surface was washed with HSA buffer comprising 25 mM HEPES-KOH 537 pH 7.0, 150 mM KCl, and 5 mM Mg(OAc)2. PDI/ERp46 lacking the N-terminal 538 His6-tag was added to the RNC-immobilized mica surface at a final concentration of 1 539 nM. Measurements were performed under the same conditions described above. 540 541 Acknowledgments 542 This work was supported by Grants-in-Aid for Scientific Research from MEXT to KI 543 (26116005 and 18H03978), the NAGASE Science Technology Foundation (K.I.) and 544 the MITSUBISHI Foundation (K.I.). This work was also supported by Grant-in-Aid for 545 JSPS Fellows (Grant Number 20J11932 to C.H.) and a Grant-in-Aid of Tohoku 546 University, Division for Interdisciplinary Advanced Research and Education (to C.H.). 547 548 Author contributions 549 C.H. and T.M. developed an experimental system for directly monitoring 550 co-translational disulfide bond formation. K.M. and H.I. developed and prepared 551 cell-free protein translation system reconstituted with human factors. C.H. prepared 552 various plasmids. C.H. and M.O. purified PDI and ERp46, and their mutants. C.H. and 553 K.N. performed HS-AFM measurements and analyses. C.H., K.N., M.O. and T.O. 554 discussed the results of HS-AFM. K.I. supervised the work. C.H. and K.N. prepared the 555 Figures. C.H. and K.I. wrote the manuscript. All of the authors discussed the results and 556 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 24 approved the manuscript. 557 558 Conflict of interests 559 We declare that there are no competing interests related to this work. 560 561 562 References 563 Alderete JP, Jarrahian S, Geballe AP (1999) Translational effects of mutations and 564 polymorphisms in a repressive upstream open reading frame of the human 565 cytomegalovirus UL4 gene. J Virol 73: 8330-8337 566 567 Araki K, Inaba K (2012) Structure, mechanism, and evolution of Ero1 family enzymes. 568 Antioxidants & redox signaling 16: 790-799 569 570 Bekendam RH, Bendapudi PK, Lin L, Nag PP, Pu J, Kennedy DR, Feldenzer A, Chiu J, 571 Cook KM, Furie B et al (2016) A substrate-driven allosteric switch that enhances PDI 572 catalytic activity. Nature communications 7: 12579 573 574 Buhr F, Jha S, Thommen M, Mittelstaet J, Kutz F, Schwalbe H, Rodnina MV, Komar 575 AA (2016) Synonymous Codons Direct Cotranslational Folding toward Different 576 Protein Conformations. Molecular cell 61: 341-351 577 578 Bulleid NJ, Ellgaard L (2011) Multiple ways to make disulfides. Trends in biochemical 579 sciences 36: 485-492 580 581 Chadani Y, Niwa T, Izumi T, Sugata N, Nagao A, Suzuki T, Chiba S, Ito K, Taguchi H 582 (2017) Intrinsic Ribosome Destabilization Underlies Translation and Provides an 583 Organism with a Strategy of Environmental Sensing. Molecular cell 68: 528-539.e525 584 585 Farmery MR, Allen S, Allen AJ, Bulleid NJ (2000) The role of ERp57 in disulfide bond 586 formation during the assembly of major histocompatibility complex class I in a 587 synchronized semipermeabilized cell translation system. The Journal of biological 588 chemistry 275: 14933-14938 589 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 25 590 Harada Y, Li H, Li H, Lennarz WJ (2009) Oligosaccharyltransferase directly binds to 591 ribosome at a location near the translocon-binding site. Proceedings of the National 592 Academy of Sciences of the United States of America 106: 6945-6949 593 594 Hartl FU, Bracher A, Hayer-Hartl M (2011) Molecular chaperones in protein folding 595 and proteostasis. Nature 475: 324-332 596 597 Kadokura H, Dazai Y, Fukuda Y, Hirai N, Nakamura O, Inaba K (2020) Observing the 598 nonvectorial yet cotranslational folding of a multidomain protein, LDL receptor, in the 599 ER of mammalian cells. Proceedings of the National Academy of Sciences of the United 600 States of America 117: 16401-16408 601 602 Klappa P, Freedman RB, Zimmermann R (1995) Protein disulphide isomerase and a 603 lumenal cyclophilin-type peptidyl prolyl cis-trans isomerase are in transient contact 604 with secretory proteins during late stages of translocation. Eur J Biochem 232: 755-764 605 606 Klappa P, Ruddock LW, Darby NJ, Freedman RB (1998) The b' domain provides the 607 principal peptide-binding site of protein disulfide isomerase but all domains contribute 608 to binding of misfolded proteins. The EMBO journal 17: 927-935 609 610 Kodera N, Yamamoto D, Ishikawa R, Ando T (2010) Video imaging of walking myosin 611 V by high-speed atomic force microscopy. Nature 468: 72-76 612 613 Kojima R, Okumura M, Masui S, Kanemura S, Inoue M, Saiki M, Yamaguchi H, 614 Hikima T, Suzuki M, Akiyama S et al (2014) Radically different thioredoxin domain 615 arrangement of ERp46, an efficient disulfide bond introducer of the mammalian PDI 616 family. Structure (London, England : 1993) 22: 431-443 617 618 Koritzinsky M, Levitin F, van den Beucken T, Rumantir RA, Harding NJ, Chu KC, 619 Boutros PC, Braakman I, Wouters BG (2013) Two phases of disulfide bond formation 620 have differing requirements for oxygen. The Journal of cell biology 203: 615-627 621 622 Lee JY, Hirose M (1992) Partially folded state of the disulfide-reduced form of human 623 serum albumin as an intermediate for reversible denaturation. The Journal of biological 624 chemistry 267: 14753-14758 625 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 26 626 Machida K, Mikami S, Masutani M, Mishima K, Kobayashi T, Imataka H (2014) A 627 translation system reconstituted with human factors proves that processing of 628 encephalomyocarditis virus proteins 2A and 2B occurs in the elongation phase of 629 translation without eukaryotic release factors. The Journal of biological chemistry 289: 630 31960-31971 631 632 Matsuo Y, Ikeuchi K, Saeki Y, Iwasaki S, Schmidt C, Udagawa T, Sato F, Tsuchiya H, 633 Becker T, Tanaka K et al (2017) Ubiquitination of stalled ribosome triggers 634 ribosome-associated quality control. Nature communications 8: 159 635 636 Mezghrani A, Fassio A, Benham A, Simmen T, Braakman I, Sitia R (2001) 637 Manipulation of oxidative protein folding and PDI redox state in mammalian cells. The 638 EMBO journal 20: 6288-6296 639 640 Molinari M, Helenius A (1999) Glycoproteins form mixed disulphides with 641 oxidoreductases during folding in living cells. Nature 402: 90-93 642 643 Nguyen VD, Saaranen MJ, Karala AR, Lappi AK, Wang L, Raykhel IB, Alanen HI, Salo 644 KE, Wang CC, Ruddock LW (2011) Two endoplasmic reticulum PDI peroxidases 645 increase the efficiency of the use of peroxide during disulfide bond formation. Journal 646 of molecular biology 406: 503-515 647 648 Nguyen VD, Wallis K, Howard MJ, Haapalainen AM, Salo KE, Saaranen MJ, Sidhu A, 649 Wierenga RK, Freedman RB, Ruddock LW et al (2008) Alternative conformations of 650 the x region of human protein disulphide-isomerase modulate exposure of the substrate 651 binding b' domain. Journal of molecular biology 383: 1144-1155 652 653 Noi K, Yamamoto D, Nishikori S, Arita-Morioka K, Kato T, Ando T, Ogura T (2013) 654 High-speed atomic force microscopic observation of ATP-dependent rotation of the 655 AAA+ chaperone p97. Structure (London, England : 1993) 21: 1992-2002 656 657 Okumura M, Kadokura H, Inaba K (2015) Structures and functions of protein disulfide 658 isomerase family members involved in proteostasis in the endoplasmic reticulum. Free 659 radical biology & medicine 83: 314-322 660 661 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 27 Okumura M, Noi K, Kanemura S, Kinoshita M, Saio T, Inoue Y, Hikima T, Akiyama S, 662 Ogura T, Inaba K (2019) Dynamic assembly of protein disulfide isomerase in catalysis 663 of oxidative folding. Nature chemical biology 15: 499-509 664 665 Robinson PJ, Bulleid NJ (2020) Mechanisms of Disulfide Bond Formation in Nascent 666 Polypeptides Entering the Secretory Pathway. Cells 9 667 668 Robinson PJ, Kanemura S, Cao X, Bulleid NJ (2020) Protein secondary structure 669 determines the temporal relationship between folding and disulfide formation. The 670 Journal of biological chemistry 295: 2438-2448 671 672 Robinson PJ, Pringle MA, Woolhead CA, Bulleid NJ (2017) Folding of a single domain 673 protein entering the endoplasmic reticulum precedes disulfide formation. The Journal of 674 biological chemistry 292: 6978-6986 675 676 Rutkevich LA, Cohen-Doyle MF, Brockmeier U, Williams DB (2010) Functional 677 relationship between protein disulfide isomerase family members during the oxidative 678 folding of human secretory proteins. Molecular biology of the cell 21: 3093-3105 679 680 Rutkevich LA, Williams DB (2012) Vitamin K epoxide reductase contributes to protein 681 disulfide formation and redox homeostasis within the endoplasmic reticulum. Molecular 682 biology of the cell 23: 2017-2027 683 684 Sato Y, Inaba K (2012) Disulfide bond formation network in the three biological 685 kingdoms, bacteria, fungi and mammals. The FEBS journal 279: 2262-2271 686 687 Sato Y, Kojima R, Okumura M, Hagiwara M, Masui S, Maegawa K, Saiki M, Horibe T, 688 Suzuki M, Inaba K (2013) Synergistic cooperation of PDI family members in 689 peroxiredoxin 4-driven oxidative protein folding. Scientific reports 3: 2456 690 691 Schulman S, Wang B, Li W, Rapoport TA (2010) Vitamin K epoxide reductase prefers 692 ER membrane-anchored thioredoxin-like redox partners. Proceedings of the National 693 Academy of Sciences of the United States of America 107: 15027-15032 694 695 Sugio S, Kashima A, Mochizuki S, Noda M, Kobayashi K (1999) Crystal structure of 696 human serum albumin at 2.5 A resolution. Protein engineering 12: 439-446 697 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 28 698 Tavender TJ, Bulleid NJ (2010) Molecular mechanisms regulating oxidative activity of 699 the Ero1 family in the endoplasmic reticulum. Antioxidants & redox signaling 13: 700 1177-1187 701 702 Tavender TJ, Springate JJ, Bulleid NJ (2010) Recycling of peroxiredoxin IV provides a 703 novel pathway for disulphide formation in the endoplasmic reticulum. The EMBO 704 journal 29: 4185-4197 705 706 Tian G, Xiang S, Noiva R, Lennarz WJ, Schindelin H (2006) The crystal structure of 707 yeast protein disulfide isomerase suggests cooperativity between its active sites. Cell 708 124: 61-73 709 710 Uchihashi T, Watanabe YH, Nakazaki Y, Yamasaki T, Watanabe H, Maruno T, Ishii K, 711 Uchiyama S, Song C, Murata K et al (2018) Dynamic structural states of ClpB involved 712 in its disaggregation function. Nature communications 9: 2147 713 714 Wang C, Yu J, Huo L, Wang L, Feng W, Wang CC (2012) Human protein-disulfide 715 isomerase is a redox-regulated chaperone activated by oxidation of domain a'. The 716 Journal of biological chemistry 287: 1139-1149 717 718 Zhang Y, Wölfle T, Rospert S (2013) Interaction of nascent chains with the ribosomal 719 tunnel proteins Rpl4, Rpl17, and Rpl39 of Saccharomyces cerevisiae. The Journal of 720 biological chemistry 288: 33697-33707 721 722 723 724 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 29 Figure 1 - Disulfide bond introduction into RNC 69-aa and 82-aa by PDI and 725 ERp46 726 A Schematic structure of plasmids constructed in this study. ‘uORF2’ is an arrest 727 sequence that serves to stall translation of the upstream protein and thereby prepare 728 stable ribosome-nascent chain complexes (RNCs). The bottom cartoon represents the 729 location of cysteines and disulfide bonds in HSA domain I. HSA domain I consists of 730 195 amino acids and contains five disulfide bonds and one free cysteine at residue 34. A 731 green box indicates the pro-sequence. Orange circles and red lines indicate cysteines 732 and native disulfide bonds, respectively. The region predicted to be buried in the 733 ribosome exit tunnel is shown by a cyan box. 734 B Domain organization of PDI and ERp46. Redox-active Trx-like domains with a 735 CGHC motif are indicated by cyan boxes, while redox-inactive ones in PDI are by 736 light-green boxes. Note that the PDI b’ domain contains a substrate-binding 737 hydrophobic pocket. 738 C, E Time course of PDI-, ERp46-, and glutathione (no enzyme)-catalyzed disulfide 739 bond introduction into RNC 69-aa (C) and 82-aa (E). ‘noSS’ and ‘1SS’ denote reduced 740 and single-disulfide-bonded species of HSA nascent chains, respectively. Note that faint 741 bands observed between “no SS” and “1SS” likely represent a species in which one of 742 cysteines is not subjected to mal-PEG modification due to glutathionylation. In support 743 of this, these minor bands are even fainter under the conditions of no GSH/GSSG. 744 D, F Quantification of disulfide-bonded species for RNC 69-aa (D) and 82-aa (F) based 745 on the results shown in (C) and (E), respectively (n = 3). 746 747 Figure 2 - Disulfide bond introduction into RNC 82-aa Cys mutants by PDI and 748 ERp46 749 A Cartoon of RNC constructs used in this study. In each construct, a cysteine 750 (represented by a black circle) was mutated to alanine. Note that RNC 82-aa C34A 751 retains a native cysteine pairing (i.e., Cys53 and Cys62), while RNC 82-aa C53A and 752 C62A retain a non-native pairing. 753 B and C Time course of PDI- and ERp46-catalyzed disulfide bond introduction into 754 RNC 82-aa C34A (top), C53A (middle), and C62A (bottom) mutants. Note that faint 755 bands observed between “no SS” and “1SS” likely represent a species in which one of 756 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 30 cysteines is not subjected to mal-PEG modification due to glutathionylation. 757 Quantification of disulfide-bonded species of RNC 82-aa Cys mutants is based on the 758 results shown for the upper raw data (n = 3). 759 D Formation of a mixed disulfide bond between RNC 82-aa mono-Cys mutants and PDI 760 (upper)/ERp46 (lower). ‘Mixed’ and ‘No SS’ denote a mixed disulfide complex between 761 PDI/ERp46 and RNC mono-Cys mutants and isolated RNC 82-aa, respectively. Note 762 that faint bands observed between ‘Mixed’ and ‘no SS’ are likely non-specific bands, as 763 they were seen at the same position regardless of which 82-aa mono-Cys mutant was 764 tested or whether an RNC was reacted with PDI or ERp46. 765 E Quantification of mixed disulfide species based on the results shown in (D). n = 3. 766 F The cartoon on the left shows possible steric collisions between ribosomes and PDI 767 when Cys62 attacks the mixed disulfide between Cys53 on RNC 82-aa and PDI (left). 768 The cartoon on the right shows that ERp46 can avoid this steric collision due to its 769 higher flexibility and domain arrangement. 770 771 Figure 3 - Correlation of the distance between Cys residues and the ribosome exit 772 site with the efficiency of disulfide bond introduction by PDI/ERp46 773 A Cartoons of RNC constructs with [SG]-repeat insertions. A [SG]5 or [SG]10 repeat 774 sequence was inserted into RNC-82 aa C34A immediately after Cys62. 775 B, D PDI- (B) and ERp46 (D)-mediated disulfide bond introduction into RNC 82-aa 776 C34A with insertion of [SG]5 (upper) or [SG]10 (lower) repeats after Cys62. 777 C, E Quantification of disulfide-bonded species (1SS) based on the results shown in (B) 778 and (D). n = 3 for PDI and 2 for ERp46. 779 F Formation of a mixed disulfide bond between the 82-aa mono-Cys mutant with a 780 [SG]10 repeat and PDI (upper)/ERp46 (lower). Note that bands observed between 781 ‘Mixed’ and ‘no SS’ are likely non-specific bands, as they were seen at the same 782 position regardless of which 82-aa mono-Cys [SG]10 mutant was tested or whether an 783 RNC was reacted with PDI or ERp46. 784 G Quantification of mixed disulfide species based on the results shown in (F). n = 3. 785 786 Figure 4 - Disulfide bond introduction into RNC 95-aa by PDI and ERp46 787 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 31 A Schematic structure of RNC-95-aa. Orange circles and red lines in the bottom cartoon 788 indicate cysteines and native disulfides, respectively. The region predicted to be buried 789 in the ribosome exit tunnel is shown by a cyan box. 790 B Time course of PDI (0.1 M)-, ERp46 (0.1 M)-, and their mixture (0.1 M 791 each)-catalyzed disulfide bond introduction into RNC 95-aa. ‘noSS’ and ‘1SS’ denote 792 reduced and single-disulfide-bonded species of the HSA nascent chain, respectively. 793 C Quantification of the single-disulfide-bonded (1 SS) species based on the result 794 shown in (B) (n = 3). 795 796 Figure 5 - High-speed AFM analysis of ERp46 797 A AFM images (scan area, 200  200 Å; scale bar, 30 Å) for ERp46 V-shape (left) and 798 O-shape (right) conformations. 799 B Left upper: Histograms of circularity calculated from AFM images of ERp46. Values 800 represent the average circularity (mean ± s.d.) calculated from curve fitting with a 801 single- (middle and right) or two- (left) Gaussian model. Left lower: Histograms of 802 height calculated from AFM images of ERp46. Values represent the average height 803 (mean ± s.d.) calculated from curve fitting with a single-Gaussian model. Right: 804 Two-dimensional scatterplots of the height versus circularity for ERp46 molecules 805 observed by HS-AFM. 806 C Time-course snapshots of oxidized ERp46 captured by HS-AFM. The images were 807 traced for 10 s. See also Movie EV1. 808 D Time trace of the circularity of an ERp46 molecule. 809 E Histogram of the circularity of ERp46 calculated from the time-course snapshots 810 shown in (D). 811 812 Figure 6 - Single-molecule observation of PDI/ERp46 acting on 82-aa CA RNC by 813 high-speed atomic force microscopy 814 A The AFM images (scan area, 500 Å  500 Å; scale bar, 100 Å) displaying 82-aa CA 815 RNC in the absence of PDI family enzymes on a Ni 2+ -coated mica surface. The surface 816 model on the right side of each AFM image illustrates ribosome whose view angle is 817 approximately adjusted to the observed RNC particle. 40S and 60S ribosomal subunits 818 are shown in red and blue, respectively. 819 B Upper AFM images (scan area, 500 Å  500 Å; scale bar, 100 Å) displaying 82-aa CA 820 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 32 RNC in the presence of oxidized PDI (1 nM). PDI molecules that appear to bind 82-aa 821 CA RNC are marked by red squares. Lower images (scan area, 250 Å  250 Å; scale bar, 822 50 Å) highlight the regions surrounded by red squares in the upper images. 823 C Upper AFM images (scan area, 500 Å  500 Å; scale bar, 100 Å) displaying 82-aa 824 CA RNC in the presence of oxidized ERp46 (1 nM). ERp46 molecules that appear to 825 bind 82-aa CA RNC are marked by blue squares. Lower images (scan area, 250 Å  250 826 Å; scale bar, 50 Å) highlight the regions surrounded by blue squares in the upper 827 images. 828 D Histograms of the RNC binding time of the PDI monomer (left), the PDI dimer 829 (middle), and ERp46 (right), calculated from the observed AFM images. 830 E Histograms of the distance between the edge of the ribosome and the centers of 831 RNC-neighboring PDI (left) and ERp46 (right) molecules, calculated from the observed 832 AFM images. Values represent the average distance (mean ± s.d.) calculated from curve 833 fitting with a single-Gaussian model. 834 835 Figure 7 - Role of the PDI hydrophobic pocket in PDI-mediated disulfide bond 836 introduction into RNC 82-aa 837 A Disulfide bond introduction into RNC 82-aa by PDI I289A (upper) and ERp57 838 (lower). Note that faint bands observed between “no SS” and “1SS” likely represent a 839 species in which one of cysteines is not subjected to mal-PEG modification due to 840 glutathionylation. In support of this, these minor bands are even fainter under the 841 conditions of no GSH/GSSG. 842 B Quantification of disulfide-bonded species based on the results shown in (A). 843 Quantifications for ERp46 and PDI are based on the results shown in Fig 1E and 1F. n = 844 3. 845 C HS-AFM analyses for binding of PDI I289A to RNC CA 82-aa. Upper AFM images 846 (scan area, 500 Å  500 Å; scale bar, 100 Å) display the PDI I289A molecules that bind 847 82-aa CA RNC, as marked by red squares. Lower images (scan area, 250 Å  250 Å; 848 scale bar, 50 Å) highlight the regions surrounded by red squares in the upper images. 849 D Histograms show the distribution of the RNC binding time of the PDI I289A 850 monomers (left) and dimers (right). 851 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 33 E Histogram shows the distribution of the distance between the edge of the ribosome 852 and the centers of RNC-neighboring PDI I289A molecules, calculated from the 853 observed AFM images. Values represents the average distance (mean ± s.d.) calculated 854 from curve fitting with a single-Gaussian model. 855 856 Figure 8 - Proposed model of co-translational disulfide bond introduction into 857 nascent chains by ERp46 and PDI 858 During the early stages of translation, ERp46 introduces disulfide bonds through 859 transient binding to a nascent chain. For efficient disulfide introduction by ERp46, a 860 pair of cysteines must be exposed by at least ~8 amino acids from the ribosome exit site. 861 By contrast, PDI introduces disulfide bonds by holding a nascent chain inside the 862 central cavity of the PDI homodimer during the later stages of translation, where a pair 863 of cysteines must be exposed by at least ~18 amino acids from the ribosome exit site. 864 However, when a longer polypeptide is exposed outside the ribosome, ERp46- or 865 PDI-mediated disulfide bond formation can be slower, possibly due to formation of 866 higher-order conformation in the nascent chain. Longer nascent chains may allow PDI 867 family enzymes to compete with each other for binding and acting on RNC. 868 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 34 Table 1 – Primers used in this study 869 870 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 Fig.1 A N CHSA domainⅠ (X aa) uORF2 (22 aa)FLAG (8 aa) Arrest sequence Ribosome exit tunnel ~30 aa Nascent chain 82aa (pro 6 aa + HSA 76 aa) 69 aa (pro 6 aa + HSA 63 aa) 34 53 62 75 90 91 101 168124 169 177 C C C C CC C C C C CproN C Phe70Glu57 D no SS 1 SS C no SS 1 SS 75 50 37 75 50 37 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI GSH/GSSG mal-PEG 5K 69 aa Time(s) 0 15 30 60 180 360 －－－＋＋＋＋＋＋＋＋＋＋＋＋＋ GSH/GSSG mal-PEG 5K 69 aa Time(s) E PDI ERp46 no SS 1 SS 75 50 37 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp46 GSH/GSSG mal-PEG 5K 69 aa Time(s) Glutathione no SS 1 SS 75 50 37 100 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI GSH/GSSG mal-PEG 5K 82 aa Time(s) F PDI ERp46 Glutathione no SS 1 SS 75 50 37 100 IB : FLAG 0 15 30 60 180 360 －－－＋＋＋＋＋＋＋＋＋＋＋＋＋ GSH/GSSG mal-PEG 5K 82 aa Time(s) no SS 1 SS 75 50 37 100 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp46 GSH/GSSG mal-PEG 5K 82 aa Time(s) CGHC CGHC CGHC -S-S- -S-S- -S-S- Trx1 Trx2 Trx3 ERp46CGHC CGHC -S-S- -S-S- a b a’b’ PDI B Hydrophobic pocket 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 0 20 40 60 80 100 Cys34 Cys53 Cys62 ** *** p=0.06 Fig.2 native 82 aa C34A A C C 34 53 62 non-native 82 aa C62A C C A 34 53 62 82 aa C53A C A C 34 53 62 non-native A B C no SS 1 SS no SS 1 SS 75 50 37 75 50 37 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp46 GSH/GSSG mal-PEG 5K 82 aa C34A (native) Time(s) 82 aa C62A (non-native) 82 aa C53A (non-native) 75 50 37 no SS 1 SS no SS 1 SS no SS 1 SS 75 50 37 75 50 37 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI GSH/GSSG mal-PEG 5K 82 aa C34A (native) Time(s) 82 aa C62A (non-native) 82 aa C53A (non-native) 75 50 37 no SS 1 SS D E Non-reducing Reducing Remaining Cys residue Non-reducing Reducing Remaining Cys residue 82 aa mono-Cys mutant + PDI Mixed No SS 75 50 37 100 82 aa mono-Cys mutant + ERp46 Mixed no SS 75 50 37 100 M ix e d d is u lf id e b o n d f o rm e d ( % ) PDI ERp46 62 53 PDI Low flexibility 62 53 ERp46 High flexibility F 82 aa C34A 82 aa C62A 82 aa C53A82 aa C53A 82 aa C62A 82 aa C34A 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) * * (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 no SS 1 SS no SS 1 SS 75 50 37 75 50 37 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp46 GSH/GSSG mal-PEG 5K 82 aa C34A [SG]5 Time(s) 82 aa C34A [SG]10 D B no SS 1 SS no SS 1 SS 75 50 37 75 50 37 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI GSH/GSSG mal-PEG 5K 82 aa C34A [SG]5 Time(s) 82 aa C34A [SG]10 82 aa C34A [SG]x + PDI C 10 SG 5 SG 0 SG 82 aa C34A [SG]x + ERp46 E 10 SG 5 SG 0 SG G M ix e d d is u lf id e b o n d f o rm e d ( % ) PDI ERp46 0 20 40 60 80 100 120 Cys34 Cys53 Cys62 n.s. n.s. n.s. F Non-reducing Reducing Remaining Cys residue 82 aa mono-Cys [SG]10 mutant + PDI Mixed no SS 75 50 37 100 82 aa mono-Cys [SG]10 mutant + ERp46 Mixed no SS 75 50 37 100 Non-reducing Reducing Remaining Cys residue A 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) 82 aa C34A [SG]10 native A C C 34 53 62 [SG]10 82 aa C34A [SG]5 native A C C 34 53 62 [SG]5 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) * * Fig.3 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 A B no SS 1 SS 50 37 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI GSH/GSSG PEG-PCMal 95 aa Time(s) no SS 1 SS 50 37 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI & ERp46 GSH/GSSG PEG-PCMal 95 aa Time(s) IB : FLAG no SS 1 SS 50 37 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp46 GSH/GSSG PEG-PCMal 95 aa Time(s) C 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) PDI ERp46 ERp46+PDI N CHSA domainⅠ (X aa) uORF2 (22 aa)FLAG (8 aa) Arrest sequence Ribosome exit tunnel ~30 aa Nascent chain 95 aa (pro 6 aa + HSA 89 aa) 34 53 62 75 90 91 101 168124 169 177 C C C C CC C C C C CproN C Thr83 Fig.4 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 N u m b e r o f fr a m e s Circularity 0 1 2 3 4 5 0 20 40 60 N u m b e r o f m o le c u le s Height (nm) 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 N u m b e r o f m o le c u le s Circularity 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 H e ig h t (n m ) Circularity 0.53 ± 0.10 0.80 ± 0.04 Total (n=200) B 2.7 ± 0.6 nm Total (n=200) D E A (2) 1.8 sec (3) 2.5 sec (5) 7.6 sec (1) 1.4 sec (4) 4.9 sec C 30 Å 0.0 2.5 O-shape molecule Cir:0.776 Cir:0.820 30 Å 0.0 3.0 0.0 3.7 V-shape molecule Cir:0.421 Cir:0.535 30 Å 0.0 2.4 0.0 1.8 30 Å 30 Å 30 Å30 Å30 Å30 Å 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 C ir c u la ru ty Time (sec) (1) (2) (3) (5) (4) O-shape V-shape O-shape V-shape 0.58 ± 0.06 0.78 ± 0.05 N u m b e r o f m o le c u le s N u m b e r o f m o le c u le s circularity height (nm) circularity h e ig h t (n m ) 0 0 2 4 6 8 10 0.2 Time (s) 0.4 0.6 0.8 1 c ir c u la ri ty N u m b e r o f fr a m e s circularity Fig.5 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 0 10 20 30 40 50 60 0 5 10 15 20 N u m b e r o f m o le c u le s Binding time (sec) 0 5 10 15 20 25 30 35 40 0 5 10 15 20 N u m b e r o f m o le c u le s Distance (nm) 0 5 10 15 20 25 30 35 40 0 5 10 15 20 N u m b e r o f m o le c u le s Distance (nm) 16.9 ± 4.7 nm 15.7 ± 4.1 nm B D C PDI monomer ERp46 PDI ERp46 E PDI dimer 100 Å 50 Å 82-aa CA RNC + Oxidized PDI monomer dimer dimer c lo s e d -u p 0.0 15.2 0.0 13.0 0.0 18.0 0.0 6.0 0.0 5.8 0.0 14.5 50 Å 150 Å 82-aa CA RNC + Oxidized ERp46 monomer monomer c lo s e d -u p 0.0 14.7 0.0 7.9 0.0 18.1 0.0 10.9 0.0 19.0 0.0 20.4 0.0 17.9 A 0 10 20 30 40 50 60 0 10 20 30 40 N u m b e r o f m o le c u le s Binding time (sec) 100 Å 100 Å100 Å 100 Å100 Å 50 Å 50 Å 150 Å 50 Å 0 10 20 30 40 50 60 0 5 10 15 20 25 N u m b e r o f m o le c u le s Binding time (sec) N u m b e r o f m o le c u le s Binding time (s) N u m b e r o f m o le c u le s Binding time (s) N u m b e r o f m o le c u le s Binding time (s) N u m b e r o f m o le c u le s Distance (nm) N u m b e r o f m o le c u le s Distance (nm) Fig. 6 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 no SS 1 SS A no SS 1 SS 75 50 37 100 75 50 37 100 IB : FLAG 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ PDI I289A GSH/GSSG mal-PEG 5K 82 aa Time(s) 0 15 30 60 180 360 －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ ERp57 GSH/GSSG mal-PEG 5K 82 aa Time(s) B PDI ERp46 ERp57 PDI I289A 0 20 40 60 80 100 0 60 120 180 240 300 360 D is u lf id e b o n d in tr o d u c ti o n (% ) Time (s) c lo s e d -u p monomer dimer 0.0 18.0 0.0 15.3 0.0 4.4 0.0 3.7 100 Å100 Å 50 Å 50 Å 100 Å 50 Å dimer 0.0 14.0 0.0 4.8 82-aa CA RNC + Oxidized PDI I289A 0 10 20 30 40 50 60 0 2 4 6 8 10 N u m b e r o f m o le c u le s Binding time (sec) 0 10 20 30 40 50 60 0 10 20 30 40 50 N u m b e r o f m o le c u le s Binding time (sec) PDI I289A monomer PDI I289A dimer N u m b e r o f m o le c u le s Binding time (s) N u m b e r o f m o le c u le s Binding time (s) 0 5 10 15 20 25 30 35 40 0 10 20 30 N u m b e r o f m o le c u le s Distance (nm) PDI I289A N u m b e r o f m o le c u le s Distance (nm) 15.0 ± 3.5 nm C D E Fig. 7 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 18 aa Ribosome cytosol ER lumen 8 aa ーSH ERp46 PDI PDIERp46 competition Fig. 8 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 1 Expanded View 1 2 Distinct roles and actions of PDI family enzymes in catalysis of nascent-chain 3 disulfide formation 4 5 Chihiro Hirayama1, Kodai Machida2#, Kentaro Noi3#, Tadayoshi Murakawa4, Masaki 6 Okumura1,5, Teru Ogura6,7, Hiroaki Imataka2, and Kenji Inaba1* 7 8 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 2 9 10 Figure EV1 - Redox states of PDI and ERp46 in glutathione redox buffer and 11 disulfide bond introduction into 82 aa C34A, catalyzed by PDI a domain 12 A Redox states of PDI and ERp46 in the presence of 1 mM GSH and 0.2 mM GSSG. 13 Purified PDI and ERp46 were incubated for 6 mins at 30 ºC in the above glutathione 14 redox buffer and modified with 2 mM mal-PEG 5K for separation on SDS gels. 15 B Quantification based on the results shown in (A). 16 17 18 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 3 19 20 Figure EV2 - Statistical analysis of RNC molecules observed by HS-AFM in the 21 presence or absence of PDI/ERp46 22 A Number of particles observed for NC-RNC or 82-aa CA RNC molecules present in 23 isolation or bound to PDI/ERp46 molecules. 24 B Ratio of NC-RNC or 82-aa CA RNC molecules present in isolation or bound to 25 PDI/ERp46, calculated based on the observed number of particles in (A). Note that a 26 minor portion of NC-RNC or 82-aa CA RNC molecules were bound to many ERp46/PDI 27 molecules, possibly due to serious structural damages of the RNC molecules. 28 29 30 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 4 31 Figure EV3 - Representative time-course snapshots captured by HS-AFM for 82-aa 32 CA RNC bound to the PDI monomer (A), the PDI dimer (B), and ERp46 (C). 33 A Time-course snapshots captured by HS-AFM for the PDI monomer binding to 82-aa 34 CA RNC. The AFM images (scan area, 650 Å  650 Å; scale bar, 130 Å) displaying 82-35 aa CA RNC in the presence of oxidized PDI (1 µM). White arrows indicate the 36 monomeric PDI molecules that bind to 82-aa CA RNC. See also supplementary video 2. 37 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 5 B Time-course snapshots captured by HS-AFM for the PDI dimer binding to 82-aa CA 38 RNC. The AFM images (scan area, 700 Å  700 Å; scale bar, 140 Å) displaying 82-aa 39 CA RNC in the presence of oxidized PDI (1 µM). White arrows indicate the dimeric PDI 40 molecules that bind to 82-aa CA RNC. See also supplementary video 3. 41 C Time-course snapshots captured by HS-AFM for ERp46 binding to 82-aa CA RNC. 42 The AFM images (scan area, 1,000 Å  1,000 Å; scale bar, 200 Å) displaying 82-aa CA 43 RNC in the presence of oxidized ERp46 (1 µM). White arrows indicate the ERp46 44 molecules that bind to 82-aa CA RNC. See also supplementary video 4. 45 46 47 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 6 48 Figure EV4 - Representative time-course snapshots captured by HS-AFM for 82-aa 49 CA RNC bound to the PDI I289A monomer (A), and the PDI I289A dimer (B). 50 A Time-course snapshots captured by HS-AFM for the PDI I289A monomer binding to 51 82-aa CA RNC. The AFM images (scan area, 900 Å  900 Å; scale bar, 200 Å) displaying 52 82-aa CA RNC in the presence of oxidized PDI I289A (1 µM). White arrows indicate the 53 monomeric PDI I289A molecules that bind to 82-aa CA RNC. See also supplementary 54 video 5. 55 B Time-course snapshots captured by HS-AFM for the PDI I289A dimer binding to 82-56 aa CA RNC. The AFM images (scan area, 800 Å  800 Å; scale bar, 200 Å) displaying 57 82-aa CA RNC in the presence of oxidized PDI I289A (1 µM). White arrows indicate the 58 dimeric PDI I289A molecules that bind to 82-aa CA RNC. See also supplementary video 59 6. 60 61 62 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 7 Movie EV1 - HS-AFM movies showing structure dynamics of oxidized ERp46. This 63 movie is a source of the time-course snapshots shown in Fig 5C. 64 65 Movie EV2 - HS-AFM movies showing the binding of the PDI monomer to 82-aa CA 66 RNC. This movie is a source of the time-course snapshots shown in supplementary Fig 67 EV3A. 68 69 Movie EV3 - HS-AFM movies showing the binding of the PDI dimer to 82-aa CA 70 RNC. This movie is a source of the time-course snapshots shown in supplementary Fig 71 EV3B. 72 73 Movie EV4 - HS-AFM movies showing the binding of ERp46 to 82-aa CA RNC. This 74 movie is a source of the time-course snapshots shown in supplementary Fig EV3C. 75 76 Movie EV5 - HS-AFM movies showing the binding of the PDI I289A monomer to 77 82-aa CA RNC. This movie is a source of the time-course snapshots shown in 78 supplementary Fig EV4A. 79 80 Movie EV6 - HS-AFM movies showing the binding of the PDI I289A dimer to 82-81 aa CA RNC. This movie is a source of the time-course snapshots shown in 82 supplementary Fig EV4B. 83 84 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425348doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425348 10_1101-2021_01_05_425367 ---- Disorder is a critical component of lipoprotein sorting in Gram-negative bacteria 1 Disorder is a critical component of lipoprotein sorting in Gram-negative bacteria 1 2 Jessica El Rayes1,2$, Joanna Szewczyk1,2$, Michael Deghelt1,2, André Matagne3, Bogdan I. 3 Iorga4, Seung-Hyun Cho1,2, and Jean-François Collet1,2* 4 5 6 1WELBIO, Avenue Hippocrate 75, 1200 Brussels, Belgium. 7 2de Duve Institute, Université catholique de Louvain, Avenue Hippocrate 75, 1200 Brussels, 8 Belgium. 9 3Centre d’ingéniérie des Protéines, Institut de Chimie B6, Université de Liège, Allée de la 10 Chimie 3, 4000 Liège, Sart Tilman, Belgium. 11 4Université Paris-Saclay, CNRS UPR 2301, Institut de Chimie des Substances Naturelles, 12 91198 Gif-sur-Yvette, France. 13 14 $Both authors contributed equally to the work 15 16 *Correspondence: jfcollet@uclouvain.be 17 2 Abstract (150 max) 18 19 Gram-negative bacteria express structurally diverse lipoproteins in their envelope. Here 20 we found that approximately half of lipoproteins destined to the Escherichia coli outer 21 membrane display an intrinsically disordered linker at their N-terminus. Intrinsically 22 disordered regions are common in proteins, but establishing their importance in vivo has 23 remained challenging. Here, as we sought to unravel how lipoproteins mature, we 24 discovered that unstructured linkers are required for optimal trafficking by the Lol 25 lipoprotein sorting system: linker deletion re-routes three unrelated lipoproteins to the 26 inner membrane. Focusing on the stress sensor RcsF, we found that replacing the linker 27 with an artificial peptide restored normal outer membrane targeting only when the 28 peptide was of similar length and disordered. Overall, this study reveals the role played 29 by intrinsic disorder in lipoprotein sorting, providing mechanistic insight into the 30 biogenesis of these proteins and suggesting that evolution can select for intrinsic disorder 31 that supports protein function. 32 3 Introduction 33 The cell envelope is the morphological hallmark of Escherichia coli and other Gram-negative 34 bacteria. It is composed of the inner membrane, a classical phospholipid bilayer, as well as the 35 outer membrane, an asymmetric bilayer with phospholipids in the inner leaflet and 36 lipopolysaccharides in the outer leaflet1. This lipid asymmetry enables the outer membrane to 37 function as a barrier that effectively prevents the diffusion of toxic compounds in the 38 environment into the cell. The inner and outer membranes are separated by the periplasm, a 39 viscous compartment that contains a thin layer of peptidoglycan also known as the cell wall1. 40 The cell envelope is essential for growth and survival, as illustrated by the fact that several 41 antibiotics such as the b-lactams target mechanisms of envelope assembly. Mechanisms 42 involved in envelope biogenesis and maintenance are therefore attractive targets for novel 43 antibacterial strategies. 44 45 Approximately one-third of E. coli proteins are targeted to the envelope, either as soluble 46 proteins present in the periplasm or as proteins inserted in one of the two membranes2. While 47 inner membrane proteins cross the lipid bilayer via one or more hydrophobic α-helices, proteins 48 inserted in the outer membrane generally adopt a β-barrel conformation3. Another important 49 group of envelope proteins is the lipoproteins, which are globular proteins anchored to one of 50 the two membranes by a lipid moiety. Lipoproteins carry out a variety of important functions 51 in the cell envelope: they participate in the biogenesis of the outer membrane by inserting 52 lipopolysaccharide molecules4,5 and b-barrel proteins6, they function as stress sensors triggering 53 signal transduction cascades when envelope integrity is altered7, and they control processes that 54 are important for virulence8. The diverse roles played by lipoproteins in the cell envelope has 55 drawn a lot of attention lately, revealing how crucial these proteins are in a wide range of vital 56 processes and identifying them as attractive targets for antibiotic development. Yet, a detailed 57 4 understanding of the mechanisms involved in lipoprotein maturation and trafficking is still 58 missing. 59 60 Lipoproteins are synthesized in the cytoplasm as precursors with an N-terminal signal peptide9. 61 The last four C-terminal residues of this signal peptide, known as the lipobox, function as a 62 molecular determinant of lipid modification unique to bacteria; only the cysteine at the last 63 position of the lipobox is strictly conserved10. After secretion of the lipoprotein into the 64 periplasm, the thiol side-chain of the cysteine is first modified with a diacylglyceryl moiety by 65 prolipoprotein diacylglyceryl transferase (Lgt)9 (Extended Data Fig. 1a, step 1). Then, signal 66 peptidase II (LspA) catalyzes cleavage of the signal peptide N-terminally of the lipidated 67 cysteine before apolipoprotein N-acyltransferase (Lnt) adds a third acyl group to the N-terminal 68 amino group of the cysteine (Extended Data Fig. 1a, steps 2-3). Most mature lipoproteins are 69 then transported to the outer membrane by the Lol system. Lol consists of LolCDE, an ABC 70 transporter that extracts lipoproteins from the inner membrane and transfers them to the soluble 71 periplasmic chaperone LolA (Extended Data Fig. 1a, steps 4-5)11. LolA escorts lipoproteins 72 across the periplasm, binding their hydrophobic lipid tail, and delivers them to the outer 73 membrane lipoprotein LolB (Extended Data Fig. 1a, step 6). LolB finally anchors lipoproteins 74 to the inner leaflet of the outer membrane using a mechanism that remains poorly characterized 75 (Extended Data Fig. 1a, step 7). 76 77 In most Gram-negative bacteria, a few lipoproteins remain in the inner membrane12,13. The 78 current view is that inner membrane retention depends on the identity of the two residues 79 located immediately downstream of the N-terminal cysteine on which the lipid moiety is 80 attached14; this sequence, two amino acids in length, is known as the Lol sorting signal. When 81 lipoproteins have an aspartate at position +2 and an aspartate, glutamate, or glutamine at 82 5 position +3, they remain in the inner membrane15,16, possibly because strong electrostatic 83 interactions between the +2 aspartate and membrane phospholipids prevent their interaction 84 with LolCDE17. However, this model is largely based on data obtained in E. coli and variations 85 have been described in other bacteria. For instance, in the pathogen Pseudomonas aeruginosa, 86 an aspartate is rarely found at position +2 and inner membrane retention appears to be 87 determined by residues +3 and +418,19. Surprisingly, lipoproteins are well sorted in P. 88 aeruginosa cells expressing the E. coli LolCDE complex20, despite their different Lol sorting 89 signal. This result cannot be explained by the current model of lipoprotein sorting, underscoring 90 that our comprehension of the precise mechanism that governs the triage of lipoproteins remains 91 incomplete. 92 93 Excitingly, more unresolved questions regarding lipoprotein biogenesis have recently been 94 raised. First, it was reported that a LolA-LolB-independent trafficking route to the outer 95 membrane exists in E. coli21, but the factors involved have remained unknown. Second, 96 although lipoproteins have traditionally been considered to be exposed to the periplasm in E. 97 coli and many other bacterial models9, a series of investigations have started to challenge this 98 view by identifying lipoproteins on the surface of E. coli, Vibrio cholerae, and Salmonella 99 Typhimurium22-26. Overall, the field is beginning to explore a lipoprotein topological landscape 100 that is more complex than previously assumed and raising intriguing questions about the signals 101 that control surface targeting and exposure. 102 103 Here, stimulated by the hypothesis that crucial details of the mechanisms underlying lipoprotein 104 maturation remained to be elucidated, we sought to identify novel molecular determinants 105 controlling lipoprotein biogenesis. First, we systematically analyzed the sequence of the 66 106 lipoproteins with validated localization27 encoded by the E. coli K12 genome27 and found that 107 6 half of the outer membrane lipoproteins display a long and intrinsically disordered linker at 108 their N-terminus. Intrigued by these unstructured segments, we then probed their importance 109 for the biogenesis of RcsF, NlpD, and Pal, three structurally and functionally unrelated outer 110 membrane lipoproteins. Unexpectedly, we found that deleting the linker—while keeping the 111 Lol sorting signal intact—altered the targeting of all three lipoproteins to the outer membrane, 112 with physiological consequences. Focusing on RcsF, we determined that both the length and 113 disordered character of the linker were important. Remarkably, lowering the load of the Lol 114 system by deleting lpp, which encodes the most abundant lipoprotein (~1 million copies per 115 cell28), restored normal outer membrane targeting of linker-less RcsF, indicating that the N-116 terminal linker is required for optimal lipoprotein processing by Lol. Taken together, these 117 observations reveal the unsuspected role played by protein intrinsic disorder in lipoprotein 118 biogenesis. 119 7 Results 120 121 Half of E. coli lipoproteins present long disordered segments at their N-termini 122 In an attempt to discover novel molecular determinants controlling the biogenesis of 123 lipoproteins, we decided to systematically analyze the sequence of the lipoproteins encoded by 124 the E. coli genome (strain MG1655) in search of unidentified structural features. E. coli encodes 125 ~80 validated lipoproteins29, of which 58 have been experimentally shown to localize in the 126 outer membrane27. Comparative modeling of existing X-ray, cryogenic electron microscopy 127 (cryo-EM), and nuclear magnetic resonance (NMR) structures revealed that approximately half 128 of these outer membrane lipoproteins display a long segment (>22 residues) that is predicted to 129 be disordered at the N-terminus (Fig. 1, Extended Data Fig. 2, Extended Data Table 1). In 130 contrast, only one of the 8 lipoproteins that remain in the inner membrane (DcrB; Extended 131 Data Fig. 2, Extended Data Table 1) had a long, disordered linker, suggesting that disordered 132 peptides may be important for lipoprotein sorting. 133 134 Deleting the N-terminal linker of RcsF, NlpD, and Pal perturbs their targeting to the outer 135 membrane 136 Intrigued by the presence of these N-terminal disordered segments in so many outer membrane 137 lipoproteins, we decided to investigate their functional importance. We selected three 138 structurally unrelated lipoproteins whose function could easily be assessed: the stress sensor 139 RcsF (which triggers the Rcs signaling cascade when damage occurs in the envelope30), NlpD 140 (which activates the periplasmic N-acetylmuramyl-L-alanine amidase AmiC, which is involved 141 in peptidoglycan cleavage during cell division31,32), and the peptidoglycan-binding lipoprotein 142 Pal (which is important for outer membrane constriction during cell division33). 143 144 8 We began by preparing truncated versions of RcsF, NlpD, and Pal devoid of their N-terminal 145 unstructured linkers (Extended Data Fig. 1b, Extended Data Fig. 2; RcsF∆19-47, Pal∆26-56, and 146 NlpD∆29-64). Note that the lipidated cysteine residue (+1) and the Lol sorting signal (the amino 147 acids at positions +2 and +3) were not altered in RcsF∆19-47, Pal∆26-56, and NlpD∆29-64, nor in any 148 of the constructs discussed below (Extended Data Table 2). For Pal, although the unstructured 149 linker spans residues 25-68 (Fig. 1), we used Pal∆26-56 because Pal∆25-68 was either degraded or 150 not detected by the antibody (data not shown). We first tested whether the truncated lipoproteins 151 were still correctly extracted from the inner membrane and transported to the outer membrane. 152 The membrane fraction was prepared from cells expressing the three variants independently, 153 and the outer and inner membranes were separated using sucrose density gradients (Methods). 154 Whereas wild-type RcsF, NlpD, and Pal were mostly detected (>90%) in the outer membrane 155 fraction, as expected, ~50% of RcsF∆19-47 and ~60% of NlpD∆29-64 were retained in the inner 156 membrane (Fig. 2a, 2b). The sorting of Pal was also affected, although to a lesser extent: 15% 157 of Pal∆26-56 was retained in the inner membrane (Fig. 2c). Notably, the expression levels of the 158 three linker-less variants were similar (NlpD∆29-64) or lower (RcsF∆19-47; Pal∆26-56) than those of 159 the wild-type proteins (Extended Data Fig. 3), indicating that accumulation in the inner 160 membrane did not result from increased protein abundance. 161 162 We then tested the impact of linker deletion on the function of these three proteins. In cells 163 expressing RcsF∆19-47, the Rcs system was constitutively turned on (Fig. 2d); when RcsF 164 accumulates in the inner membrane, it becomes available for interaction with IgaA, its 165 downstream Rcs partner in the inner membrane30,34. Likewise, expression of NlpD∆29-64 did not 166 rescue the chaining phenotype (Fig. 2e)35 exhibited by cells lacking both nlpD and envC, an 167 activator of the amidases AmiA and AmiB32. Finally, Pal∆26-56 partially rescued the sensitivity 168 of the pal mutant to SDS-EDTA that results from increased membrane permeability36 (Fig. 2f). 169 9 However, this observation needs to be considered with caution given that Pal∆26-56 seemed to 170 be expressed at lower levels than wild-type Pal (Extended Data Fig. 3). Thus, preventing 171 normal targeting of RcsF, NlpD and Pal to the outer membrane had functional consequences. 172 173 RcsF variants with unstructured artificial linkers of similar lengths are normally targeted 174 to the outer membrane 175 The results above were surprising because they revealed that the normal targeting of RcsF, 176 NlpD, and Pal to the outer membrane does not only require an appropriate Lol sorting signal, 177 as proposed by the current model for lipoprotein sorting9, but also the presence of an N-terminal 178 linker. We selected RcsF, whose accumulation in the inner membrane can be easily tracked by 179 monitoring Rcs activity30,37, to investigate the structural features of the linker controlling 180 lipoprotein maturation; keeping as little as 10% of the total pool of RcsF molecules in the inner 181 membrane is sufficient to fully activate Rcs30. 182 183 We first tested whether changing the sequence of the N-terminal segment while preserving its 184 disordered character still yielded normal targeting of the protein to the outer membrane. To that 185 end, we prepared an RcsF variant in which the N-terminal linker was replaced by an artificial, 186 unstructured sequence (Extended Data Table 2, Extended Data Fig. 2, Extended Data Fig. 187 4) of similar length and consisting mostly of GS repeats (RcsFGS). Substituting the wild-type 188 linker with this artificial sequence was remarkably well tolerated by RcsF: RcsFGS was targeted 189 normally to the outer membrane (Fig. 3a) and did not constitutively activate the stress system 190 (Fig. 3b). Thus, although RcsFGS has an N-terminus with a completely different primary 191 structure, it behaved like the wild-type protein. 192 193 10 We then investigated whether the N-terminal linker required a minimal length for proper 194 targeting and function. We therefore constructed two RcsF variants with shorter, unstructured, 195 artificial linkers (RcsFGS2 and RcsFGS3, with linkers of 18 and 10 residues, respectively; 196 Extended Data Table 2, Extended Data Fig. 2, Extended Data Fig. 4). Importantly, RcsFGS2 197 and, to a greater extent, RcsFGS3 did not properly localize to the outer membrane: the shorter 198 the linker, the more RcsF remained in the inner membrane (Fig. 3a). Consistent with the amount 199 of RcsFGS2 and RcsFGS3 retained in the inner membrane, Rcs activation levels were inversely 200 related to linker length (Fig. 3b). 201 202 The disordered character of the linker is required for normal targeting 203 Taken together, the results above demonstrated that the RcsF linker can be replaced with an 204 artificial sequence lacking secondary structure, provided that it is of appropriate length. Next, 205 we sought to directly probe the importance of having a disordered linker by replacing the RcsF 206 linker with an alpha-helical segment 35 amino acids long from the periplasmic chaperone FkpA 207 (RcsFFkpA; Extended Data Table 2, Extended Data Fig. 2, Extended Data Fig. 4). 208 Introducing order at the N-terminus of RcsF dramatically impacted the protein distribution 209 between the two membranes: RcsFFkpA was substantially retained in the inner membrane (Fig. 210 3c) and constitutively activated Rcs (Fig. 3d). As alpha-helical segments are considerably 211 shorter than unstructured sequences containing a similar number of amino acids, we also 212 prepared an RcsF variant (RcsFcol) with a longer alpha helix from the helical segment of colicin 213 Ia, which is 73 amino acids in length and also predicted to remain folded in the RcsFcol construct 214 (Extended Data Table 2, Extended Data Fig. 2, Extended Data Fig. 4). However, doubling 215 the size of the helix had no impact, with RcsFcol behaving similarly to RcsFFkpA (Fig. 3c, 3d). 216 Together, these data demonstrate that having an N-terminal disordered linker downstream of 217 the Lol sorting signal is required to correctly target RcsF to the outer membrane. The length of 218 11 the linker is important, but the sequence is not, on the condition that the linker does not fold 219 into a defined secondary structure. 220 221 The disordered linker is required for optimal processing by Lol 222 Our finding that N-terminal disordered linkers function as molecular determinants of the 223 targeting of lipoproteins to the outer membrane raised the question of whether these linkers 224 work in a Lol-dependent or Lol-independent manner. To address this mechanistic question, we 225 tested the impact of deleting lpp on the targeting of RcsF∆19-47. The lipoprotein Lpp, also known 226 as the Braun lipoprotein, covalently tethers the outer membrane to the peptidoglycan and 227 controls the size of the periplasm38,39. Being expressed at ~1 million copies per cell28, Lpp is 228 numerically the most abundant protein in E. coli. Thus, by deleting lpp, we considerably 229 decreased the load on the Lol system by removing its most abundant substrate. Remarkably, 230 lpp deletion fully rescued the targeting of RcsF∆19-47 to the outer membrane (Fig. 4a), indicating 231 that the linker functions in a Lol-dependent manner and suggesting that accumulation of 232 RcsF∆19-47 in the inner membrane results from a decreased ability of the Lol system to process 233 the linker-less RcsF variant. Importantly, similar results were obtained with NlpD∆29-64, which 234 was also correctly targeted to the outer membrane in cells lacking Lpp (Fig. 4a). Pal∆26-56 could 235 not be tested because membrane fractionation failed with lpp pal double mutant cells whether 236 or not they expressed Pal∆26-56 (data not shown). 237 238 To obtain further insights into the mechanism at play here, we next monitored whether linker 239 deletion impacted the transfer of RcsF from LolA to LolB in vitro. LolA with a C-terminal His-240 tag was expressed in the periplasm of cells expressing wild-type RcsF or RcsF∆19-47 and purified 241 to near homogeneity via affinity chromatography (Methods; Extended Data Fig. 5). Both RcsF 242 and RcsF∆19-47 were detected in immunoblots of the fractions containing purified LolA 243 12 (Extended Data Fig. 5), indicating that both proteins form a soluble complex with LolA and 244 confirming that they use this chaperone for transport across the periplasm. LolB was expressed 245 as a soluble protein in the cytoplasm and purified by taking advantage of a C-terminal Strep-246 tag; LolB was then incubated with LolA-RcsF or LolA-RcsF∆19-47 and pulled-down using 247 Streptactin beads (Methods). As both RcsF and RcsF∆19-47 were detected in the LolB-containing 248 pulled-down fractions (Fig. 4b), we conclude that both proteins were transferred from LolA to 249 LolB. Thus, the linker is not required for the transfer of RcsF from LolA to LolB. 250 251 Finally, we focused on the LolCDE ABC transporter in charge of extracting outer membrane 252 lipoproteins and transferring them to LolA. Over-expression (Extended Data Fig. 6a) of all 253 components of this complex failed to rescue normal targeting of RcsF∆19-47 to the outer 254 membrane (Extended Data Fig. 6b). Likewise, over-expressing the enzymes involved in 255 lipoprotein maturation (Lgt, LspA, and Lnt; Fig. 1) had no impact on membrane targeting 256 (Extended Data Fig. 7a, 7b). Thus, taken together, our results suggest that retention of RcsF∆19-257 47 in the inner membrane does not result from the impairment of a specific step, but rather from 258 less efficient processing of the truncated lipoprotein by the entire lipoprotein maturation 259 pathway (see Discussion). 260 13 Discussion 261 262 Lipoproteins are crucial for essential cellular processes such as envelope assembly and 263 virulence. However, despite their functional importance and their potential as targets for new 264 antibacterial therapies, we only have a vague understanding of the molecular factors that control 265 their biogenesis. By discovering the role played by N-terminal disordered linkers in lipoprotein 266 sorting, this study adds an important new layer to our comprehension of lipoprotein biogenesis 267 in Gram-negative bacteria. Critically, it also indicates that the current model of lipoprotein 268 sorting—that sorting between the two membranes is controlled by the 2 or 3 residues that are 269 adjacent to the lipidated cysteine40—needs to be revised. Lipoproteins with unstructured linkers 270 at their N-terminus are commonly found in Gram-negative bacteria including many pathogens 271 (see below); further work will be required to determine whether these linkers control lipoprotein 272 targeting in organisms other than E. coli, laying the foundation for designing new antibiotics. 273 274 It was previously shown that both lolA and lolB (but not lolCDE) can be deleted under specific 275 conditions21, suggesting at least one alternate route for the transport of lipoproteins across the 276 periplasm and their delivery to the outer membrane. During this investigation, we envisaged 277 the possibility that the linker could be required to transport lipoproteins via a yet-to-be-278 identified pathway independent of LolA/LolB. However, our observations that both RcsF and 279 RcsF∆19-47 were found in complex with LolA (Extended Data Fig. 5) and were transferred by 280 LolA to LolB (Fig. 4b) does not support this hypothesis. Instead, our data clearly indicate that 281 lipoproteins with N-terminal linkers still depend on the Lol system for extraction from the inner 282 membrane and transport to the outer membrane (Extended Data Fig. 1a); they also suggest 283 that N-terminal linkers improve lipoprotein processing by Lol (see below). 284 285 14 We note that two of the lipoproteins under investigation here, Pal and RcsF, have been reported 286 to be surface-exposed30,41,42. A topology model has been proposed to explain how RcsF reaches 287 the surface: the lipid moiety of RcsF is anchored in the outer leaflet of the outer membrane 288 while the N-terminal linker is exposed on the cell surface before being threaded through the 289 lumen of b-barrel proteins42. Thus, in this topology, the linker allows RcsF to cross the outer 290 membrane. It is therefore tempting to speculate that N-terminal disordered linkers may be used 291 by lipoproteins as a structural device to cross the outer membrane and reach the cell surface. It 292 is worth noting that N-terminal linkers are commonly found in lipoproteins expressed by the 293 pathogens Borrelia burgdorferi and Neisseria meningitides24,43,44; lipoprotein surface exposure 294 is common in these pathogens. In addition, the accumulation of RcsF∆19-47 in the inner 295 membrane (Fig. 2a) also suggests that Lol may be using N-terminal linkers to recognize 296 lipoproteins destined to the cell surface before their extraction from the inner membrane in 297 order to optimize their targeting to the machinery exporting them to their final destination 298 (BAM in the case of RcsF30,42,45). Investigating whether a dedicated Lol-dependent route exists 299 for surface-exposed lipoproteins will be the subject of future research. 300 301 Our work also delivers crucial insights into the functional importance of disordered segments 302 in proteins in general. Most proteins are thought to present portions that are intrinsically 303 disordered. For instance, it is estimated that 30-50% of eukaryotic proteins contain regions that 304 do not adopt a defined secondary structure in vitro46. However, demonstrating that these 305 unstructured regions are functionally important in vivo is challenging. By showing that an N-306 terminal disordered segment downstream of the Lol signal is required for the correct sorting of 307 lipoproteins, our work provides direct evidence that evolution has selected intrinsic disorder by 308 function. 309 310 15 In conclusion, the data reported here establish that the triage of lipoproteins between the inner 311 and outer membranes is not solely controlled by the Lol sorting signal; additional molecular 312 determinants, such as protein intrinsic disorder, are also involved. Our data further highlight 313 the previously unrecognized heterogeneity of the important lipoprotein family and call for a 314 careful evaluation of the maturation pathways of these lipoproteins. 315 316 DATA AVAILABILITY 317 All data generated or analysed during this study are included in this published article and its 318 supplementary information file. 319 320 REFERENCES 321 1. Silhavy, T.J., Kahne, D. & Walker, S. The bacterial cell envelope. Cold Spring Harb 322 Perspect Biol 2, a000414 (2010). 323 2. Weiner, J.H. & Li, L. Proteome of the Escherichia coli envelope and technological 324 challenges in membrane proteome analysis. Biochim Biophys Acta 1778, 1698-713 325 (2008). 326 3. Ricci, D.P. & Silhavy, T.J. Outer Membrane Protein Insertion by the β-barrel Assembly 327 Machine. EcoSal Plus 8(2019). 328 4. Chimalakonda, G. et al. Lipoprotein LptE is required for the assembly of LptD by the 329 beta-barrel assembly machine in the outer membrane of Escherichia coli. Proc Natl 330 Acad Sci U S A 108, 2492-7 (2011). 331 5. Sherman, D.J. et al. Lipopolysaccharide is transported to the cell surface by a 332 membrane-to-membrane protein bridge. Science 359, 798-801 (2018). 333 6. Malinverni, J.C. et al. YfiO stabilizes the YaeT complex and is essential for outer 334 membrane protein assembly in Escherichia coli. Mol Microbiol 61, 151-64 (2006). 335 7. Laloux, G. & Collet, J.F. "Major Tom to ground control: how lipoproteins 336 communicate extra-cytoplasmic stress to the decision center of the cell". J Bacteriol 337 (2017). 338 8. Kovacs-Simon, A., Titball, R.W. & Michell, S.L. Lipoproteins of bacterial pathogens. 339 Infect Immun 79, 548-61 (2011). 340 9. Szewczyk, J. & Collet, J.F. The Journey of Lipoproteins Through the Cell: One 341 Birthplace, Multiple Destinations. Adv Microb Physiol 69, 1-50 (2016). 342 10. Babu, M.M. et al. A database of bacterial lipoproteins (DOLOP) with functional 343 assignments to predicted lipoproteins. J Bacteriol 188, 2761-73 (2006). 344 11. Narita, S.I. & Tokuda, H. Bacterial lipoproteins; biogenesis, sorting and quality 345 control. Biochim Biophys Acta Mol Cell Biol Lipids 1862, 1414-1423 (2017). 346 12. Horler, R.S., Butcher, A., Papangelopoulos, N., Ashton, P.D. & Thomas, G.H. 347 EchoLOCATION: an in silico analysis of the subcellular locations of Escherichia coli 348 16 proteins and comparison with experimentally derived locations. Bioinformatics 25, 349 163-6 (2009). 350 13. Tokuda, H. Biogenesis of outer membranes in Gram-negative bacteria. Biosci 351 Biotechnol Biochem 73, 465-73 (2009). 352 14. Tokuda, H. & Matsuyama, S. Sorting of lipoproteins to the outer membrane in E. coli. 353 Biochim Biophys Acta 1694, IN1-9 (2004). 354 15. Gennity, J.M. & Inouye, M. The protein sequence responsible for lipoprotein 355 membrane localization in Escherichia coli exhibits remarkable specificity. J Biol Chem 356 266, 16458-64 (1991). 357 16. Terada, M., Kuroda, T., Matsuyama, S.I. & Tokuda, H. Lipoprotein sorting signals 358 evaluated as the LolA-dependent release of lipoproteins from the cytoplasmic 359 membrane of Escherichia coli. J Biol Chem 276, 47690-4 (2001). 360 17. Hara, T., Matsuyama, S. & Tokuda, H. Mechanism underlying the inner membrane 361 retention of Escherichia coli lipoproteins caused by Lol avoidance signals. J Biol Chem 362 278, 40408-14 (2003). 363 18. Narita, S. & Tokuda, H. Amino acids at positions 3 and 4 determine the membrane 364 specificity of Pseudomonas aeruginosa lipoproteins. J Biol Chem 282, 13372-8 (2007). 365 19. Lewenza, S., Mhlanga, M.M. & Pugsley, A.P. Novel inner membrane retention signals 366 in Pseudomonas aeruginosa lipoproteins. J Bacteriol 190, 6119-25 (2008). 367 20. Lorenz, C., Dougherty, T.J. & Lory, S. Correct Sorting of Lipoproteins into the Inner 368 and Outer Membranes of Pseudomonas aeruginosa by the Escherichia coli LolCDE 369 Transport System. mBio 10(2019). 370 21. Grabowicz, M. & Silhavy, T.J. Redefining the essential trafficking pathway for outer 371 membrane lipoproteins. Proc Natl Acad Sci U S A 114, 4769-4774 (2017). 372 22. Konovalova, A. & Silhavy, T.J. Outer membrane lipoprotein biogenesis: Lol is not the 373 end. Philos Trans R Soc Lond B Biol Sci 370(2015). 374 23. Wilson, M.M. & Bernstein, H.D. Surface-Exposed Lipoproteins: An Emerging Secretion 375 Phenomenon in Gram-Negative Bacteria. Trends Microbiol 24, 198-208 (2016). 376 24. Zuckert, W.R. Secretion of bacterial lipoproteins: through the cytoplasmic 377 membrane, the periplasm and beyond. Biochim Biophys Acta 1843, 1509-16 (2014). 378 25. Pride, A.C., Herrera, C.M., Guan, Z., Giles, D.K. & Trent, M.S. The outer surface 379 lipoprotein VolA mediates utilization of exogenous lipids by Vibrio cholerae. MBio 4, 380 e00305-13 (2013). 381 26. Valguarnera, E., Scott, N.E., Azimzadeh, P. & Feldman, M.F. Surface Exposure and 382 Packing of Lipoproteins into Outer Membrane Vesicles Are Coupled Processes in 383 Bacteroides. mSphere 3(2018). 384 27. Sueki, A., Stein, F., Savitski, M.M., Selkrig, J. & Typas, A. Systematic Localization of 385 Escherichia coli Membrane Proteins. mSystems 5(2020). 386 28. Li, G.W., Burkhardt, D., Gross, C. & Weissman, J.S. Quantifying absolute protein 387 synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 388 624-35 (2014). 389 29. Gonnet, P., Rudd, K.E. & Lisacek, F. Fine-tuning the prediction of sequences cleaved 390 by signal peptidase II: a curated set of proven and predicted lipoproteins of 391 Escherichia coli K-12. Proteomics 4, 1597-613 (2004). 392 30. Cho, S.H. et al. Detecting Envelope Stress by Monitoring beta-Barrel Assembly. Cell 393 159, 1652-64 (2014). 394 17 31. Heidrich, C. et al. Involvement of N-acetylmuramyl-L-alanine amidases in cell 395 separation and antibiotic-induced autolysis of Escherichia coli. Mol Microbiol 41, 167-396 78 (2001). 397 32. Uehara, T., Parzych, K.R., Dinh, T. & Bernhardt, T.G. Daughter cell separation is 398 controlled by cytokinetic ring-activated cell wall hydrolysis. EMBO J 29, 1412-22 399 (2010). 400 33. Gerding, M.A., Ogata, Y., Pecora, N.D., Niki, H. & de Boer, P.A. The trans-envelope 401 Tol-Pal complex is part of the cell division machinery and required for proper outer-402 membrane invagination during cell constriction in E. coli. Mol Microbiol 63, 1008-25 403 (2007). 404 34. Hussein, N.A., Cho, S.H., Laloux, G., Siam, R. & Collet, J.F. Distinct domains of 405 Escherichia coli IgaA connect envelope stress sensing and down-regulation of the Rcs 406 phosphorelay across subcellular compartments. PLoS Genet 14, e1007398 (2018). 407 35. Tsang, M.J., Yakhnina, A.A. & Bernhardt, T.G. NlpD links cell wall remodeling and 408 outer membrane invagination during cytokinesis in Escherichia coli. PLoS Genet 13, 409 e1006888 (2017). 410 36. Shrivastava, R., Jiang, X. & Chng, S.S. Outer membrane lipid homeostasis via 411 retrograde phospholipid transport in Escherichia coli. Mol Microbiol 106, 395-408 412 (2017). 413 37. Farris, C., Sanowar, S., Bader, M.W., Pfuetzner, R. & Miller, S.I. Antimicrobial peptides 414 activate the Rcs regulon through the outer membrane lipoprotein RcsF. J Bacteriol 415 192, 4894-903 (2010). 416 38. Cohen, E.J., Ferreira, J.L., Ladinsky, M.S., Beeby, M. & Hughes, K.T. Nanoscale-length 417 control of the flagellar driveshaft requires hitting the tethered outer membrane. 418 Science 356, 197-200 (2017). 419 39. Asmar, A.T. et al. Communication across the bacterial cell envelope depends on the 420 size of the periplasm. PLoS Biol 15, e2004303 (2017). 421 40. Grabowicz, M. Lipoprotein Transport: Greasing the Machines of Outer Membrane 422 Biogenesis: Re-Examining Lipoprotein Transport Mechanisms Among Diverse Gram-423 Negative Bacteria While Exploring New Discoveries and Questions. Bioessays 40, 424 e1700187 (2018). 425 41. Michel, L.V. et al. Dual orientation of the outer membrane lipoprotein Pal in 426 Escherichia coli. Microbiology 161, 1251-9 (2015). 427 42. Konovalova, A., Perlman, D.H., Cowles, C.E. & Silhavy, T.J. Transmembrane domain of 428 surface-exposed outer membrane lipoprotein RcsF is threaded through the lumen of 429 beta-barrel proteins. Proc Natl Acad Sci U S A 111, E4350-8 (2014). 430 43. Brooks, C.L., Arutyunova, E. & Lemieux, M.J. The structure of lactoferrin-binding 431 protein B from Neisseria meningitidis suggests roles in iron acquisition and 432 neutralization of host defences. Acta Crystallogr F Struct Biol Commun 70, 1312-7 433 (2014). 434 44. Noinaj, N. et al. Structural basis for iron piracy by pathogenic Neisseria. Nature 483, 435 53-8 (2012). 436 45. Rodriguez-Alonso, R. et al. Structural insight into the formation of lipoprotein-beta-437 barrel complexes. Nat Chem Biol 16, 1019-1025 (2020). 438 46. Bardwell, J.C. & Jakob, U. Conditional disorder in chaperone action. Trends Biochem 439 Sci 37, 517-25 (2012). 440 18 47. Majdalani, N., Hernandez, D. & Gottesman, S. Regulation and mode of action of the 441 second small RNA activator of RpoS translation, RprA. Mol Microbiol 46, 813-26 442 (2002). 443 48. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout 444 mutants: the Keio collection. Mol Syst Biol 2, 2006 0008 (2006). 445 49. Cherepanov, P.P. & Wackernagel, W. Gene disruption in Escherichia coli: TcR and 446 KmR cassettes with the option of Flp-catalyzed excision of the antibiotic-resistance 447 determinant. Gene 158, 9-14 (1995). 448 50. Gil, D. & Bouche, J.P. ColE1-type vectors with fully repressible replication. Gene 105, 449 17-22 (1991). 450 51. Yu, D. et al. An efficient recombination system for chromosome engineering in 451 Escherichia coli. Proc Natl Acad Sci U S A 97, 5978-83 (2000). 452 52. Sklar, J.G. et al. Lipoprotein SmpA is a component of the YaeT complex that 453 assembles outer membrane proteins in Escherichia coli. Proc Natl Acad Sci U S A 104, 454 6400-5 (2007). 455 53. Miller, J.C. Experiments in Molecular Genetics, (Cold Spring Harbor Laboratory Press, 456 New York, 1972). 457 54. Šali, A. & Blundell, T.L. Comparative Protein Modelling by Satisfaction of Spatial 458 Restraints. Journal of Molecular Biology 234, 779-815 (1993). 459 55. Pettersen, E.F. et al. UCSF Chimera - A visualization system for exploratory research 460 and analysis. Journal of Computational Chemistry 25, 1605-1612 (2004). 461 56. Guzman, L.M., Belin, D., Carson, M.J. & Beckwith, J. Tight regulation, modulation, and 462 high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol 463 177, 4121-30 (1995). 464 465 466 ACKNOWLEDGMENTS 467 We thank Asma Boujtat for technical help. We are indebted to the members of the Collet 468 laboratory and to Nassos Typas (EMBL, Heidelberg) for helpful suggestions and discussions 469 and to Tom Silhavy (Princeton) for providing bacterial strains. J.S. was a research fellow of the 470 FRIA and J.F.C. is an Investigator of the FRFS-WELBIO. This work was funded by the 471 WELBIO, by grants from the F.R.S.-FNRS, from the Fédération Wallonie-Bruxelles (ARC 472 17/22-087), from the European Commission via the International Training Network 473 Train2Target (721484), and from the EOS Excellence in Research Program of the FWO and 474 FRS-FNRS (G0G0818N). 475 476 AUTHOR CONTRIBUTIONS 477 19 J.-F.C., J.E.R., J.S., and S.H.C. designed and performed the experiments. J.E.R., J.S., and 478 S.H.C. constructed the strains and cloned the constructs. J.-F.C., J.E.R., J.S., S.H.C., and A.M. 479 analyzed and interpreted the data. B.I.I. performed the structural analysis. J.-F.C., J.E.R., and 480 J.S. wrote the manuscript. All authors discussed the results and commented on the manuscript. 481 20 FIGURE LEGENDS 482 483 Figure 1. Structural analysis of lipoproteins reveals that half of outer membrane 484 lipoproteins display an intrinsically disordered linker at the N-terminus. 485 Structures were generated via comparative modeling (Methods). X-ray and cryo-EM structures 486 are green, NMR structures are cyan, and structures built via comparative modeling from the 487 closest analog in the same PFAM group are orange. In all cases, the N-terminal linker is 488 magenta. Lipoproteins targeting the outer membrane: Pal, OsmE, NlpE, NlpC, MltB, NlpI, 489 MltC, RcsF, YajI, YcfL, YbaY, RlpA, NlpD, YcaL. The 29 remaining lipoproteins are shown 490 in Extended Data Figure 2. 491 492 Figure 2. The N-terminal linker displayed by lipoproteins is important for outer 493 membrane targeting. 494 a, b, c. The outer membrane (OM) and inner membrane (IM) were separated via centrifugation 495 in a three-step sucrose density gradient (Methods). While (c) RcsFWT, (d) NlpDWT, and (e) 496 PalWT were found predominantly in the OM, RcsF∆19-47, NlpD∆29-64, and Pal∆26-56 were 497 substantially retained in the IM. Data are presented as the ratio of signal intensity in a single 498 fraction to the total intensity in all fractions. All variants were expressed from plasmids 499 (Extended Data Table 4). DsbD and Lpp were used as controls for the OM and IM, 500 respectively. d. The Rcs system is constitutively active when RcsF’s linker is missing. Rcs 501 activity was measured with a beta-galactosidase assay in a strain harboring a transcriptional 502 rprA::lacZ fusion (Methods). Results were normalized to expression levels of RcsF variants 503 (mean ± standard deviation; n = 6 biologically independent experiments) e. Phase-contrast 504 images of the envC::kan ∆nlpD mutant complemented with NlpDWT or NlpD∆29-64. NlpD∆29-64 505 only partially rescues the chaining phenotype of the envC::kan ∆nlpD double mutant. Scale 506 21 bar, 5 µm. f. Expression of Pal∆26-56 does not rescue the sensitivity of the pal::kan mutant to 507 SDS-EDTA. Cells were grown in LB medium at 37 °C until OD600 = 0.5. Tenfold serial 508 dilutions were made in LB, plated onto LB agar or LB agar supplemented with 0.01% SDS and 509 0.5 mM EDTA, and incubated at 37 °C. Images in a, b, c, e, and f are representative of 510 biological triplicates. Graphs in a, b, and c were created by spline analysis of curves 511 representing a mean of three independent experiments. 512 513 Figure 3. The length and the disordered character of the RcsF linker play key roles in 514 RcsF targeting to the outer membrane. 515 a. The outer membrane (OM) and inner membrane (IM) were separated via centrifugation in a 516 three-step sucrose density gradient (Methods). DsbD and Lpp were used as controls for the OM 517 and IM, respectively. The longer the linker, the more protein was correctly translocated to the 518 IM. Bar graphs denote mean ± standard deviation of n = 3 biologically independent 519 experiments. Images are representative of experiments and immunoblots performed in 520 biological triplicate. b. Rcs activity was measured with a beta-galactosidase assay in a strain 521 harboring a transcriptional rprA::lacZ fusion (Methods). Results were normalized to expression 522 levels of RcsF variants (mean ± standard deviation of n = 6 biologically independent 523 experiments). Rcs activity relates to the quantity of RcsF retained in the inner membrane. c. 524 RcsF mutants harboring alpha helical linkers (RcsFFkpA and RcsFcol) were subjected to two 525 consecutive centrifugations in sucrose density gradients (Methods). Both mutants were 526 inefficiently translocated from the IM to the OM (mean ± standard deviation of n = 3 527 biologically independent experiments). Images are representative of experiments and 528 immunoblots performed in biological triplicate. d. The Rcs system was constitutively active in 529 RcsFFkpA and RcsFcol strains; activation levels were comparable to those of RcsF∆19-47. Rcs 530 activity was measured as in b. Results were normalized as in b. 531 22 532 Figure 4. N-terminal disordered linkers interact with the Lol system to target lipoproteins 533 to the outer membrane. 534 a. Deleting Lpp rescues normal targeting of RcsF∆19-47 and NlpD∆29-64 to the outer membrane. 535 The outer and inner membranes were separated via centrifugation in a sucrose density gradient 536 (Methods). Whereas RcsF∆19-47 and NlpD∆29-64 accumulate in the inner membrane of cells 537 expressing Lpp, the most abundant Lol substrate, they are normally targeted to the outer 538 membrane in cells lacking Lpp (mean ± standard deviation of n = 3 biologically independent 539 experiments). b. In vitro pull-down experiments show that RcsFWT and RcsF∆19-47 are 540 transferred from LolA to LolB. LolA-RcsFWT and LolA- RcsF∆19-47 complexes were obtained 541 by LolA-His affinity chromatography followed by size exclusion chromatography (Methods). 542 Each complex was incubated with LolB-Strep that was previously purified via Strep-Tactin 543 affinity chromatography (Methods). Both RcsF variants were eluted in complex with LolB-544 strep, while LolA was only present in the flow through. I, input; FT, flow through; E, eluate. 545 546 23 FIGURES 547 Figure 1 548 549 550 24 551 Figure 2 552 553 25 554 Figure 3 555 556 557 26 558 Figure 4 559 560 561 562 563 564 565 566 27 567 METHODS 568 569 Bacterial growth conditions 570 Bacterial strains used in this study are listed in Extended Data Table 3. Bacterial cells were 571 cultured in Luria broth (LB) at 37 °C unless stated otherwise. The following antibiotics were 572 added when appropriate: spectinomycin (100 µg/mL), ampicillin (200 µg/mL), 573 chloramphenicol (25 µg/mL), and kanamycin (50 µg/mL). L-arabinose (0.2%) and isopropyl-574 β-D-thiogalactoside (IPTG) were used for induction when appropriate. 575 576 Bacterial strains and plasmids 577 DH300 (a derivative of Escherichia coli MG1655 carrying a chromosomal rprA::lacZ fusion at 578 the λ attachment site47) was used as wildtype throughout the study. All deletion mutants were 579 obtained by transferring the corresponding alleles from the Keio collection48 (kanR) into 580 DH30047 via P1 phage transduction. Deletions were verified by PCR and the absence of the 581 protein was verified via immunoblotting (when possible). If necessary, the kanamycin cassette 582 was removed via site-specific recombination mediated by the yeast Flp recombinase with 583 pCP20 vector49. All strains expressing the RcsF mutants used for subcellular fractionation 584 lacked rcsB in order to prevent induction of Rcs. 585 586 The plasmids used in this study are listed in Extended Data Table 4 and the primers appear in 587 Extended Data Table 5. RcsF, Pal, and NlpD were expressed from the low-copy vector 588 pAM23850 containing the SC101 origin of replication and the lac promoter. To produce pSC202 589 for RcsF expression, rcsF (including approximately 30 base pairs upstream of the coding 590 sequence) was amplified by PCR from the chromosome of DH300 (primer pair SH_RcsF(PstI)-591 28 R and SH_RcsFU-R (kpnI)-F). The amplification product was digested with KpnI and PstI and 592 inserted into pAM238, resulting in pSC202. nlpD was amplified using primers JR1 and JR2 593 and pal was amplified with primers JS145 and JS146. Amplification products were digested 594 with PstI-XbaI and KpnI-XbaI, respectively, generating pJR8 (for NlpD expression) and pJS20 595 (for Pal expression). To clone rcsFΔ19-47, the nucleotides encoding the RcsF signal sequence 596 were amplified using primers SH_RcsFUR(kpnI)_F and SH_RcsFss-Fsg (NcoI)_R, and those 597 encoding the RcsF signaling domain were amplified using primers SH_RcsFss-Fsg (NcoI)_R 598 and SH_RcsF(PstI)_R. In both cases, pSC202 was used as template. Then, overlapping PCR 599 was performed using SH_RcsFUR(kpnI)_F and SH_RcsF(PstI)_R from the two PCR products 600 previously obtained. The final product was digested with KpnI and PstI, and ligated with 601 pAM238 pre-digested with the same enzymes, yielding pSC201. To add a GS linker (Ser-Gly-602 Ser-Gly-Ser-Gly-Ala-Met) into pSC201, the primers SH_GS linker_F and SH_GS linker_R 603 were mixed, boiled, annealed at room temperature, and ligated with pSC201 pre-digested with 604 NcoI, generating pSC198. pSC199 was generated similarly, but using primers SH_SG linker_F 605 and SH_SG linker_R and plasmid pSC198. pSC200 was generated using primers SH_Da 606 linker_F and SH_SG linker_R and plasmid pSC199. The pal allele lacking the linker region 607 (palΔ26-56) was created via overlapping PCR. The pJS20 plasmid served as template for PCR 608 with the M13R/M13F external primers and JS152/JS153 internal primers. The truncated allele 609 was cloned into pAM238 at the same restriction sites as the full-length allele, producing pJS24. 610 The nlpD allele lacking the linker regions (nlpDΔ29-64) was created via overlapping PCR. E. coli 611 chromosomal nlpD served as template for the PCR, with JR1/JR2 as external primers and 612 JR7/JR8 as internal primers. The truncated allele was then cloned into pAM238 at the same 613 restriction sites as the full-length allele, producing pJR10. 614 615 29 rcsFFkpA and rcsFcol were obtained by inserting DNA sequences corresponding to helical linker 616 fragments (FkpA Ser94-Glu125 and colicin IA Ile213-Lys282) into rcsFΔ19-47 at NcoI and RsrII 617 restriction sites. The fkpA gene fragment was amplified from the E. coli MC4100 chromosome 618 (JS50/JS51 primers) and the cia gene fragment was chemically synthetized as a gene block by 619 Integrated DNA Technologies (IDT). The resulting plasmids were pJS18 and pJS27, 620 respectively. pAM238 does not contain the lacIq repressor. Therefore, to enable expression-621 level regulation by IPTG, strains containing the pAM238 plasmids expressing RcsF variants 622 were co-transformed with pET22b, a high-copy plasmid from a different incompatibility group 623 (pBR223 origin of replication; Novagen) containing the lacIq repressor. Chromosomal 624 insertion of RcsFΔ19-47 was performed via λ-Red recombineering51 with pSIM5-Tet plasmid (a 625 gift of D. Hughes). In the first step, the cat-sacB cassette was introduced and later replaced by 626 mutant rcsF. 627 628 The chromosomal lolCDE operon was amplified via PCR using primers JS277 and JS278 629 (adding a C-terminal His-tag to LolE) and then inserted into pBAD33 using the restriction sites 630 PstI and XbaI, resulting in pJR203. The expression level of LolE-His was verified via 631 immunoblotting. 632 633 The sequence encoding lolB without its N-terminal cysteine was first amplified from the 634 chromosome via PCR using primers JR50/PL387 (adding a C-terminal Strep-tag). It was then 635 cloned into pET28a using the restriction sites XbaI and PstI. lolA was amplified using 636 chromosomal lolA as PCR template for primers JR30/JR31 (JR31 contains the sequence of a 637 His-tag) and then cloned into pBAD18 using KpnI and XbaI, resulting in pJR48. 638 639 30 The genes encoding Lgt and Lnt were amplified from the chromosome with PCR primers 640 AG389/AG403 and AG393/JR74, respectively. AG403 and JR74 also encode a Myc-tag. PCR 641 products were cloned into pAM238 using KpnI and PstI. Expression levels were verified via 642 immunoblotting (data not shown). lspA was amplified with PCR primers JR77/JR78. The PCR 643 product was cloned into pSC213, a modified pAM238 with a ribosome binding site and a C-644 terminal Flag tag, using NcoI and BamHI. Expression of LspA-Flag was induced by adding 25 645 µM IPTG. Expression levels were verified with immunoblots (data not shown). 646 647 Cell fractionation and sucrose density gradients 648 Cell fractionation was performed as described previously52 with some modifications. Four 649 hundred milliliters of cells were grown until the optical density at 600 nm (OD600) of the culture 650 reached 0.7. Cells were harvested via centrifugation at 6,000 x g at 4 °C for 15 min, washed 651 with TE buffer (50 mM Tris-HCl pH 8, 1 mM EDTA), and resuspended in 20 mL of the same 652 buffer. The washing step was skipped with the Dlpp strains to prevent the loss of outer 653 membrane vesicles. DNase I (1 mg; Roche), 1 mg RNase A (Thermo Scientific), and a half 654 tablet of a protease inhibitor cocktail (cOmplete EDTA-free Protease Inhibitor Cocktail tablets; 655 Roche) were added to cell suspensions, and cells were passed through a French pressure cell at 656 1,500 psi. After adding MgCl2 to a final concentration of 2 mM, the lysate was centrifuged at 657 5,000 x g at 4 °C for 15 min in order to remove cell debris. Then, 16 mL of supernatant were 658 placed on top of a two-step sucrose gradient (2.3 mL of 2.02 M sucrose in 10 mM HEPES pH 659 7.5 and 6.6 mL of 0.77 M sucrose in 10 mM HEPES pH 7.5). The samples were centrifuged at 660 180,000 x g for 3 h at 4 °C in a 55.2 Ti Beckman rotor. After centrifugation, the soluble fraction 661 and the membrane fraction were collected. The membrane fraction was diluted four times with 662 10 mM HEPES pH 7.5. To separate the inner and the outer membranes, 7 mL of the diluted 663 membrane fraction were loaded on top of a second sucrose gradient (10.5 mL of 2.02 M sucrose, 664 31 12.5 mL of 1.44 M sucrose, 7 mL of 0.77 M sucrose, always in 10 mM HEPES pH 7.5). The 665 samples were then centrifuged at 112,000 x g for 16 h at 10 °C in a SW 28 Beckman rotor. 666 Approximately 30 fractions of 1.5 mL were collected and odd-numbered fractions were 667 subjected to SDS-PAGE, transferred onto a nitrocellulose membrane, and probed with specific 668 antibodies. Graphs were created in GraphPad Prism 9 via spline analysis of the curves 669 representing a mean of three independent experiments. 670 671 Immunoblotting 672 Protein samples were separated via 10% or 4-12% SDS-PAGE (Life Technologies) and 673 transferred onto nitrocellulose membranes (GE Healthcare Life Sciences). The membranes 674 were blocked with 5% skim milk in 50 mM Tris-HCl pH 7.6, 0.15 M NaCl, and 0.1% Tween 675 20 (TBS-T). TBS-T was used in all subsequent immunoblotting steps. The primary antibodies 676 were diluted 5,000 to 20,000 times in 1% skim milk in TBS-T and incubated with the membrane 677 for 1 h at room temperature. The anti-RcsF, anti-DsbD, anti-Lpp, anti-NlpD, anti-LolA, and 678 anti-LolB antisera were generated by our lab. Anti-Pal was a gift from R. Lloubès, and anti-His 679 is a peroxidase-conjugated antibody (Qiagen). The membranes were incubated for 1 h at room 680 temperature with horseradish peroxidase-conjugated goat anti-rabbit IgG (Sigma) at a 1:10,000 681 dilution. Labelled proteins were detected via enhanced chemiluminescence (Pierce ECL 682 Western Blotting Substrate, Thermo Scientific) and visualized using X-ray film (Fuji) or a 683 camera (Image Quant LAS 4000 and Vilber Fusion solo S). In order to quantify proteins levels, 684 band intensities were measured using ImageJ version 1.46r (National Institutes of Health). 685 686 β-galactosidase assay 687 β-galactosidase activity was measured as described previously53. Graphs representing a mean 688 of six experiments with standard deviation were prepared in GraphPad Prism. Expression-level 689 32 estimations were performed as follows. Cultures used for β-galactosidase activity (0.5 mL per 690 culture) were precipitated with 10% trichloroacetic acid, washed with ice-cold acetone, and 691 resuspended in 0.2 mL Laemmli SDS sample buffer. Samples (5 µL) were subjected to SDS-692 PAGE and immunoblotted with anti-RcsF antibody. 693 694 SDS-EDTA sensitivity assay 695 Cells were grown in LB at 37 °C until they reached an OD600 of 0.7. Tenfold serial dilutions 696 were made in LB and plated on LB agar supplemented with spectinomycin (100 µg/mL) when 697 necessary. Plates were incubated at 37 °C. To evaluate the sensitivity of the pal mutant, plates 698 were supplemented with 0.01% SDS and 0.5 mM EDTA. 699 700 Microscopy image acquisition 701 Cells were grown in LB at 37 °C until OD600 = 0.5. Cells growing in exponential phase were 702 spotted onto a 1% agarose phosphate-buffered saline pad for imaging. Cells were imaged on a 703 Nikon Eclipse Ti2-E inverted fluorescence microscope with a CFI Plan Apochromat DM 704 Lambda 100X Oil, N.A. 1.45, W.D. 0.13 mm objective. Images were collected on a Prime 95B 705 25 mm camera (Photometrics). We used a Cy5-4050C (32 mm) filter cube (Nikon). Image 706 acquisition was performed with NIS-Element Advance Research version 4.5. 707 708 Protein purification 709 JR90 cells were grown in LB supplemented with kanamycin (50 µg/mL) at 37 °C. When the 710 culture OD600 = 0.5, the expression of cytoplasmic LolB-Strep was induced with 1 mM IPTG. 711 Cells (1 L) were pelleted when they reached OD600 = 3 and resuspended in 25 mL of buffer A 712 (200 mM NaCl and 50 mM NaPi, pH 8) containing one tablet of cOmplete EDTA-free Protease 713 Inhibitor Cocktail (Roche). Cells were lysed via two passages through a French pressure cell at 714 33 1,500 psi. The lysate was centrifuged at 30,000 x g for 40 min at 4 °C in a JA 20 rotor and the 715 supernatant was mixed with Strep-Tactin resin (IBA Lifesciences) previously equilibrated with 716 buffer A. After washing the resin with 10 column volumes of buffer A, LolB-Strep was eluted 717 with 5 column volumes of buffer A supplemented with 5 mM desthiobiotin. LolB-Strep was 718 finally desalted using a PD10 column (GE Healthcare). 719 720 Soluble LolA-RcsFWT and LoA-RcsFΔ19-47 complexes were purified via affinity 721 chromatography as follows. Cells co-expressing LolA either with wild-type RcsF (JR47) or 722 RcsFΔ19-47 (JR44) were grown in LB at 37 °C supplemented with 200 µg/mL ampicillin until 723 OD600 = 0.5. Protein expression was then induced with 0.2% arabinose. Cells (1 L) were 724 pelleted at OD600 = 3 and resuspended in 25 mL of buffer A containing one tablet of protease 725 inhibitor cocktail. Cells were lysed via two passages through a French pressure cell at 1,500 726 psi. The lysate was centrifuged at 45,000 x g for 30 min at 4 °C using a 55.2 Ti Beckman rotor. 727 To obtain the soluble fraction, the supernatant was centrifuged at 180,000 x g for 1 h at 4 °C 728 using the same rotor. The supernatant was added to a His Trap HP column (Merck) previously 729 equilibrated with buffer A. The column was washed with 10 column volumes of buffer A 730 supplemented with 20 mM imidazole and LolA-His was eluted using a gradient of imidazole 731 (from 20 mM to 300 mM). The fractions obtained were analyzed via SDS-PAGE; LolA was 732 detected around 25 kDa (data not shown). RcsF variants were detected via immunoblotting with 733 an anti-RcsF antibody. Fractions containing LolA-RcsF variants were pooled, concentrated to 734 1 mL using a Vivaspin 4 Turbo concentrator (Cut-off 5 kDa; Sartorius), and purified via size-735 exclusion chromatography with a Superdex S75-10/300 column (GE Healthcare). 736 737 Pull down and transfer of RcsF variants from LolA to LolB 738 34 LolB-Strep was incubated at 30 °C for 20 min under agitation with LolA-RcsFWT or with LolA-739 RcsFΔ19-47 (LolA-RcsFWT and LolA-RcsFΔ19-47 complexes were purified as described above). 740 The mixture was added to magnetic Strep beads (MagStrep type 3 beads, IBA Life science) 741 previously equilibrated with buffer A and incubated for 30 min at 4 °C on a roller. After washing 742 the beads with the same buffer, LolB-Strep was eluted with buffer A supplemented with 50 mM 743 biotin. Samples were analyzed via SDS-PAGE and LolA and LolB were detected with 744 Coomassie Brilliant Blue (Bio-Rad). RcsF was detected via immunoblotting with an anti-RcsF 745 antibody. 746 747 Structural analysis of lipoproteins 748 When X-ray, cryo-EM, or NMR structures were available, the missing residues were completed 749 through comparative modeling using MODELLER version 9.2254. If no structure of the 750 lipoprotein was available, then the most pertinent analogous structure from proteins belonging 751 to the same PFAM group was used as template for comparative modeling. The linker was 752 defined as the unstructured fragment from the N-terminal Cys of the mature form until the first 753 residue with well-defined secondary structure (α-helix or β-strand) belonging to a globular 754 domain. Short, intermediate, and long linkers had lengths of <12, 12-22, and >22 residues, 755 respectively. Images were generated using UCSF Chimera version 1.13.155. 756 757 35 LEGENDS FOR FIGURES IN THE EXTENDED DATA 758 759 Extended Data Figure 1. Lipoprotein maturation and sorting in the E. coli cell envelope. 760 a. After processing by Lgt (step 1), LspA (step 2), and Lnt (step 3), a new lipoprotein either 761 remains in the inner membrane or is extracted by the LolCDE complex (step 4), depending on 762 the residues at position +2 and +3. LolCDE transfers the lipoprotein to the periplasmic 763 chaperone LolA (step 5), which delivers the lipoprotein to LolB (step 6). LolB, a lipoprotein 764 itself, inserts the lipoprotein in the outer membrane using a poorly understood mechanism (step 765 7). b. Schematic of lipoprotein structural domains. The N-terminal signal sequence targets the 766 lipoprotein to the cell envelope; the last four amino acid residues of the signal sequence form 767 the lipobox. The last residue of the lipobox is the invariant cysteine that undergoes lipidation. 768 This cysteine, which is the first residue of the mature lipoprotein, is directly followed by the 769 sorting signal, a sequence of 2 or 3 amino acids that controls the sorting of mature lipoproteins 770 between the inner and outer membranes. The C-terminal portion of a mature lipoprotein is a 771 globular domain. An intrinsically disordered linker separates the sorting signal from the 772 globular domain in about half of E. coli lipoproteins (Fig. 1; Extended Data Fig. 2; Extended 773 Data Table 1). The lengths of the deleted disordered linkers of the unrelated lipoproteins RcsF, 774 Pal, and NlpD are indicated. LP, lipoprotein. 775 776 Extended Data Figure 2. Structural analysis of lipoproteins reveals that half of outer 777 membrane lipoproteins display an intrinsically disordered linker at the N-terminus. 778 Structures were generated via comparative modeling. X-ray and cryo-EM structures are green, 779 NMR structures are cyan, and structures built via comparative modeling from the closest analog 780 in the same PFAM group are orange. In all cases, the N-terminal linker is magenta. Lipoproteins 781 targeting the outer membrane: AmiD, BamB, BamC, HslJ, MltA, LoiP, LpoB, Blc, BamE, 782 CsgG, EmtA, GfcE, BamD, LpoA, LolB, LptE, MlaA, MliC, YddW, YedD, YghG, YfeY, 783 36 YbjP, YiaD, YbhC, PqiC, YgeR, YfiB, YraP. Lipoproteins targeting the IM: DcrB, MetQ, 784 NlpA, YcjN, YehR, ApbE. Synthetic constructs: RcsFGS, RcsFGS2, RcsFGS3, RcsF∆19-47, 785 RcsFFkpA, RcsFcol, NlpD∆29-64, Pal∆26-56. 786 787 Extended Data Figure 3. Expression levels of RcsF∆19-47, Pal∆26-56, and NlpD∆29-64. 788 Cells were grown at 37 °C in LB until OD600 = 0.5 and precipitated with trichloroacetic acid 789 (Methods). Immunoblots were performed with a-RcsF, a-NlpD, and a-Pal antibodies 790 (Methods). All images are representative of three independent experiments. 791 792 Extended Data Figure 4. Schematic of RcsF variants used in this study and their 793 distributions in the outer membrane (OM) and inner membrane (IM). 794 RcsFGS, RcsFGS2, and RcsFGS3 have linkers that are disordered and mostly consist of GS repeats. 795 The linker of RcsFGS is the same length as the linker of RcsFWT. RcsFGS2 and RcsFGS3 are shorter 796 than RcsFWT. Regions of RcsFFkpA and RcsFcol fold into alpha helices borrowed from the 797 sequences of FkpA and colicin Ia, respectively. 798 799 Extended Data Figure 5. Complexes between LolA and RcsFWT or RcsF∆19-47 can be 800 purified. 801 Both RcsFWT (a) and RcsF∆19-47 (b) were eluted in complex with LolA-His via affinity 802 chromatography followed by size exclusion chromatography. Gel filtration was performed with 803 a Superdex S75-10/300 column. Samples were analyzed via SDS-PAGE and proteins, 804 including LolA-His, were stained with Coomassie Brilliant Blue (Methods). RcsF variants were 805 detected by immunoblotting fractions with a-RcsF antibodies. Images are representative of 806 three independent experiments. 807 808 37 Extended Data Figure 6. Overexpression of Lol CDE does not restore targeting of RcsF∆19-809 47. 810 a. Expression level of LolCDE-His. Cells were grown in LB plus 0.2% arabinose at 37 °C until 811 OD600 = 0.7 (Methods). Membrane and soluble fractions were separated with a sucrose density 812 gradient (Methods). LolE-His was detected in the membrane fraction by immunoblotting with 813 a-His (Methods). Images are representative of three independent experiments. b. The outer 814 membrane (OM) and inner membrane (IM) were separated with a sucrose density gradient. 815 Expression of LolCDE did not rescue OM targeting of RcsF∆19-47. Images are representative of 816 experiments performed in biological triplicate. 817 818 Extended Data Figure 7. Overexpressing Lgt, LspA, and Lnt does not rescue the targeting 819 of RcsF∆19-47 to the outer membrane. 820 a. Expression levels of Lgt, LspA, and Lnt. Cells were grown in LB (plus 25 µM IPTG for cells 821 expressing LspA) at 37 °C until OD600 = 0.7 (Methods). Outer membrane (OM) and inner 822 membrane (IM) were separated with a sucrose density gradient (Methods). Lgt-Myc and Lnt-823 Myc were detected in the IM via immunoblotting with a-Myc. LspA-Flag was detected in the 824 IM with a-Flag. b. Cells overexpressing Lgt, LspA, or Lnt were exposed to a sucrose density 825 gradient (Methods). RcsF∆19-47 was retained in the IM in all conditions. Images are 826 representative of three independent experiments. 827 828 38 EXTENDED DATA FIGURES 829 Extended Data Figure 1 830 831 832 833 39 Extended Data Figure 2 834 835 836 40 Extended Data Figure 3 837 838 839 41 Extended Data Figure 4 840 841 842 42 Extended Data Figure 5 843 844 43 Extended Data Figure 6 845 846 44 Extended Data Figure 7 847 848 45 EXTENDED DATA TABLES 849 850 Extended Data Table 1: List of the verified lipoproteins of E. coli used for the structural 851 analysis in this study. 852 Attached Excel sheet 853 854 Extended Data Table 2: RcsF mutants used in this study and the amino acid sequences of 855 their corresponding N-terminal linkers. The acylated cysteine is the first residue listed. 856 RcsF linkers Amino acid sequence RcsFWT CSMLSRSPVEPVQSTAPQPKAEPAKPKAPRATPV RcsFΔ19-47 CSMGPV RcsFGS CSMSLFDAPAMSGSGSGAMSGSGSGAMPV RcsFGS2 CSMSGSGSGAMSGSGSGAMPV RcsFGS3 CSMSGSGSGAMPV RcsFFkpA CSMGSDQEIEQTLQAFEARVKSSAQAKMEKDAADNEPV RcsFcol CSMGILDTRLSELEKNGGAALAVLDAQQARLLGQQTRNDRAISEARNKL SSVTESLNTARNALTRAEQQLTQQKPV 857 858 859 46 Extended Data Table 3: E. coli strains used in this study. 860 Strains Genotype and description Source DH300 rprA-lacZ MG1655 (argF-lac) U169 47 Keio collection single mutants rcsF::kan, rcsB::kan, pal::kan, nlpD::kan, envC::kan 48 XL1-Blue endA1 gyrA96 (nalR) thi-1 recA1 relA1 lac glnV44F’ [::Tn10 proAB+ lacIq D(lacZ)M15] hsdR17 (rK- mK+) Stratagene BL21 F- ompT hsdSB (rB- mB-) gal dcm (DE3) Novagen JS41 DH300 DrcsF pAM238 This study JS265 DH300 DrcsF pJS18 This study JS346 DH300 DrcsF rcsB::kan pET22b This study JS267 JS346 pJS18 This study JS325 DH300 pal::kan This study JS331 JS325 pJS20 This study JS345 JS325 pJS24 This study JS360 DH300 DrcsF pJS27 This study JS363 JS346 pJS27 This study JS364 DH300 DrcsF pSC202 This study JS372 DH300 DrcsF pSC201 This study JS395 JS346 pSC198 This study JS396 JS346 pSC199 This study JS397 JS346 pSC200 This study JS398 JS346 pSC201 This study JS573 JS346 pSC202 This study JS574 DH300 DrcsF pSC198 This study JS575 DH300 DrcsF pSC199 This study 47 JS576 DH300 DrcsF pSC200 This study JS639 DrcsB lpp::kan rcsF::rcsFD19-47 This study JR30 nlpD::kan This study JR31 JR30 pJR8 This study JR32 JR30 pJR10 This study JR2 DH300 pAM238 This study JR88 BL21 rcsF::kan This study JR90 JR88 pET28-cytoplasmic LolB-Strep This study JR187 rcsB::kan rcsF::rcsFD19-47 This study JR149 DnlpD This study JR121 DnlpD envC::kan This study JR122 JR121 pJR8 This study JR123 JR121 pJR10 This study JR188 JR187 pAM238 This study JR191 JR187 pAG833 This study JR204 JR187 pJR203 This study JR194 JR187 pBAD33 This study JR211 JR187 pJR209 This study JR257 JR187 pJR239 This study JR274 JR149 lpp::kan This study JR279 JR274 pJR10 This study JR292 JS325 pAM238 This study JR293 JR187 pSC213 This study JR44 rcsB::kan rcsF::rcsFD19-47 pJR48 This study 48 JR47 rcsB::kan pJR48 This study JR77 rcsB::kan rcsF::rcsFD19-47 pBAD18 This study JR78 rcsB::kan pBAD18 This study 861 862 49 Extended Data Table 4: Plasmids used in this study. 863 Plasmids Features Source pAM238 IPTG-regulated Plac, pSC101-based, spectinomycin (no lacIQ) 50 pBAD18 Arabinose inducible PBAD, ampicillin 56 pBAD33 Arabinose inducible PBAD, chloramphenicol 56 pET28a IPTG regulated T7 promoter, kanamycin Novagen pET22b IPTG regulated T7 promoter, ampicillin Novagen pCP20 FLP+, l cI857+, l PR Repts, ampicillin, chloramphenicol 49 pSIM5-Tet pSC101 plasmid, repAts, tetRA, l-Red (Gram-Beta-Exo), cI857, tetracycline Gift from D. Hughes pJS18 pAM238 RcsFFKpA FkpA linker (S94-E125) This study pJS20 pAM238 PalWT This study pJS24 pAM238 PalD26-56 This study pJS27 pAM238 RcsFcol Colicin Ia linker (I213-K282) This study pSC198 pAM238 RcsFGS3 (C16S17M18S19GSGSGAMG) This study pSC199 pAM238 RcsFGS2 (C16S17M18S19GSGSGAMSGSGSGAM G) This study pSC200 pAM238 RcsFGS (C16S17M18S19LFDAPAMSGSGSGAM SGSGSGAMG) This study pSC201 pAM238 RcsFD19-47 (C16S17M18G19P20) This study pSC202 pAM238 RcsFWT This study pJR8 pAM238 NlpDWT This study pJR10 pAM238 NlpDD29-64 (C26S27D28A29) This study pJR48 pBAD18 LolA-6xHis This study pJR90 pET28 Cytoplasmic LolB-Strep This study 50 pJR203 pBAD33 LolCDE-6xHis This study pJR209 pAM238 Lnt-Myc This study pJR239 pSC213 LspA-Flag This study pSC213 pAM238, IPTG-regulated Plac , lacIQ, triple Flag tag This study pAG833 pAM238 Lgt-Myc This study 864 865 866 867 Extended Data Table 5: Primers used in this study. 868 Primer Sequence 5’ to 3’ JS50_FkpAlinker _fw acatccatggggtccgaccaagagatcgaac JS51_FkpAlinker _rv atgtcggaccggttcgttatcagccgcgtc JS143_Pal_-100b cgtcttccggcaactgatgg JS144_Pal_+100b ttggtgcctgagcaaaagcg JS145_Pal_fw ACATggtaccTTAATTGAATAGTAAAGGAATC JS146_Pal_rv ATGTtctagaTTAgtaaaccagtaccgcac JS152_PalNoLink er_overlapPCR_ fw tgttcttccaacCAGGCTCGTCTGCAAATG JS153_PalNoLink er_overlapPCR_ rv CAGACGAGCCTGgttggaagaacatgccgc JS277_LolCDEHi s_fw ACATtctagaTCTTTGCTACAGCAACCAGAC JS278_LolCDE_ His_rv ATGTctgcagTTAGTGATGGTGATGGTGATGACCctggccgctaaggactcg JS289_lred_catSa cBin_RcsF_fw tcctgattcaatattgacgttttgatcatacattgaggaaatactAAAATGAGACGTTGATCGG CACG 51 JS290_lred_catSa cBin_RcsF_rev tatagggcgagcgaataacgcctatttgctcgaactggaaactgcATCAAAGGGAAAACTGT CCA JS291_lred_RcsF _catSacBout_fw tcctgattcaatattgacgttttgatcatacattgaggaaatactATGCGTGCTTTACCGATCTG TT JS292_lred_RcsF _catSacBout_rv tatagggcgagcgaataacgcctatttgctcgaactggaaactgcTCATTTCGCCGTAATGTT AAGC JS293_junction1lr ed_RcsFup_fw gcggagctgttaaaggctg JS294_junction2lr ed_RcsFdown_rv gagcaatgagatgcagttcg JS295_junction1lr ed_cat-out_rv CGGGCAAGAATGTGAATAAAGG JS296_junction2lr ed_sacB-out_fw GCTGTACCTCAAGCGAAAGG M13R CAGGAAACAGCTATGACCATG M13F TGTAAAACGACGGCCAGT PL145_rcsF_- 100b cgctttttaccagacctggc PL146_rcsF_+10 0 atatcattcaggacgggcgcttgccc PL153_rcsB_- 100b acatctgattcgtgagaagg PL154_rcsB+100 b taatgggaatcgtaggccgg PL168_Fw_lpp_- 100 CAATTTTTTTATCTAAAACCCAGCG PL169_Rv_lpp_+ 100 CCAGAGCAAGGGAATATGTTACGCG SH_Da linker_F CATGaGcTTATTCGACGCGCCGGc SH_Da linker_R catggCCGGCGCGTCGAATAAgCt SH_RcsF(PstI)_R gagaCTGCAGtcaTTTCGCCGTAATGTTAAG SH_RcsFUR(kpn I)_F GAGGGTACCcgttttgatcatacattg RcsFss-Fsg (NcoI)_F GCGGCTGTTCCATGGggccggtccgaatttatac RcsFss-Fsg (NcoI)_R ggaccggccCCATGGAACAGCCGCTTAGCATGAG SH_GS linker_F CATGagtggctctggatctggtgc 52 SH_GS linker_R catggcaccagatccagagccact JR1_NlpD_fw GAGATCTAGATTATTAACCAATTTTTCCTGGGGGATAA JR2_NlpD_rv AGAGCTGCAGTTATCGCTGCGGCAAATAACGCA JR7_NlpDoverlap _fw GGCTGGCAGGCTGTTCTGACGCGCAGCAACCGCAAATTCA JR8_NlpDoverlap _rv TGAATTTGCGGTTGCTGCGCGTCAGAACAGCCTGCCAGCC JR23_Fw_NlpD- 98 CAGGTCAGCGTATCGTGAACATC JR24_Rv_NlpD+ 100 TCATTTAAATCATGAACTTTCAGCG JR30_Fw_LolA_- 28_pBAD18 ACATGGTACCCGGGAGTGACGTAATTTGAGGAAT JR31_Rev_LolA_ His_pBAD18 ATGTTCTAGAttaatgatgatgatgatgatgctcgaGCTTACGTTGATCATCTACC GTGAC JR50_Rev_cytopl asmic_LolB_nost op_StrepTag_stop CCAACTCGAGTCACTTTTCGAACTGCGGGTGGCTCCAGCTTGCTTT CACTATCCAGTTATCCAT JR56-Fw--100- envC GTTGTCGCTG ATGGGTA JR57-Rev- +100envC AATCATCAATGACGATGGCA JR74-Rev-Lnt- myctag-PstI AAAAACTGCAGctacaggtcttcttcgctaatcagtttctgttcgcttgcTTTACGTCGCTG ACGCAGAC JR77-Fw-NcoI- LspA gagaCCATGGgtAGTCAATCGATCTGTTCAAC JR78-Rev-LspA- no stop-BamHI gagaGGATCCTTGTTTTTTCGCTCTAG AG389_lgt_- 49_Fw_KpnI AAAAAggtaccTTCAATCGCTGTTCTCTTTC AG393_lnt_- 49_Fw_KpnI AAAAAggtaccACCCCAGCCGAAGCTGGATG AG403_lgt_myc CT_PstI AAAAACTGCAGctacaggtcttcttcgctaatcagtttctgttcgcttgcGGAAACGTGTT GCTGTGGGC PL387- LolBwoss-Fw- NcoI acacCCATGGccgttaccacgcccaaagg ColicinIalinker_ geneBLOCK acatccatggggATTCTGGACACGCGGTTGTCAGAGCTGGAAAAAAATG GCGGGGCAGCCCTTGCCGTTCTTGATGCACAACAGGCCCGTCTGC TCGGGCAGCAGACACGGAATGACAGGGCCATTTCAGAGGCACGG AATAAACTCAGTTCAGTGACGGAATCGCTTAACACGGCCCGTAAT 53 GCATTAACCAGAGCTGAACAACAGCTGACGCAACAGAAAgcggtccg acat 869 54 10_1101-2021_01_05_425432 ---- Microsoft Word - Urease_inhibitor_actanew2 1 High-throughput tandem-microwell assay for ammonia repositions 1 FDA-Approved drugs to Helicobacter pylori infection 2 Fan Liu,a,b,# Jing Yu,b,# Yan-Xia Zhang,c Fangzheng Li,a, d Qi Liu,e Yueyang Zhou,a 3 Shengshuo Huang,b Houqin Fang,f Zhuping Xiao,e Lujian Liao,f Jinyi Xu,d Xin-Yan Wu,c 4 Fang Wu a,* 5 6 7 aKey Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for 8 Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 200240, China 9 bState Key Laboratory of Microbial Metabolism, Sheng Yushou Center of Cell Biology 10 and Immunology, School of Life Science and Biotechnology, Shanghai Jiao Tong 11 University, Shanghai, 200240, China 12 cSchool of Chemistry & Molecular Engineering, East China University of Science and 13 Technology, Shanghai, 200237, China. 14 dState Key Laboratory of Natural Medicines and Department of Medicinal Chemistry, 15 China Pharmaceutical University, Nanjing, 210009, China 16 eHunan Engineering Laboratory for Analyse and Drugs Development of Ethnomedicine 17 in Wuling Mountains, Jishou University, Hunan, 416000, China 18 fShanghai Key Laboratory of Regulatory Biology, School of Life Sciences, East China 19 Normal University, Shanghai, 200241, China. 20 #These authors contributed equally to this work. 21 *To whom correspondence may be addressed. Email: fang.wu@sjtu.edu.cn 22 23 Running title: Repositioning of old drugs to treat H. pylori infection 24 25 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 ABSTRACT 26 To date, little attempt has been made to develop new treatments for Helicobacter 27 pylori (H. pylori), although the community is aware of the shortage of treatments for H. 28 pylori. In this study, we developed a 192-tandem-microwell-based high-throughput-assay 29 for ammonia that is a known virulence factor of H. pylori and a product of urease. We 30 could identify few drugs, i.e. panobinostat, dacinostat, ebselen, captan and disulfiram, to 31 potently inhibit the activity of ureases from bacterial or plant species. These inhibitors 32 suppress the activity of urease via substrate-competitive or covalent-allosteric mechanism, 33 but all except captan prevent the antibiotic-resistant H. pylori strain from infecting human 34 gastric cells, with a more pronounced effect than acetohydroxamic acid, a well-known 35 urease inhibitor and clinically used drug for the treatment of bacterial infection. This 36 study offers several bases for the development of new treatments for urease-containing 37 pathogens and to study the mechanism responsible for the regulation of urease activity. 38 39 Key Words: Ammonia, High-throughput screening, Antibiotic resistance, Enzyme 40 inhibitor, Urease, Mechanism of action, Helicobacter pylori 41 42 43 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 INTRODUCTION 44 Bacteria, fungi and plants, with the exception of animals, contain urease(1). Urease (EC 45 3.5.1.5) is a class of nickel metalloenzyme that hydrolyzes amino acid metabolites to 46 produce ammonia (NH3) and carbon dioxide(2,3). The active catalytic site of urease 47 consists of two nickel ions, a carbamylated lysine residue, two histidines and an aspartic 48 acid. In addition to the consistent catalytic mechanism, the amino acid sequence of urease 49 has been reported to be highly conserved between different species(4). 50 Bacterial urease is known to be a key virulence factor of some pathogens for a number of 51 diseases(5), e.g., Helicobacter pylori (H. pylori) for gastritis or gastric cancer, and 52 Proteus mirabilis (P. mirabilis) for urinary tract infections and urinary stones(6) . The 53 pathogens can hydrolyze urea substrates to produce NH3. The released NH3 not only 54 helps H. pylori to survive in the low pH environment of the stomach but also causes 55 damage to the gastric mucosa, triggering the infection(7). Additionally, NH3 generated by 56 P. mirabilis urease has been demonstrated to form urinary stones and destroy the urinary 57 epithelium in the urinary system(8). Because the human body does not contain urease, 58 bacterial urease has been thought to be an important and specific drug target for 59 combating these pathogens(9). 60 A number of studies have been performed to identify inhibitors of urease(10-12), but only 61 one urease inhibitor, acetohydroxamic acid (AHA), was approved for the treatment of 62 urinary infections and urinary stones in 1983 by the US Food and Drug Administration 63 (FDA)(13,14). Severe side effects, low stability in gastric juice, and a lack of direct 64 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 evidence for suppressing the growth of pathogens seem to be the limiting factors for the 65 low success rate of these urease inhibitors. Adverse side effects of AHA, including 66 teratogenic effects(15), a low efficiency indicated by the required high dose for the 67 patient (~ 1000 mg/day for adults), and the assumed drug resistance of bacteria, further 68 imply that potent and bioactive inhibitors with new chemical moieties are urgently 69 needed to combat these pathogens. Indeed, the current clinical first-line regimen for the 70 treatment of H. pylori [proton-pump inhibitor, clarithromycin, amoxicillin or 71 metronidazole (sometimes tinidazole)](16,17), is unable to completely eradicate H. pylori 72 due to the increased antibiotic resistance(16,18). 73 To date, few validated high-throughput assay has been constructed to quantitatively 74 analyze NH3 and the activity of NH3-generating enzyme urease, but no high-throughput 75 screening approach has been employed to systematically extend the chemical moiety of 76 urease inhibitors. The current assay to determine the activity of urease mainly relies on 77 colorimetric reactions to determine the concentration of NH3 using indophenol or 78 Nessler’s reaction(19). Recently, a microfluidic chip-based fluorometric assay has been 79 developed to monitor the activity of urease(20,21). In addition, a cell-based assay for H. 80 pylori urease has been reported lately, and validated by known inhibitors of urease, but it 81 has not been employed to screen new inhibitors for urease yet(22). Overall, the current 82 assay setting and procedures are relatively time-consuming and vulnerable to 83 interference. 84 In this study, we established and validated a new tandem-well-based HTS assay for NH3 85 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 and NH3-generating urease and performed an HTS screening campaign to identify 86 druggable chemical entities from 3,904 FDA or Foreign Approved Drugs (FAD) 87 -approved drugs for jack bean and bacterial ureases. Five clinically used drugs, i.e., 88 panobinostat, dacinostat, ebselen (EBS), captan and disulfiram, were found to be 89 submicromolar inhibitors of H. pylori urease (HPU), jack bean urease (JBU), or urease 90 from Ochrobactrum anthropi (O. anthropi), a newly identified pathogen with resistance 91 to -lactam antibiotics(23). Moreover, panobinostat, dacinostat, EBS and disulfiram 92 potently inhibited the infection of H. pylori, suggesting that these pharmacologically 93 active moieties or drugs could serve as bases for the development of new treatments for 94 urease-positive pathogens.95 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 RESULTS 96 Development of a high-throughput assay and identification of potent inhibitors for 97 urease 98 To construct a high-throughput assay for NH3-generating urease and prevent the detection 99 interference from substances in the enzyme extraction, we utilized a 100 192-tandem-well-based gas-detection method, which we previously developed to monitor 101 the activity of H2S-generating enzymes(24,25). The tandem-well design could physically 102 separate the gas product from the enzymatic reaction and enable the specific and 103 real-time detection of the gas-producing enzyme activity (Figure 1A). 104 To construct the HTS assay, we compared three reported protocols for determination of 105 the activity of JBU by using salicylic acid-hypochlorite and Nessler detection reagent, as 106 well as phenol red(20,26,27), which undergo the indophenol and Nessler’s reaction with 107 NH3, respectively. Salicylic acid-hypochlorite and Nessler’s reagents could 108 dose-dependently and time-dependently monitor the activity of JBU at various 109 concentrations (Figures S1A and B); however, the phenol red failed to detect it (Figure 110 S1C). We decided to choose salicylic acid-hypochlorite as the detection reagent for the 111 HTS screening assay of JBU (Figure S1A) due to its lower toxicity than Nessler reagent, 112 which contains mercury(26). The absorbance (OD) at 697 nm of the blue complex 113 indophenol generated from salicylic acid was correlated linearly with the concentration of 114 NH4Cl (19.5 - 625 M), thus validating the analytic setup for NH3 quantification (Figure 115 S1D). Moreover, the optimal assay buffer for JBU was found to be phosphate buffer at 116 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 pH 7.4 (Figure S1E). In contrast, we employed Nessler’s reagent to detect the activity of 117 HPU and Ochrobactrum anthropic urease (OAU) in subsequent studies since it showed a 118 better sensitivity for the limitation of detection of the activity of HPU than salicylic 119 acid-hypochlorite (Figures S1F and 1G). Collectively, we chose 50 nM of JBU and 25 120 mM urea substrate in the phosphate buffer to perform the assay. 121 Under the assay conditions, AHA showed an IC50 of ~ 160 μM (Figure 1B), which was 122 very similar to the previously reported value (IC50 of ~ 140 μM; ref. (13)), indicating that 123 the newly developed assay for urease was accurate and reliable. However, the IC50 of 124 AHA was found to decrease to 33.7 μM when using the 50 mM Tris buffer instead of the 125 phosphate buffer in our assay (Table 1). To determine the well-to-well reproducibility, the 126 assay was validated with 200 M AHA (~ IC50) or 800 M (~ 5-fold IC50) AHA. The 127 tandem-well plate consistently showed distinct differences among the control, the 200 128 M-AHA-treated and the 800 M-AHA-treated groups (Figure 1C). The average Z’ 129 values of the assay were found to be ~ 0.9 when they were calculated with the 800 M 130 AHA positive control. 131 To identify novel and potent inhibitors for urease, we screened 3,904 FDA or FAD 132 -approved drugs at 100 μM. Five potent hits, i.e., panobinostat, dacinostat, EBS, captan 133 and disulfiram, were found to dose-dependently inhibit the activity of JBU with IC50 134 values of 0.2, 1.1, 0.4, 2.3 and 38.9 M, which are ~ 800, 146, 400, 70, 4, -fold more 135 potent than AHA, respectively (Figure 1E and Table 1). Intriguingly, the former two 136 drugs are analogs of AHA. Importantly, all of them seemed to bear significant 137 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 selectivities for urease since they did not substantially inhibit other gas-producing 138 enzymes, i.e., cystathionine beta-synthase (CBS) and cystathionine -lyase (CSE), two 139 H2S-generating enzymes (Figure 1F). Moreover, the potent inhibitory effects of these 140 inhibitors were likely due to on-target inhibition of JBU rather than the nonspecific 141 reaction with NH3 or forming an aggregation since they did not react with NH3 and their 142 inhibition was not attenuated by the detergent (Figures S2A and S2B). In corroborating 143 these findings, EBS and disulfiram have recently been reported to be specific inhibitors 144 of bacterial and plant urease(11,12), respectively, although their mode of actions for 145 inhibiting urease, and their effects on the proliferation or infection of urease-containing 146 pathogens remain little explored. 147 148 The mode of action study for urease inhibitors 149 To determine the reversibility of the inhibition by panobinostat, dacinostat, EBS, captan 150 and disulfiram to JBU, various concentrations of the inhibitors and JBU were incubated 151 together for 60 min (Figure 2A). After a 200-fold dilution, the inhibitory effects of 152 panobinostat and dacinostat as well as disulfiram were found to be reversible (Figures 2A 153 and S3C). In contrast, EBS or captan at 100 nM was found to completely block the 154 activity of JBU; this concentration did not affect the activity without the pre-incubation 155 with enzyme (Figure 1E). Additionally, the inhibitions exerted by EBS or captan were not 156 fully recovered (Figure 2A), indicating that both of them were likely to be covalent or 157 slow-dissociation inhibitors for JBU. 158 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 Surprisingly, the inhibitory effect of disulfiram was found to be dependent on the 159 concentrations of Ni2+ ion, the catalytic cofactor for urease (Figure S3C), indicating that 160 it inhibits JBU likely via formation of a complex with the catalytic Ni2+ ion and 161 subsequently occupying the active site of JBU. This explanation seems to be plausible 162 since recent findings have revealed that disulfiram inhibits the proliferation of tumor cells 163 by forming a complex with Cu2+(28). 164 Moreover, the inhibitory potencies of panobinostat and dacinostat were found to increase 165 with the pre-incubation time of the compound with urease (Figure 2B). After 2 h 166 pre-incubation, the IC50 value of panobinostat and dacinostat were decreased ~ 7.5 folds 167 and ~ 18.8 folds, respectively (Figure 2B). In enzyme kinetics studies for JBU, 168 panobinostat and dacinostat were found to be competitive inhibitors towards urea 169 substrate, with a Ki value of 0.02 and 0.07 M (Figure 2C and Table 1), which are ~ 105 170 folds and 30 folds more potent than AHA (Ki ~ 2.1 M; Table 1). In consistent with this 171 observation, the inhibition of these two inhibitors doesn’t be interfered with Ni2+ (Figure 172 S3A). Also, the addition of histidine or cysteine has no effects on the inhibition of 173 panobinostat or dacinostat (Figure S3B). Importantly, the surface plasmon resonance 174 assay demonstrate that these two compounds could physically bind to JBU (Figure 2D; 175 Table 1). The drastic effect seems not only relying on the hydroxamic acid motif that is 176 the known pharmacophore of AHA-derivative inhibitors, but also the hydrophobic ring 177 and secondary amine group, as indicated by that the benzene ring favorably interacts with 178 the His492 residue and/or the nitrogen atom forms an additional hydrogen bond with 179 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 Asp494 in the modeled inhibitor-JBU complex structure (Figure 2E). 180 In contrast, the inhibition caused by both EBS and captan was found to be prevented by 181 the addition of dithiothreitol (DTT) or free cysteine into the enzymatic reaction, but not 182 that of histidine or Ni2+ (Figures S4A-C). Furthermore, the IC50 values of the two 183 inhibitors were linear with the concentrations of the enzyme (Figure S4D), an inhibitory 184 feature of the covalent inhibitor(29), confirming that they targeted the enzyme covalently. 185 The inhibition constants for these irreversible inhibitors, i.e., the rate of enzyme 186 inactivation (kinact) and inactivation rate constants (KI), were also determined by 187 nonlinear regression of the time-dependent IC50 values (Figure S4E)(29). The kinact and KI 188 for EBS were found to be 2.79 × 10-3 s-1 and 0.73 M, which were 4.4 and 2.4-fold better 189 than captan (kinact, 0.63 × 10 -3 s-1; KI, 1.76 M), respectively. Taken together, the results 190 demonstrated that EBS and captan inhibited JBU by covalently modifying the Cys rather 191 than His residue, the latter of which is known to be the active site of urease (2,3). 192 Interestingly, we observed a synergistic inhibitory effect from the combination of EBS 193 and AHA (Figure 3A), a substrate-competitive inhibitor for urease, implying that EBS 194 targeted Cys residue(s) of another site rather than the active site. Similar experimental 195 results were also obtained for captan. Moreover, the combination of EBS with 2 M 196 captan also significantly increased the potency of EBS by 6-fold (right panel, Figure 3A), 197 implying distinct binding sites of the two covalent inhibitors. 198 To corroborate this finding, we performed enzyme kinetics, mass spectrometry and 199 surface plasmon resonance studies (Figures 3B-D). Consistently, EBS or captan displayed 200 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 a noncompetitive mode for the urea substrate (Figure 3B). Furthermore, tandem-mass 201 spectrometry analysis revealed that Cys313 and Cys406, which were not adjacent to the 202 active site, appeared to be modified by EBS and captan, respectively (Figure 3C). The 203 addition of 274.18 daltons in molecular weight was observed for EBS, demonstrating the 204 breakage of the Se-N bond and formation of the Se-S bond with the Cys residue, a 205 phenomenon that has been reported previously for EBS(30). However, the increase of 206 150.15 daltons suggested that only the isoindole dione moiety of captan modified the Cys 207 residue, accompanied by the release of the trichloromethyl thio moiety [-SC(C1)3]. This 208 new observation provides a new perspective for the unexplored covalent molecular 209 mechanism of captan. 210 Additionally, a potent and physical interaction between EBS or captan and JBU was 211 observed in the surface plasmon resonance study (Figure 3D). The equilibrium 212 dissociation constant (KD) for EBS and captan was found to be 89 and 96 nM, 213 respectively. 214 To illustrate the binding mode of EBS or captan, we modeled them into the respective 215 allosteric Cys-containing pocket (Cys313 for EBS, Cys406 for Captan) in JBU by using 216 molecular dynamics simulations (Figure 3E). The carbonyl group of EBS was found to 217 form a hydrogen bond with Lys369, and the phenyl ring interacts with the hydrophobic 218 side chain of Leu308. Additionally, the two carbonyl groups of captan formed four 219 hydrogen bonds with the side chains of Asn517, His542, Tyr544 and Asn688. Taken 220 together, these results implied that these intermolecular weak interactions also 221 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 substantially contributed to the binding of the covalent inhibitors to the protein, in 222 addition to the covalent interaction. 223 The inhibitory effect of inhibitors on bacterial ureases 224 Next, we investigated the effects of panobinostat, dacinostat, EBS and captan as well as 225 disulfiram on the activity of HPU and OAU, two bacterial ureases from H. pylori and O. 226 anthropic, respectively. As expected, these drugs could inhibit the activity of HPU in the 227 crude extracts and showed IC50 values of 0.1 M, 0.2 M, 2.8 M, 3.4 and 8.9 M, 228 which indicated that they were ~ 259, 130, 10, 8 and 3 -fold more potent than AHA (IC50 229 ~ 25.9 M; Figure 4A and Table 1), respectively. Moreover, panobinostat, dacinostat, 230 EBS, captan and disulfiram were also found to inhibit the partially purified HPU, which 231 was isolated by size-exclusion chromatography (Figures 4B and S5). Consistently, they 232 also suppressed the activity of OAU at a similar potency to HPU (Figure 4A and Table 1). 233 Compounds 1, 4 and 6, which were synthesized in house (Scheme S1), as well as 234 commercially available EBS oxide, also showed a better efficiency than EBS (IC50 ~ 2.8 235 M) in the in vitro HPU-based enzyme assay (Table S1), and 4 displayed a maximum 236 three-fold increase in potency (IC50 ~ 1.1 M; Table S1). Moreover, we could confirm 237 that panobinostat, dacinostat and EBS as well as EBS oxide, 1, 4 or 6, could largely 238 suppress the activity of HPU in culture (Figure 4C). The IC50 values of these inhibitors 239 for inhibiting the urease of the cultured H. pylori strain ranged from 5.7 to 23.2 M 240 (Figure 4C and Table S2). 241 Further, we investigated the effects of panobinostat, dacinostat and EBS, which are the 242 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 most potent inhibitors for HPU (Figure 4A). The results showed that EBS, but not 243 panobinostat, dacinostat or its analog AHA, has a substantial suppression on the growth 244 of H. pylori (Figure 4D). The inability of AHA as well as its derivatives, i.e. panobinostat 245 and dacinostat, on the growth of H. pylori as identified above seems to be consistent with 246 the previous finding that AHA doesn’t inhibit the growth of H. pylori(31). Interestingly, 247 EBS and EBS analogs, as well as disulfiram, could dose-dependently suppress the growth 248 of H. pylori and showed a minimum inhibitory concentration (MIC) in a range between 2 249 and 4 g/ml (right panel of Figure 4D, Figure S6A and Table S2). Importantly, the 250 inhibitory effect of this type of covalent inhibitors lasted for a long period in culture, as 251 indicated by EBS and 1, which could substantially inhibit HPU even after removal of the 252 inhibitor for 6 h (Figure S6B). 253 Urease inhibitors prevent H. pylori infection in a gastric cell-based bacterial 254 infection model 255 To evaluate the ability of these urease inhibitors to prevent H. pylori infection, we 256 constructed a gastric cell-based bacterial infection model using the remaining viable cell 257 number of SGC-7901 adenocarcinoma gastric cells to reflect the virulence of H. 258 pylori(15). Our results showed that treatment with 30 M panobinostat, 30 M dacinostat, 259 20 M EBS or 20 M disulfiram could prevent the cell death triggered by H. pylori 260 (Figures 5A-B). In sharp contrast, the cells that lacked such treatments were largely 261 sabotaged. Panobinostat and EBS were found to be the most potent agents and almost 262 completely protected from the infection of H. pylori. These effects of these drugs seemed 263 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 to be much more efficient than the effects of 20 M AHA or 50 M tinidazole, the analog 264 of metronidazole, and one of the two antibiotics in the triple regimens for the treatment of 265 H. pylori (16,17). In support of this observation, tinidazole as well as metronidazole 266 hardly suppressed the growth of our H. pylori strain, with an MIC value of more than 512 267 g/ml in culture (Figure S7A and Table S2), indicating that this strain is resistant to 268 treatment with nitroimidazole-type antibiotics. 269 Since panobinostat, dacinostat, EBS and disulfiram at a concentration up to 100 M or 25 270 M did not interfere with the proliferation of SGC-7901 gastric cells (Figure S7B), the 271 protective effects in the gastric-cell-based H. pylori infection model seemed to be 272 attributed to on-targeting inhibition of the infection transmitted by H. pylori. Moreover, 273 all four drugs potentially inhibited the level of ammonia in the cell medium (Figure 5C), 274 indicating that they efficiently suppressed the endogenous urease activity of H. pylori in 275 the infection model. 276 277 The structural basis and inhibitory mechanisms of newly-identified three classes 278 urease inhibitors 279 To identify the active chemical moiety of panobinostat, dacinostat, EBS or captan 280 required for inhibition of urease, we analyzed their structure-activity relationships 281 (Figures 6, Table S1 and S3). The former two inhibitors are hydroxamic acid-based 282 urease inhibitors, and not only their hydroxyamino heads are forming hydrogen bonds 283 with the catalytic Ni2+ and residues in JBU or HPU (Asp633 or Ala636 for JBU; Asp362 284 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 or Ala365 for HPU), but also the acetyl group constitutes one hydrogen bond (His492 for 285 JBU and His221 for HPU; Figures 2E and S8A). Consistent with this observation, the 286 hydroxyamino and acetyl groups of AHA interact with Asp362 or Ala365 and His221 in a 287 co-crystal structure of AHA and HPU(2), respectively (Figure S8A). Compound lacking 288 of this acetyl group, i.e. hydroxylamine, totally abolished the inhibitory effect of this type 289 inhibitor (Figure 6B and Table S3). Apart from these interactions, the hydrophobic 290 benzene ring and secondary amine group of panobinostat were found to be additional 291 pharmacophores (upper panel, Figure 2E), which interact favorably with His492 (JBU) or 292 His221 (HPU) and form an extra hydrogen bond with Asp494 (JBU) or Asp223 (HPU). 293 In supporting this finding, the hydroxamic acid analogs that are lack of the benzene ring, 294 i.e. ricolinostat, ilomastat and pracinostat, are inactive to JBU and HPU (Figure 6B and 295 Table S3). Strikingly, the replacement of benzene with benzimidazole (pracinostat) totally 296 loses the inhibition, suggesting the benzene is critical for maintaining the inhibition. 297 Moreover, the secondary amine group seems to be also important for enhancing the 298 potency of this type inhibitor, since the modification or replacement of it with hydroxyl 299 group or sulfonyl group (dacinostat or belinostat), also weaken ~ 5-fold or 24-fold in IC50 300 values. 301 For EBS analogs, compounds (2-3) lacking the Se atom largely lost inhibitory activities 302 toward JBU and HPU (Figure 6B and Table S1). Furthermore, dibenzyl diselenide was 303 also inactive toward both ureases, indicating that the Se-containing benzisoxazole moiety 304 rather than the solo Se atom might be essential for the inhibition. Indeed, Se-containing 305 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 benzisoxazole (4) showed potent inhibition of HPU (IC50 ~ 0.8 and 1.1 M for JBU and 306 HPU, respectively). The introduction of an electron-donating group to the benzisoxazole 307 moiety apparently strongly reduced the potency (5; IC50 ~ 1.4 M for JBU and more than 308 10 M for HPU; Figure 6B). In contrast, the provision of electron-withdrawing groups to 309 the nitrogen or Se atom of the benzisoxazole moiety, i.e., 6 or EBS oxide, seemed to 310 enhance the potency of JBU by a maximum of three-fold (6). Similarly, when weakening 311 the electron-withdrawing effect in the substitution group of the isoindole dione core of 312 captan, the active moiety (Figure 3C), was also found to lead to a decreased potency 313 (Figures 6; Table S1). Taken together, these data indicate that the Se-containing 314 benzisoxazole or the isoindole dione moiety played crucial roles in the potency of these 315 kinds of inhibitors, the Se or N atom of which was subjected to nucleophilic attack by the 316 thiol group of Cys and formed the Se-S or N-S bond. 317 318 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 DISCUSSION 319 In the present study, we could identify that four clinical-used drugs, i.e., panobinostat, 320 dacinostat, EBS and disulfiram, two anti-cancer drugs, an anti -stroke or -bipolar drugs, 321 and an alcohol-deterrent drug, respectively, could protect the gastric cells from the 322 infection at submicromolar concentrations (Table 1 and Figure 5). The efficacy of these 323 drugs substantially exceeded that of AHA, a well-known urease inhibitor and clinically 324 used drug for bacterial infections. They seemed also to be more effective than tinidazole, 325 a metronidazole type antibiotic in the classic triple recipe for H. pylori (Figures 5). 326 Moreover, panobinostat, EBS and disulfiram have been administered to humans and do 327 not incur severe side effects(28,32,33). Additionally, these drugs did not affect the 328 viability of mammalian cells at a concentration up to 100 M or 25 M (Figure S7B), 329 suggesting that they had a rather safe profile in cells and in vivo. Taken together, our 330 study armed with the newly-developed HTS assay for urease repositions four clinically 331 used drugs as new advanced leads for the treatment of H. pylori infection. 332 The mode of action of panobinostat, dacinostat, EBS or disulfiram was found to inhibit H. 333 pylori urease and reduce the production of NH3 in culture (Table 1; Figures S6A, 4B, 4C 334 and 5C), which are well-known bacterial virulence factors(15). Panobinostat and 335 dacinostat are reversible hydroxamic acid-type inhibitors for urease, and displayed more 336 than 250 or 130 -fold potencies than its analog AHA (Table 1). These largely improved 337 inhibitors indeed enhanced the protective effects to the infection of H. pylori in the 338 cell-based infection model (Figures 5A and 5C), demonstrating that pharmacologically 339 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 targeting urease could offer an effective treatment for H. pylori and HPU is a validated 340 pharmacological drug target. However, suppression of the urease activity with these 341 potent inhibitors of HPU, could not retard the growth of H. pylori in culture, indicating 342 that urease is not crucial for bacterial growths. 343 Moreover, EBS was found to irreversibly inhibit urease by covalently modifying an 344 allosteric Cys residue outside of the active site (Figures 2A and 3). The newly identified 345 covalently allosteric regulation of the activity and stability of urease by EBS and captan 346 may explain why these inhibitors could potently and persistently inhibit urease activity 347 and the growth of H. pylori even in the presence of high concentrations of urea substrate 348 (Figure S6B), two merits that are observed for covalent allosteric drugs(34). Indeed, 349 when compared with the reversible inhibitor AHA, EBS displayed an ~ 400 and 10-fold 350 improved potency for JBU and HPU, respectively, and a long-acting inhibitory effect on 351 the endogenous activity of urease and the growth and infection of H. pylori in culture 352 (Figures 4C-D, 5B-C and S6B). Importantly, the anti-H. pylori MIC value of EBS and its 353 analogs, i.e. EBS oxide, 1, 4, 6, seems to be much effective or at least comparable to 354 metronidazole or clarithromycin, which are the two antibiotics in the classic triple recipe 355 for H. pylori (Table S1)(35), indicating these newly-validated chemical moieties for 356 inhibiting the growth of H. pylori are promising antibiotics for developing new 357 treatments for urease-containing pathogens. Since the urease activity is dispensable for 358 the growth of H. pylori (see our discussions with the mode of action of panobinostat and 359 dacinostat), this finding indicates the effect of EBS-type inhibitor on the growth of H. 360 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 pylori is beyond the solo inhibition of urease activity. 361 In summary, we identified five clinical drugs as submicromolar inhibitors for plant or 362 bacterial urease by performing the first HTS campaign of urease. These clinically used 363 drugs panobinostat, dacinostat, EBS and disulfiram inhibit the virulence of H. pylori in a 364 gastric-cell-based infection model. This study provides a new HTS assay, drug leads and 365 a regulatory mechanism to develop bioactive urease inhibitors for the treatment of H. 366 pylori infection, especially antibiotic-resistant strains. 367 368 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 EXPERIMENTAL PROCEDURES 369 Materials 370 Jack bean urease (JBU), DMSO, and dithiothreitol (DTT) were purchased from Sigma 371 (Steinheim, Germany). Hypochlorous acid, sodium nitroprusside, salicylate, potassium 372 sodium tartrate, urea, sodium hydroxide, bovine serum albumin, Triton X-100, 373 L-histidine and L-cysteine were purchased from Sangon (Shanghai, China). Nessler's 374 reagent was purchased from Jiumu company (Tianjin, China). Acetohydroxamic acid was 375 purchased from Medchemexpress (Monmouth Junction, NJ). Columbia blood agar plate, 376 liquid medium powder for H. pylori, bacteriostatic agent and polymyxin B were 377 purchased from Hopebio company (Shandong, China). RMPI 1640 medium and fetal 378 bovine serum (FBS) were purchased from Gibco (Invitrogen, Gaithersburg, MD). The 379 other materials were purchased from the indicated commercial sources or were from 380 Sigma. 381 Construction of the high-throughput screening assay for urease 382 The assay was constructed to measure the activity of urease based on a 192-tandem 383 microwell plate, which we had previously developed to detect the H2S gas generated by 384 H2S-generating enzymes(24,25). Phosphate or Tris buffer at various pH values were used 385 to determine the optimal pH for JBU in the presence of 25 mM urea substrate (Figure 386 S1E). The optimal conditions were found to be the 50 mM phosphate buffer and pH 7.4. 387 Moreover, the suitable detection reagent and enzyme concentrations were resolved by 388 testing three types of NH3 detection reagents with various concentrations of JBU or HPU, 389 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 i.e., salicylic acid-hypochlorite, Nessler’s reagent and phenol red detection reagent 390 (Figures S1A-C). The optimized conditions for the standard assay were found to be with 391 salicylic acid-hypochlorite and commercial Nessler’s detection reagents (Jiumu, Tianjin, 392 China) for JBU and HPU, respectively, in the presence of 50 nM JBU or 200-400 nM 393 HPU, 25 mM urea, 100 M NiCl2, and 50 mM phosphate buffer (final concentrations of 394 pH 7.4). The salicylic acid-hypochlorite detection reagent contained 1.6 mM hypochlorite, 395 400 mM sodium hydroxide, 36 mM salicylic acid, 18 mM potassium sodium tartrate and 396 1.6 mM sodium nitroprusside. The assay was performed using multichannel pipettes to 397 add 1 μl of each compound (solubilized in DMSO or H2O) and 24 μl of the enzyme mix 398 (100 nM, 100 M Tris, pH 7.9) into the reaction well (Figure 1A), followed by a 30-min 399 incubation. After addition of 50 l of salicylic acid-hypochlorite or Nessler’s detection 400 reagent to the detection well, 25 l substrate solution (50 mM urea, 200 M NiCl2, 401 0.04% bovine serum albumin (w/v)) was mixed with the enzyme in the reaction well. The 402 reaction was monitored at 37 °C, and the absorbance at 697 nm or 420 nm was 403 accordingly measured at the appropriate time points in a microplate reader (Synergy2 404 from BioTek, Winooski, VT). 405 Primary screening of urease inhibitors using a high-throughput assay 406 We screened 3,904 compounds of FDA or FAD-approved drugs from Johns Hopkins 407 Clinical Compound Library (JHCCL, Baltimore, MD) or from TopScience Biotech Co. 408 Ltd. (Shanghai, China) at 100 μM for the inhibition of JBU under standard assay 409 conditions with salicylic acid-hypochlorite detection reagent as described above. The Z’ 410 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 value of the screening assay was calculated from 60 negative samples (2% DMSO) and 411 60 positive samples (800 M AHA) and found to be more than 0.9 (36), indicating the 412 assay is an excellent assay. Routinely, 16 negative samples and 8 positive samples were 413 used to determine the assay performance, and screening data with a minimum Z’ value of 414 0.5 were accepted. 415 Compounds that show more than 50% inhibition were selected for the further validation. 416 Primary hits were defined as that compound is free of heavy metal atom and shows a 417 more than 50% inhibition at 50 M. 418 Compounds used for follow-up studies 419 All hits identified from the primary screening and their analogs were reordered in the 420 highest pure powder from commercial sources or synthesized in-house for the following 421 studies: dose-dependent, kinetic studies, biophysical assays, LC-MS/MS analysis, cell or 422 bacteria-based studies. Panobinostat and dacinostat were brought from AdooQ (catalog 423 number: A10518 for panobinostat, A10516 for dacinostat). EBS and captan were 424 purchased from Sigma (catalog number: E3520 for EBS, 32054 for captan). Disulfiram 425 (tetraethylthiuram disulfide) was purchased from TCI Chemicals (B0479). Captafol 426 (1ST21228) was purchased from Alta Scientific Co.,Ltd (Tianjing, China), and dibenzyl 427 diselenide (catalog number: B21278) was purchased from Alfa Aesar (Ward Hill, MA). 428 Abexinostat (catalog number: HY-10990), belinostat (HY-10225), vorinostat 429 (HY-10221), ricolinostat (HY-16026), ilomastat (HY-15768) and pracinostat (HY-13322) 430 were brought from Medchemexpress. The purities of these commercially available 431 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 primary leads or analogs of leads as well as in-house synthesized EBS derivatives were 432 confirmed to be at least 95% by using HPLC (for details, see below), with an exception 433 for EBS, the purity of which is determined with combustion analysis methods by the 434 supplier. All the HPLC spectra as well as the combustion analysis data for these 435 inhibitors, which were determined either from commercial supplier or by ourself, were 436 included in the Supporting Information (see below). 437 Determination of IC50 values 438 The IC50 values of all the hits or their analogs, as well as AHA, on the activity of JBU, 439 HPU or OAU were determined according to the above-described standard assay 440 conditions. Compounds were incubated with the enzyme and assayed at a series of 441 concentrations (at least 7 steps of doubling dilution). Similarly, the IC50 values of these 442 inhibitors for hCBS or hCSE were determined accordingly(24). Sigmoidal curves were 443 fitted using the standard protocol provided in GraphPad Prism 5 (GraphPad Software, 444 San Diego CA). IC50 was calculated by semilogarithmic graphing of the dose-response 445 curves. 446 Aggregation-based assay 447 To exclude the mechanism by which inhibitors suppress the activity of urease via 448 colloidal aggregation, we performed an aggregation-based assay in the presence of 449 nonionic detergents(37). Freshly prepared Triton X-100 (Sangon, Shanghai, China) at 450 different concentrations of 0.1%, 0.05%, 0.01%, 0.005%, and 0.001% was first tested for 451 its effects on the activity of JBU under standard assay conditions. Subsequently, the 452 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 inhibitory effects of panobinostat, dacinostat, EBS, captan and disulfiram, as well as the 453 analogs of EBS in the in vitro JBU activity assay, were determined in the presence of 454 0.01% Triton X-100, a concentration that alone has no inhibitory effect on the activity of 455 JBU. 456 Reversibility assay 457 To illustrate the mode of action for the inhibitors of urease, we performed the 458 rapid-dilution experiment. After incubation with panobinostat at a concentration of 4 M, 459 dacinostat at 10 M, EBS or captan at 200, 100, 50 or 20 μM for 60 min, JBU (10 M) 460 was diluted 200-fold in the assay buffer. After a further incubation of 0, 1, 1.5, 2, 3, 4 or 5 461 h, the remaining activity of JBU was accordingly measured (METHODS). The inhibitor 462 concentrations after dilution are indicated in the figure. 463 Determination of kinact or KI parameters for irreversible inhibitors 464 The IC50 values of EBS or captan for JBU were measured after different preincubation 465 periods with the enzyme, i.e., 5, 10, 20, 30, 40, 45, 60, 70 or 90 min. The kinact and KI 466 values for EBS or captan were obtained by nonlinear regression plotting of the 467 time-dependent IC50 data as previously reported(29). 468 Enzyme kinetics 469 The reaction rate was determined with JBU at the indicated concentrations of panobinosta, 470 dacinostat, EBS or captan against increasing concentrations of urea substrate (15.625, 471 31.25, 62.5, 125, 250, 500, 1000 mM for panobinosta and dacinostat; 12.5, 25, 50, 100, 472 200 mM for EBS and captan). The data were fitting to the Michaelis-Menten inhibition 473 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 equation for determination of the competitive and noncompetitive inhibition parameter Ki 474 and Ki using GraphPad Prism 5 (Table 1, Figures 2C and 3B)(24), respectively. To 475 illustrate the inhibition type, Lineweaver-Burk plots of these inhibitors were drawn and 476 analyzed. 477 LC-MS/MS analysis 478 JBU at a concentration of 12.5 M was incubated with DMSO, 200 M EBS or 200 M 479 captan for 120 min at room temperature. Then, three aliquots of 25 g samples from the 480 inhibitor-treated JBU or purified HPU (fraction 3 in Figure 4B) were digested separately 481 with three proteases, including 0.5 l trypsin (1 gl, 0.5 l GluC (1 glor 0.5 l 482 subtilisin (1 glovernight. The proteolytic peptides were combined and desalted on 483 C18 spin columns and dissolved in buffer A (0.1% formic acid in water) for LC-MS/MS 484 analysis. The peptides were separated on a 15-cm C18 reverse-phase column (75 μm × 485 360 μm) at a flow rate of 300 nl/min, with a 75-min linear gradient of buffer B (0.1% 486 formic acid in acetonitrile) from 2% to 60%. The MS/MS analysis was performed on the 487 Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific, San Jose, CA) using 488 standard data acquisition parameters as described previously(38). The mass spectral raw 489 files were searched against the protein database derived from the standard sequence of 490 JBU, HPU or the proteome of H. pylori using Proteome Discovery 1.4 software (Thermo 491 Fisher Scientific, San Jose, CA), with a differential modification of 274.18 m/z in the 492 case of EBS and 150.15 m/z in the case of captan. 493 Surface plasmon resonance assays 494 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 The direct interactions between panobinostat, dacinostat, ebselen or captan and JBU were 495 observed by the surface plasmon resonance (SPR) experiment with a BIAcore T200 (GE 496 Healthcare, Uppsala, Sweden). JBU was immobilized on the surface of the CM5 sensor 497 chip via the amino-coupling kit. The working solution used for the SPR assay was PBS-P 498 (10 mM Na2HPO4, 1.8 mM KH2PO4, 2.7 mM KCl, and 140 mM NaCl in presence of 5% 499 DMSO, pH 7.4). To determine the affinity of the inhibitors toward JBU, panobinostat, 500 dacinostat, EBS or captan were diluted to specific concentrations with PBS-P buffer (for 501 panobinostat: 25, 12.5 6.25, 3.125, 1.56 M; dacinostat: 100, 50, 25, 12.5 6.25, 3.125, 502 1.56 M; EBS: 1000, 500, 250, 125, 62.5, or 31.25 nM; for captan: 390.6, 195.3, 97.6, 503 48.8, 24.4, 12.2 or 6.1 nM) and subjected to the JBU-coated chips. The KD values were 504 calculated with BIAcore evaluation software (version 3.1). 505 Molecular modeling 506 The crystal structures of ureases were obtained from the Protein Data Bank (PDB code: 507 4GOA for JBU; PDB code: 1E9Y, HPU). The binding modes of panobinostat or 508 dacinostat were gathered by using the CDOCKER module of the Discovery Studio 509 software (version 3.5; Accelrys, San Diego, CA). Alternatively, AutoDock Vina was 510 initially used to dock the EBS or captan to the respective Cys-containing allosteric site of 511 JBU to obtain the appropriate configurations, enabling the reactive motifs of the 512 compounds (the Se-containing benzisoxazole of EBS and the isoindole dione moiety of 513 captan) to fall into the distance restraint of one covalent bond to the sulfur atom of the 514 reactive Cys residue. The Se-S bond or the N-S bond for isoindole dione was then 515 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 manually incorporated using the Discovery Studio 3.5 software (Accelrys, San Diego, 516 CA). Subsequently, molecular dynamics simulation was performed with AMBER14 517 software and the ff03.r1 force field(39). To relieve any steric clash in the solvated system, 518 initial minimization with the frozen macromolecule was performed using 500-step 519 steepest descent minimization and 2,000-step conjugate gradient minimization. Next, the 520 whole system was followed by 1,000-step steepest descent minimization and 19,000-step 521 conjugate gradient minimization. After these minimizations, 400-ps heating and 200-ps 522 equilibration periods were performed in the NVT ensemble at 310 K. Finally, the 100-ns 523 production runs were simulated in the NPT ensemble at 310 K. The binding modes for 524 these inhibitors were visually inspected and the best docking mode was selected. 525 Bacterial strains and culture conditions 526 Bacterial strains of H. pylori or O. anthropic were obtained from BeiNuo Life Science 527 (Shanghai, China). The strains were maintained on Columbia blood agar plates (Hopebio, 528 Shandong, China) containing 5% defibrinated sheep blood at 37 °C under microaerobic 529 conditions (5% O2, 10% CO2 and 85% N2), which was supplied by an 530 AnaeroPack-MicroAero gas generator (Mitsubishi Gas Chemical Company, Japan). After 531 a culture of 3-5 days in the plate, the bacterial colonies were scratched into the liquid 532 medium for H. pylori, containing 10% or 7% fetal bovine serum and an antibacterial 533 cocktail (composed of 10 mg/l nalidixic acid, 3 mg/l vancomycin, 2 mg/l amphotericin B, 534 5 mg/l trimethoprim and 2.5 mg/l polymyxin B sulfate; BeiNuo, Shanghai, China), and 535 microaerobically incubated for another 3 or 5 days. Then, the medium or bacterial cells 536 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 were collected for subsequent experiments. 537 A single colony of O. anthropic was inoculated into Luria-Bertani liquid medium (LB), 538 which was supplemented with 50 mg/l ampicillin, 30 mg/l kanamycin and 10% FBS 539 (Invitrogen) and cultured at 37 °C. After the bacterial culture reached an O.D. of 0.8 at 540 600 nm, the bacterial cells were collected by centrifugation for future experiments. 541 The identification of H. pylori and O. anthropic strain was carried out by PCR 542 amplification of the urease gene or 16S rRNA with known primers (Table S4), 543 LC-MS/MS analysis of proteins in the extracts, the bacterial urease activity assay or 544 Gram staining. 545 16S rRNA sequencing 546 One colony from the H. pylori or O. anthropic culture plate was suspended in 50 μl of 547 sterile water, and the DNA was liberated by a boiling-freezing method. The 16S rRNA 548 gene was selectively amplified from this crude lysate by PCR using the universal primers 549 27f and 1492r, which have been previously described (Table S4). The PCR products at 550 ~1400 bp were sequenced. The resultant 16S rRNA sequences were compared with the 551 standard nucleotide sequences deposited in GenBank with the BLAST program 552 (http://www.ncbi.nlm.nih.gov/blast/). The DNA sequences of 16S rRNA extracted from 553 these strains were confirmed to be from H. pylori or O. anthropic. 554 Preparation of crude extracts from the H. pylori and O. anthropic strains for the 555 urease activity assay 556 For the urease activity assay, H. pylori or O. anthropic was cultured accordingly in 100 557 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 ml of broth medium as described above. Bacteria were centrifuged at 5,000 rpm for 30 558 min, and the pellet was washed with phosphate-buffered saline (PBS, pH = 7.4). The 559 pellet was resuspended in 7 ml of PBS in the presence of protease inhibitors 560 (Sigma-Aldrich, Steinheim, Germany) and then sonicated for 30 min of 30 cycles (30 s 561 run and 30 s rest) using the noncontact ultrasonic rupture device (Diagenode, Liege, 562 Belgium). The resultant bacterial lysate was centrifuged twice at 12,000 rpm for 30 min; 563 the supernatant was collected and desalted using a Sephadex G-25 desalting column (Yeli, 564 Shanghai, China). The protein in the fractions was separated by 10% SDS-PAGE, and the 565 corresponding protein band for urease was quantified to determine the concentration of 566 ureases by Coomassie blue R-250 (Sinopharm, Shanghai, China) staining using bovine 567 serum albumin as a standard. The desalted fractions were stored at -80 °C in the presence 568 of 15% glycerol until usage in the activity assay. 569 Size-exclusion chromatography for the purification of urease from H. pylori 570 The crude extract from H. pylori was first centrifuged at 12,000 rpm for 30 min. One 571 milliliter of supernatant was loaded onto a gel filtration column (10 mm × 30 cm; GE 572 Healthcare) and eluted with PBS at a rate of 0.5 ml/min on an AKTA Explorer 100 FPLC 573 Workstation (GE Healthcare). The protein peaks observed were collected in Eppendorf 574 tubes in a volume between 0.5 and 1 ml. The collected fractions were separated by PAGE 575 on a 10% Tris-glycine SDS-gel and stained with Coomassie Brilliant Blue R-250 to 576 identify H. pylori urease. 577 Determination of the minimal inhibition concentration and dose-dependent 578 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 growth-inhibition curve for urease inhibitors 579 The minimal inhibition concentration (MIC) and dose-dependent growth-inhibition curve 580 for the inhibitors on H. pylori were determined using the broth dilution method(40). 581 Briefly, H. pylori was grown to an OD600 nm of 1.0 in liquid medium supplemented with 582 7% FBS under standard culture conditions. Then, 150 μl H. pylori in the diluted culture 583 (OD of 0.1) was incubated with the inhibitors at final concentrations of 1, 2, 4, 16, 32, 64, 584 128, 256, 512 μg/ml or at indicated concentrations for 72 h. The OD600 nm was measured 585 to calculate the percentage of growth inhibition. The DMSO (1% final 586 concentration)-treated H. pylori cultures and culture medium in the absence of bacteria 587 were referred as the negative control (0%) and positive control (100%), respectively. The 588 MIC was defined as the lowest concentration of inhibitor that inhibited 100% of bacterial 589 growth. The H. pylori strain was found to be resistant to tinidazole or metronidazole and 590 have an MIC of greater than 512 g/ml. 591 Bacterial-cell-based assay for measuring the activity of urease in culture 592 The endogenous activity of HPU in bacterial cultures was determined using the 593 tandem-well-based plate. Briefly, 300 μl of H. pylori culture (OD600 nm ~1.0) was treated 594 with panobinostat, dacinostat or EBS as well as EBS analogs for 6 or 24 h at different 595 concentrations (0, 3.125, 6.25, 12.5, 25, 50, 100 or 200 μM). Then, the bacterial cells 596 were centrifuged, washed and resuspended in assay buffer containing 25 mM urea. 597 Finally, the ~100 l suspension was added to the reaction well of the tandem-well plate 598 and assessed for the activity of urease with Nessler’s reagent under standard assay 599 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 conditions. 600 Gastric cell infection model of H. pylori 601 The cell infection model of H. pylori was constructed using the SGC-7901 602 adenocarcinoma gastric cell line and following an established protocol(15). Briefly, H. 603 pylori was cultured in liquid medium for H. pylori at 37 °C for 3-5 days under standard 604 culture conditions (see above). Then, H. pylori at a concentration of 1.5  106 CFU/ml 605 was treated with the indicated inhibitors for 24 h in culture. The bacterial suspension 606 together with 10 mM urea were subsequently added to the culture medium of SGC-7901 607 cells (MOI = 30), which had been cultured with RPMI 1640 medium plus 10% FBS in a 608 96-well plate for one day, and coincubated with the cells for an additional 24 h. Cell 609 images were obtained at specific time points prior to and one day after addition of the 610 bacterial culture using Image Xpress Micro® XLS (Molecular Devices, Sunnyvale, CA) 611 under a 20  objective lens. The cell numbers in the images were quantified using Image 612 Xpress Software. The protective effects of the inhibitors were calculated by dividing the 613 number of SGC-7901 cells after the 24-h treatment by that prior to the treatment (100%) 614 in the same well. 615 616 617 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 DATA AVAILABILITY 618 All data are contained within the manuscript. 619 CONFLICT OF INTEREST 620 The authors declare no conflicts of interest. 621 ACKNOWLEDGEMENTS 622 We thank David Sullivan, Jun Liu and Curtis Chong of Johns Hopkins University for 623 providing the Johns Hopkins Clinical Compound Library. We thank Prof. S.C. Tao 624 (Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 625 China) for kindly providing the SGC-7901 cell line. We thank Dr. J.R. Xu (Department 626 of Radiology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, 627 Shanghai, China) for assisting with the surface plasmon resonance assay experiment. 628 Funding 629 This work was supported by the National Natural Science Foundation of China 630 (31870763, 21834005), the Natural Science Foundation of Shanghai (18ZR1419500), the 631 Shanghai Foundation for the Development of Science and Technology (19JC1413000), 632 and the Research Fund of Medicine and Engineering of Shanghai Jiao Tong University 633 (YG2019QNB27). 634 AUTHOR CONTRIBUTIONS 635 F.L., J.Y., J.Y.X., X.Y.W. and F.W. designed the study, and analyzed the data. F.Z.L. and 636 Y.X.Z. synthesized analogs of EBS lead. Y.Y.Z. constructed the assay and performed the 637 high-throughput screening. H.Q.F. and L.J.L. performed the LC-MS/MS analysis. Q.L. 638 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 and Z.P.X. confirmed the inhibitory activity of compounds. S.S.H performed the 639 molecular simulation. F.L., X.Y.W. and F.W. wrote the paper. All authors reviewed the 640 results and approved the final version of the manuscript. 641 642 643 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 REFERENCES 644 1. Maroney, M. J., and Ciurli, S. (2014) Nonredox nickel enzymes. Chemical Reviews 114, 645 4206-4228 646 2. Ha, N. C., Oh, S. T., Sung, J. Y., Cha, K. A., Lee, M. H., and Oh, B. H. (2001) 647 Supramolecular assembly and acid resistance of Helicobacter pylori urease. Nature Structural 648 Biology 8, 505-509 649 3. Mazzei, L., Cianci, M., Benini, S., and Ciurli, S. (2019) The Structure of the Elusive 650 Urease-Urea Complex Unveils the Mechanism of a Paradigmatic Nickel-Dependent Enzyme. 651 Angewandte Chemie 58, 7415-7419 652 4. Mobley, H. L., and Hausinger, R. P. (1989) Microbial ureases: significance, regulation, and 653 molecular characterization. Microbiological Reviews 53, 85-108 654 5. Debowski, A. W., Walton, S. M., Chua, E. G., Tay, A. C., Liao, T., Lamichhane, B., 655 Himbeck, R., Stubbs, K. A., Marshall, B. J., Fulurija, A., and Benghezal, M. (2017) 656 Helicobacter pylori gene silencing in vivo demonstrates urease is essential for chronic 657 infection. PLoS Pathogens 13, e1006464 658 6. Armbruster, C. E., Forsyth-DeOrnellas, V., Johnson, A. O., Smith, S. N., Zhao, L., Wu, W., 659 and Mobley, H. L. T. (2017) Genome-wide transposon mutagenesis of Proteus mirabilis: 660 Essential genes, fitness factors for catheter-associated urinary tract infection, and the impact 661 of polymicrobial infection on fitness requirements. PLoS Pathogens 13, e1006434 662 7. Dunn, B. E., Campbell, G. P., Perez-Perez, G. I., and Blaser, M. J. (1990) Purification and 663 characterization of urease from Helicobacter pylori. The Journal of Biological Chemistry 265, 664 9464-9469 665 8. Norsworthy, A. N., and Pearson, M. M. (2017) From Catheter to Kidney Stone: The 666 Uropathogenic Lifestyle of Proteus mirabilis. Trends in Microbiology 25, 304-315 667 9. Mora, D., and Arioli, S. (2014) Microbial urease in health and disease. PLoS Pathogens 10, 668 e1004472 669 10. Debraekeleer, A., and Remaut, H. (2018) Future perspective for potential H elicobacter pylori 670 eradication therapies. Future Microbiol 13, 671-687 671 11. Macegoniuk, K., Grela, E., Palus, J., Rudzinska-Szostak, E., Grabowiecka, A., Biernat, M., 672 and Berlicki, L. (2016) 1,2-Benzisoselenazol-3(2H)-one Derivatives As a New Class of 673 Bacterial Urease Inhibitors. Journal of Medicinal Chemistry 59, 8125-8133 674 12. Diaz-Sanchez, A. G., Alvarez-Parrilla, E., Martinez-Martinez, A., Aguirre-Reyes, L., 675 Orozpe-Olvera, J. A., Ramos-Soto, M. A., Nunez-Gastelum, J. A., Alvarado-Tenorio, B., and 676 de la Rosa, L. A. (2016) Inhibition of Urease by Disulfiram, an FDA-Approved Thiol 677 Reagent Used in Humans. Molecules 21, E1628 678 13. Yu, X. D., Zheng, R. B., Xie, J. H., Su, J. Y., Huang, X. Q., Wang, Y. H., Zheng, Y. F., Mo, 679 Z. Z., Wu, X. L., Wu, D. W., Liang, Y. E., Zeng, H. F., Su, Z. R., and Huang, P. (2015) 680 Biological evaluation and molecular docking of baicalin and scutellarin as Helicobacter pylori 681 urease inhibitors. Journal of Ethnopharmacology 162, 69-78 682 14. Xiao, Z. P., Peng, Z. Y., Dong, J. J., Deng, R. C., Wang, X. D., Ouyang, H., Yang, P., He, J., 683 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 Wang, Y. F., Zhu, M., Peng, X. C., Peng, W. X., and Zhu, H. L. (2013) Synthesis, molecular 684 docking and kinetic properties of beta-hydroxy-beta-phenylpropionyl-hydroxamic acids as 685 Helicobacter pylori urease inhibitors. European Journal of Medicinal Chemistry 68, 212-221 686 15. Yang, X., Koohi-Moghadam, M., Wang, R., Chang, Y. Y., Woo, P. C. Y., Wang, J., Li, H., 687 and Sun, H. (2018) Metallochaperone UreG serves as a new target for design of urease 688 inhibitor: A novel strategy for development of antimicrobials. PLoS Biology 16, e2003887 689 16. Malfertheiner, P., Megraud, F., O'Morain, C. A., Atherton, J., Axon, A. T., Bazzoli, F., 690 Gensini, G. F., Gisbert, J. P., Graham, D. Y., Rokkas, T., El-Omar, E. M., and Kuipers, E. J. 691 (2012) Management of Helicobacter pylori infection--the Maastricht IV/ Florence Consensus 692 Report. Gut 61, 646-664 693 17. Malfertheiner, P., Megraud, F., O'Morain, C. A., Gisbert, J. P., Kuipers, E. J., Axon, A. T., 694 Bazzoli, F., Gasbarrini, A., Atherton, J., Graham, D. Y., Hunt, R., Moayyedi, P., Rokkas, T., 695 Rugge, M., Selgrad, M., Suerbaum, S., Sugano, K., and El-Omar, E. M. (2017) Management 696 of Helicobacter pylori infection-the Maastricht V/Florence Consensus Report. Gut 66, 6-30 697 18. Graham, D. Y., and Shiotani, A. (2008) New concepts of resistance in the treatment of 698 Helicobacter pylori infections. Nature Clinical Practice. Gastroenterology & Hepatology 5, 699 321-331 700 19. Pierce, C. W. H., E. L.; Sawyer, D. T. (1958) Quantitative Analysis, John Wiley & Sons, New 701 York 702 20. Zhang, Q., Tang, X., Hou, F., Yang, J., Xie, Z., and Cheng, Z. (2013) Fluorimetric urease 703 inhibition assay on a multilayer microfluidic chip with immunoaffinity immobilized enzyme 704 reactors. Analytical Biochemistry 441, 51-57 705 21. T. T. Ngo, A. P. H. P., C. F. Yam, and Lenhoff. (1982) Interference in Determination of 706 Ammonia with the Hypochlorite-Alkaline Phenol Method of Berthelot. Anal Chem 54, 46-49 707 22. Tarsia, C., Danielli, A., Florini, F., Cinelli, P., Ciurli, S., and Zambelli, B. (2018) Targeting 708 Helicobacter pylori urease activity and maturation: In-cell high-throughput approach for drug 709 discovery. Biochimica et Biophysica Acta. General subjects 1862, 2245-2253 710 23. Alonso, C. A., Kwabugge, Y. A., Anyanwu, M. U., Torres, C., and Chah, K. F. (2017) 711 Diversity of Ochrobactrum species in food animals, antibiotic resistance phenotypes and 712 polymorphisms in the blaOCH gene. FEMS Microbiology Letters 364 713 24. Zhou, Y., Yu, J., Lei, X., Wu, J., Niu, Q., Zhang, Y., Liu, H., Christen, P., Gehring, H., and 714 Wu, F. (2013) High-throughput tandem-microwell assay identifies inhibitors of the hydrogen 715 sulfide signaling pathway. Chemical Communications 49, 11782-11784 716 25. Croppi, G., Zhou, Y., Yang, R., Bian, Y., Zhao, M., Hu, Y., Ruan, B. H., Yu, J., and Wu, F. 717 (2020) Discovery of an Inhibitor for Bacterial 3-Mercaptopyruvate Sulfurtransferase that 718 Synergistically Controls Bacterial Survival. Cell Chem Biol 27, 1483-1499 719 26. Upvan Narang, P. N. P., and Frank V. Bright. (1994) A Novel Protocol To Entrap Active 720 Urease in a Tetraethoxysilane-Derived Sol-Gel Thin-Film Architecture. Chem. Mater. 6, 721 1596-1598 722 27. Bloomster, T. G., and Lynn, R. J. (1981) Effect of antibiotics on the dynamics of color 723 change in Ureaplasma urealyticum cultures. Journal of Clinical Microbiology 13, 598-600 724 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 28. Skrott, Z., Mistrik, M., Andersen, K. K., Friis, S., Majera, D., Gursky, J., Ozdian, T., 725 Bartkova, J., Turi, Z., Moudry, P., Kraus, M., Michalova, M., Vaclavkova, J., Dzubak, P., 726 Vrobel, I., Pouckova, P., Sedlacek, J., Miklovicova, A., Kutt, A., Li, J., Mattova, J., Driessen, 727 C., Dou, Q. P., Olsen, J., Hajduch, M., Cvek, B., Deshaies, R. J., and Bartek, J. (2017) 728 Alcohol-abuse drug disulfiram targets cancer via p97 segregase adaptor NPL4. Nature 552, 729 194-199 730 29. Krippendorff, B. F., Neuhaus, R., Lienau, P., Reichel, A., and Huisinga, W. (2009) 731 Mechanism-based inhibition: deriving K(I) and k(inact) directly from time-dependent IC(50) 732 values. Journal of Biomolecular Screening 14, 913-923 733 30. Lieberman, O. J., Orr, M. W., Wang, Y., and Lee, V. T. (2014) High-throughput screening 734 using the differential radial capillary action of ligand assay identifies ebselen as an inhibitor 735 of diguanylate cyclases. ACS Chemical Biology 9, 183-192 736 31. Goldie, J., Veldhuyzen van Zanten, S. J., Jalali, S., Richardson, H., and Hunt, R. H. (1991) 737 Inhibition of urease activity but not growth of Helicobacter pylori by acetohydroxamic acid. 738 Journal of Clinical Pathology 44, 695-697 739 32. Singh, N., Halliday, A. C., Thomas, J. M., Kuznetsova, O. V., Baldwin, R., Woon, E. C., 740 Aley, P. K., Antoniadou, I., Sharp, T., Vasudevan, S. R., and Churchill, G. C. (2013) A safe 741 lithium mimetic for bipolar disorder. Nature Communications 4, 1332 742 33. Chari, A., Cho, H. J., Dhadwal, A., Morgan, G., La, L., Zarychta, K., Catamero, D., Florendo, 743 E., Stevens, N., Verina, D., Chan, E., Leshchenko, V., Lagana, A., Perumal, D., Mei, A. H., 744 Tung, K., Fukui, J., Jagannath, S., and Parekh, S. (2017) A phase 2 study of panobinostat with 745 lenalidomide and weekly dexamethasone in myeloma. Blood Advances 1, 1575-1583 746 34. Nussinov, R., and Tsai, C. J. (2015) The design of covalent allosteric drugs. Annual Review of 747 Pharmacology and Toxicology 55, 249-267 748 35. Hancock, R. E. (1997) Peptide antibiotics. Lancet 349, 418-422 749 36. Zhang, J. H., Chung, T. D., and Oldenburg, K. R. (1999) A Simple Statistical Parameter for 750 Use in Evaluation and Validation of High Throughput Screening Assays. Journal of 751 Biomolecular Screening 4, 67-73 752 37. Irwin, J. J., and Shoichet, B. K. (2016) Docking Screens for Novel Ligands Conferring New 753 Biology. Journal of Medicinal Chemistry 59, 4103-4120 754 38. Wei, W., Mao, A., Tang, B., Zeng, Q., Gao, S., Liu, X., Lu, L., Li, W., Du, J. X., Li, J., Wong, 755 J., and Liao, L. (2017) Large-Scale Identification of Protein Crotonylation Reveals Its Role in 756 Multiple Cellular Functions. Journal of Proteome Research 16, 1743-1752 757 39. Maier, J. A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K. E., and Simmerling, C. 758 (2015) ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters 759 from ff99SB. J Chem Theory Comput 11, 3696-3713 760 40. Palacios-Espinosa, J. F., Arroyo-Garcia, O., Garcia-Valencia, G., Linares, E., Bye, R., and 761 Romero, I. (2014) Evidence of the anti-Helicobacter pylori, gastroprotective and 762 anti-inflammatory activities of Cuphea aequipetala infusion. Journal of Ethnopharmacology 763 151, 990-998 764 765 766 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 767 Figure 1 Development of a new high-throughput assay for urease and the discovery of new 768 urease inhibitors. (A) Diagram of the tandem-well-based assay for the NH3-producing enzyme. The 769 procedures for the assays and the cross-section of a tandem-well are shown. Blue, the reaction reagent; 770 red, the detection reagent for NH3. (B) Validation of the urease assay with the known inhibitor AHA. 771 (C) Well-to-well reproducibility of the 192-tandem-well-based assay for urease. ●, 2% DMSO 772 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 (control, 100%); ■, 200 μM AHA; ▲, 800 μM AHA (n = 60). (D) High-throughput inhibitor 773 screening for JBU with 192-tandem-well plates. Compound concentration: 100 M. (E-F) 774 Dose-dependent effects of panobinostat, dacinostat, EBS, captan and disulfiram on the activity of JBU 775 (E), human CBS (F) or human CSE (F). Means ± SDS (n = 3). All experiments except the primary 776 screening (D) were independently repeated at least twice, and one representative result is presented. 777 778 779 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 780 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 Figure 2 Panobinostat, dacinostat, EBS and captan inhibit the activity of JBU. (A) Panobinostat 781 and dacinostat are reversible inhibitors, whereas EBS and captan are covalent inhibitors or 782 slow-binding inhibitors toward JBU. Means ± SDs (n = 3). (B) Effects of the incubation period on the 783 IC50 values of panobinostat and dacinostat toward JBU. Panobinostat and dacinostat were 784 preincubated with JBU for the indicated times before performing the standard assay to analyze their 785 inhibitory effects. Means ± SDs (n = 3). (C) Inhibition of JBU by panobinostat or dacinostat as a 786 function of urea concentration. Ki values for panobinostat and dacinostat, 0.02 μM and 0.07 μM, 787 respectively. Means ± SDs (n=3). (D) Surface plasmon resonance assay analysis of the binding of 788 panobinostat or dacinostat to JBU. KD were calculated using Biacore evaluation software and listed in 789 Table 1. (E) The putative binding mode of panobinostat or dacinostat in the JBU active site. 790 Panobinostat and dacinostat were docked into the JBU crystal structure (PDB code: 4GOA) using the 791 Discovery Studio software. Residues surrounding the inhibitor within a distance of 3.5 Å are shown in 792 gray; and hydrogen bonds are represented as green dotted lines. The experiments were independently 793 repeated at least twice, and one representative result is presented. 794 795 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 41 796 Figure 3 EBS or captan allosterically inhibits the activity of urease by covalently modifying a 797 non-active-site Cys residue. (A) The synergistic inhibitory effects of the combinations of EBS, captan 798 or AHA. A dose-dependent synergistic effect of the combination of EBS at the indicated concentrations 799 with 2 M captan was observed (right panel). Data are presented as percentages of the controls (DMSO 800 and 2 M captan alone in the left panel and right panel, respectively, 100%). Means ± SDs (n=3). (B) 801 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 42 Inhibition of JBU by EBS or captan as a function of the urea concentration. αKi for EBS and captan, 0.8 802 μM and 1.1 μM, respectively. Means ± SDs (n=3). (C) Tandem mass spectrometry analysis of the 803 modification site of EBS and captan on JBU. The Cys modification of EBS and captan on JBU were 804 illustrated in the right panels. (D) Surface plasmon resonance assay analysis of the binding of EBS or 805 captan to JBU. (E) The potential binding modes of EBS and captan in JBU. EBS and captan were 806 modeled into the respective allosteric sites presented in the crystal structure of JBU (PDB code: 4GOA; 807 METHODS). The residues within 3.5 Å surrounding the EBS and captan are shown. Hydrogen bonds 808 are indicated as dashed green lines. The experiments were independently repeated at least twice, and 809 one representative result is presented. 810 811 812 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 43 813 Figure 4 Urease inhibitors suppress bacterial ureases or the growth of urease-containing 814 bacteria. (A) Dose-dependent effects of panobinostat, dacinostat, EBS, captan, disulfiram and AHA 815 on the activity of H. pylori urease (HPU, upper panel) or O. anthropic urease (OAU, lower panel) in 816 vitro. (B) Panobinostat, dacinostat, EBS, captan and disulfiram inhibit the activity of purified HPU 817 from size-exclusion chromatography. Chromatography of the purification is shown in the left panel. 818 The collected fractions (numbers 1-12) of the peaks (left panel), as well as the crude extract (number 819 0), were separated by 10% SDS-PAGE and stained with Coomassie Brilliant Blue R-250 (middle 820 panel). The arrows indicate the peak of H. pylori urease (left panel) or subunit A or B of H. pylori 821 urease (middle panel). The collected sample containing the urease (number 3) was tested to evaluate 822 the inhibitory effects of indicated compounds (right panel). The protein identity of fraction 3 was 823 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 44 analyzed by LC-MS/MS (METHODS and Figure S5). (C) The inhibitory effects of panobinostat, 824 dacinostat and newly synthesized EBS analogs (1, 4 and 6) on the activity of HPU in culture. 825 Inhibitors were incubated with the H. pylori bacteria for 6 h. (D) The effects of panobinostat, 826 dacinostat, EBS and its derivatives on the growth of H. pylori. Mean ± SD (n=3). All experiments 827 were independently repeated at least twice, and one representative result is presented. 828 829 830 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 45 831 Figure 5 Panobinostat, dacinostat and EBS inhibits the virulence of H. pylori in cultured gastric 832 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 46 cells. SGC-7901 cells were infected with HP in the presence of 30 M panobinostat (A), 30 M 833 dacinostat (A), 30 M AHA (A), 20 M EBS (B), 20 M disulfiram or 50 M tinidazole (B) for 24 h 834 before capturing the images in bright field by Image Xpress Micro® XLS (Molecular Devices, 835 Sunnyvale, CA) under a 20 × objective lens. A representative image for each treatment condition is 836 shown (n = 3). Scale bars, 100 m. The cell numbers before treatment (100%) or after 24 h of 837 treatment were quantified. (C) The effects of urease inhibitors on the NH3 amount of the cell culture 838 medium. After the treatment, the amount of NH3 in the cell medium of the corresponding samples was 839 quantified with Nessler’s reagent, and the data are shown as percentages of the control (DMSO, 840 100%). Means ± SDs (n=3). Statistical analyses were performed using the raw data by one-way 841 ANOVA with Bonferroni posttests. n.s., no significance; *, p< 0.05; **, p< 0.01; ***, p < 0.001. All 842 experiments were independently repeated twice, and one representative result is presented. 843 844 845 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 47 846 Figure 6. Structure-activity relationships of panobinostat, dacinostat, EBS and captan. (A) The 847 effects of commercially available analogs of panobinostat and dacinostat, newly synthesized EBS 848 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 48 derivatives and commercially available EBS or captan analogs on the activity of JBU. DMSO, 100%. 849 Mean ± SD (n=3). The experiments were independently repeated at least twice, and one representative 850 result is presented. (B) The illustration charts for the structure-activity relationships of hydroxamic 851 acid analogs, EBS or captan. 852 853 854 855 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 49 Table 1 Indication, chemical structure, IC50, αKi, or KD values of urease inhibitors. 856 857 858 aFrom the enzyme kinetic study 859 bAssay was performed in 50 mM Tris buffer (pH= 7.4). 860 861 862 863 864 Name Application Structure IC50 (M); JBU IC50 (M); HPU IC50 (M); OAU αKi or Ki (M)a IC50 (M); hCBS IC50 (M); hCSE KD (M) Panobinostat Anticancer N H HN NH O OH 0.2 ± 0.006 0.1 ± 0.01 0.07 ± 0.006 0.02 ± 0.01 > 200.0 > 200.0 8.9 ± 0.4 Dacinostat Anticancer N H N NH O OH HO 1.1 ± 0.005 0.2 ± 0.009 0.1 ± 0.01 0.07 ± 0.02 > 200.0 > 200.0 5.3 ± 0.2 Ebselen Anti-stroke; Anti-bipolar Se N O 0.4 ± 0.07 2.8 ± 0.5 3.0 ± 1.0 0.8 ± 0.2 > 200.0 44.3 ± 1.3 0.089 ± 0.005 Captan Pharmaceutical excipient; Fungicide N O O S CCl3 2.3 ± 0.2 3.4 ± 0.5 5.8 ± 1.6 1.1 ± 0.2 > 200.0 > 200.0 0.096 ± 0.006 Disulfiram Alcohol deterrent CH3 CH3 NS SN H3C H3C S S 38.9 ± 2.7 8.9 ± 1.5 35.0 ± 0.1 - > 200.0 > 200.0 - Acetohydrox amic acid Urinary tract infections NO OH 161.8 ±13.4 33.7 ± 1.0b 25.9 ± 1.2 2.8 ± 0.9 2.1 ± 0.8 > 200.0 > 200.0 - .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S1 Supplementary Information 1 High-throughput Tandem-microwell Assay for Ammonia Repositions 2 FDA-Approved Drugs to Helicobacter Pylori Infection 3 Fan Liu,a,b,# Jing Yu,b,# Yan-Xia Zhang,c Fangzheng Li,a, d Qi Liu,e Yueyang Zhou,a 4 Shengshuo Huang,b Houqin Fang,f Zhuping Xiao,e Lujian Liao,f Jinyi Xu,d Xin-Yan Wu,c 5 Fang Wu a,* 6 7 8 aKey Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for 9 Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 200240, China 10 bState Key Laboratory of Microbial Metabolism, Sheng Yushou Center of Cell Biology 11 and Immunology, School of Life Science and Biotechnology, Shanghai Jiao Tong 12 University, Shanghai, 200240, China 13 cSchool of Chemistry & Molecular Engineering, East China University of Science and 14 Technology, Shanghai, 200237, China. 15 dState Key Laboratory of Natural Medicines and Department of Medicinal Chemistry, 16 China Pharmaceutical University, Nanjing, 210009, China 17 eHunan Engineering Laboratory for Analyse and Drugs Development of Ethnomedicine 18 in Wuling Mountains, Jishou University, Hunan, 416000, China 19 fShanghai Key Laboratory of Regulatory Biology, School of Life Sciences, East China 20 Normal University, Shanghai, 200241, China. 21 #These authors contributed equally to this work. 22 *To whom correspondence may be addressed. Emails: fang.wu@sjtu.edu.cn 23 24 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S2 Table of Contents 25 EXPERIMENTAL PROCEDURES .............................................................................. S3 26 Figure S1. Development and optimization of the high-throughput assay for urease. .... S12 27 Figure S2. Validation of on-target inhibition of panobinostat, dacinostat, EBS, captan and 28 disulfiram on JBU. .......................................................................................................... S14 29 Figure S3. The mode of action of panobinostat, dacinostat and disulfiram in vitro . .... S16 30 Figure S4. The mode of action of EBS and captan in vitro. .......................................... S18 31 Figure S5. The identification of HPU from extracts of H. pylori by LC-MS/MS. ........ S20 32 Figure S6. EBS and 1 is a long-acting inhibitor for HPU in culture. ............................. S21 33 Figure S7. The effects of inhibitors on the cell viability of gastric SGC-7901 cells and 34 antibiotic resistance of the H. pylori strain. .................................................................... S22 35 Figure S8. The binding modes of inhibitors in ureases. ................................................. S24 36 Table S1. Chemical structures and IC50 values of EBS or captan analogs for ureases ......... 37 ......................................................................................................................................... S26 38 Table S2. The minimal inhibitory concentration of urease inhibitors or known antibiotics 39 for inhibiting H. pylori and their IC50 values in the in cellulo urease assay ................... S27 40 Table S3. Chemical structures and IC50 values of hydroxamic acid-based analogs for 41 ureases ............................................................................................................................. S28 42 Table S4. Primer sequences. ........................................................................................... S29 43 Reference ....................................................................................................................... S30 44 45 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S3 EXPERIMENTAL PROCEDURES 46 Synthesis of EBS analogs 1-6 47 Compound 1-6 were synthesized according to literature procedure(1-3), as shown in 48 Scheme S1. The chemical reagents and solvents are purchased from commercial sources, 49 and used without further purification, unless stated otherwise. 1H NMR spectra for these 50 compounds were recorded with Bruker 400 spectrometer. The chemical shifts of 1H NMR 51 spectra were referenced to tetramethylsilane (δ 0.00 ppm). 52 53 54 Scheme S1. Synthesis of compounds 1-6. 55 Reagents and conditions: (a) HCl, NaNO2, 0 oC, 0.5 h; (b) Na2Se2, 60 oC, 3 h; (c) SOCl2, 56 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S4 85 oC, 3 h; (d) R2NH2, Et3N, CH2Cl2, rt, 4.5 h; (e) Br2, CH2Cl2, reflux, overnight; (f) 57 Cu(NO3)2.xH2O, Et3N, toluene, reflux. 58 General procedure for synthesis of Compounds 1, 4, 5 and 6 (Route A). 59 The 2-aminobenzoic acid or its derivative was treated with hydrochloric acid (2.50 equiv.) 60 and sodium nitrite (1.06 equiv.) in water (0.7 M) at 0 °C to form the corresponding 61 diazonium salt. Then, the diazonium salt solution was added dropwise to a solution of 62 Na2Se2 (0.87 equiv., fresh prepared from selenium powder and NaBH4 in water) at 0 °C 63 under Argon. The stirring was continued at 60 °C for 3 h. After work-up, crude 64 2,2’-diseleno-dibenzoic acid was obtained. Sequentially, the acid was further converted 65 to 2-(chloroseleno)benzoyl chloride with excess SOCl2 and one drop of DMF at 85 oC for 66 3 h. After the removal of thionyl chloride, the crude compound was obtained, and which 67 was treated with different amines (1.2 equiv.) and Et3N (2.0 equiv.) in CH2Cl2 (0.1 M) 68 under Argon to afford products 1 and 4-6, respectively. Silica gel column 69 chromatography was used to purify these compounds, and their HPLC purity was more 70 than 99%. 71 72 2-Phenyl-6-methoxybenzoisoselen-3-one (1) 73 4-Methoxy-2-aminobenzoic acid and aniline were used to give the compound. 1H NMR 74 (400 MHz, CDCl3): δ 8.01 (d, J = 8.8 Hz, 1H), 7.62 (dd, J = 7.6, 0.8 Hz, 2H), 7.43 (t, J = 75 8.0 Hz, 2H), 7.29-7.24 (m, 1H), 7.11 (d, J = 2.0 Hz, 1H), 7.01 (dd, J = 8.4, 2.0 Hz, 1H), 76 3.92 (s, 3H). MS (m/z): 305.0 [M+H]+. 77 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S5 Benzisoselenol-3-one (4) 78 o-Aminobenzoic acid and ammonia were used to give the product. 1H NMR (400 MHz, 79 d6-DMSO): δ 9.17 (br, 1H), 8.06 (d, J = 8.1 Hz, 1H), 7.81 (dd, J = 8.0, 0.8 Hz, 1H), 7.61 80 (td, J = 7.6, 1.2 Hz, 1H), 7.42 (td, J = 7.6, 0.8 Hz, 1H). MS (m/z): 198.9 [M+H]+. 81 2-Propyl-benzisoselenol-3-one (5) 82 o-Aminobenzoic acid and n-propylamine were used to give the product. 1H NMR (400 83 MHz, CDCl3) δ 8.05 (d, J = 8.0 Hz, 1H), 7.63 (d, J = 7.6 Hz, 1H), 7.58 (td, J = 7.6, 1.2 84 Hz, 1H), 7.45-7.40 (m, 1H), 3.83 (t, J = 7.2 Hz, 2H), 1.76 (hex, J = 7.2 Hz, 2H), 1.00 (t, J 85 = 7.2 Hz, 3H). MS m/z: 242.0 [M+H]+. 86 2-Methylthio-benzisoseleno-3-one (6) 87 o-Aminobenzoic acid and thiourea were used to give the product. 1H NMR (400 MHz, 88 d6-DMSO): δ 10.21 (d, J = 0.8 Hz, 1H), 9.98 (d, J = 1.2 Hz, 1H), 8.00 (d, J = 8.4 Hz, 1H), 89 7.88 (d, J = 8.0 Hz, 1H), 7.71 (td, J = 8.0, 1.2 Hz, 1H), 7.45 (t, J = 7.6 Hz,1H). MS (m/z): 90 240.0 [M-NH3] -. 91 92 Synthesis of compound 2. 93 Compound 2 was prepared according to route B (Scheme S1). 2,2'-Dithiobis-benzoic acid 94 was reacted with bromine in CH2Cl2 under reflux and Argon, and then treated with 95 aniline and Et3N in CH2Cl2 at room temperature. After purified the crude product by 96 column chromatography, compound 2 was obtained. 1H NMR (400 MHz, CDCl3): δ 8.11 97 (d, J = 7.6 Hz, 1H), 7.73-7.69 (m, 2H), 7.68-7.65 (m, 1H), 7.51-7.43 (m, 3H), 7.59 (d, J = 98 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S6 8.0 Hz, 1H), 7.33 (t, J = 7.6 Hz, 1H). MS (m/z): 227.0 [M]+. 99 100 Synthesis of compound 3. 101 Compound 3 was synthesized according to route C (Scheme S1). A Schlenk tube 102 equipped with a stirrer bar was charged with isoindoline-1,3-dione, diphenyliododnium 103 salt (2.05 equiv.) and Cu(NO3)2.xH2O (0.1 equiv.) in dry toluene (0.1 M) under Argon. 104 The mixture was heated to 70 °C, followed by the addition of Et3N (1.5 equiv.). After 105 stirring at 70 °C for 8.5 h (monitoring by TLC), the resulting mixture was continued 106 stirring at room temperature overnight. Then, the mixture was concentrated and the 107 residue was purified by column chromatography. 1H NMR (400 MHz, CDCl3): δ 108 7.99-7.94 (m, 2H), 7.80 (dd, J = 5.6, 3.2 Hz, 2H), 7.55-7.49 (m, 2H), 7.47-7.39 (m, 3H). 109 MS (m/z): 223.1 [M]+. 110 111 HPLC method and purity analysis 112 The purity of compounds 1-5, ebselen oxide or dibenzyl diselenide was analyzed on a 113 Waters sunfire silica column (4.6×250mm; Waters, Milford, MA), which is coupled to a 114 Waters HPLC system (e2695). 3 l compound was injected onto the column and 115 separated by a gradient elution [0 min: 95% phase A (hexane), 5% phase B (isopropyl 116 alcohol); 15 min: 60% phase A (hexane), 40% phase B (isopropyl alcohol)] at a flow rate 117 of 0.7 ml/min under room temperature. 118 Similarly, the purity of compound 6 was resolved on a Waters PHERISORB CN column 119 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S7 (4.6×250mm, Waters). 5 l compound 6 was injected onto the column and analyzed at a 120 flow rate of 0.7 ml/min with an isocratic elution of solvent, which is composed of 75% 121 hexane and 25% isopropyl alcohol. 122 The absorbance of the compounds were monitored at a wavelength of 230 nm, and the 123 corresponding spectra were recorded and analyzed for the determination of the purity. 124 125 The purity of EBS analogs, which were newly synthesized in house (Compound 1-6) 126 or obtained from commercial sources (for Ebselen oxide and dibenzyl diselenide), 127 were analyzed by HPLC (for details, see above). 128 129 Compound 1 130 Determined Purity: > 99%; Retention time: 11.30 min 131 132 133 134 135 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S8 Compound 2 136 Determined Purity: > 99%; Retention time: 7.72 min 137 138 139 140 141 Compound 3 142 Determined Purity: > 99%; Retention time: 6.85 min 143 144 145 146 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S9 147 Compound 4 148 Determined Purity: > 99%; Retention time: 13.55 min 149 150 151 152 153 154 155 156 157 158 Compound 5 159 Determined Purity: > 99%; Retention time: 10.85 min 160 161 162 163 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S10 164 Compound 6 165 Determined Purity: > 97%; Retention time: 7.71 min 166 167 168 169 170 171 172 173 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S11 Ebselen oxide (Cayman) 174 Determined Purity: 95%; Retention time: 8.62 min 175 176 177 178 Ddibenzyl diselenide (Cayman) 179 Determined Purity: > 99%; Retention time: 4.78 min 180 181 182 183 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S12 184 185 Figure S1. Development and optimization of the high-throughput assay for urease. 186 Three types of detection reagents, i.e., salicylic acid-hypochlorite (A), Nessler’s reagent 187 (B), and phenol red (C), were used to detect the released NH3 generated by JBU. The 188 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S13 assay was monitored in the presence of various concentrations of JBU and 25 mM urea. 189 The absorbance (O.D.) values at 697 nm, 420 nm or 435 nm were recorded accordingly. 190 (D) Standard curve of the absorbance of indophenol blue at 697 nm versus the NH4Cl 191 concentration. Various concentrations of NH4Cl were mixed with the detection reagent 192 salicylic acid-hypochlorite before measurement of the absorbance at 697 nm in a 193 microplate reader. (E) The pH profile of the activity of JBU. The 50 mM phosphate 194 buffer (■) was used to maintain the pH between 6 and 8, and 50 mM Tris-HCl (●) was 195 used for pH 7 to 9. JBU was dissolved in the respective buffers and assayed at a final 196 concentration of 50 nM. (F-G) The comparison between salicylic acid-hypochlorite and 197 Nessler’s detection reagent for the detection of HPU activity. The assay was performed to 198 detect the urease activity in the extract from H. pylori with salicylic acid-hypochlorite 199 (left panel) and Nessler’s detection reagent (right panel) in the presence of 25 mM urea. 200 Data are presented as the mean ± SD (n=3). The curves were fitted to the data points with 201 GraphPad Prism 5. All the experiments were independently repeated twice, and one 202 representative result is presented. 203 204 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S14 205 Figure S2. Validation of on-target inhibition of panobinostat, dacinostat, EBS, 206 captan and disulfiram on JBU. (A) NH3 did not interfere with the inhibitors. 5 mM 207 NH3·H2O was incubated with various concentration of panobinostat, dacinostat, EBS, 208 captan or disulfiram in assay buffer. The volatile NH3 was analyzed with salicylic 209 acid-hypochlorite detection reagent (OD697 nm). (B) Triton X-100 did not affect either the 210 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S15 activity of JBU or the inhibition potency of panobinostat, dacinostat, EBS, captan or 211 disulfiram as well as EBS analogs. Various concentrations of Triton X-100 were tested for 212 their effects on the activity of JBU. Additionally, the indicated concentrations of 213 panobinostat, dacinostat, EBS, EBS Oxide, captan, 1, 4, 6 or disulfiram were assayed in the 214 presence or absence of 1/10000 Triton X-100 (v/v) to determine whether their inhibitory 215 mechanisms occurred via colloidal aggregation (METHODS)(4). The results are shown as 216 percentages of the respective control (DMSO or H2O, 100%). Mean ± SD (n=3). All 217 experiments were independently repeated twice, and one representative result is presented. 218 219 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S16 220 Figure S3. The mode of action of panobinostat, dacinostat and disulfiram in vitro. (A) 221 The effect of NiCl2 on the inhibition of JBU by panobinostat or dacinostat. NiCl2 at a 222 concentration of 25, 50 or 100 M was added into the assay that is with the various 223 concentrations of panobinostat or dacinostat under standard assay conditions. (B) Effects 224 of cysteine and histidine on the inhibition of JBU with panobinostat and dacinostat. The 225 assay samples were incubated with the indicated concentrations of panobinostat or 226 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S17 dacinostat in the presence or the absence of 100 M Cys or 100 M His. The results are 227 shown as percentages of the control (DMSO, 100%). (C) Reversibility of the inhibition of 228 JBU by disulfiram. After incubation with JBU at 200, 100 μM for 60 min, disulfiram was 229 diluted 200-fold in assay buffer. The diluted concentrations for disulfiram are 1 μM and 230 0.5 μM, respectively, which do not inhibit JBU (Fig. 1E). After a further incubation for 231 0.5 h, the remaining activity of JBU was measured accordingly (METHODS). And the 232 effect of NiCl2 on the inhibition of JBU by disulfiram was shown on the right panel. The 233 results are shown as percentages of the respective control (DMSO, 100%). Mean ± SD 234 (n=3). All experiments were independently repeated twice, and one representative result is 235 presented. 236 237 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S18 238 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S19 Figure S4. The mode of action of EBS and captan in vitro. (A) Effects of dithiothreitol 239 on the inhibition of JBU caused by EBS and captan. The assay was incubated with 4 M 240 EBS or 10 M captan in the presence or the absence of 5 mM DTT. (B) Effects of 241 cysteine and histidine on the inhibition of JBU by EBS and captan. The samples were 242 incubated with the indicated concentrations of EBS or captan in the presence or absence 243 of 100 M Cys or 100 M His. (C) The effect of NiCl2 on the inhibition of EBS by JBU. 244 NiCl2 at a concentration of 12.5, 25, 50 or 100 M was incubated with the various 245 concentrations of EBS under standard assay conditions. (D) The IC50 values of EBS and 246 captan toward JBU were linearly correlated with the concentrations of JBU. EBS and 247 captan were incubated with various concentrations of JBU, and the IC50 values were 248 determined accordingly. (E) The inhibition constants of KI or kinact for irreversible 249 inhibitors were determined according to the methods described in ref. (5). Means ± SDs 250 (n=3). All experiments were independently repeated at least twice, and one representative 251 result is presented. 252 253 254 255 256 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S20 257 258 Figure S5. The identification of HPU from extracts of H. pylori by LC-MS/MS. 259 Fraction 3 collected by size-exclusion chromatography (Figure 4B) was digested with 260 trypsin, GluCand subtilisin, separated from the C18 reverse-phase column and subjected 261 to analysis with a Thermo Q Exactive Orbitrap (Thermo Fisher Scientific). The peptides 262 in red were identified by LC-MS/MS as subunit A or B of H. pylori. The overall coverage 263 of UreB and UreA identified in the analysis of LC-MS/MS was 80.1% and 76.9%, 264 respectively. 265 266 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S21 267 Figure S6. EBS and 1 is a long-acting inhibitor for HPU in culture. (A) Disulfiram 268 dose-dependently and selectively inhibits the growth of H. pylori. Various concentrations 269 of disulfiram were incubated at 37 °C with H. pylori. (B) The inhibitory effects of EBS 270 and 1 on the activity of HPU in cellulo. EBS, 1 or AHA at a concentration of 100 M 271 were incubated with H. pylori bacteria for 24 h. Additionally, one batch of the treated 272 bacteria was washed, diluted into freshly prepared medium without the addition of the 273 inhibitors, and cultured for an additional 6 h. The in cellulo urease activities from the 274 cultured cells under the two treated-conditions were determined accordingly 275 (METHODS). The results are shown as percentages of the control (DMSO, 100%). Mean 276 ± SD (n=3). All experiments were independently repeated at least twice, and one 277 representative result is presented. 278 279 280 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S22 281 Figure S7. The effects of inhibitors on the cell viability of gastric SGC-7901 cells and 282 antibiotic resistance of the H. pylori strain. (A) The H. pylori strain is resistant to 283 treatment with tinidazole or metronidazole. Various concentrations of tinidazole or 284 metronidazole were incubated at 37 °C with H. pylori for 72 h under standard culture 285 conditions, and the OD at 600 nm was recorded using a spectrophotometer to determine 286 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S23 the cell growth of H. pylori (METHODS). (B) The effects of urease inhibitors on the 287 viability of mammalian cells. SGC-7901 cells were incubated with DMSO, the indicated 288 concentrations of panobinosta, dacinostat, EBS or disulfiram for 24 h in a 96-well plate 289 before measurement of cell viability using the CellTiter96® Aqueous One Solution Cell 290 Proliferation Assay (Promega, Madison, WI). The results are shown as percentages of the 291 control (DMSO, 100%). Means ± SDs (n=3). All experiments were independently 292 repeated at least twice, and one representative result is presented. 293 294 295 296 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S24 297 Figure S8. The binding modes of inhibitors in ureases. (A) The putative binding mode 298 of panobinostat (black) or dacinostat (black) in the HPU active site. Panobinostat and 299 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S25 dacinostat were docked into the HPU crystal structure (PDB code 1E9Y; ref. (6)) using 300 the Discovery Studio software. Residues surrounding the inhibitor within a distance of 301 3.5 Å are shown in gray or in the default atom color. (B) Global view of the binding 302 region of EBS (upper panel) and captan (lower) in JBU. In the modeled EBS or captan 303 and protein complex structure (METHODS and Figure 3E), the protein is shown in black, 304 the key residues (His492 and His519) in the active site of JBU in cyan and the inhibitors 305 as well as its attached Cys residue (Cys313 for EBS, Cys406 for captan; Figure 3E) in red. 306 Hydrogen bonds are represented as green dotted lines. 307 308 309 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S26 Table S1. Chemical structures and IC50 values of EBS or captan analogs for ureases. 310 311 312 Name Structure IC50 (M); HPU IC50 (M); JBU IC50 (M); OAU 1 Se N O O 2.0 ± 0.9 0.3 ± 0.007 7.5 ± 0.6 2 S N O > 10.0 1.0 ± 0.002 4.9 ± 1.1 3 N O O > 10.0 > 10.0 > 10.0 4 Se NH O 1.1 ± 0.08 0.8 ± 0.008 2.2 ± 0.1 5 Se N O > 10.0 1.4 ± 0.03 5.3 ± 0.9 6 Se N O NH2 S 1.3 ± 0.4 0.3 ± 0.04 1.7 ± 0.1 Ebselen Oxide Se N O O 1.5 ± 0.2 0.4 ± 0.005 3.3 ± 0.1 Dibenzyl diselenide Se Se > 10.0 > 10.0 > 10.0 Captafol N O O S Cl Cl Cl Cl > 10.0 8.8 ± 0.2 9.1 ± 1.1 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S27 Table S2. The minimal inhibitory concentration of urease inhibitors or known 313 antibiotics for inhibiting H. pylori and their IC50 values in the in cellulo urease 314 assay. 315 316 Compound H. pylori (MIC) H. pylori (IC50 values in the in cellulo urease assay; M) g/ml M EBS 4 12.5 5.7 ± 1.3 1 2 6.25 4.7 ± 1.1 4 2 12.5 18.5 ± 1.2 6 4 12.5 21.8 ± 1.0 EBS Oxide 4 12.5 23.2 ± 1.1 Captan 32 100 29.5 ± 1.2 Disulfiram 4 12.5 36.3 ± 1.0 Dibenzyl diselenide > 64 > 200 > 200.0 AHA > 16 > 100 - Tinidazole > 512 > 2000 - Metronidazole > 512 > 3000 - MIC: minimal inhibitory concentration 317 318 319 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S28 Table S3. Chemical structures and IC50 values of hydroxamic acid-based analogs for 320 ureases. 321 322 323 324 Name Structure IC50 (M); HPU IC50 (M); JBU Abexinostat O O N H O HN O OH N 1.3 ± 0.2 1.4 ± 0.3 Belinostat O HN S O O NH HO 3.2 ± 0.2 4.7 ± 0.5 Vorinostat O NH O HN HO 14.0 ± 3.9 4.1 ± 1.9 Ricolinostat N NO NH O HN OH N > 20.0 > 20.0 Ilomastat O NH O HN O HN N H OH > 20.0 > 20.0 Pracinostat O HN OH N N N > 10.0 > 20.0 Hydroxylamine H2N OH > 20.0 > 20.0 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S29 Table S4. Primer sequences. 325 No. Primer Usage 1 5'- AGAGTTTGATCCTGGCTCAG-3' 5' primer for 16S rRNA 2 5'- AAGGAGGTGATCCAGCCGCA-3' 3' primer for 16S rRNA 3 5'- ATTAATCATTAGATGTATGGCCCTACTACAGGCG-3' 5' primer for UreB 4 5'- AATATACTCGAGCTAGAAAATGCTAAAGAGTTG-3' 3' primer for UreB 326 327 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ S30 Reference 328 1. Pacula, A. J., Obieziurska, M., Scianowski, J., Kaczor, K. B., and Antosiewicz, J. (2018) 329 Water-dependent synthesis of biologically active diaryl diselenides. Arkivoc, 153-164 330 2. Ngo, H. X., Shrestha, S. K., Green, K. D., and Garneau-Tsodikova, S. (2016) Development of 331 ebsulfur analogues as potent antibacterials against methicillin-resistant Staphylococcus aureus. 332 Bioorgan Med Chem 24, 6298-6306 333 3. Lucchetti, N., Scalone, M., Fantasia, S., and Muniz, K. (2016) Sterically Congested 334 2,6-Disubstituted Anilines from Direct C-N Bond Formation at an Iodine(III) Center. Angew 335 Chem Int Edit 55, 13335-13339 336 4. Irwin, J. J., and Shoichet, B. K. (2016) Docking Screens for Novel Ligands Conferring New 337 Biology. Journal of Medicinal Chemistry 59, 4103-4120 338 5. Krippendorff, B. F., Neuhaus, R., Lienau, P., Reichel, A., and Huisinga, W. (2009) 339 Mechanism-based inhibition: deriving K(I) and k(inact) directly from time-dependent IC(50) 340 values. Journal of Biomolecular Screening 14, 913-923 341 6. Ha, N. C., Oh, S. T., Sung, J. Y., Cha, K. A., Lee, M. H., and Oh, B. H. (2001) 342 Supramolecular assembly and acid resistance of Helicobacter pylori urease. Nature Structural 343 Biology 8, 505-509 344 345 346 .CC-BY-NC-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425432doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425432 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_05_425440 ---- Thermal proteome profiling reveals distinct target selectivity for differentially oxidized oxysterols Thermal proteome profiling reveals distinct target selectivity for differentially oxidized oxysterols Cecilia Rossetti,1 Luca Laraia1# 1Department of Chemistry, Technical University of Denmark, Kemitorvet 207, 2800, Kgs. Lyngby, Denmark. #Correspondence to luclar@kemi.dtu.dk Abstract Oxysterols are produced physiologically by many species, however their distinct roles in regulating human (patho)physiology have not been studied systematically. The role of differing oxidation states and sites in mediating their biological functions is also unclear. As individual oxysterols have been associated with atherosclerosis, neurodegeneration and cancer, a better understanding of their protein targets would be highly valuable. To address this, we profiled three A- and B-ring oxidized sterols as well as 25-hydroxycholesterol using thermal proteome profiling (TPP), validating selected targets with the cellular thermal shift assay (CETSA) and isothermal dose response fingerprinting (ITDRF). This revealed that the site of oxidation has a profound impact on target selectivity, with each oxysterol possessing an almost unique set of target proteins. However, overall targets clustered in pathways relating to vesicular transport and lipid metabolism and trafficking, suggesting that while individual oxysterols bind to a unique set of proteins, the processes they modulate are highly interconnected. Introduction Dysregulation of cholesterol homeostasis is a severe condition leading to inadequate or excessive tissue cholesterol levels. Hypercholesterolemia has been identified as a common risk factor of diverse disorders, including breast, colorectal, prostatic and testicular cancer[1] together with coronary, artery and Alzheimer's diseases.[2],[3] Oxidative metabolites of cholesterol, termed oxysterols, contribute to the regulation of cholesterol homeostasis with different transcriptional and non-genomic mechanisms, which are still incompletely understood.[4],[5],[6] Additionally, recent research suggests that they may play distinct roles not directly connected to the regulation of cholesterol homeostasis, including mediating membrane contact sites and trafficking. Evidence has also associated increased oxysterol levels to cancer progression, the mechanisms of which remain to be elucidated.[7] Of the over twenty oxysterols identified, side-chain oxidized sterols and particularly 25- hydroxycholesterol (25-HC) have been the most widely studied. They have been shown to modulate the activity of cholesterol transport proteins and transcription factors involved in regulating cholesterol homeostasis. However, A- and B-ring oxidized sterols have been less well studied, in particular in relation to their target profile. Those oxidized at the C7 position, such as 7-ketocholesterol (7-KC), are most frequently detected at high levels in atherosclerotic plaques[8] and in the plasma of patients with high cardiovascular risk factors.[9] Furthermore, 7-KC displays toxicity at higher concentrations, accompanied by a pronounced effect on lysosomal activity.[10] The precise mechanisms by which this occurs are still unknown. For oxysterols oxidized at 4-, 5- and 6-positions virtually no targets have been annotated, with the partial exception of the liver X receptor (LXR). Crucially, the effect on (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint mailto:luclar@kemi.dtu.dk https://doi.org/10.1101/2021.01.05.425440 biological activity of different oxidative modifications on the sterol backbone has not been explored. For all of the reasons above, the systematic discovery of oxysterol target proteins will be of profound importance in determining their (patho)physiological roles . Herein we describe the systematic identification of oxysterol target proteins using thermal proteome profiling (TPP). The oxidation site and state significantly affected the target profile for each oxysterol tested, with only two proteins identified as targets for more than one oxysterol. Of these, the vacuolar protein sorting associated protein 51 (VPS51) was validated more comprehensively as a protein that binds oxysterols. Though different, most oxysterol targets clustered in pathways and processes related to vesicular transport as well as lipid metabolism and transfer, and most targets were localized at intracellular membranes. These results suggest specific but different roles for individual oxysterols and provide a blueprint for further studies on these important metabolites. Results and Discussion Identification of oxysterol target proteins using thermal proteome profiling To identify potential oxysterol target proteins, TPP was selected as the method of choice [11][12] (Figure 1A). This method is advantageous over other target identification methods as it does not require pre-functionalization, immobilization or modification of the compound of interest and has been shown to offer excellent proteome coverage. Furthermore, the use of selected detergents including NP-40, has successfully enabled the identification of a large proportion of membrane proteins, which is particularly relevant as this is where a large proportion of known sterol targets are located.[13] We opted to carry out experiments in cell lysates for the primary screening efforts, for increased reproducibility[14],[15] and data interpretation simplicity.[16] The use of cell lysates enables the evaluation of direct oxysterol target engagement without additional sources of variability deriving from factors such as membrane transport, accumulation and cell metabolism, which are prominent in experiments with intact cells. We selected 4β-hydroxycholesterol (4β-HC), cholestane-3β,5α,6β-triol (CT) and 7-ketocholesterol (7-KC) as representative A/B-ring oxidized sterols which cover all four known oxidation sites and arise through enzymatic, but also spontaneous oxidation (Figure 1B). The oxysterols were also selected with the aim of elucidating how the different oxidation pattern on the sterol core determines the selectivity of these important metabolites. Furthermore, we also included 25-hydroxycholesterol (25-HC) as an oxysterol that has been more widely studied and applied, but whose complete target profile had also not been elucidated. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 Figure 1. Target identification of oxysterols using thermal proteome profiling. A) Workflow of the thermal proteome profiling experiments and criteria for the identification of putative oxysterol targets; B) Structures of the tested oxysterols: 4ß-hydroxycholesterol (4ß-HC), cholestane-3β,5α,6β-triol (CT), 25-hydroxycholesterol (25-HC) and 7- ketocholesterol (7-KC); C) Summary table of the identified proteins from the HeLa proteome analysis and setting of the threshold limits for the identification of putative hits. TPP enabled the identification and monitoring of changes in thermal stability of up to 7000 proteins, upon the incubation of HeLa cell lysates with the different oxysterols (Figure 1C). From these, it was possible to calculate thermal shifts for about 85% of the identified proteins. To define which shifts in melting temperatures were significant, two standard deviations from the median of all the calculated shifts was deemed appropriate, in line with previous reports.[17] In the screening process, proteins with a significant change in melting temperature following oxysterol exposure were filtered according their melting curves normalized to the lowest temperature. Proteins displaying a shift in the same direction (positive or negative ΔTm) in all three replicates and with a curve plateau corresponding to a fraction of soluble protein less or equal to 0.5 were selected as potential targets (Figure 1A and 1C). The entire screening set produced a list of 77 hits considered as putative targets for at least one of the tested oxysterols (Figure 2A). Overall, the re-identification of 10 known cholesterol binding proteins as determined by affinity-based probes[18] (Supporting Information (SI) Table S1), both validates the use of TPP for identifying novel oxysterol target proteins, but also highlights the wealth of previously unidentified sterol interactors. Interestingly, the overlap of the candidate targets between the different oxysterols was remarkably low. Only 7-KC and 4β-HC shared two putative interacting proteins (Figure 2B). While this result may appear unexpected, it is in fact (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 consistent with previously reported studies describing the binding of oxysterols to the cholesterol transport proteins Niemann Pick Class 1 (NPC1) and Aster-A (also known as GRAMD1A). Both were shown to bind 25-HC, but displayed no binding of A- or B-ring oxidized sterols.[19][20][21] These examples and our data suggest that these particular oxysterols do not exert their function by modulating traditional (oxy)sterol-associated proteins. Crucially, our data suggests that different oxysterols show distinct target profiles that are dependent on the position and level of oxidation. Despite marked differences in their individual target profiles, some general trends could be observed clearly. The functional enrichment analysis of the identified candidate targets (Figure 2A) showed an enrichment of the intracellular membrane compartments (highlighted in red and listed in SI Table S2). Of these, several proteins associated with clathrin coated vesicle (CCV) transport were identified (Figure 2C). CCV transport is known to require cholesterol,[22] however except for OSBP the mediators of this effect were unknown, and oxysterols were not suggested to modulate this process. Trans-Golgi network (TGN) membrane associated proteins were also significantly targeted. As CCVs are known to also form at the TGN, this could suggest an overall link between oxysterols and CCV transport. Unexpectedly, a large proportion of the RNA polymerase III transcription complex was identified as putative oxysterol targets (Figure 2C). In particular, constituents of the super elongation complex (SEC) were highly enriched, with 4β-HC targeting several of the components (vide infra). Perhaps unsurprisingly, among the Reactome pathways enriched in the STRING analysis, the metabolism of lipids was identified (Figure 2A, highlighted in blue, and SI Table S3). This was due to the presence of known sterol biosynthetic and metabolic proteins but also by a large number of lipid kinases of different classes. In particular phosphatidyl inositol kinases (PIKs), which were targeted by multiple oxysterols, contributed to the enrichment of Phosphatidylinositol metabolism in both Reactome and KEGG pathways (SI table S3 and Figure 2C, respectively) and contributed to the enrichment of the Phosphatidylinositol (phosphate) kinase activity as the most significant molecular function in the GO analysis (Figure 2E). Proteins that regulated the mechanistic target of rapamycin complex 1 (mTORC1) either directly or indirectly were also abundant, confirming its essential role in regulating lipid metabolism. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 Figure 2. Analysis of all the putative targets for the tested oxysterols 4ß-HC, 25-HC, CT and 7-KC. A) STRING functional analysis with proteins from intracellular membrane-bounded organelle highlighted in red (GO:0043231; FDR: 5.20e-06), and proteins involved in the metabolism of lipids in blue (HSA-556833; FDR: 0.027); B) Venn diagram of the putative targets for each of the oxysterol and overlap among common targets; C) GO Cellular Components enriched from the analysis of all the putative targets; D) KEGG Pathway analysis and target contribution for each pathway. Pathways are colored according their significance from orange to white to indicate p-values from 0.002 to 0.1; E) GO Molecular Functions enriched from the analysis of all the putative targets. Proteome-wide profiling of 7-KC We focused our initial analysis of specific oxysterol target proteins with 7-KC, as it is the most prominent and toxic of the non-enzymatically produced oxysterols. Several known and novel 7- KC targets with significant ΔTm (Figure 3A and 3B) were identified. For example, squalene monooxygenase (SQLE) is a key cholesterol biosynthetic enzyme.[5] 7-KC has been previously A D B C E (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 shown to lead to SQLE degradation, analogously to cholesterol and other sterols oxidized at the 7-position. The identification of a known 7-KC target protein further confirms that TPP is a suitable approach for oxysterol target protein identification. Importantly, destabilized proteins were considered as putative targets (SI Table S4). Among them, BRISC and BRCA1-A complex member 2 (BRE), is known to be involved in the defective synthesis of steroid hormones and accumulation large quantities of cholesterol under stress or under the influence of steroid hormones.[23],[24] Among other stabilized proteins, several are involved in PI metabolism, including PIP5K1A, which is the main source of cellular PI4,5P2. Nuclear receptor-binding factor 2 (NRBF2) is known to modulate PI3K-III activity by stabilization of the VPS34 complex I, a key autophagy- related kinase.[25] Figure 3. TPP analysis of 7-KC. A) Melting temperature shifts of the entire HeLa proteome. Significant shifts lies outside the 2 standard deviation interval marked with dotted lines. B) STRING functional analysis of the putative targets selected from the TPP screening assay. C) Melting curves of Squalene monooxygenase (SQLE), E3 ubiquitin-protein ligase RNF167 (RNF 167), Vacuolar protein sorting-associated protein 51 homolog (VPS51) and the Ragulator protein complex protein LAMTOR4. Data is mean ± sem of three independent experiments. The two targets most stabilized by 7-KC were associated with lysosomal functions (SI Table S4). E3 ubiquitin-protein ligase RNF167 and Ragulator complex protein LAMTOR4 are both localized in lysosomes, where they perform ubiquitin protein ligase activity and regulation of TOR signaling activity, respectively (see Figure 3C for associated melting curves). RNF167 has been found to (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 regulate AMPA receptor-mediated synaptic transmission[26] while the LAMTOR complex regulates mTOR signaling and thus cellular lipid metabolism more generally. The modulation of these targets may begin to explain the phenotypic effects elicited by 7-KC[10]. Accumulation of 7-KC in the lysosomes is thought to alter pH maintenance, reducing their ability to hydrolyze and process cellular debris, as it has already described for lysosomal accumulation of cholesterol[27]. The presence of VPS51 among the most stabilized proteins was intriguing, as it was one of only two (the other being AAR2 splicing factor homolog) proteins identified as putative new targets for more than one oxysterol (7-KC and 4β-HC). VPS51 is part of the Golgi-associated retrograde protein (GARP) complex, which is known to regulate cholesterol transport between early and late endosomes and the trans-Golgi network (TGN) via lysosomal NPC2[28]. However, a direct interaction of VPS51 with cholesterol, or indeed any sterol, had not been reported. Thus we selected VPS51 for further validation. The TPP results were initially validated by means of a cellular thermal shift assay (CETSA), with Western blot read-out. For both 7-KC and 4β-HC, we were able to reproduce the stabilization observed in the TPP experiment (Figure 4A and 4B, respectively), although the thermal shift was less pronounced for 4β-OHC. To address this discrepancy, we carried out an isothermal dose-response fingerprinting (ITDRF) experiment, which showed that 4β-HC stabilized VPS51 in a dose-dependent manner at 51 °C, confirming their putative interaction (Figure 4C). Figure 4. Target validation of VPS51. A) CETSA experiments for the validation of VPS51 with 7-KC; B) CETSA experiments for the validation of VPS51 with 4ß-HC; C) ITDRF experiment for the validation of VPS51 in 4ß-HC with related dose-response curve. Both reported isoforms of VPS51 are visible. Data is the mean of two independent experiments, representative blots are shown. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 Putative targets of 4β-hydroxy cholesterol In addition to VPS51, 4β-HC appeared to target cholesterol transport in other ways (SI Table S5 and Figure S2). Targets were clearly enriched vacuolar transport processes, including VPS51, VPS4A and PIK3R4 (VPS15) (Figure S2, green). VPS15 is a component of the class III PI3K complex, which is a key component of autophagy initiation, strengthening the link observed with other oxysterols. VPS4A has been extensively associated with cholesterol transport, in a function not directly governed by its role in disassembling the endosomal sorting complex required for transport (ESCRT-III) polymer.[29][30] The stabilization of translation initiation factors EIF3A and EIF2B, may be connected to the more general targeting of other mTOR regulators including LAMTOR3 and 4 by oxysterols, since it has been shown that the mTOR complex mediates assembly of the translation preinitiation complex (PIC) modulating the function of EIF3 in the translation of mRNAs encoding proteins.[31] Very recently 4β-HC has been shown to act as a pro-lipogenic factor by enhancing Sterol Regulatory Element Binding Protein 1c (SREBP1c) expression in an LXR-dependent manner.[32] In this context, we found that 4β-HC (de)stabilized a series of transcriptional regulators, including the general transcription factor 3C (GTF3C) and two components of the super elongation complex, cyclin-dependent kinase 9 (CDK9) and AF4/FMR2 family member 4 (AFF4). This raises the possibility that transcriptional elongation of SREBP1c may require 4β-HC’s ability to interact with the SEC. Putative targets of 25-hydroxy cholesterol Putative targets of 25-HC were strongly enriched in PI metabolism (SI Figure S3). Stabilization of the Phosphatidylinositol 4-phosphate 3-kinase C2 domain-containing subunit alpha (PIK3C2A) and destabilization of related subunit beta (PIK3C2B), allowed the identification of two of the three isoforms of the class II PI3Ks. These known to play key roles in clathrin-mediated endocytosis.[33] The ability of 25-HC to modulate PIK3C2A was tested in an enzymatic kinase profiling assay; however no change in kinase activity was observed (SI Table S8). This does not necessarily de- validate the target, as binding in an allosteric pocket may modulate protein-protein interactions rather than enzymatic activity. Similarly, all other putative kinase targets of oxysterols (CDK9, PHKG2 and PIP5K1A), were tested with the related assays without showing significant increase or decrease in enzymatic activity (SI Table S9-S11). Unsurprisingly, 25-HC also stabilized regulators of cholesterol biosynthesis and metabolism, including 7-dehydrocholesterol reductase (DHCR7). DHCR7 catalyzes the last step in the biosynthesis of cholesterol and when mutated has been associated with the developmental disease Smith-Lemli-Opitz syndrome.[34] Binding to oxysterols such as 7-KC is known to induce its proteasomal degradation; however, interestingly this effect was not reported for 25-HC.[35] The stabilization of Host cell factor 1 (HCFC1) by 25-HC could link the regulation of an intragenic region in the HCFC1 gene by Sterol regulatory element-binding protein 1 (SREBP1),[36] which is itself regulated by cholesterol and 25-HC.[37] Finally, 25-HC targeted the Ragulator complex protein LAMTOR3, and the Transmembrane 9 superfamily member 1 (ENSG00000254692), a protein involved in authophagy,[38] which is known to be regulated by cholesterol metabolism.[39] (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 Putative targets of cholestane triol CT targets were significantly enriched in autophagic and golgi-associated proteins (SI Figure S4). The most stabilized protein was Paxillin (PXN), an autophagy substrate that interacts with LC3 during focal adhesion (FAs) disassembly in highly metastatic tumor cells.[40] FA turnover is reportedly influenced by ORP3-mediated lipid exchange,[41] which may explain the association of oxysterols with PXN. A significantly destabilized target was the microtubule-associated protein 1S (MAP1S), whose deficiency causes impaired autophagic degradation of lipid droplets, which then accumulate in normal renal epithelial cells, initiating the development of renal cell carcinomas.[42] Golgi phosphoprotein 3-like (GOLPH3L) was the most destabilized putative target. Interestingly, this protein is also associated to the AKT/mTOR pathway, since it contributes to the tumorigenesis of Hepatocellular carcinoma increasing cell proliferation by the activation of mTOR signaling via overexpression of mTORC1.[43] Adaptin ear-binding coat-associated protein 2 (NECAP2) promotes fast endocytic recycling of epidermal growth factor receptor (EGFR) and of the tumor necrosis factor receptor (TfnR) through the recruitment of AP-1–clathrin machinery to early endosomes. In order to facilitate the receptor recycling, early endosomes receive endocytosed material from clathrin-dependent and -independent pathways and sort cargo for recycling to the cell surface, retrograde transport to the Golgi or degradation in lysosomes.[44] NECAP2 sits at a node in the overall oxysterol target interaction map, and would thus be an intriguing target for further study. Conclusion In summary, we have carried out the first systematic exploration of oxysterol target proteins using thermal protein profiling as the enabling technology. TPP proved convenient for screening small compounds sets such as the four oxysterols we selected, as it does not require compound modification or functionalization. Furthermore, previously identified sterol-binding proteins were re-identified here, validating the approach.[18] Strikingly, our results demonstrate that oxysterols which differ from cholesterol by the addition of just one or two oxygen atoms, display distinct target profiles, with only two proteins identified as targets of more than one oxysterol. To the best of our knowledge this has never been conclusively shown or systematically studied. Although virtually no overlap between the oxysterol targets was present, targets were enriched in lipid metabolism, mTOR signaling, vesicle trafficking and transcriptional regulators. The intracellular membrane localization of most target proteins is also consistent with the lipophilic nature of the compounds, and their reported membrane association. Of the two proteins which share two oxysterols as putative targets, VPS51 was further validated using CETSA and ITDRF experiments. Although its role in mediating cholesterol transport by targeting NPC2 to the lysosomes as part of the GARP complex is known, our data raises the intriguing possibility that this event is regulated by (oxy)sterols themselves. The specific target profiles of the individual oxysterols studied may also begin to explain the phenotypes they induce. In particular 7-KC has previously been shown to affect lysosomal integrity and activity. The fact that several of the putative targets identified are lysosomal membrane proteins may begin to offer an explanation for this observed effect. Importantly, future work to determine whether target (de)stabilization by oxysterols occurs through direct binding or is mediated by a complex will be necessary. To conclude, TPP is a robust technology to identify new oxysterol target proteins, and the data provided herein provides an extensive resource as well as a wealth of testable hypotheses linking (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 oxysterols to lipid metabolism and transport, vesicle trafficking and transcription. Despite the exciting developments achievable with this technique, it is important to note that like all target identification methods, TPP also has its caveats. False negatives are more common with this technique as target proteins may not be (de)stabilized by small molecules they interact with, or that very high compound concentrations are required to see observe a meaningful effect. This was particularly apparent for known 25-HC protein targets including OSPB, NPC1 and certain STARDs which were not identified as putative targets although they are present in our MS data and more generally in meltome analyses. [45] OSBP, STARD2, NPC1 and NPC2 proteins were identified in the HeLa cell proteome, but their thermal shift was not considered significant according the chosen criteria or was not determined in all three replicates. In this regard, the arbitrary exclusion of proteins with shifts lower than two standard deviations from the median might particularly affect the recognition of protein targets belonging to compounds whose meltome more generally altered from the DMSO control. While the use of NP-40 facilitates recovery of membrane proteins, it has recently been shown that different detergent types and concentrations can affect which proteins are recovered in the final analysis, introducing a slight bias.[13] Despite this, we believe that TPP and its variants including ITDRF will be applied increasingly for (off)- target identification and validation. Acknowledgements We would like to thank Assoc. Prof. Erwin Schoof from DTU Proteomics Core for excellent advice and support and Prof. Ulrich auf dem Keller for access to cell culture and reagents at DTU Bioengineering. We would also like to thank Dr. Petra Janning and Malte Metz for invaluable advice regarding the data analysis. We would also like to acknowledge the Novo Nordisk Foundation (NNF17OC0028366) and DTU for funding. References [1] X. Ding, W. Zhang, S. Li, H. Yang, Am. J. Cancer Res. 2019, 9, 219–227. [2] S. MacMahon, S. Duffy, A. Rodgers, S. Tominaga, L. Chambless, G. De Backer, D. De Bacquer, M. Kornitzer, P. Whincup, S. G. Wannamethee, R. Morris, N. Wald, J. Morris, M. Law, M. Knuiman, H. Bartholomew, G. Davey Smith, P. Sweetnam, P. Elwood, J. Yarnell, R. Kronmal, D. Kromhout, S. Sutherland, J. Keil, G. Jensen, P. Schnohr, C. Hames, A. Tyroler, A. Aromaa, P. Knekt, A. Reunanen, J. Tuomilehto, P. Jousilahti, E. Vartiainen, P. Puska, T. Kuznetsova, T. Richart, J. Staessen, L. Thijs, T. Jorgensen, T. Thomsen, D. Sharp, J. D. Curb, N. Qizilbash, H. Iso, S. Sato, A. Kitamura, Y. Naito, A. Benetos, L. Guize, U. Goldbourt, M. Tomita, Y. Nishimoto, T. Murayama, M. Criqui, C. Davis, C. Hart, D. Hole, C. Gillis, D. Jacobs, H. Blackburn, R. Luepker, J. Neaton, L. Eberly, C. Cox, D. Levy, R. D’Agostino, H. Silbershatz, A. Tverdal, R. Selmer, T. Meade, K. Garrow, J. Cooper, F. Speizer, M. Stampfer, A. Menotti, A. Spagnolo, I. Tsuji, Y. Imai, T. Ohkubo, S. Hisamichi, L. Haheim, I. Holme, I. Hjermann, P. Leren, P. Ducimetiere, J. Empana, K. Jamrozik, R. Broadhurst, G. Assmann, H. Schulte, C. Bengtsson, C. Björkelund, L. Lissner, P. Sorlie, M. Garcia- Palmieri, E. Barrett-Connor, R. Langer, K. Nakachi, K. Imai, X. Fang, S. Li, R. Buzina, A. Nissinen, C. Aravanis, A. Dontas, A. Kafatos, H. Adachi, H. Toshima, T. Imaizumi, S. Nedeljkovic, M. Ostojic, Z. Chen, H. Tunstall-Pedoe, T. Nakayama, N. Yoshiike, T. Yokoyama, C. Date, H. Tanaka, J. Keller, K. Bonaa, E. Arnesen, E. Rimm, M. Gaziano, J. E. Buring, C. Hennekens, S. Törnberg, J. Carstensen, M. Shipley, D. Leon, M. Marmot, J. Armitage, C. Baigent, R. Clarke, R. Collins, J. Emberson, J. Halsey, M. Landray, S. Lewington, A. Palmer, S. Parish, R. Peto, P. Sherliker, G. Whitlock, Lancet 2007, 370, 1829–1839. [3] J. E. Vance, Dis. Model. Mech. 2012, 5, 746–755. [4] V. Mutemberezi, O. Guillemot-Legris, G. G. Muccioli, Prog. Lipid Res. 2016, 64, 152–169. [5] S. Gill, J. Stevenson, I. Kristiana, A. J. Brown, Cell Metab. 2011, 13, 260–273. [6] A. A. Bielska, P. Schlesinger, D. F. Covey, D. S. Ory, Trends Endocrinol. Metab. 2012, 23, 99–106. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 [7] A. Kloudova, F. P. Guengerich, P. Soucek, Trends Endocrinol. Metab. 2017, 28, 485–496. [8] A. J. Brown, W. Jessup, Atherosclerosis 1999, 142, 1–28. [9] Q. Zhou, E. Wasowicz, B. Handler, L. Fleischer, F. A. Kummerow, Atherosclerosis 2000, 149, 191– 197. [10] A. Anderson, A. Campo, E. Fulton, A. Corwin, W. G. Jerome 3rd, M. S. O’Connor, Redox Biol. 2020, 29, 101380. [11] L. Dai, N. Prabhu, L. Y. Yu, S. Bacanu, A. D. Ramos, P. Nordlund, Annu. Rev. Biochem. 2019, 88, 383–408. [12] T. Friman, Bioorg. Med. Chem. 2020, 28, 115174. [13] A. Kawatkar, M. Schefter, N.-O. Hermansson, A. Snijder, N. Dekker, D. G. Brown, T. Lundbäck, A. X. Zhang, M. P. Castaldi, ACS Chem. Biol. 2019, 14, 1913–1920. [14] L. Dai, T. Zhao, X. Bisteau, W. Sun, N. Prabhu, Y. T. Lim, R. M. Sobota, P. Kaldis, P. Nordlund, Cell 2018, 173, 1481-1494.e13. [15] I. Becher, A. Andrés-Pons, N. Romanov, F. Stein, M. Schramm, F. Baudin, D. Helm, N. Kurzawa, A. Mateus, M.-T. Mackmull, A. Typas, C. W. Müller, P. Bork, M. Beck, M. M. Savitski, Cell 2018, 173, 1495-1507.e18. [16] K. A. Ball, K. J. Webb, S. J. Coleman, K. A. Cozzolino, J. Jacobsen, K. R. Jones, M. H. B. Stowell, W. M. Old, Commun. Biol. 2020, 3, 75. [17] S. A. Peck Justice, M. P. Barron, G. D. Qi, H. R. S. Wijeratne, J. F. Victorino, E. R. Simpson, J. Z. Vilseck, A. B. Wijeratne, A. L. Mosley, J. Biol. Chem. 2020, jbc.RA120.014576. [18] J. J. Hulce, A. B. Cognetta, M. J. Niphakis, S. E. Tully, B. F. Cravatt, Nat. Methods 2013, 10, 259– 264. [19] R. E. Infante, L. Abi-Mosleh, A. Radhakrishnan, J. D. Dale, M. S. Brown, J. L. Goldstein, J. Biol. Chem. 2008, 283, 1052–1063. [20] R. E. Infante, A. Radhakrishnan, L. Abi-Mosleh, L. N. Kinch, M. L. Wang, N. V Grishin, J. L. Goldstein, M. S. Brown, J. Biol. Chem. 2008, 283, 1064—1075. [21] L. Laraia, A. Friese, D. P. Corkery, G. Konstantinidis, N. Erwin, W. Hofer, H. Karatas, L. Klewer, A. Brockmeyer, M. Metz, B. Schölermann, M. Dwivedi, L. Li, P. Rios-Munoz, M. Köhn, R. Winter, I. R. Vetter, S. Ziegler, P. Janning, Y.-W. Wu, H. Waldmann, Nat. Chem. Biol. 2019, 15, 710–720. [22] S. K. Rodal, G. Skretting, Ø. Garred, F. Vilhardt, B. van Deurs, K. Sandvig, Mol. Biol. Cell 1999, 10, 961–974. [23] J. Miao, N. S. Panesar, K.-T. Chan, F. M. M. Lai, N. Xia, Y. Wang, P. J. Johnson, J. Y. H. Chan, J. Histochem. Cytochem. 2001, 49, 491–499. [24] J. Miao, K. W. Chan, G. G. Chen, S. Y. Chun, N. S. Xia, J. Y. H. Chan, N. S. Panesar, J. Endocrinol. 2005, 185, 507–517. [25] J. Lu, L. He, C. Behrends, M. Araki, K. Araki, Q. Jun Wang, J. M. Catanzaro, S. L. Friedman, W.-X. Zong, M. I. Fiel, M. Li, Z. Yue, Nat. Commun. 2014, 5, 3920. [26] M. P. Lussier, B. E. Herring, Y. Nasu-Nishimura, A. Neutzner, M. Karbowski, R. J. Youle, R. A. Nicoll, K. W. Roche, Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 19426–19431. [27] W. G. Jerome, B. E. Cox, E. E. Griffin, J. C. Ullery, Microsc. Microanal. 2008, 14, 138–149. [28] J. Wei, Y.-Y. Zhang, J. Luo, J.-Q. Wang, Y.-X. Zhou, H.-H. Miao, X.-J. Shi, Y.-X. Qu, J. Xu, B.-L. Li, B.-L. Song, Cell Rep. 2017, 19, 2823–2835. [29] X. Du, A. S. Kazim, I. W. Dawes, A. J. Brown, H. Yang, Traffic 2013, 14, 107–119. [30] N. Bishop, P. Woodman, Mol. Biol. Cell 2000, 11, 227–239. [31] R. Marchione, S. A. Leibovitch, J.-L. Lenormand, Cell. Mol. Life Sci. 2013, 70, 3603–3616. [32] O. Moldavski, P.-J. H. Zushin, C. A. Berdan, R. J. Van Eijkeren, X. Jiang, M. Qian, D. S. Ory, D. F. Covey, D. K. Nomura, A. Stahl, E. J. Weiss, R. Zoncu, bioRxiv 2020, 2020.08.20.256487. [33] Y. Posor, M. Eichhorn-Gruenig, D. Puchkov, J. Schöneberg, A. Ullrich, A. Lampe, R. Müller, S. Zarbakhsh, F. Gulluni, E. Hirsch, M. Krauss, C. Schultz, J. Schmoranzer, F. Noé, V. Haucke, Nature 2013, 499, 233–237. [34] B. U. Fitzky, M. Witsch-Baumgartner, M. Erdel, J. N. Lee, Y.-K. Paik, H. Glossmann, G. Utermann, F. F. Moebius, Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 8181–8186. [35] A. V Prabhu, W. Luu, L. J. Sharpe, A. J. Brown, J. Biol. Chem. 2016, 291, 8363–8373. [36] M. Motallebipour, S. Enroth, T. Punga, A. Ameur, C. Koch, I. Dunham, J. Komorowski, J. Ericsson, C. Wadelius, FEBS J. 2009, 276, 1878–1890. [37] C. M. Adams, J. Reitz, J. K. De Brabander, J. D. Feramisco, L. Li, M. S. Brown, J. L. Goldstein, J. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 Biol. Chem. 2004, 279, 52772–52780. [38] P. He, Z. Peng, Y. Luo, L. Wang, P. Yu, W. Deng, Y. An, T. Shi, D. Ma, Autophagy 2009, 5, 52–60. [39] E. Piscianz, L. Vecchi Brumatti, A. Tommasini, A. Marcuzzi, Neural Regen. Res. 2019, 14, 582– 587. [40] M. N. Sharifi, E. E. Mowers, L. E. Drake, C. Collier, H. Chen, M. Zamora, S. Mui, K. F. Macleod, Cell Rep. 2016, 15, 1660–1672. [41] R. S. D’souza, J. Y. Lim, A. Turgut, K. Servage, J. Zhang, K. Orth, N. G. Sosale, M. J. Lazzara, J. Allegood, J. E. Casanova, Elife 2020, 9, DOI 10.7554/eLife.54113. [42] G. Xu, Y. Jiang, Y. Xiao, X. D. Liu, F. Yue, W. Li, X. Li, Y. He, X. Jiang, H. Huang, Q. Chen, E. Jonasch, L. Liu, Oncotarget 2016, 7, 6255–6265. [43] H. Liu, X. Wang, B. Feng, L. Tang, W. Li, X. Zheng, Y. Liu, Y. Peng, G. Zheng, Q. He, BMC Cancer 2018, 18, 661. [44] J. P. Chamberland, L. T. Antonow, M. Dias Santos, B. Ritter, J. Cell Sci. 2016, 129, 2625 LP – 2637. [45] A. Jarzab, N. Kurzawa, T. Hopf, M. Moerch, J. Zecha, N. Leijten, Y. Bian, E. Musiol, M. Maschberger, G. Stoehr, I. Becher, C. Daly, P. Samaras, J. Mergner, B. Spanier, A. Angelov, T. Werner, M. Bantscheff, M. Wilhelm, M. Klingenspor, S. Lemeer, W. Liebl, H. Hahne, M. M. Savitski, B. Kuster, Nat. Methods 2020, 17, 495–503. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.05.425440doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425440 10_1101-2021_01_05_425448 ---- Distribution and diversity of dimetal-carboxylate halogenases in cyanobacteria 1 Distribution and diversity of dimetal-carboxylate halogenases in cyanobacteria 1 Nadia Eusebio1, Adriana Rego1, Nathaniel R. Glasser2, Raquel Castelo-Branco1, Emily P. Balskus2* and Pedro 2 N. Leão1* 3 1Interdisciplinary Centre of Marine and Environmental Research (CIIMAR/CIMAR), University of Porto, 4 Matosinhos, Portugal 5 2Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA 6 7 8 9 *Corresponding authors, E-mail: pleao@ciimar.up.pt, balskus@chemistry.harvard.edu 10 11 Keywords: halogenases, cyanobacteria, natural products, biocatalysis 12 13 Repositories: The draft genomes generated in this study are available in the GenBank under BioProject 14 SUB8150995. 15 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract 16 Halogenation is a recurring feature in natural products, especially those from marine organisms. The selectivity 17 with which halogenating enzymes act on their substrates renders halogenases interesting targets for biocatalyst 18 development. Recently, CylC – the first predicted dimetal-carboxylate halogenase to be characterized – was 19 shown to regio- and stereoselectively install a chlorine atom onto an unactivated carbon center during 20 cylindrocyclophane biosynthesis. Homologs of CylC are also found in other characterized cyanobacterial 21 secondary metabolite biosynthetic gene clusters. Due to its novelty in biological catalysis, selectivity and ability 22 to perform C-H activation, this halogenase class is of considerable fundamental and applied interest. However, 23 little is known regarding the diversity and distribution of these enzymes in bacteria. In this study, we used both 24 genome mining and PCR-based screening to explore the genetic diversity and distribution of CylC homologs. 25 While we found non-cyanobacterial homologs of these enzymes to be rare, we identified a large number of genes 26 encoding CylC-like enzymes in publicly available cyanobacterial genomes and in our in-house culture collection 27 of cyanobacteria. Genes encoding CylC homologs are widely distributed throughout the cyanobacterial tree of 28 life, within biosynthetic gene clusters of distinct architectures. Their genomic contexts feature a variety of 29 biosynthetic partners, including fatty-acid activation enzymes, type I or type III polyketide synthases, 30 dialkylresorcinol-generating enzymes, monooxygenases or Rieske proteins. Our study also reveals that dimetal-31 carboxylate halogenases are among the most abundant types of halogenating enzymes in the phylum 32 Cyanobacteria. This work will help to guide the search for new halogenating biocatalysts and natural product 33 scaffolds. 34 35 Data statement: All supporting data and methods have been provided within the article or through a 36 Supplementary Material file, which includes 14 supplementary figures and 4 supplementary tables. 37 38 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 Introduction 39 Nature is a rich source of new compounds that fuel innovation in the pharmaceutical and agriculture sectors [1]. 40 The remarkable diversity of natural products (NPs) results from a similarly diverse pool of biosynthetic enzymes 41 [2]. These often are highly selective and efficient, carrying out demanding reactions in aqueous media, and 42 therefore are interesting starting points for the development of industrially-relevant biocatalysts [2]. Faster and 43 more accessible DNA sequencing technologies have enabled, in the past decade, a large number of genomics 44 and metagenomics projects focused on the microbial world [3]. The resulting sequence data holds immense 45 opportunities for the discovery of new microbial enzymes and their associated NPs [4]. 46 Halogenation is a widely used and well-established reaction in synthetic and industrial chemistry [5], which 47 can have significant consequences for the bioactivity, bioavailability and metabolic activity of a compound 48 [5-7]. Halogenating biocatalysts are thus highly desirable for biotechnological purposes [6, 8]. The 49 mechanistic aspects of biological halogenation can also inspire the development of organometallic catalysts 50 [9]. Nature has evolved multiple strategies to incorporate halogen atoms into small molecules [6], as 51 illustrated by the structural diversity of thousands of currently known halogenated NPs, which include drugs 52 and agrochemicals [10, 11]. Until the early 1990’s, haloperoxidases were the only known halogenating 53 enzymes. Research on the biosynthesis of halogenated metabolites eventually revealed a more diverse range 54 of halogenases with different mechanisms. Currently, biological halogenation is known to proceed by 55 distinct electrophilic, nucleophilic or radical mechanisms [6]. Electrophilic halogenation is characteristic of 56 the flavin-dependent halogenases and the heme- and vanadium-dependent haloperoxidases, which catalyze 57 the installation of C-I, C-Br or C-Cl bonds onto electron-rich substrates. Two families of nucleophilic 58 halogenases are known, the halide methyltransferases and SAM halogenases. Both utilize S-59 adenosylmethionine (SAM) as an electrophilic co-factor or as a co-substrate and halide anions as 60 nucleophiles. Notably, these are the only halogenases capable of generating C-F bonds. Finally, radical 61 halogenation has only been described for nonheme- iron/2-oxo-glutarate (2OG)-dependent enzymes. This 62 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 type of halogenation allows the selective insertion of a halogen into a non-activated, aliphatic C-H bond. A 63 recent review by Agarwal et al (2017) thoroughly covers the topic of enzymatic halogenation. 64 Cyanobacteria are a rich source of halogenases among bacteria, in particular for nonheme iron/2OG-dependent 65 and flavin-dependent halogenases (Fig. 1). AmbO5 and WelO5 are cyanobacterial enzymes that belong to the 66 nonheme iron/2OG-dependent halogenase family [12-14]. AmbO5 is an aliphatic halogenase capable of site-67 selectively modifying ambiguine, fischerindole and hapalindole alkaloids [12, 13]. The close homolog (79% 68 sequence identity) WelO5 is capable of performing analogous halogenations in hapalindole-type alkaloids and 69 it is involved in the biosynthesis of welwintindolinone [13, 15]. BarB1 and BarB2 are also nonheme iron/2OG-70 dependent halogenases that catalyze trichlorination of a methyl group from a leucine substrate attached to the 71 peptidyl carrier protein BarA in the biosynthesis of barbamide [16-18]. Other halogenases from this enzyme 72 family include JamE, CurA, and HctB. JamE and CurA catalyse halogenations in intermediate steps of the 73 biosynthesis of jamaicamide and curacin A, respectively [19, 20], while HctB is a fatty acid halogenase 74 responsible for chlorination in hectochlorin assembly [21]. ApdC and McnD are FAD-dependent halogenases 75 responsible for the modification of cyanopeptolin-type peptides (also known as (3S)-amino-(6R)-hydroxy 76 piperidone (Ahp)-cyclodepsipeptides). These enzymes halogenate, respectively, anabaenopeptilides in 77 Anabaena and micropeptins in Microcystis strains [22-25]. AerJ is another example of a FAD-dependent 78 halogenase, which acts during aeruginosin biosynthesis in Planktothrix and Microcystis strains [24]. 79 Recent efforts to characterize the biosynthesis of structurally unusual cyanobacterial natural products have 80 uncovered a distinct class of halogenating enzymes. Using a genome mining approach, Nakamura et al. (2012) 81 discovered the cylindrocyclophane biosynthetic gene cluster (BGC) in the cyanobacterium Cylindrospermum 82 licheniforme ATCC 29412 [26]. The natural paracyclophane natural products were found to be assembled from 83 two chlorinated alkylresorcinol units [27]. The paracyclophane macrocycle is created by forming two C-C bonds 84 using a Friedel–Crafts-like alkylation reaction catalyzed by the enzyme CylK [27] (Fig. 1). Therefore, although 85 many cylindrocyclophanes are not halogenated, their biosynthesis involves a halogenated intermediate [26, 27], 86 a process termed a cryptic halogenation [28]. Nakamura et al. (2017) showed that the CylC enzyme was 87 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 responsible for regio- and stereoselectively installing a chlorine atom onto the fatty acid-derived sp3 carbon 88 center of a biosynthetic intermediate that is subsequently elaborated to the key alkylresorcinol monomer (Fig. 89 1). To date, CylC is the only characterized dimetal-carboxylate halogenase (this classification is based on both 90 biochemical evidence and similarity to other diiron-carboxylate proteins) [27]. Homologs of CylC have been 91 found in the BGCs of the columbamides [29], bartolosides [30], microginin [27], 92 puwainaphycins/minutissamides [31], and chlorosphaerolactylates [32], all of which produce halogenated 93 metabolites. CylC-type enzymes bear low sequence homology to dimetal desaturases and N-oxygenases [27], 94 functionalize C-H bonds in aliphatic moieties at either terminal or mid-chain positions, and are likely able to 95 carry out gem-dichlorination (Kleigrewe 2015, Leão 2015). The reactivity displayed by CylC and its homologs 96 is of interest for biocatalysis, in particular because this type of carbon center activation is often inaccessible to 97 organic synthesis [15, 33]. An understanding of the molecular basis for the halogenation of different positions 98 and for chain-length preference will also be of value for biocatalytic applications. Hence, accessing novel 99 variants of CylC enzymes will facilitate the functional characterization of this class of halogenases, mechanistic 100 studies, and biocatalyst development. 101 Here, we provide an in-depth analysis of the diversity, distribution and context of CylC homologs in microbial 102 genomes. Using both publicly available genomes and our in-house culture collection of cyanobacteria 103 (LEGEcc), we report that CylC enzymes are common in cyanobacterial genomes, found in numbers comparable 104 to those of flavin-dependent or nonheme iron/2OG-dependent halogenases. We additionally show that CylC 105 homologs are distributed throughout the cyanobacterial phylogeny and are, to a great extent, part of cryptic 106 BGCs with diverse architectures, underlining the potential for NP discovery associated with this new halogenase 107 class. 108 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 109 Figure 1. Selected examples of halogenation reactions catalyzed by different classes of microbial enzymes, with 110 a focus on cyanobacterial halogenases. An asterisk denotes that the enzyme has been biochemically 111 characterized. ACP – acyl carrier protein. 112 flavin-dependent halogenases Bmp5* (Marinomonas mediterranea MMB-1) b) N H2N O OH Cl N H2N O OH PrnA* (Pseudomonas fluorescens BL915) OH Br Br OH Br OHO OH OHO nonheme iron/2OG-dependent halogenases S O HO OH O ACP S O HO Cl OH O ACP N H NC Cl H H N H NC H H CurA* (Moorea producens 3L) WelO5* (Hapalosiphon welwitschii UTEX B1830) c) dimetal-carboxylate halogenasesa) CylC* (Cylindrospermum licheniforme ATCC 29412) S O ACP S O ACP Cl McnD (Microcystis cf. wesenbergii NIVA-CYA 172/5) N OH O N OH O Cl BrtJ (Synechocystis salina LEGE 06099): unknown substrate O O HO HO OH OH Cl Cl Cl bartoloside I S O ACP S O ACP Cl ClCl ColD/ColE (Moorea bouillonii PNG) ClyC/ClyD (Sphaerospermopsis sp. LEGE 00249) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 Methods 113 Sequence similarity networks and Genomic Neighborhood Diagrams 114 Sequence similarity networks (SSNs) were generated using the EFI-EST sever, following a “Sequence BLAST” 115 of CylC (AFV96137) as input [34], using negative log e-values of 2 and 40 for UniProt BLAST retrieval and 116 SSN edge calculation, respectively. This SSN edge calculation cutoff was found to segregate the homologs into 117 different SSN clusters, less stringent cutoff values resulted in a single SSN cluster. The 153 retrieved sequences 118 and the query sequence were then used to generate the SSNs with an alignment score threshold of 42 and a 119 minimum length of 90. The networks were visualized in Cytoscape (v3.80). The full SSN obtained in the 120 previous step was used to generate Genomic Neighborhood Diagrams (GNDs) using the EFI-GNT tool [34]. A 121 Neighborhood Size of 10 was used and the Lower Limit for Co-occurrence was 20%. The resulting GNDs were 122 visualized in Cytoscape (Fig. 2). 123 124 Cyanobacterial strains and growth conditions 125 Freshwater and marine cyanobacteria strains from Blue Biotechnology and Ecotoxicology Culture Collection 126 (LEGEcc) (CIIMAR, University of Porto) were grown in 50 mL Z8 medium [35] or 50 mL Z8 25‰ sea salts 127 (Tropic Marine) with vitamin B12, with orbital shaking (~200 rpm) under a regimen of 16 h light (25 μmol 128 photons m-2 s -1)/8 h dark at 25 °C. 129 130 Genomic DNA extraction 131 Fifty milliliters of each cyanobacterial strain were centrifuged at 7000 ×g for 10 min. The cell pellets were used 132 for genomic DNA (gDNA) extraction using the PureLink ® Genomic DNA Mini Kit (Thermo Fisher 133 Scientific®) or NZY Plant/Fungi gDNA Isolation kit (Nzytech), according to the manufacturer’s instructions. 134 135 Primer design 136 Basic local alignment search tool (BLAST) searches using CylC [Cylindrospermum licheniforme UTEX B 137 2014] as query identified related genes (for tBLASTn: 31-93% amino acid identity). We discarded nucleotide 138 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 hits with a length <210 and e-values <1×10-10. The complete sequences (56 cylC homolog sequences, Table S1) 139 were collected from NCBI and aligned using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) 140 [36]. Phylogenetic analysis of the hits was performed using FastTree GTR with a rate of 100. Streptomyces 141 thioluteus aurF, encoding a distant dimetal-carboxylate protein [27] was used as an outgroup 142 (AJ575648.1:4858-5868). We divided the phylogeny of cylC homologs in five groups with moderate similarity 143 (Fig. S1). The regions of higher similarity within each group were selected for degenerate primer design (Table 144 1). 145 146 Table 1. Degenerate primers 147 Code Sequence Expected amplicon size (bp) Tm (ºC) AF CAAAAAATHGCDCTYAAYC 788-986 55 AR TGDAADCCTTCRTGTTC BF CACAAAAAHTWGCTCTYAAYC 673-715 57 BR GTKGTRTGGWARGATTCATC CF AATCAWCTTTAYTGGGTRGC 506-509 55 CR AARAARTGAAARCTYTCRTC DF AATCAAACYAGYGCWGC 299 51 DR GTRAAATAYTGACAAGC XF ATCWRGAAACCARTSAAGA 449-591 51 XR CATCAAAAACTTTYYGTARRC 148 PCR conditions 149 The PCR to detect cylC homologs were conducted in a final volume of 20 µL, containing 6.9 µL of ultrapure 150 water, 4.0 µL of 5× GoTaq Buffer (Promega), 2.0 µL of MgCl2, 1.0 µL of dNTPs, 2.0 µL of reverse and 2.0 µL 151 of forward primer (each at 10 µM), 0.1 µL of GoTaq and 2.0 µL of cyanobacterial gDNA. PCR thermocycling 152 conditions were: denaturation for 5 min at 95 °C; 35 cycles with denaturation for 1 min at 95 °C, primer 153 annealing for 30 s at different temperatures (55 ºC for group A; 57ºC for group B; 55 ºC for group C; 51 ºC for 154 group D; 51 ºC for group X) and extension for 1 min at 72 °C; and final extension for 10 min at 72 °C. 155 When not already available, the 16S rRNA gene for a tested strain was amplified by PCR, using standard primers 156 for amplification (CYA106F 5’ CGG ACG GGT GAG TAA CGC GTG A 3’ and CYA785R 5’ GAC TAC 157 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 WGG GGT ATC TAA TCC 3’). The PCR reactions were conducted in a final volume of 20 µL, containing 6.9 158 µL of ultrapure water, 4.0 µL of 5× GoTaq Buffer, 2.0 µL of MgCl2, 1.0 µL of dNTPs, 2.0 µL of primer reverse 159 and 2.0 µL of primer forward (each one at 10 µM), 0.1 µL of GoTaq and 2.0 µL of cyanobacterial DNA. PCR 160 thermocycling conditions were: denaturation for 5 min at 95 °C; 35 cycles with denaturation for 1 min at 95 °C, 161 primer annealing for 30 s at 52 ºC and extension for 1 min at 72 °C; and final extension for 10 min at 72 °C. 162 Amplicon sizes were confirmed after separation in a 1.0% agarose gel. 163 164 Cloning and sequencing 165 The cylC homolog and 16S rRNA gene sequences were obtained either directly from the NCBI or through 166 sequencing. To obtain high quality sequences, the TOPO PCR cloning (Invitrogen) was used. The TOPO cloning 167 reaction was conducted in a final volume of 3 µL, containing 1 µL of fresh PCR product, 1 µL of salt solution, 168 0.5 µL of TOPO vector and 0.5 µL of water. The reaction was incubated for 20 min at room temperature. Three-169 microliters of TOPO reaction were added into a tube containing chemically competent E. coli (Top10, Life 170 Technologies) cells. After 30 min of incubation on ice, the cells were placed for 30 s at 42 ºC without shaking 171 and were then immediately transferred to ice. 250 µL of room temperature SOC medium were added to the 172 previous mixture and the tube was horizontally shaken at 37 ºC for 1 h (180rpm). 60 µL of the different cloning 173 reactions were spread onto LB ampicillin/X-gal plates and incubated overnight at 37 ºC. 174 Two or three positive colonies from each reaction were tested by colony-PCR. The PCR was conducted in a 175 final volume of 20 µL, containing 10.9 µL of ultrapure water, 4.0 µL of 5x GoTaq Buffer, 2.0 µL of MgCl2, 1.0 176 µL of dNTPs, 1.0 µL of reverse pUCR and 1.0 µL of forward pUCF primers (each at 20 µM), 0.1 µL of GoTaq 177 and the target colony. PCR thermocycling conditions were: denaturation for 5 min at 95 °C; 35 cycles with 178 denaturation for 1 min at 95 °C, primer annealing for 30 s at 50 ºC and extension for 1 min at 72 °C; and final 179 extension for 10 min at 72 °C. Amplicon sizes were confirmed after separation in an 1.0 % agarose gel. Selected 180 colonies were incubated overnight at 37 ºC (180 rpm), in 5 mL of LB supplemented with 100 µg mL-1 ampicillin. 181 The plasmids containing the amplified PCR products were extracted (NZYMiniprep kits) and Sanger sequenced 182 using pUC primers. 183 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 184 Cyanobacteria genome sequencing 185 Many of the LEGEcc strains are non-axenic, and so before extraction of gDNA for genome sequencing, an 186 evaluation of the amount of heterotrophic contaminant bacteria in cyanobacterial cultures was performed by 187 plating onto Z8 or Z8 with added 2.5% sea salts (Tropic Marine) and vitamin B12 (10 µg/L) agar medium 188 (depending the original environment) supplemented with casamino acids (0.02% wt/vol) and glucose (0.2% 189 wt/vol) [37]. The plates were incubated for 2-4 days at 25 ºC in the dark and examined for bacterial growth. 190 Those cultures with minimal contamination were used for DNA extraction for genome sequencing. The selection 191 of DNA extraction methodology used was based on morphological features of each strain. Total genomic DNA 192 was isolated from a fresh or frozen pellet of 50 mL culture using a CTAB-chloroform/isoamyl alcohol-based 193 protocol [38] or using the commercial PureLink Genomic DNA Mini Kit (Thermo Fisher Scientific®) or the 194 NZY Plant/Fungi gDNA Isolation kit (NZYTech). The latter included a homogenization step (grinding cells 195 using a mortar and pestle with liquid nitrogen) before extraction using the standard kit protocol. The quality of 196 the gDNA was evaluated in a DS-11 FX Spectrophotometer (DeNovix) and 1 % agarose gel electrophoresis, 197 before genome sequencing, which was performed elsewhere (Era7, Spain and MicrobesNG, UK) using 2 × 250 198 bp paired-end libraries and the Illumina platform (except for Synechocystis sp. LEGE 06099, whose genome 199 was sequenced using the Ion Torrent PGM platform). A standard pipeline including the identification of the 200 closest reference genomes for reading mapping using Kraken 2 [39] and BWA-MEM to check the quality of the 201 reads [40] was carried out, while de novo assembly was performed using SPAdes [41]. The genomic data 202 obtained for each strain was treated as a metagenome. The contigs obtained as previously mentioned were 203 analyzed using the binning tool MaxBin 2.0 [42] and checked manually in order to obtain only cyanobacterial 204 contigs. The draft genomes were annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) 205 [43] and submitted to GenBank under the BioProject number SUB8150995. In the case of Hyella patelloides 206 LEGE 07179 and Sphaerospermopsis sp. LEGE 00249 the assemblies had been previously deposited in NCBI 207 under the BioSample numbers SAMEA4964519 and SAMN15758549, respectively. 208 209 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Genomic context of CylC homologs 210 BLASTp searches using CylC [Cylindrospermum licheniforme UTEX B 2014] as query identified related CylC 211 homologs within the publicly available cyanobacterial genomes and in the genomes of LEGEcc strains. We 212 annotated the genomic context for each CylC homolog using antiSMASH v5.0 [44] and manual annotation 213 through BLASTp of selected proteins. Some BGCs were not identified by antiSMASH and were manually 214 annotated using BLASTp searches. 215 216 Phylogenetic analysis 217 Nucleotide sequences of cylC homologs obtained from the NCBI and from genome sequencing in this study, 218 were aligned using MUSCLE from within the Geneious R11.0 software package (Biomatters). The nucleotide 219 sequence of the distantly-related dimetal-carboxylate protein AurF [27] from Streptomyces thioluteus 220 (AJ575648.1:4858-5868) was used as an outgroup. The alignments, trimmed to their core 788, 673, 506, 299 221 and 499 positions (for group A, B, C, D and X, respectively), were used for phylogenetic analysis, which was 222 performed using FastTree 2 (from within Geneious), using a GTR substitution model (from jmodeltest, [45]) 223 with a rate of 100 (Fig. S2). 224 For the phylogenetic analysis based on the 16S rRNA gene (Fig. 3, Fig. S3), the corresponding nucleotide 225 sequences were retrieved from the NCBI (from public available genomes until March 16, 2020) or from 226 sequence data (amplicon or genome) obtained in this study. The sequences were aligned as detailed for cylC 227 homologs and trimmed to the core shared positions (663). A RAxML-HPC2 phylogenetic tree inference using 228 maximum likelihood/rapid bootstrapping run on XSEDE (8.2.12) with 1000 bootstrap iterations in the Cipres 229 platform [46] was performed. 230 The amino acid sequences of CylC homologs were aligned using MUSCLE from within the Geneious software 231 package (Biomatters). The alignments were trimmed to their core 333 residues and used for phylogenetic 232 analysis, which was performed using RAxML-HPC2 phylogenetic tree inference using maximum 233 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 likelihood/rapid bootstrapping run on XSEDE (8.2.12) with 1000 bootstrap iterations in the Cipres platform [46] 234 (Fig. 4c). 235 236 CORASON analysis 237 CORASON, a bioinformatic tool that computes multi-locus phylogenies of BGCs within and across gene cluster 238 families [47], was used to analyze cyanobacterial genomes collected from the NCBI and the LEGEcc genomes 239 (Table S2). In total 2059 cyanobacterial genomes recovered from NCBI and 56 additional LEGE genomes were 240 used in the analysis. The amino acid sequences of CurA (AAT70096.1), WelO5 (AHI58816.1), McnD 241 (CCI20780.1), Bmp5 (WP_008184789.1), PrnA (WP_044451271.1) and CylC (ARU81117.1) were used as 242 query and, for each enzyme, a reference genome was selected (Table S2). To increase the phylogenetic 243 resolution, selected genomes were removed from the analysis of enzymes CylC, PrnA, CurA, McnD and Bmp5 244 (Table S2). Additionally, for the CylC analysis, a few BGCs were manually extracted and included in the 245 analysis (Table S2) since they were not detected by CORASON. 246 247 Prevalence of halogenases in cyanobacterial genomes 248 Representative proteins of each class were used as query in each search: CylC (ARU81117.1), BrtJ 249 (AKV71855.1), “Mic” (WP_002752271.1) - the halogenase in the putative microginin gene cluster – ColD 250 (AKQ09581.1), ColE (AKQ09582.1), NocO (AKL71648.1), NocN (AKL71647.1) for dimetal-carboxylate 251 halogenases; PrnA (WP_044451271.1), Bmp5 (WP_008184789.1), and McnD (CCI20780.1) for flavin-252 dependent halogenases; the halogenase domains from CurA (AAT70096.1), and the halogenases Barb1 253 (AAN32975.1), HctB (AAY42394.1), WelO5 (AHI58816.1) and AmbO5 (AKP23998.1) for nonheme iron-254 dependent halogenases). Non-redundant sequences obtained for these searches using a 1×10-20 e-value cutoff, 255 which represents a percentage identity between the query and target protein superior to 30%, were considered 256 to share the same function as the query. 257 258 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 Results and Discussion 259 CylC-like halogenases are mostly found in cyanobacteria 260 To investigate the distribution of CylC homologs encoded in microbial genomes, we first searched the reference 261 protein (RefSeq) or non-redundant protein sequences (nr) databases (NCBI) for homologs of CylC or BrtJ, using 262 the Basic Local Alignment Search Tool, BLASTp (min 25% identity, 9.9×10-20 E-value and 50% coverage). A 263 total of 128 and 246 homologous unique protein sequences were retrieved using the RefSeq or nr databases, 264 respectively; in both cases, sequences were primarily from cyanobacteria (96 and 88%, respectively) (Fig. 2a). 265 We then used the Enzyme Similarity Tool of the Enzyme Function Initiative (EFI-EST) [34] to evaluate the 266 sequence landscape of dimetal-carboxylate halogenases. Using CylC as query, we obtained a SSN (sequence 267 similarity network) composed of 154 sequences retrieved from the UniProt database [48] (Fig. 2b). The SSN 268 featured two major clusters, one containing homologs from diverse cyanobacterial genera, the other composed 269 of homologs from several cyanobacteria, with a few from proteobacteria (mostly deltaproteobacteria) and two 270 from the cyanobacteria sister-phylum Melainabacteria. A third SSN cluster was composed only by the 271 previously reported BrtJ enzymes and, finally, a homolog from the cyanobacterial genus Hormoscilla remained 272 unclustered. We were unable to recover any SSN that included clusters containing other characterized enzyme 273 functions, which attests to the uniqueness of the dimetal-carboxylate halogenases in the current protein-sequence 274 landscape. 275 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 276 Figure 2. Abundance of CylC homologs in bacteria. a) BLASTp using CylC (GenBank accession no: 277 ARU81117) as query against different databases, shows that these dimetal-carboxylate enzymes are found 278 almost exclusively in cyanobacteria. b) Sequence Similarity Network (SSN) of CylC depicting the similarity-279 based clustering of UniProt-derived protein sequences with homology (BLAST e-value cutoff 1×10-2, edge e-280 value cutoff 1×10-40) to CylC (GenBank accession no: ARU81117). In each node, the bacterial genus for the 281 corresponding UniProt entry is shown (NA – not attributed). 282 283 284 285 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 CylC homologs are widely distributed throughout the phylum Cyanobacteria 286 With the intent of accessing a wide diversity of CylC homolog sequences, we decided to use a degenerate-primer 287 PCR strategy to discover additional homologs in cyanobacteria from the LEGEcc culture collection [49], 288 because the phylum Cyanobacteria is diverse and still underrepresented in terms of genome data [50-55]. The 289 LEGEcc culture collection maintains cultures isolated from diverse freshwater and marine environments, mostly 290 in Portugal, and, for example, contains all known bartoloside-producing strains [30]. Primers were designed 291 based on 54 nucleotide sequences retrieved from the NCBI that were selected to represent the phylogenetic 292 diversity of CylC homologs (Fig. S1). Due to the lack of highly conserved nucleotide sequences among all 293 homologs considered, we divided the nucleotide alignment into five groups and designed a degenerate primer 294 pair for each. Upon screening 326 strains from LEGEcc using the five primer pairs, we retrieved 89 sequences 295 encoding CylC homologs, confirmed through cloning and Sanger sequencing of the obtained amplicons. We 296 were unable to directly analyze the diversity of the entire set of LEGEcc-derived cylC amplicons due to low 297 overlap between sequences obtained with different primers. As such, we performed a phylogenetic analysis of 298 the diversity retrieved with each primer pair (Fig. S2), by aligning the PCR-derived sequences with a set of 299 diverse cylC genes retrieved from the NCBI. For some strains, our PCR screen retrieved more than one homolog 300 using different primer pairs (e.g. Nostoc sp. LEGE 12451 or Planktothrix mougeotii LEGE 07231). In general, 301 and for each primer pair, the PCR screen retrieved mostly sequences that were closely related and associated to 302 one or two phylogenetic clades. This can likely be explained by the geographical bias that might exist in the 303 LEGEcc culture collection [49] and/or with primer design and PCR efficiency issues, which might have favored 304 certain phylogenetic clades. 305 To access full-length sequences of the CylC homologs identified among LEGEcc strains, as well as their 306 genomic context, we undertook a genome-sequencing effort informed by our PCR screen. We selected 21 strains 307 for genome sequencing, which represents the diversity of CylC homologs observed in the different PCR 308 screening groups. The resulting genome data was used to generate a local BLAST database and the homologs 309 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 were located within the genomes. In some cases, additional homologs that were not detected in the PCR screen 310 were identified. Overall, 33 full-length genes encoding CylC homologs were retrieved from LEGEcc strains. 311 To explore the phylogenetic distribution of CylC homologs encoded in publicly available reference genomes 312 and the herein sequenced LEGEcc genomes, we aligned the 16S rRNA genes from 648 strains with RefSeq 313 genomes and the LEGEcc strains that were screened by PCR in this study. Using this dataset, we performed a 314 phylogenetic analysis which indicated that CylC homologs are broadly distributed through five Cyanobacterial 315 orders: Nostocales, Oscillatoriales, Chroococcales, Synechococcales and Pleurocapsales (Fig. 3, Fig. S3). It is 316 noteworthy that the cyanobacterial orders for which we did not find CylC homologs (Chroococcidiopsidales, 317 Spirulinales, Gloeomargaritales and Gloeobacterales) are poorly represented in our dataset (Fig. 3, Fig. S3). 318 However, our previous BLASTp search against the nr database did retrieve two close homologs in two 319 Chroococcidiopsidales strains (genera Aliterella and Chroococcidiopsis) and a more distant homolog in a 320 Gloeobacter strain (Gloeobacterales) (Table S3). Given the wide but punctuated presence of CylC homologs 321 among the cyanobacterial diversity considered in this study, it is unclear how much of the current CylC homolog 322 distribution reflects vertical inheritance or horizontal gene transfer events. 323 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 324 Figure 3. RAxML cladogram of the 16S rRNA gene of LEGEcc strains (grey squares) and from cyanobacterial 325 strains with NCBI-deposited reference genomes, screened in this study. Taxonomy is presented at the order level 326 (colored rectangles). Strains whose genomes encode CylC homologs are denoted by black squares. Green 327 squares indicate that at least one homolog was detected by PCR-screening and verified by retrieving the 328 sequence of the corresponding amplicon by cloning followed by Sanger sequencing. Gloeobacter violaceus PCC 329 7421 served as an outgroup. A version of this cladogram including the bootstrap values for 1000 replications is 330 provided as Supplementary Material. 331 332 Diversity of BGCs encoding CylC homologs 333 To characterize the biosynthetic diversity of BGCs encoding CylC homologs, which were found in 78 334 cyanobacterial genomes (21 from LEGEcc and 57 from RefSeq) from different orders, we first submitted these 335 Lim n orap his rob usta C S 951 Filam ento us cyano bacter ium LEGE 060 07 Scytonem a millei VB51128 3 Geitler inema sp LEGE 1139 1 N os to c ed ap h ic um L E G E 0 72 99 M icrocystis aeruginosa N IES 25 49 No sto c sp C AV N2 Proc hlo ro cocc u s s p RS 04 Chlo r oglo eo ps is frit schii PCC 921 2 Synecho coccales cyan obacte rium LEG E 0 6003 un id en tif ie d No st oc al es L EG E 1 2 45 2 Fisch erella t herm alis WC5 27 Fisch erella t herm alis WC5 38 S yn ec ho cy st is s al in a LE G E 0 00 36 Cu sp ido thr ix iss at sc he nk oi L E GE 00 24 7 Tycho nem a sp LE G E 062 05 Filam ento us cyano bacter ium LEGE XX0 62 Cyano bi um sp LE GE 0 613 0 P lanktoth rix aga rdhii C C A P 1459 11 A M icr ocystis aer uginosa N IES 98 Croco sphae ra sub trop ica ATCC 5 1472 Le p to ly ng b ya s p LE G E 0 73 19 Ly ng by a co n fe rv oi d es B DU 1 41 95 1 Chon drocystis sp NIES 41 02 Acaryoch loris ma rina M BI C11 017 C yl in dr o sp er m op si s ra ci bo rs ki i S 05 C yl in dr o sp er m op si s ra ci bo rs ki i C Y LP No do sili ne a s p L EG E 06 19 1 A na ba en a a ph an iz o m en o id es L E G E 0 02 50 Stanier ia cyano sphaer a PCC 7437 S yn ec ho co cc us s p L EG E 0 7 07 4 Croco sphae ra chwa kensis CCY0 110 Tycho nem a sp LE G E 072 00 Cyan o bac ter iu m PC C 77 02 Cu sp ido thr ix iss at sc he nk oi L E GE 03 28 5 Le p t oly ng b y a s p L EG E 07 0 8 0 Filam ento us cyano bacter ium LEGE 071 80 Fisch erella t herm alis WC2 13 un id en tif ie d Ps eu da na ba en a ce ae c ya no ba ct er iu m L E G E 0 61 12 No do sili ne a s p L EG E 06 12 1 Fi la m en to us c ya no ba ct er iu m L E G E 0 72 09 Synec ho coc cus sp L EGE 113 79 C al ot h rix p ar as iti ca N IE S 26 7 Fisch erella m uscicola PCC 741 4 P lanktoth rix m o ugeot ii L EG E 06 222 M icrocystis sp LE G E 000 66 No do sili ne a s p L EG E 06 00 1 Tycho nema bor netii LEGE 1444 4 Pseuda nabae na af f mucicola LEGE 0 0260 M icrocystis aeruginosa LE G E 91343 C yl in dr o sp er m op si s ra ci bo rs ki i S 01 Cylindro sp erm um st a gnale PCC 74 17 C yl in dr o sp er m op si s ra ci bo rs ki i L EG E 9 9 04 5 unid entified Pseu dana baena ceae cya noba cterium LEGE 1341 5 M icrocystis aeruginosa LE G E 91094 Fo rti ea s p LE G E X X 44 3 Mo orea bouillonii PNG5 1 98 Nostoc az ollae 0 708 O sc ill at or ia s p LE G E 0 60 18 un cu ltu re d T ol yp ot h rix s p cl on e LE G E 1 13 97 Cyano bium sp LE GE 0 6015 No do sili ne a s p L EG E 07 36 4 Calo th rix sp P CC 7507 M icrocystis aeruginosa LE G E 08327 M a stigocoleu s testarum B C 008 Ca lot h r ix sp N IE S 20 98 Fisch erella t herm alis WC4 41 M icrocystis aeruginosa LE G E 05195 Fo rtie a con tor ta PCC 7126 P lanktoth rix aff m oug eotii LE G E 0622 4 Micr ocoleus sp PCC 7 11 3 M icrocystis aeruginosa N IES 12 11 Nostoc lin ckia z 6 Cyano bacter ium ap oninum PCC 1 0605 Stanier ia sp NIES 3757 Mo orea prod ucens JHB Anaba ena sp ATCC 33 047 Fisch erella t herm alis CCMEE 5 318 H al om icr on em a cf m et az oi cu m L E G E 0 71 3 2 Gloeo capsop sis sp 1H9 N os to c sp L E G E 1 24 5 0 C aloth rix sp N IE S 4071 Syn ech o co ccu s cf n id ulan s L EGE 06 322 S yn ec ho cy st is s al in a LE G E 0 61 55 Fisch erella t herm alis WC2 46 Nosto c lin c kia N IES 2 5 M icrocystis aeruginosa LE G E 11464 Cyano bium sp LEGE 0 0035 Gloeo bacter kilauee nsis JS1 Fisch erella t herm alis PCC 7 521 Vulc anoc o ccu s lim net ic u s L L Cylindro spe rm um liche niform e UTE X B 2014 Apha n izo meno n flos a qu ae N I ES 81 Halom icr onem a ho ngdech loris C220 6 Phorm idium sp LEGE 07215 N os to c sp L E G E 1 24 4 7 Cya no b ium sp LE GE 0 60 26 S yn ec ho cy st is s al in a LE G E 0 00 31 M icrocystis aeruginosa PC C 7 806S L M icrocystis aeruginosa LE G E 91352 Nostoc li n ckia z4 Cyano thece sp PCC 7 822 Limn othr ix rosea NIES 20 8 Sy nec ho coc cus nid u la ns LE GE 07 1 7 1 P lanktoth rix pau civesiculata P C C 963 1 P lanktoth rix sp P C C 1120 1 Fisch erella t herm alis WC4 39 P lanktoth rix m o ugeot ii L EG E 06 223 N os to c sp L E G E 0 73 6 5 Cya no b ium a ff gra cile LE GE 073 66 M icrocystis aeruginosa PC C 9 717 A ff R oh ol tie lla s p LE G E 1 24 11 Fisch erella sp PCC 9 605 Nostoc lin ckia z 3 un id en tifi ed N o s toc ale s L EG E XX 27 6 Do lic ho sp erm u m sp L EG E 00 26 3 M icrocystis aeruginosa LE G E 91341 C yl in dr o sp er m op si s ra ci bo rs ki i L EG E 9 9 04 3 Tycho nem a sp LE G E 072 14 Anaba ena sp 4 3 Lep tolyngb ya oha dii IS1 Lep toly ngb ya cf h a lo phi la L EG E 0 61 02 No do sili ne a s p L EG E 06 12 4 M icrocystis aeruginosa LE G E 08354 N os to c ca rn e um N IE S 2 10 7 Le p t oly ng b y a s p L EG E 06 3 0 8 N os to c sp L E G E 1 24 4 9 Cyano thece sp PCC 8 802 Le p t oly ng b y a s p L EG E 07 2 9 8 Cya no b ium sp LE GE 0 63 07 N os to c sp L E G E 1 24 5 4 P lanktoth rix prolifica N IVA C YA 98 Tycho nema sp LEGE 072 03 R ap h id io ps is b ro ok ii D 9 D 9 2 3 M icrocystis viridis N IE S 102 Xenoco ccus sp PCC 7305 S yn ec ho cy st is s p LE G E 0 60 0 5 C yl in dr o sp er m op si s ra ci bo rs ki i C 03 Hyella pa telloides L EGE 07 179 Nostoc sp 333 5mG No do sili ne a s p L EG E 06 02 0 Cyano thece sp PCC 8 801 S cytonem a sp N IE S 407 3 M icrocystis aeruginosa LE G E 91351 Lep tolyngb ya sp LEGE 134 18 R iv ul ar ia s p LE G E 0 71 5 9 S ynecho cystis sp IP PA S B 1 465 No st oc a le s cy an o ba ct er iu m L E GE 1 13 8 6 Micr ocoleus sp LEGE 07081 M icrocystis aeruginosa N IES 44 S yn ec ho cy st is s al in a LE G E 0 00 38 Rome ria sp LEG E 0 6013 Do lic ho sp erm u m sp L EG E 00 24 8 To ly po th rix te nu is P C C 71 01 Cylindro sperm opsis r aciborskii L EGE 99 046 M icrocystis aeruginosa PC C 7 005 Cyano bium sp LEGE 11 437 C yl in dr o sp er m op si s ra ci bo rs ki i S 14 M icrocystis sp M C1 9 No sto c p isc ina le C EN A2 1 Af f N od os ilin e a sp L EG E 06 14 8 Fisch erella t herm alis CCMEE 5 273 Anaba en a sp PCC 71 08 Doli chos p erm u m plan cton icum NIE S 80 No sto c sp N IE S 3 75 6 Cyano bium s p LEG E 0 61 37 Cyano bium sp LEGE XX442 S ynecho cystis sp P C C 6714 M icr ocystis aer uginosa PC C 9 807 Deser tifilum sp IPPAS B 122 0 un id en tifi ed N o s toc ale s L EG E XX 25 4 M icrocystis aeruginosa LE G E 12461 Geitler inema sp LEGE 1139 0 Sy nec ho coc cus sp L E GE 11 3 9 4 No du la ri a s p L EG E 0 428 8 Tycho nem a sp LE G E 071 96 P lanktoth rix ru bescens strain 7 821 Synecho cystis sp LEGE 0601 7 un id en tif ie d fila m en t o us S yn ec ho co cc al es L EG E 0 7 08 9 Mo orea prod ucens PAL 8 15 08 1 Chro ococcidiop sis sp TS 82 1 S yn ec ho cy st is s al in a LE G E 0 00 30 Do lic ho sp erm u m sp L EG E 00 24 6 No do sili ne a n od ulo sa P CC 71 0 4 P lanktoth rix m o ugeot ii L EG E 06 226 Doli chos p erm u m com pact um NIE S 8 0 6 No sto c sp N IE S 2 111 Cyan o bium sp L EGE 1 037 5 Croco sphae ra watso nii W H 0005 Cyano bium sp LEGE 0 6184 To xi fil um m ys id oc id a L E G E 06 10 8 C aloth rix rhizo soleniae SC 01 Aph an iz o me no n flos a qu ae 2 012 KM 1 D3 Filam ento us cyano bacter ium LEGE 000 52 Cu sp ido thr ix sp LE GE 0 32 84 Tycho nem a sp LE G E 072 21 A rthrospira sp TJS D 091 S ph ae ro sp e rm op si s sp L E G E 0 22 6 6 P lanktoth rix m o ugeot ii L EG E 06 225 Lep tolyngb ya bor yana NIES 213 5 Le p t oly ng b y a s p L EG E 07 3 1 1 A rthrospira sp O 9 1 3F Do lic ho sp erm u m sp L EG E 00 23 4 M icrocystis aeruginosa KW M icrocystis aeruginosa TA IH U9 8 Fisch erella m ajor NI ES 592 S yn ec ho cy st is s p LE G E 0 60 2 5 Li m n ot hr ix sp P R1 52 9 Gloeo capsop sis crepidin um LEGE 061 23 M icrocystis sp LE G E X X4 08 M icr ocystis aer uginosa SP C 777 Nos toc sp U IC 1 011 0 Chro ococcales cyanoba cterium LEGE 11438 C yl in dr o sp er m op si s ra ci bo rs ki i L EG E 9 9 04 8 Lep tolyngb ya sp PCC 6406 Phorm idium sp LEGE 00064 Le p to ly ng b ya s p LE G E 0 60 70 Oscillator ia sp PCC 1080 2 A naba ena cylind rica P C C 7122 Pseuda nabae na sp PCC 68 02 Pse680 2 R ap h id io ps is c ur va ta N IE S 9 32 Gloeo capsa sp PCC 7 428 S cytonem a tolyp othrich oides V B 6127 8 C yl in dr o sp er m op si s ra ci bo rs ki i G IH E 2 0 18 S ph ae ro sp e rm op si s sp L E G E 0 83 3 4 M icrocystis aeruginosa PC C 9 806 S yn ec ho cy st is s al in a LE G E 0 00 29 Fisch erella t herm alis WC2 45 Fisch erella m usc icola PCC 731 03 Dactyloco ccopsis salina PCC 83 05 C yl in dr o sp er m op si s ra ci bo rs ki i C 04 Cyano bium sp LEGE 0 6138 Oscillator iales cyano bacter ium M TP1 S yn ec ho cy st is s al in a LE G E 0 00 32 A rthrospira platen sis NI ES 39 Lep tolyngb ya sp PCC 7376 M icrocystis aeruginosa LE G E 12460 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 1 04 05 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 11 39 2 Cya no b ium sp P CC 70 0 1 M icrocystis aeruginosa LE G E 91342 Coleof asciculus chth onop la stes PCC 7420 M icrocystis aeruginosa LE G E 91095 No do sili ne a s p L EG E 07 09 1 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 00 53 Chro ococcidiop sis cubana SAG 3 9 79 A rthrospira platen sis YZ A rthrospira sp TJS D 092 Fi la m en to us c ya no ba ct er iu m L E G E 0 00 33 C aloth rix dese rtica P C C 7102 No sto c cy ca da e WK 1 Le p to ly ng b ya s p B C 13 07 Lep tolyngb ya sp LEGE 063 61 Fisch erella t herm alis WC3 41 Syne cho c occu s sp UW1 4 0 No do sili ne a s p L EG E 06 12 9 P lanktoth rix aff m oug eotii LE G E 0722 7 M icrocystis aeruginosa PC C 9 808 Micr ocoleus va ginatu s FGP 2 Lep tolyngb ya sp he nsonii28 Phorm idium sp LEGE 06204 M icrocystis sp LE G E 083 31 Le p t oly ng b y a s p L EG E 07 3 1 4 N os to c sp L E G E 1 24 4 8 Nos toc sp 5 18 3 Cyano bium s p LEG E 0 60 16 Cya no b ium g rac ile L EGE 093 99 Aphan othece sacrum FPU3 S yn ec ho cy st is s al in a LE G E 0 00 28 Gem inocystis sp NIES 370 8 No do silin e a no d u los a L EG E 0 6 1 04 Sy ne ch o c oc ca les cy an ob ac te riu m LE G E 1 1 3 95 C yl in dr o sp er m op si s ra ci bo rs ki i C 07 No sto c sp P CC 7 12 0 Nostoc lin ckia z9 Alkalinema aff pa ntana lense L EGE 15 481 Doli chos p erm u m circi na le AW QC3 10 F 31 0 F Tycho nem a sp LE G E 062 20 A rthrospira sp str PC C 80 05 Lyng bya ae stuarii B L J lae st3 N os to c sp L E G E 1 24 5 6 S yn ec ho cy st is s al in a LE G E 0 00 40 Tycho nem a sp LE G E 062 06 Nos toc sp 2 32 A literella a tlantica C E N A 595 M icr ocystis aer uginosa PC C 9 432 Planktoth rix mo ugeot ii L EGE 07 229 No do silin e a sp L E GE 06 010 C yl in dr o sp er m op si s ra ci bo rs ki i M VC C 14 Cyano b ium sp L EGE 0 7 175 Tycho nema bour rellyi F EM GT70 3 Crina liu m epip sammu m PCC 9333 M icrocystis sp LE G E 083 55 C yl in dr o sp er m op si s ra ci bo rs ki i C S 5 05 Cyano thece sp PCC 7 425 M icrocystis aeruginosa PC C 9 809 S ynecho cystis sp P C C 6803 No sto c c om mu ne HK 02 Cyano bium sp LEGE 0 6109 Syn ech o co ccu s sp L E GE 06 306 Phorm idesmis p riestleyi BC140 1 Chro ococcales cyanoba cterium IPPAS B 1 203 S cy to ne m a ho fm an n i U TE X 2 34 9 Lusita niella cor iacea L EGE 07 157 Rubidib acter la cunae KORDI 5 1 2 KR51 Cyano biu m sp LEG E 0 6143 M icrocystis sp LE G E 002 58 Ana ba e na s p 90 M icrocystis aeruginosa Sj C yl in dr o sp er m op si s ra ci bo rs ki i C r2 01 0 Cya no b ium sp LE GE 0 63 16 Do lic ho sp erm u m sp L EG E 00 24 0 Fisch erella m uscicola CCMEE 532 3 Cyano bium sp LEGE 0 7183 Spirulina major PCC 6 313 Fisch erella t herm alis WC4 42 M icrocystis pan niform is FA CH B 17 57 Fisch erella sp PCC 9 339 M icrocystis aeruginosa N IES 87 No du la ri a s pu mig en a C CY 9 4 14 Geitler inema sp PCC 9228 Fisch erella sp PCC 9 431 Gloeo capsop sis sp LEGE 1342 0 An aba ena va riab ilis AT CC 29 413 Chro ococcidiop sid ales cyan obacte rium L EGE 13 419 Nos toc sp A TCC 5 3 789 Fisch erella t herm alis WC3 44 M a stigoclado psis rep ens P C C 1091 4 un id en tif ie d fila m en t o us S yn ec ho co cc al es L EG E 0 7 16 3 To ly po th rix s p NI ES 4 07 5 Chro ococcidiop sid ales cyan obacte rium L EGE 13 423 Cyano bium sp LEGE 0 6 068 Westiellopsis p rolifica IICB1 Lim n orap his rob usta L EG E X X3 58 S yn ec ho cy st is s p LE G E 0 60 7 9 P lanktoth rix ru bescens NIVA CY A 4 07 Pseuda nabae na cf cu rta L EGE 10 371 M icr ocystis aer uginosa D IAN C H I90 5 C yl in dr o sp er m op si s ra ci bo rs ki i C E N A3 02 un id en tif ie d fila m en t o us S yn ec ho co cc al es L EG E 0 6 14 4 M icrocystis aeruginosa LE G E XX 359 Cyano bium sp LEGE 0 0034 M icrocystis aeruginosa PC C 9 701 Fisch erella t herm alis CCMEE 5 282 Fisch erella t herm alis BR2 B Synec ho coc cus sp L EGE 113 8 1 No do silin e a sp L E GE 10 376 Lyng bya sp P C C 810 6 Chro ococcidiop sis therm alis PCC 7 203 Planktoth rix mo ugeot ii L EGE 07 230 Lep tolyngb ya sp NIES 2104 M icrocystis aeruginosa LE G E 08328 Pleuro capsales cya noba cterium LEGE 10410 Cand id atus Atelocyan obacte rium thalassa iso late ALOHA N os to c sp L E G E 0 43 5 7 Syn ech o co cca les cya n ob acte rium LE G E 0 8 333 Lep tolyngb ya bor yana d g5 C al ot h rix s p P C C 7 10 3 Cyano bacter ium ap oninum IPPAS B 1 201 Cyano thece sp PCC 7 424 Cyano thece sp BG 0011 S ph ae ro sp e rm op si s re ni fo rm is N IE S 1 94 9 Ph orm idi um t e nu e NI ES 30 R ichelia int racellular is H M 01 Fisch erella t herm alis CCMEE 5 205 Cyano bium s p LEG E 0 613 9 Cyano bium sp LEGE 0 6008 M icrocystis w esenb ergii L EG E 08 368 Chro ogloeo cystis sidero phila NIES 1031 Fisch erella t herm alis CCMEE 5 198 Fisch erella t herm alis WC11 9 C aloth rix sp N IE S 4105 Spirulina subsalsa PCC 94 45 No do sili ne a s p L EG E 06 13 3 Trichodesm ium e rythraeum IM S 101 S ph ae ro sp e rm op si s ki ss el ev ia na N IE S 7 3 Fi la m en to us c ya no ba ct er iu m L E G E 0 00 60 Filam ento us cyano bacter ium ESFC 1 A3MYDRAF T S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 61 18 Hapa lo siphon sp MRB220 Croco sphae ra watso nii W H 0402 Cyano bium sp LEG E 0 6002 Phorm idium sp LEGE 00065 Cyano bium sp LEGE 0 6127 Le p to ly ng b ya e ct oc a rp i L E G E 1 14 2 5 Do lic ho sp erm u m sp L EG E 00 24 1 S yn ec ho cy st is s p LE G E 0 73 6 7 Lep tolyngb ya bor yana PCC 630 6 Phorm idesmis p riestleyi ULC0 07 No do sili ne a s p L EG E 07 08 8 Cyano bium s p LEGE 0 61 42 Fisch erella t herm alis WC11 4 C aloth rix sp N IE S 3974 No do silin e a sp L E GE 06 014 Nos toc sp 2 13 S yn ec ho cy st is s al in a LE G E 0 00 27 Croco sphae ra watso nii W H 0401 No sto c f lag elli for me CC NU N1 A rthrospira platen sis C1 To lyp o t hri x s p PC C 76 01 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 00 39 No do sili ne a s p L EG E 06 11 9 C yl in dr o sp er m op si s ra ci bo rs ki i S 07 No do sili ne a s p L EG E 06 11 5 Cyano bacter ium sp IPPAS B 120 0 Tycho nem a sp LE G E 062 07 M icr ocystis aer uginosa N IES 84 3 N os to c sp L E G E 0 60 7 7 Do lic ho sp erm u m flo s- aq ua e L EG E 02 26 8 Chro ococcidiop sis sp LEGE 0617 4 Ca lot h r ix bre vis sim a N IE S 22 Cyan o biu m u s it atum C3 Le p t oly ng b y a s p L EG E 06 0 6 9 Oscillator ia nigr o viridis PCC 7112 No du la ri a s p L EG E 0 607 1 An aba ena va riab ilis NI ES 2 3 Croco sphae ra watso nii W H 0003 To ly po th rix s p LE G E 1 44 45 S yn ec ho cy st is s al in a LE G E 0 00 37 S yn ec ho cy st is s p LE G E 0 70 7 3 Cyano bacter ium isolat e RgSB Tolypo thrix cam pylone moides VB5112 88 Croco sphae ra watso nii W H 8501 M icrocystis aeruginosa LE G E 91344 Pseuda nabae na sp ABRG5 3 No do sili ne a-l ike sp LE GE 11 42 4 Le p to ly ng b ya -li ke s p LE G E 1 34 12 Tycho nem a sp LE G E 071 99 S cytonem a sp H K 05 Le p to ly ng b ya m in u ta L E G E 0 71 2 8 Cu sp ido thr ix iss at sc he nk oi L E GE 03 28 2 M icrocystis aeruginosa LE G E 08329 Tycho nem a sp LE G E 072 02 Cyano bium sp LEGE 0 7313 Gloeo capsop sis sp LEGE 1341 4 No do silin e a cf no dul osa LE G E 1 0 377 Fisch erella sp NIES 37 54 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 71 60 Le p to ly ng b ya c f e ct o ca rp i L EG E 1 14 79 Cyano bium sp LEGE 0 60 12 No do silin e a sp L E GE 06 149 Nostoc lin ckia z18 Acaryoch loris sp CCM EE 541 0 Geitler inema sp PCC 7407 Chro ococcales cyanoba cterium LEGE 0601 9 Phorm idium sp LEGE 06363 Chro ococcales cyanoba cterium LEGE 0745 9 Do lic ho sp erm u m sp L EG E 03 27 8 P le ct on e m a cf r ad io su m L E G E 0 61 0 5 C aloth rix sp 33 6 3 Aphan othece sacrum FPU1 Le p t oly ng b y a s p K IO ST 1 LS S ynecho cystis sp P C C 7509 Acaryoch loris sp RCC17 74 RCC1774 Nostoc lin ckia z 2 P lanktoth rix aga rdhii N IV A C Y A 15 unid entified fila ment ous cyan obacte rium L EGE 114 80 Limn othr ix sp LEGE 0023 7 P lanktoth rix m o ugeot ii L EG E 07 231 C aloth rix sp P C C 6303 Sy nec ho coc cus sp L E GE 06 324 Phorm idium cf irrigu um LEGE 000 55 Gloeo capsop sis sp LEGE 1342 1 Nosto c lin c kia z1 4 Do lic ho sp erm u m sp L EG E 00 25 9 C yl in dr o sp er m op si s ra ci bo rs ki i L EG E 9 9 04 4 Cya no b ium sp L EGE 0 60 24 Nos toc sp N 6 No sto c sp C EN A5 43 Cyano thece sp ATCC 5 11 42 Chro ococcales cyanoba cterium LEGE 11426 Le p to ly ng b ya a ff e ct oc ar pi L E G E 11 38 9 S yn ec ho cy st is s p LE G E 0 60 8 3 Geitler inema sp PCC 7105 Lep to lyngb ya sp LEG E 061 1 7 S ph ae ro sp e rm op si s sp L E G E 0 02 4 9 No sto c s pha er o ide s K utz in g En N odo siline a sp L EG E 06 009 Fil am en to us cy an o b ac ter ium C CT 1 Fi la m en to us c ya no ba ct er iu m L E G E 0 71 70 No do silin e a sp L E GE 06 022 Lep tolyngb ya bor yana I AM M 10 1 M icrocystis aeruginosa LE G E 00239 C yl in dr o sp er m op si s ra ci bo rs ki i C S 5 08 No do sili ne a s p L EG E 06 14 5 S yn ec ho cy st is s al in a LE G E 0 00 41 Pseuda nabae na sp 59 Fisch erella sp N IES 41 06 No do sili ne a s p L EG E 06 19 3 Myxo sarcina sp LEGE 0614 6 Syne cho c occus nidu lans L EGE 061 5 6 Nostoc lin ckia z8 Cylin dro sp erm u m sp NIES 40 74 C yl in dr o sp er m op si s ra ci bo rs ki i S 10 M icrocystis aeruginosa LE G E 91347 Ma stigocladu s lami nosu s UU774 A rthrospira platen sis str P ara ca isolate UA S W S Nos toc pun ctifo rme PC C 7 3 10 2 No du la ri a s pu mig en a C EN A5 96 P lanktoth rix tep id a P C C 9214 P ho rm id iu m s p L E G E 11 38 4 N os to c sp L E G E 1 24 5 1 M icr ocystis aer uginosa PC C 7 941 No do sili ne a s p L EG E 03 28 3 Phorm idium sp LEGE 06072 Nosto c lin c kia z1 6 Cya no b ium g rac ile L EGE 124 31 Nos toc sp K VJ2 0 Fisch erella t herm alis CCMEE 5 201 R iv ul ar ia s p P C C 71 16 Gem inocystis he rdma nii PCC 6 308 Cham aesipho n polym orph us CCAL A S ph ae ro sp e rm op si s sp L E G E 0 83 3 5 S cytonem a ho fm ann ii PC C 711 0 Cham aesipho n minu tus PCC 6605 Haloth ece sp PCC 741 8 N ostoca les cyano bacter ium H T 58 2 Cyano b ium sp LEGE 0 7293 S yn ec ho cy st is s al in a LE G E 0 60 99 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 1 04 00 R om e ria a ff g ra ci lis L E G E 0 7 31 0 M icrocystis aeruginosa N IES 42 85 Nostoc sp P CC 7 524 C hlorogloea sp C C AL A 69 5 Oscillator iales cyano bacter ium JSC 12 Do lic ho sp erm u m flo s- aq ua e L EG E 04 28 9 Fi la m en to us c ya no ba ct er iu m L E G E X X0 61 M icrocystis aeruginosa LE G E 12462 Fisch erella t herm alis WC1 57 Unicellular cyanob acter iu m SU3 M icrocystis aeruginosa LE G E 12463 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 71 85 Mo orea prod ucens 3 L P le ct on e m a cf r ad io su m L E G E 0 61 14 Cyano biu m sp LEG E 0 6135 Nostoc lin ckia z 1 Cya no b ium sp L EGE 0 60 23 Cya no b ium sp LE GE 0 71 53 C yl in dr o sp er m op si s ra ci bo rs ki i C E N A3 03 M icrocystis aeruginosa N IES 24 81 M icr ocystis aer uginosa N IES 88 M icrocystis aeruginosa LE G E 11465 Cyano bium sp LEGE 0 7318 Tycho nem a sp LE G E 072 13 Phorm idium la etevire ns LEGE 0610 3 Cyano b ium sp L EGE 0 6 140 Anab a ena sp W A102 No sto c sp P CC 7 10 7 Lep toly ngb ya sp LE GE 07 0 8 4 No du la ri a s pu mig en a U HC C 0 039 Ca lot h r ix sp N IE S 21 00 Chlo r oglo eo ps is frit schii PCC 691 2 Le p t oly ng b y a s p L EG E 07 0 8 5 Cyano bium sp LEGE 0 7186 Nos toc sp D B3 9 92 M icrocystis aeruginosa PC C 9 443 Fisch erella t herm alis strain JSC 11 Cyano bium sp LEGE 0 6011 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 61 16 Pleuro capsa sp PCC 73 19 Planktoth ricoides sp SR0 01 Phorm idium sp LEGE 06078 Trichorm us sp N M C 1 No do silin e a sp L E GE 06 006 C al ot h rix s p LE G E 0 61 00 Pseuda nabae na bice ps PCC 7 429 Gloeo bacter violaceu s PCC 74 21 An ab a e no ps is cir cu lar is NI ES 21 Chro ococcidiop sid ales cyan obacte rium L EGE 13 417 H al om icr on em a ex ce nt ric um s tr L ak sh ad w e ep 2 A rthrospira m axim a C S 3 28 Pleuro capsa sp PCC 73 27 A ul os ira la xa N IE S 5 0 O cu la te lla s p LE G E 0 61 41 Le p to ly ng b ya s p LE G E 0 70 75 Phorm idium sp HE1 0JO Filam ento us cyano bacter ium LEGE 071 67 C yl in dr o sp er m op si s ra ci bo rs ki i I T E P A 1 Cya no b ium sp N IES 98 1 Fisch erella t herm alis WC111 0 No sto c c om mu ne NIE S 4 072 Lep tolyngb ya sp NIES 3755 Do lic ho sp erm u m sp L EG E 03 27 7 Sy nec ho coc cal es cya n o bac te r ium LE G E 1 3 422 M icr ocystis aer uginosa N IES 29 8 C al en e m a si ng ul ar is L EG E 0 6 18 8 No do silin e a sp L E GE 06 110 Lep tolyngb ya sp O 7 7 Cyano bium s p LEG E 0 613 4 Oscillator ia acum inata PCC 630 4 Tycho nem a sp LE G E 072 17 cf P ho rm id es m is s p LE G E 1 14 77 N os to c sp N IE S 4 10 3P lanktoth rix prolifica N IVA C YA 406 un id en tif ie d co lo ni al S yn ec h oc oc ca le s L EG E 0 6 19 2 No du la ri a c f h arv eya na HB U2 6 Croco sphae ra watso nii W H 8502 Le p to ly ng b ya s ax ic ol a L EG E 0 6 13 1 Chro ococcop sis sp LEGE 0716 8 unid entified Oscilla toriales LEG E 11 385 S yn ec ho co cc al es c ya n ob ac te riu m L EG E 0 60 21 Myxo sarcina sp GI1 co ntig 13 O sc ill at or ia le s cy an o ba ct er iu m L EG E 1 0 37 0 Nosto c lin c kia z1 5 P se ud a na ba e na s p LE G E 0 71 90 Kampt onem a for mosum PCC 6 407 An ab a e na m inu tis sim a UT EX B 16 13 S ynecho cystis sp LE G E 07211 Cyano bium sp LEGE 0 6097 Tycho nem a sp LE G E 071 97 Pleuro capsales cya noba cterium LEGE 06147 Syn ech o co cca les cya n ob acte rium LE G E 0 9 398 M icrocystis aeruginosa LE G E 08330 Le p to ly ng b ya s p H er on Is la n d J 11 Fi la m en to us c ya no ba ct er iu m L E G E 0 72 12 Fisch erella t herm alis CCMEE 5 268 Gloeo mar garita lit hopho ra Alchichica D10 Le p to ly ng b ya s p P C C 7 37 5 Le p t oly ng b y a s p L EG E 07 1 5 4 Cya no b ium g ra cile LE GE 000 54 Geitler inema sp LEGE 1139 3 unid entified Oscilla toriales LEG E 0 0049 Pseuda nabae na sp BC14 03 S cy to ne m a sp L EG E 0 7 18 9 P lanktoth rix aga rdhii N IV A C Y A 12 6 8 Le p t oly ng b y a s p L EG E 07 3 0 9 Pseuda nabae na sp PCC 73 67 N os to c sp L E G E 0 61 5 8 No do silin e a sp L E GE 06 120 Filam ento us cyano bacter ium LEGE 124 32 Nostoc lin ckia z7 Fr e my e lla dip los iph o n NI ES 3 2 75 Cyano bacter ium isolat e EtSB Nosto c sp PA 18 2 419 Oscillator ia sp PCC 6506 D es m o no st oc m us co ru m L EG E 1 2 44 6 Micr ocoleus sp LEGE 07092 Cyan o bium sp L EGE 1 037 4 M icrocystis sp T 1 4 Tycho nem a sp LE G E 071 98 M icrocystis sp 08 24 No sto c s p R F31 Ym G Tycho nema sp LEGE 072 16 C yl in dr o sp er m op si s ra ci bo rs ki i S 06 Nos toc sp 2 10 A No du la ri a s p N I ES 35 8 5 Cyan o biu m g r acile PCC 6307 Mi cr oc ys tis ae r u gin os a LE GE 9 13 38 1 7 4 2 Colored ranges Nostocales Oscillatoriales Chroococcales Synechococcales Pleurocapsales Chroococcidiopsidales Spirulinales Gloeomargaritales Gloeobacterales CylC homologs CylC homologs identified by screen LEGEcc strains .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 genome sequences for antiSMASH [44] analysis. 55 CylC-encoding BGCs were detected, which were classified 336 as resorcinol, NRPS, PKS, or hybrid NRPS-PKS. Given the number of CylC homolog-encoding genes detected 337 in these genomes (105), we considered that several BGCs might have not been identified with antiSMASH. 338 Therefore, we performed manual annotation of the genomic contexts of the CylC homologs and were able to 339 identify 20 additional BGCs. Upon analysis of the entire set of CylC-encoding BGCs, we classified the BGCs 340 in seven major categories, based on their overall architecture, which we designated as follows (listed in 341 decreasing abundance): Rieske-containing (n = 36), type I PKS 342 (chlorosphaerolactylate/columbamide/microginin/puwainaphycin-like, n = 29), type III PKS (n = 13), 343 dialkylresorcinol (n = 8), PriA-containing (n = 5), nitronate monooxygenase-containing (n = 3) and cytochrome 344 P450/sulfotransferase-containing (n = 1) (Fig. 4a, Figs. S4-S10). Three BGCs were excluded from our 345 classification since they were only partially sequenced (Fig. S11). Examples of each of the cluster architectures 346 are presented in Fig. 4a and schematic representations of each of the 98 classified BGCs are presented in 347 Supplementary Figures S4-S10. It should be stressed that within several of these seven major categories, there 348 is still considerable BGC architecture diversity, notably within the dialkylresorcinol, type I and type III PKS 349 BGCs. Rieske-containing BGCs are not associated with any known NP and encode between two and four 350 proteins with Rieske domains. Most contain a sterol desaturase family protein, feature a single CylC homolog 351 and are chiefly found among Nostocales and Oscillatoriales (Fig. S4). PriA-containing BGCs encode, apart from 352 the Primosomal protein N' (PriA), a set of additional diguanylate cyclase/phosphodiesterase, aromatic ring-353 hydroxylating dioxygenase subunit alpha and a ferritin-like protein and were only detected in Synechocystis spp. 354 (Fig. S5). These are similar to the Rieske-containing BGCs; however, in strains harboring PriA-containing 355 BGCs, the additional functionalities that are found in the Rieske-containing BGCs can be found dispersed 356 throughout the genome (Table S4). In our dataset, a single sulfotransferase/P450 containing BGC was detected 357 in Stanieria sp. and was unrelated to the above-mentioned architectures (Fig. S6). Type I PKS BGCs encode 358 clusters similar to those of the chlorosphaerolactylates, columbamides, microginins and puwainaphycins and 359 typically feature a fatty acyl-AMP ligase (FAAL) and an acyl carrier protein upstream of one or two CylC 360 homologs and a type I PKS downstream of the CylC homolog(s). These were found in Nostocales and 361 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 Oscillatoriales strains (Fig. S7). Taken together with the known NP structures associated with these BGCs [29, 362 56, 57], we can expect that the encoded metabolites feature halogenated fatty acids in terminal or mid-chain 363 positions. BGCs of the dialkylresorcinol type, which contain DarA and DarB homologs (Bode 2013, Leão 2015), 364 including several bartoloside-like clusters (found only in LEGEcc strains), were detected in Nostocales, 365 Pleurocapsales and Chroococcales (Fig. S8). Type III PKS BGCs encoding CylC homologs, which include a 366 variety of cyclophane BGCs, were detected in the Nostocales, Oscillatoriales and Pleurocapsales (Fig. S9). 367 Finally, nitronate monooxygenase-containing BGCs, which are not associated with any known NP, were only 368 found in Nostocales strains from the LEGEcc and featured also genes encoding PKSI, ferredoxin, ACP or 369 glycosyl transferase (Fig. S10). 370 A less BGC-centric perspective of the genomic context of CylC homologs could be obtained through the 371 Genome Neighborhood Tool of the EFI (EFI-GNT, [58]). Using the previously generated SSN as input, we 372 analyzed the resulting Genomic Neighborhood Diagrams (Fig. 4b), which indicated that the three SSN clusters 373 had entirely different genomic contexts (herein defined as 10 upstream and 10 downstream genes from the cylC 374 homolog). The SSN cluster that encompasses CylC and its closest homologs indicates that these enzymes 375 associate most often with PP-binding (ACP/PCPs) and AMP-binding (such as FAALs) proteins. Regarding the 376 SSN cluster that includes both cyanobacterial and non-cyanobacterial CylC homologs, their genomic contexts 377 most prominently feature Rieske/[2Fe-2S] cluster proteins as well as fatty acid hydroxylase family enzymes. 378 The cyanobacterial homologs are exclusively encoded in the Rieske and PriA-containing BGCs. Homologs from 379 this particular SSN cluster may not require a phosphopantetheine tethered substratei+ as no substrate activation 380 or carrier proteins/domains were found in their genomic neighborhoods, or may act on central fatty acid 381 metabolism intermediates. The BrtJ SSN cluster, composed only of the two reported BrtJ enzymes, shows 382 entirely different surrounding genes, obviously corresponding to the brt genes. Also noteworthy is the 383 considerable number of proteins with unknown function found in the vicinity of dimetal-carboxylate 384 halogenases, suggesting that uncharted biochemistry is associated with these enzymes. 385 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 Since SSN analysis generated only three clusters of CylC homologs, we next investigated the genetic relatedness 386 among these enzymes and how it correlates to BGC architecture. We performed a phylogenetic analysis of the 387 CylC homologs from the 98 classified and 3 unclassified BGCs (Fig. 4c). Our analysis indicated that PriA-388 containing and Rieske-containing BGCs formed a well-supported clade. Its sister clade contained homologs 389 from the remaining BGCs. Within this larger clade, homologs associated with the type I PKS, dialkylresorcinol 390 or type III PKS BGCs were found to be polyphyletic. In some cases, the same BGC contained distantly related 391 CylC homologs (e.g. Hyella patelloides LEGE 07179, Anabaena cylindrica PCC 7122) (Figure 4c). This 392 analysis also revealed that several strains (Fig. 5c) encode two or three phylogenetically distant CylC homologs 393 in different BGCs. Overall, our data shows that CylC homologs have evolved to interact with different partner 394 enzymes to generate chemical diversity, but that their phylogeny is, in some cases, not entirely consistent with 395 BGC architecture. These observations suggest that functionally convergent associations between CylC 396 homologs and other proteins have emerged multiple times during evolution. Examples include the CylC/CylK 397 and BrtJ/BrtB associations, which use cryptic halogenation to achieve C-C and C-O bond formation, respectively 398 [27, 59]. However, the role of the CylC homolog-mediated halogenation of fatty acyl moieties observed for 399 other cyanobacterial metabolites is not currently understood. Interestingly, while a number of CylC homologs, 400 including those that are part of characterized BGCs, likely act on ACP-tethered fatty acyl substrates [27, 59], 401 those from the PriA- Rieske- and cytochrome P450/sulfotransferase categories do not have a neighboring carrier 402 protein and therefore might not require a tethered substrate. This would be an important property for a CylC-403 like biocatalyst [15]. 404 405 406 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 407 Figure 4. Diversity and genomic context of CylC-like enzymes BGCs. a) Examples of the different BGCs 408 architectures found among the clusters encoding CylC homologs. b) Genome Neighborhood Diagram (GND) 409 depicting the Pfam domains associated with each cluster from the initial SSN of CylC homologs. The size of 410 each node is proportional to the prevalence of the Pfam domain within the genomic context of the CylC 411 Colored ranges: nitronate monooxygenase-containing PriA-containing Rieske-containing PKSIII dialkylresorcinol chlorsphaerolactylate/columbamides/ microginin/puwainaphycin-like sulfotransferase/P450 containing others c) a) b) LE G E 12 45 0 C lu st er 1 P C C 7 32 7 C AV N 2 N IE S 4 07 1 C lu st er 2 51 A Y C A VI N N IES 267 N IE S 4 10 5 C lu st er 2 78 21 C lus ter 2 PCC 9431 Cluster2 1 UT EX B 20 14NIV A C YA 40 6 C lus ter 2 LEGE06071 PCC 6714 LEGE07179 2 HKI 22 AurF LE G E 0 02 49 1 N IV A C Y A 4 07 NIE S 9 8 PC C9 43 2 2retsul C 7147 C C P FAC HB- 524 LEGE06083 Cluster1 PCC 7116 LEG E 00249 2 N IE S 2 09 8 PCC7417 plasmidcluster LEGE0 0031 C luster2 N IE S 4103 C luster2 PCC712 2 2 C C A P 1 45 3 38 2 LEGE00031 Cluster1 LEGE06147 Cluster1 C E N A 54 3 PCC 10605 LE G E 07179 1 LEGE06147 Cluster2 U IC 10 11 0 1 1retsul C 891 5 G N P LEGE 1146 4 PA L 8 -15 -08 -1 LEGE000 41 Cluste r2 H T 58-2 P C C 7 31 02 C lu st er 2 LE G E 10410 C luster2 N IV A C Y A 406 C luster1 PCC9 701 PCC 9431 Clus ter2 2 51 83 LEG E 06 099 PCC 9431 C luster1 LEGE11479 IPPAS B 1465 LEGE 0615 5 UT EX B 1 61 3 NIES 4106 P C C 7822 N IE S 4105 C luster1PCC 7407 LE G E 12447 C luster3 LEGE07170 NIES 3757 N IE S 21 00 PNG5 198 Cluster1 2 LE G E 10410 C luster1 N IE S 4071 C luster1 NI ES 5 0 P C C 9631 LEGE00250 2 PC C 7 52 4 LEGE00041 Cluster1 LE GE 11 39 7 LEGE12447 Cluster2 N 6 NIES 4103 Cluster1 HBU26 C C N U N 1 NI ES 2 2LEGE11480 JH B LEG E06 083 Clu ster 2 PC C 74 17 C lu st er 1 PC C 71 01 LE G E 12 44 7 C lu st er 1 P C C 9333336 3 NIE S 3 27 5 H K 0 2 NIES4074 P C C 73102 C luster1 NI ES 21 07 LEG E00250 1 P C C 7113 PN G5 198 Clu ster 2 LE G E 06147 C luster3 LEGE12450 Cluster2 PCC 6803 LEGE12446 Cluster2 LE G E 12 44 6 C lu st er 1 LEGE 11477 CCAP 1453 38 1 NIE S 8 7 PC C7 00 5 7821 C luster1 P C C 6304 LE GE 91 341 PCC7122 1 noneTubC_NSBBPFtsX PP-binding AMP-binding none DUF962 FA_ hydroxylase GH3RieskeHexapepFer2 DUF559 FtsX Glycos_transf_1 UDPGT ABC_tran none DUF5122 HlyD_D23 Biotin_ lipoyl_ 2-HlyD_ D23 Beta_ helix ACP_syn_III_C cluster 3 (BrtJ) (n = 2) cluster 1 (n = 73) cluster 2 (n = 67) ABC_tran ABC transporter ACP_syn_III_C 3-Oxoacyl-[acyl-carrier-protein (ACP)] synthase III C terminal AMP-binding AMP-binding enzyme Beta_helix Right handed beta helix region Biotin_lipoyl_2-HlyD_D23 Biotin-lipoyl like-Barrel-sandwich domain of CusB or HlyD membrane-fusion DUF5122 Domain of unknown function (DUF5122) beta-propeller DUF559 Protein of unknown function (DUF559) DUF962 Protein of unknown function (DUF962) FA_hydroxylase Fatty acid hydroxylase superfamily Fer2 2Fe-2S iron-sulfur cluster binding domain FtsX FtsX-like permease family GH3 GH3 auxin-responsive promoter Glycos_transf_1 Glycosyl transferases group 1 Hexapep Bacterial transferase hexapeptide (six repeats) HlyD_D23 Barrel-sandwich domain of CusB or HlyD membrane-fusion PP-binding Phosphopantetheine attachment site Rieske Rieske [2Fe-2S] domain SBBP Beta-propeller repeat TubC_N TubC N-terminal docking domain UDPGT UDP-glucoronosyl and UDP-glucosyl transferase Pfam Description PriA-containing (Synechocystis sp. PCC 6803) unknown product Rieske-containing (Calothrix brevissima NIES-22) unknown product type III PKS (Cylindrospermum licheniforme UTEX B 2014) cylindrocyclophanes dialkylresorcinol (Synechocystis salina LEGE 06099) bartolosides type I PKS (chlorosphaerolactylates/columbamides/microginin/ puwainaphycin-like) (Moorea bouillonii PNG05-198) columbamides nitronate monooxygenase-containing (Nostoc sp. LEGE 12447) unknown product sulfotransferase/P450-containing (Stranieria sp. NIES-3757) unknown product PriA other biosynthetic hypothetical/unknown transport/regulatory Rieske other type I PKS dimetal-carboxylate halogenase fatty acyl-AMP ligase CylK homolog DAR formation type III PKS NRPS nitronate monooxygenase acyl carrier protein sulfotransferase cytochrome P450 3 kb proposed functions: 100 100 98 93 79 100 84 100 98 97 100 87 100 97 88 93 82 100 96 88 76 10 0 99 98 10 0 99 99 98 99 10 0 85 78 93 84 95 99 100 64 98 100 96 98 100 100 100 69 98 100 100 88 94 10 0 93 100 86 99 100 88 71 10 0 74 99 81 82 17 10 0 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 homologs from each SSN cluster. c) RAxML cladogram (1000 replicates, shown are bootstrap values > 70%) 412 of CylC homologs. The different colors represent a categorization based on common genes found within the 413 associated biosynthetic gene clusters (see legend). Circles of the same color depict CylC homologs encoded by 414 the same BGC. AurF (Streptomyces thioluteus HKI-22) was used as an outgroup. 415 416 CylC enzymes and other cyanobacterial halogenases 417 We sought to understand how CylC-type halogenases compare to other halogenating enzyme classes found in 418 cyanobacteria in terms of prevalence and association with BGCs. To this end, we carried out a CORASON [47] 419 analysis of publicly available cyanobacterial genomes (including non-reference genomes) and the herein 420 acquired genome data from LEGEcc strains (a total of 2,115 cyanobacterial genomes). We used different 421 cyanobacterial halogenases as input, namely CylC, McnD, PrnA, Bmp5, the 2OG-Fe(II) oxygenase domains 422 from CurA and BarB1. CORASON attempts to retrieve genome context by exploring gene cluster diversity 423 linked to enzyme phylogenies [47]. The CORASON analysis retrieved 117 (5.6%) dimetal-carboxylate 424 halogenases, 61 (2.9%) nonheme iron-dependent halogenases and 226 (10.7%) flavin dependent halogenases 425 from the cyanobacterial genomes (Fig. 5a). Using the protein homologs detected in BGCs by CORASON, a 426 sequence alignment was performed for dimetal-carboxylate, nonheme iron/2OG-dependent and flavin-427 dependent halogenases. For nonheme iron/2OG-dependent halogenases, we excised the halogenase domain from 428 multi-domain enzyme sequences. After removing repeated sequences and trimming the alignments to their core 429 shared positions, maximum-likelihood phylogenetic trees were constructed for each halogenase class and BGCs 430 were annotated manually (Figs. S12-S14). Flavin-dependent halogenases were commonly associated with 431 cyanopeptolin, 2,4-dibromophenol and pyrrolnitrin BGCs and with orphan BGCs of distinct architectures (Fig. 432 S12). Regarding nonheme iron/2OG-dependent halogenases, we identified barbamide, curacin, hectochlorin and 433 terpene/indole [60] BGCs and several distinct orphan BGCs (Fig. S13). For dimetal-carboxylate halogenases, 434 columbamide, microginin, chlorosphaerolactylate, bartoloside and cyclophane BGCs were identified (Fig. S14). 435 However, while some of the CylC homolog-encoding orphan BGCs previously identified by antiSMASH and 436 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 manual searches were detected by CORASON, the Rieske- and the PriA-containing BGCs were not. Hence, 437 several CylC homologs were not accounted for in this analysis. For the same reasons, the other two halogenase 438 types could also be missing some of its members in the CORASON-derived datasets. To circumvent this 439 limitation and obtain a more comprehensive picture of the abundance of the three types of halogenase in 440 cyanobacterial genomes, we used BLASTp searches against available cyanobacterial genomes in the NCBI 441 database (including non-reference genomes). Several representatives of each halogenase class were used as 442 query in each search (CylC, BrtJ, “Mic” – the halogenase in the putative microginin gene cluster – ColD, ColE, 443 NocO and NocN for dimetal-carboxylate halogenases; PrnA, Bmp5 and McnD for flavin dependent halogenases; 444 the halogenase domain from CurA and the halogenases BarB1, HctB, WelO5 and AmbO5 for nonheme iron-445 dependent halogenases). Non-redundant sequences obtained for these searches using a 1×10-20 e-value cutoff 446 (corresponding to >30% sequence identity) were considered to share the same function as the query. It is worth 447 mentioning that, for nonheme iron/2OG-dependent enzymes, a single amino acid difference can convert 448 hydroxylation activity into halogenation [61], so it is possible that – at least for this class – the sequence space 449 considered does not correspond exclusively to halogenation activity. Dimetal-carboxylate and flavin-dependent 450 halogenase homologs were found to be the most abundant in cyanobacteria, each with roughly 0.2 homologs per 451 genome, while nonheme iron/2OG-dependent halogenase homologs are less common (~0.05 per genome) (Fig. 452 5b). Overall, our analyses indicate that homologs of each of the three halogenase classes are associated with a 453 large number of orphan BGCs and represent opportunities for NP discovery. Particularly noteworthy, CylC-like 454 enzymes are clearly a major group of halogenases in cyanobacteria, despite having been the latest to be 455 discovered [27]. 456 457 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 458 Figure 5. Prevalence of cyanobacterial halogenases. Frequency of halogenases in Cyanobacteria from 459 CORASON analysis (A) and NCBI BLASTp analysis (B). (A) Dimetal-carboxylate halogenases: CylC - NCBI 460 reference genomes, n = 2054 and LEGEcc genomes, n = 41 CylC-containing BGCs and 56 genomes; Flavin-461 dependent halogenases: PrnA - NCBI reference genomes, n = 2051 and LEGEcc genomes, n = 56 genomes; 462 Bmp5- NCBI reference genomes, n = 2050 and LEGEcc genomes, n = 56 genomes; McnD: NCBI reference 463 genomes, n = 2052 and LEGEcc genomes, n = 54 genomes); Nonheme iron/2OG-dependent halogenases: 464 halogenase domain from CurA - NCBI reference genomes, n = 2052 and LEGEcc genomes, n = 56 genomes. 465 (B) Average of the total number of homologs per dimetal-carboxylate halogenases (CylC, BrtJ, “Mic”, ColD, 466 ColE, NocO, NocN), flavin-dependent halogenases (Tryptophan 7-halogenase PrnA, Bmp5 and McnD) and 467 % o f h al og en as es (C O R A S O N ) N um be r of h om ol og s (B LA S T) a) b) Di me tal No n− he me iro n Fla vin -de pe nd en t 0 50 100 150 200 250 300 350 0 2 4 6 8 10 12 14 Di me tal No n− he me iro n Fla vin -de pe nd en t .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 nonheme iron/2OG-dependent halogenases (Barb1, HctB, WelO5, AmbO5 and the halogenase domain from 468 CurA). 469 470 Conclusion 471 The discovery of a new biosynthetic enzyme class brings with it tremendous possibilities for biochemistry and 472 catalysis research, both fundamental and applied. Their functional characterization can also be used as a handle 473 to identify and deorphanize BGCs that encode their homologs. CylC typifies an unprecedented halogenase class, 474 which is almost exclusively found in cyanobacteria. By searching CylC homologs in both public databases and 475 our in-house culture collection, we report here more than 100 new cyanobacterial CylC homologs. We found 476 that dimetal-carboxylate halogenases are widely distributed throughout the phylum. The genomic 477 neighborhoods of these halogenases are diverse and we identify a number of different BGC architectures 478 associated with either one or two CylC homologs that can serve as starting points for the discovery of new NP 479 scaffolds. In addition, the herein reported diversity and biosynthetic contexts of these enzymes will serve as a 480 roadmap to further explore their biocatalysis-relevant activities. Finally, bartoloside-like BGCs and a CylC-481 associated BGC architecture (nitronate monooxygenase-containing) were found only in the LEGEcc, reinforcing 482 the importance of geographically focused strain isolation and maintenance efforts for the Cyanobacteria phylum. 483 484 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 485 Conflicts of Interest 486 The authors declare that there are no conflicts of interest. 487 488 Funding information 489 This work was funded by Fundação para a Ciência e a Tecnologia (FCT) through grant PTDC/BIA-490 BQM/29710/2017 to PNL and through strategic funding UID/Multi/04423/2013 and by the National Science 491 Foundation (NSF) through grant CAREER-1454007 to EPB. AR and RCB are supported by doctoral grants 492 from FCT (SFRH/BD/140567/2018 and SFRH/BD/136367/2018, respectively). This material is based upon 493 work supported by an NSF Postdoctoral Research Fellowship in Biology (Grant No 1907240 to NRG). Any 494 opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and 495 do not necessarily reflect the views of the NSF. 496 497 Acknowledgments 498 We thank Hitomi Nakamura, Samantha Cassell, Diana Sousa and João Reis for technical assistance during this 499 study, and the Blue Biotechnology and Ecotoxicology Culture Collection (LEGEcc) for the genomic DNA used 500 for the PCR screening. 501 502 503 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 References 504 1. Pham JV, Yilma MA, Feliz A, Majid MT, Maffetone N et al. A Review of the Microbial Production 505 of Bioactive Natural Products and Biologics. Front Microbiol 2019;10(1404). 506 2. Noda-Garcia L, Tawfik DS. Enzyme evolution in natural products biosynthesis: target- or diversity-507 oriented? Curr Opin Chem Biol 2020;59:147-154. 508 3. Giani AM, Gallo GR, Gianfranceschi L, Formenti G. Long walk to genomics: History and current 509 approaches to genome sequencing and assembly. Comput Struct Biotechnol J 2020;18:9-19. 510 4. Zhang MM, Qiao Y, Ang EL, Zhao H. Using natural products for drug discovery: the impact of the 511 genomics era. Expert Opin Drug Discov 2017;12(5):475-487. 512 5. Gkotsi DS, Dhaliwal J, McLachlan MMW, Mulholand KR, Goss RJM. Halogenases: powerful tools 513 for biocatalysis (mechanisms applications and scope). Curr Opin Chem Biol 2018;43:119-126. 514 6. Agarwal V, Miles ZD, Winter JM, Eustáquio AS, El Gamal AA et al. Enzymatic Halogenation and 515 Dehalogenation Reactions: Pervasive and Mechanistically Diverse. Chem Rev 2017;117(8):5619-5674. 516 7. Weichold V, Milbredt D, van Pée K-H. Specific Enzymatic Halogenation—From the Discovery of 517 Halogenated Enzymes to Their Applications In Vitro and In Vivo. Angew Chem Int Ed 2016;55(22):6374-6389. 518 8. Schnepel C, Sewald N. Enzymatic Halogenation: A Timely Strategy for Regioselective C−H 519 Activation. Chem Eur J 2017;23(50):12064-12086. 520 9. Petrone DA, Ye J, Lautens M. Modern Transition-Metal-Catalyzed Carbon–Halogen Bond Formation. 521 Chem Rev 2016;116(14):8003-8104. 522 10. Jeschke P. The unique role of halogen substituents in the design of modern agrochemicals. Pest Manag 523 Sci 2010;66(1):10-27. 524 11. Xu Z, Yang Z, Liu Y, Lu Y, Chen K et al. Halogen Bond: Its Role beyond Drug–Target Binding 525 Affinity for Drug Discovery and Development. J Chem Inf Model 2014;54(1):69-78. 526 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 12. Hillwig ML, Zhu Q, Ittiamornkul K, Liu X. Discovery of a Promiscuous Non-Heme Iron Halogenase 527 in Ambiguine Alkaloid Biogenesis: Implication for an Evolvable Enzyme Family for Late-Stage Halogenation 528 of Aliphatic Carbons in Small Molecules. Angew Chem Int Ed 2016;55(19):5780-5784. 529 13. Liu X. In Vitro Analysis of Cyanobacterial Nonheme Iron-Dependent Aliphatic Halogenases WelO5 530 and AmbO5. Methods Enzymol 2018;604:389-404. 531 14. Pratter SM, Ivkovic J, Birner-Gruenberger R, Breinbauer R, Zangger K et al. More than just a 532 halogenase: modification of fatty acyl moieties by a trifunctional metal enzyme. Chembiochem 2014;15(4):567-533 574. 534 15. Hillwig ML, Liu X. A new family of iron-dependent halogenases acts on freestanding substrates. Nat 535 Chem Biol 2014;10(11):921-923. 536 16. Chang Z, Flatt P, Gerwick WH, Nguyen VA, Willis CL et al. The barbamide biosynthetic gene 537 cluster: a novel marine cyanobacterial system of mixed polyketide synthase (PKS)-non-ribosomal peptide 538 synthetase (NRPS) origin involving an unusual trichloroleucyl starter unit. Gene 2002;296(1-2):235-247. 539 17. Flatt PM, O'Connell SJ, McPhail KL, Zeller G, Willis CL et al. Characterization of the Initial 540 Enzymatic Steps of Barbamide Biosynthesis. J Nat Prod 2006;69(6):938-944. 541 18. Galonić DP, Vaillancourt FH, Walsh CT. Halogenation of unactivated carbon centers in natural 542 product biosynthesis: trichlorination of leucine during barbamide biosynthesis. J Am Chem Soc 543 2006;128(12):3900-3901. 544 19. Chang Z, Sitachitta N, Rossi JV, Roberts MA, Flatt PM et al. Biosynthetic pathway and gene cluster 545 analysis of curacin A, an antitubulin natural product from the tropical marine cyanobacterium Lyngbya 546 majuscula. J Nat Prod 2004;67(8):1356-1367. 547 20. Edwards DJ, Marquez BL, Nogle LM, McPhail K, Goeger DE et al. Structure and Biosynthesis of 548 the Jamaicamides, New Mixed Polyketide-Peptide Neurotoxins from the Marine Cyanobacterium Lyngbya 549 majuscula. Chem Biol 2004;11(6):817-833. 550 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 21. Ramaswamy AV, Sorrels CM, Gerwick WH. Cloning and biochemical characterization of the 551 hectochlorin biosynthetic gene cluster from the marine cyanobacterium Lyngbya majuscula. J Nat Prod 552 2007;70(12):1977-1986. 553 22. Kocher S, Resch S, Kessenbrock T, Schrapp L, Ehrmann M et al. From dolastatin 13 to 554 cyanopeptolins, micropeptins, and lyngbyastatins: the chemical biology of Ahp-cyclodepsipeptides. Nat Prod 555 Rep 2020;37(2):163-174. 556 23. Rouhiainen L, Paulin L, Suomalainen S, Hyytiainen H, Buikema W et al. Genes encoding 557 synthetases of cyclic depsipeptides, anabaenopeptilides, in Anabaena strain 90. Mol Microbiol 2000;37(1):156-558 167. 559 24. Cadel-Six S, Dauga C, Castets AM, Rippka R, Bouchier C et al. Halogenase genes in nonribosomal 560 peptide synthetase gene clusters of Microcystis (cyanobacteria): sporadic distribution and evolution. Mol Biol 561 Evol 2008;25(9):2031-2041. 562 25. Nishizawa T, Ueda A, Nakano T, Nishizawa A, Miura T et al. Characterization of the locus of genes 563 encoding enzymes producing heptadepsipeptide micropeptin in the unicellular cyanobacterium Microcystis. J 564 Biochem 2011;149(4):475-485. 565 26. Nakamura H, Hamer HA, Sirasani G, Balskus EP. Cylindrocyclophane Biosynthesis Involves 566 Functionalization of an Unactivated Carbon Center. J Am Chem Soc 2012;134(45):18518-18521. 567 27. Nakamura H, Schultz EE, Balskus EP. A new strategy for aromatic ring alkylation in 568 cylindrocyclophane biosynthesis. Nat Chem Biol 2017;13(8):916-921. 569 28. Vaillancourt FH, Yeh E, Vosburg DA, O'Connor SE, Walsh CT. Cryptic chlorination by a non-570 haem iron enzyme during cyclopropyl amino acid biosynthesis. Nature 2005;436(7054):1191-1194. 571 29. Kleigrewe K, Almaliti J, Tian IY, Kinnel RB, Korobeynikov A et al. Combining Mass Spectrometric 572 Metabolic Profiling with Genomic Analysis: A Powerful Approach for Discovering Natural Products from 573 Cyanobacteria. J Nat Prod 2015;78(7):1671-1682. 574 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 30. Leão PN, Nakamura H, Costa M, Pereira AR, Martins R et al. Biosynthesis-assisted structural 575 elucidation of the bartolosides, chlorinated aromatic glycolipids from cyanobacteria. Angew Chem Int Ed 576 2015;54(38):11063-11067. 577 31. Mareš J, Hájek J, Urajová P, Kust A, Jokela J et al. Alternative Biosynthetic Starter Units Enhance 578 the Structural Diversity of Cyanobacterial Lipopeptides. Appl Environ Microbiol 2019;85(4):e02675-02618. 579 32. Abt K, Castelo-Branco R, Leao PNC. Biosynthesis of Chlorinated Lactylates in Sphaerospermopsis 580 sp. LEGE 00249. Chemrxiv 2020. Preprint. https://doi.org/10.26434/chemrxiv.12885476.v2 581 33. Latham J, Brandenburger E, Shepherd SA, Menon BRK, Micklefield J. Development of 582 Halogenase Enzymes for Use in Synthesis. Chem Rev 2018;118(1):232-269. 583 34. Zallot R, Oberg N, Gerlt JA. The EFI Web Resource for Genomic Enzymology Tools: Leveraging 584 Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. 585 Biochemistry 2019;58(41):4169-4182. 586 35. Kotai J. Instructions for preparation of modified nutrient solution Z8 for algae. Norwegian Institute for 587 Water Res 1972;11:5. 588 36. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic 589 Acids Res 2004;32(5):1792-1797. 590 37. Rippka R, Waterbury JB, Stanier RY. Isolation and Purification of Cyanobacteria: Some General 591 Principles. In: Starr MP, Stolp H, Trüper HG, Balows A, Schlegel HG (editors). The Prokaryotes: A Handbook 592 on Habitats, Isolation, and Identification of Bacteria. Berlin, Heidelberg: Springer Berlin Heidelberg; 1981. pp. 593 212-220. 594 38. Singh SP, Rastogi RP, Häder D-P, Sinha RP. An improved method for genomic DNA extraction from 595 cyanobacteria. World J Microbiol Biotechnol 2011;27(5):1225-1230. 596 39. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact 597 alignments. Genome Biol 2014;15(3):R46. 598 40. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. 599 Bioinformatics 2009;25(14):1754-1760. 600 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 41. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly 601 algorithm and its applications to single-cell sequencing. J Comput Biol 2012;19(5):455-477. 602 42. Wu YW, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes 603 from multiple metagenomic datasets. Bioinformatics 2016;32(4):605-607. 604 43. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP et al. NCBI prokaryotic genome 605 annotation pipeline. Nucleic Acids Res 2016;44(14):6614-6624. 606 44. Blin K, Shaw S, Steinke K, Villebro R, Ziemert N et al. antiSMASH 5.0: updates to the secondary 607 metabolite genome mining pipeline. Nucleic Acids Res 2019;47(W1):W81-W87. 608 45. Posada D. jModelTest: Phylogenetic Model Averaging. Mol Biol Evol 2008;25(7):1253-1256. 609 46. Miller MA, Pfeiffer W, Schwartz T, editors. Creating the CIPRES Science Gateway for inference of 610 large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE); 2010 14-14 Nov. 2010. 611 47. Navarro-Muñoz JC, Selem-Mojica N, Mullowney MW, Kautsar SA, Tryon JH et al. A 612 computational framework to explore large-scale biosynthetic diversity. Nat Chem Biol 2020;16(1):60-68. 613 48. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res 614 2016;45(D1):D158-D169. 615 49. Ramos V, Morais J, Castelo-Branco R, Pinheiro Â, Martins J et al. Cyanobacterial diversity held in 616 microbial biological resource centers as a biotechnological asset: the case study of the newly established LEGE 617 culture collection. J Appl Phycol 2018;30(3):1437-1451. 618 50. Dittmann E, Gugger M, Sivonen K, Fewer DP. Natural Product Biosynthetic Diversity and 619 Comparative Genomics of the Cyanobacteria. Trends Microbiol 2015;23(10):642-652. 620 51. D'Agostino PM, Woodhouse JN, Makower AK, Yeung AC, Ongley SE et al. Advances in genomics, 621 transcriptomics and proteomics of toxin-producing cyanobacteria. Environ Microbiol Rep 2016;8(1):3-13. 622 52. Calteau A, Fewer DP, Latifi A, Coursin T, Laurent T et al. Phylum-wide comparative genomics 623 unravel the diversity of secondary metabolism in Cyanobacteria. BMC Genomics 2014;15(1):977. 624 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 53. Baran R, Ivanova NN, Jose N, Garcia-Pichel F, Kyrpides NC et al. Functional genomics of novel 625 secondary metabolites from diverse cyanobacteria using untargeted metabolomics. Mar Drugs 626 2013;11(10):3617-3631. 627 54. Alvarenga DO, Fiore MF, Varani AM. A Metagenomic Approach to Cyanobacterial Genomics. Front 628 Microbiol 2017;8:809-809. 629 55. Beck C, Knoop H, Axmann IM, Steuer R. The diversity of cyanobacterial metabolism: genome 630 analysis of multiple phototrophic microorganisms. BMC Genomics 2012;13(1):56. 631 56. Okino T, Matsuda H, Murakami M, Yamaguchi K. Microginin, an angiotensin-converting enzyme 632 inhibitor from the blue-green alga Microcystis aeruginosa. Tetrahedron Lett 1993;34(3):501-504. 633 57. Voráčová K, Hájek J, Mareš J, Urajová P, Kuzma M et al. The cyanobacterial metabolite nocuolin 634 a is a natural oxadiazine that triggers apoptosis in human cancer cells. PLOS ONE 2017;12(3):e0172850. 635 58. Zallot R, Oberg NO, Gerlt JA. ‘Democratized’ genomic enzymology web tools for functional 636 assignment. Curr Opin Chem Biol 2018;47:77-85. 637 59. Reis JPA, Figueiredo SAC, Sousa ML, Leão PN. BrtB is an O-alkylating enzyme that generates fatty 638 acid-bartoloside esters. Nat Commun 2020;11(1):1458-1458. 639 60. Liu Y, Klet RC, Hupp JT, Farha O. Probing the correlations between the defects in metal-organic 640 frameworks and their catalytic activity by an epoxide ring-opening reaction. Chem Commun (Camb) 641 2016;52(50):7806-7809. 642 61. Mitchell AJ, Dunham NP, Bergman JA, Wang B, Zhu Q et al. Structure-Guided Reprogramming of 643 a Hydroxylase To Halogenate Its Small Molecule Substrate. Biochemistry 2017;56(3):441-444. 644 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.05.425448doi: bioRxiv preprint https://doi.org/10.1101/2021.01.05.425448 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_06_425392 ---- SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor 1 SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor Jiří Zahradník1, Shir Marciano1, Maya Shemesh1, Eyal Zoler1, Jeanne Chiaravalli2, Björn Meyer3 Orly Dym4, Nadav Elad5 and Gideon Schreiber1,6 1 Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel 2 Chemogenomic and Biological Screening Core Facility Institut Pasteur, 75724 Paris, France 3 Viral Populations and Pathogenesis Unit CNRS UMR 3569 Institut Pasteur, 75724 Paris, France 4 Department of Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 7610001, Israel 4 Department of Chemical Research Support, Weizmann Institute of Science, Rehovot 7610001, Israel 6 Corresponding author: gideon.schreiber@weizmann.ac.il Short Title: RBD in vitro evolution Abstract SARS-CoV-2 is constantly evolving, with more contagious mutations spreading rapidly. Using in vitro evolution to affinity maturate the receptor-binding domain (RBD) of the spike protein towards ACE2, resulted in the more contagious mutations, S477N, E484K, and N501Y to be among the first selected. This includes the British and South-African variants. Plotting the binding affinity to ACE2 of all RBD mutations against their incidence in the population shows a strong correlation between the two. Further in vitro evolution enhancing binding by 600-fold provides guidelines towards potentially new evolving mutations with even higher infectivity. For example, Q498R in combination with N501Y. This said, the high-affinity RBD is also an efficient drug, inhibiting SARS-CoV-2 infection. The 2.9Å Cryo-EM structure of the high- affinity complex, including all rapidly spreading mutations provides structural basis for future drug and vaccine development and for in silico evaluation of known antibodies. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 2 SARS-CoV-2, which causes COVID-19, resulted in an epidemic of global reach. It infects people through inhalation of viral particles, airborne, in droplets, or by touching infected surfaces. Structural and functional studies have shown that a single receptor-binding domain (RBD) of the SARS-CoV-2 homotrimer spike glycoprotein interacts with ACE2, which serves as its receptor (1, 2). Its binding and subsequent cleavage by the host protease TMPRSS2 results in the fusion between cell and viral membranes and cell entry (1). Blocking the ACE2 receptors by specific antibodies voids viral entry (1, 3, 4). In vitro binding measurements have shown that SARS-CoV- 2 S-protein binds ACE2 with ~10 nM affinity, which is about 10-fold tighter compared to the binding of the SARS-CoV S-protein (2, 3, 5). It has been suggested that the higher affinity of SARS-CoV-2 is, at least partially, responsible for its higher infectivity (6). Recently evolved SARS-CoV-2 mutations in the Spike protein´s RBD have further strengthened this hypothesis. The “British” mutation (N501Y; variant B.1.1.7) was suggested from deep sequencing mutation analysis to enhance binding to ACE2 (6). The “South African” variant (501.V2), which includes three altered residues in the ACE2 binding site (K417N, E484K, and N501Y) is spreading extremely rapidly, becoming the dominant lineage in the Eastern Cape and Western Cape Provinces (7). Another variant that seems to enhance SARS-CoV-2 infectivity is S477N, which became dominant in many regions (8). ACE2 and TMPRSS2 express in lung, trachea, and nasal tissue (9, 10). The inhaled virus likely binds to epithelial cells in the nasal cavity and starts replicating. The virus propagates and migrates down the respiratory tract along the conducting airways, and a more robust innate immune response is triggered, which in some cases leads to severe disease. Recently, a number of efficient vaccines, based on presenting the spike protein or by administrating an inactivated virus were approved for clinical use (11). Still, due to less than 100% protection, particularly for high-risk populations and the continuously mutating virus, the development of drugs should continue. Potential therapeutic targets blocking the viral entry in cells include molecules blocking the spike protein, the TMPRSS2 protease, or the ACE2 receptor (12). Most prominently, multiple high-affinity neutralizing antibodies have been developed (13). Alternatives to the antibodies, the soluble forms of the ACE2 protein (14) or engineered parts or mimics have also been shown to work (15, 16). TMPRSS2, inhibitors were already previously developed, and are repurposed for COVID-19 (1). The development of molecules blocking the ACE2 protein did not receive as much attention as the other targets. One potential civet with this approach is the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 3 importance of the ACE2 activity in humans, which could be hampered by an inhibitor. ACE2 functions as a carboxypeptidase, removing a single C- terminal amino acid from Ang II to generate Ang-(1-7), which is important in blood pressure regulation. In addition, ACE2 is fused to a collectrin-like domain, regulating amino acid transport and pancreatic insulin secretion (17, 18). Through these processes, ACE2 also appears to regulate inflammation, which downregulation relates to increased COVID-19 severity. Dalbavancin is one drug that has been shown to block the spike protein–ACE2 interaction, however with low affinity (~130 nM) (19). Notably, the RBD domain itself can be used as a competitive inhibitor of the ACE2 receptor binding site. However, for this to work, its affinity has to be significantly optimized, to reach pM affinity. We have recently developed an enhanced strategy for yeast display, based on C and N- terminal fusions of extremely bright fluorescence colors that can monitor expression at minute levels, allowing for selection to proceed down to pM bait concentrations (20). Here, we demonstrate, how this enhanced method allowed us to reach pM affinity between a mutant RBD and ACE2, based on multiple-steps of selection that combine enhanced binding with increase RBD protein thermostability. Fig. 1 shows step by step the selection process. We took advantage of using two different detection strategies, eUnaG2 and DnbALFA, and eliminate the DNA purification step, which can be tedious (20, 21) (Fig. 1, steps 2 and 4). Preceding library construction, we tested varied sizes of the RBD, for optimal surface expression (Table S1), and decided to continue using RBDcon2 for selection and RBDcon3 for protein expression (Supplementary Material Text). RBD domain affinity maturation recapitulates multiple steps in the virus evolution Fig. 1 Enhanced yeast display benefits over traditional method. The use of enhanced yeast display enables elimination of DNA purification procedures between libraries (step II.); exclusion of antibody-based expression labeling procedure (step VI.), and the bright reporters eUnaG2 (orange points, step 4.) or DnbALFA (green points, step 4.) allow for ultra-tight binding selection, with reduced background and increased sensitivity in a reduced time frame (20). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 4 Fig. 2 In vitro evolution of Spike protein RBD using yeast display and the emergence of mutations in SARS-CoV- 2 over time. (A) An overview of mutations identified during the yeast display affinity maturation process. The red and grey colored amino-acids are dominant (˃50 %) or minor (<50 %) at a given position. Red and orange background highlight the emerging mutations both in clinical samples and yeast display, with a high and low impact to binding affinity, respectively. The bottom of the table shows the naturally evolved mutations at the same positions. (B) The relation between inferred affinity changes and occurrence. Red for prevalent mutations, black for others and empty squares for occurrence <5 sequences (6,22). Blue dots are values from binding titration curves shown in (C), which were selected also by yeast display affinity maturation. (D) Affinity changes and occurrence in population (as in (B) for different mutations at positions 477, 484 and 501. (E) Binding titration curves for the best binding variant in each successive yeast library, bound to ACE2-CF640R at the given concentration. Binding of additional clones from each library is shown in Fig. S3. (F, G) Octet RED96 System binding sensorgrams for RBD-WT (F) and RBD-62 (G). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 5 Multiple consecutive libraries were constructed, S - stability-enhancing, B - ACE2 binding, and FA for the fast association. The whole RBD library (S1, nucleotides -152 – 621) was constructed by random mutagenesis, introducing 1-5 mutations per clone. The best expressing clones were selected after expression at 30 °C. Subsequently, library S2 was selected after expression at 37 °C. The most significant mutation, which dominated the second library was I358F, which nicely fits inside the hydrophobic pocket formed in the RBD domain (Fig. S1). This mutation led to nearly doubling the fluorescence signal intensity and was used for the construction of the subsequent affinity selection library (B3). B3 library was constructed by 3 components homologous recombination to preferentially incorporate the mutations in the binding interface area. The random mutagenesis was limited to nucleotides 260 – 621. The library was expressed at 37 °C to keep the pressure on protein stability, and selected by FACS sorter against decreasing concentration of ACE2 labeled with CF®640R succinimidyl ester (1000, 800, and 600 pM; 4 h of incubation). To isolate a low number of RBD variants with the strongest phenotype effect, library enrichment was done by selecting the top 3% of binding cells and in subsequent rounds, the top 0.1 – 1% yeast cells (Fig. S2). Plasmid DNA was isolated from growing selected yeasts of the sorted lib rary and used for E.coli cell transformation and the preparation of a new library (B4). This approach enriched the subsequent library with multiple selected mutations and enabled the screening of wider sequence- space and cooperative mutations, as multiple trajectories are sampled. 30 single colony isolates (SCI) of transformed bacteria were used for sequencing to monitor the enrichment process and subsequently for binding affinity screening (Fig. S3). The library B4 was selected with the same schema using 600, 400, and 200 pM ACE2 receptor. Analysis of the selected B3 library yielded two dominant mutations appearing at ˃ 70 % of clones: E484K and N501Y. In addition, multiple minor mutations: V483E, N481Y, I468T, S477N, N448S, and F490S were found (Fig. 2A). The analysis of library B4 showed the absolute domination of E484K and N501Y. Besides the dominant clones, the N460K, Q498R, and S477N mutations rose to frequencies ˃ 20 %, and new minor populated mutations were identified: G446R, I468V, T478S, F490Y, and S494P. To validate our results, we choose clones with different mutation profiles, expressed them in Expi293F™ cells, and subjected them for further analyses (Figs. 2A, C, E, S4, Table 1 and Table S2). We noted, that among the mutations selected and fixed in the yeast population during these initial steps of affinity maturation were three mutations that strongly emerged in clinical samples of SARS-CoV-2: S477N, E484K, and N501Y (6, 7, 8). It was already shown that an increase in RBD binding affinity increases pseudovirus entry (6). To validate the relation (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 6 between binding affinity and occurrence of specific mutations in the population, we combined our data with those obtained by deep-mutational scanning of the RBD domain (6), and the GISAID database (22). Fig. 2A shows the mutations selected by us, and the evolving, circulating SARS-CoV-2 variants at the same positions. In red are the most common mutations emerging in either (S477N, E484K, and N501Y). In addition, the yeast selection probed the most abundant naturally occurring variants in positions 490 and 493, which were lost during further rounds of yeast selection. S494P, which also occurs in nature but did not rapidly spread (Fig. 2B), was selected in round 4. Further analysis of this clone (Table 1, compare RBD-52 to RBD-521) shows it to increase the thermostability but decrease the association rate constant of the RBD to ACE2. Finally, some mutations were found in SARS-CoV-2 (albeit at low frequency, Fig. 2A and B) and not in the yeast selection (445, 446, 478, and 498). To evaluate why some mutations were prevalent in SARS-CoV-2 and in yeast display selection, while others were not, we plotted the occurrence of all mutations in the GISAID database (22) in respect to the apparent change in the RBD-ACE2 binding affinity (KDapp) as estimated by the frequency of given amino acids within a mutant library at the given concentration (so-called deep mutational scanning approach (6)). Figure 2B (red and black dots) shows that the more prevalent mutations have a higher binding affinity. To quantify these results, we measured the binding of re-cloned isogenic variants of the most prevalent mutations (Fig. 2C). The here calculated KD values are shown as blue dots in Fig. 2B. The highest binding affinity was measured for the South-African variant (E484K, N501Y), which is the tightest binding clone of library B3 (Fig. 2C and Table S2), followed by the “British” (N501Y) and the European emerging S477N mutations (Figs. 2B, C and Table 1). The KD of the South-African variant is 126 pM, the British 455 pM and for S477N a KD of 710 pM was measured (compared to 1.6 nM for the WT). The here measured affinity data show an even stronger relation between binding affinity and spread in the population. To further test the lack of randomness in the selection of these mutations, we compared the occurrence of mutations for these three residues to other amino-acids in the population with the apparent binding affinity. Fig. 2D shows that indeed, in all cases (except E484R) the binding affinity of the most abundant variant in the population has the highest binding affinity at the given position. In respect to E484R, the mutation of Glu to Arg requires two nucleotide changes in the same codon, making this mutation reachable only by multiple rounds of random mutagenesis, which will delay its occurrence and may explain its low frequency (however, will not stop its spread). Next, we monitored whether the spread of mutation in the population also relates to the protein-stability of (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 7 the RBD. Here, we used the level of yeast surface expression as a proxy to estimate protein stability (20). Fig. S5 shows, that mutations which occurrence is increasing in the population did not affect protein stability, corroborating that maintaining protein-stability is an important evolutionary constraint. The most abundant naturally occurring mutations in RBD have been selected by yeast display, already in the first affinity maturation library (B3) (7, 8, 23). Next, we aimed to explore whether much higher affinity binding can be achieved. Exploring the affinity limits for ACE2-RBD interaction A further selection of better binders can demonstrate the future path of SARS-CoV-2 evolution. In parallel, an ultra-tight binder can be used as an effective ACE2 blocker for inhibiting SARS- CoV-2 infection. We used the same approach as for B3 and B4 and created the subsequent library B5. The library B5 was enriched by using 200 pM ACE2 as bait, followed by 50 pM, and finally at 30 pM. Sorting with less than 100 pM bait was done after overnight incubation in 50 ml solution to prevent ligand depletion effect (as the number of ACE2 molecules becomes much lower than the number of RBD molecules). Round 5 resulted in the fixation of mutations N460K, E484K, Q498R, and N501Y in all sequenced clones. Mutations S477N and S494P were present with frequencies ˃ 20 %. Additional mutations identified were G446R, I468V, and F490Y. Representative clones with different mutational profiles were subjected to detailed analyses (Table 1). In the next selection step, we targeted for faster association-rates by using pre- equilibrium selection (24). The new library FA (fast association) was created by randomization of the whole RBD gene population from the enriched B5 library. The library was pre-selected with 30 pM ACE2 for 8 hrs (reaching equilibrium after ON incubation) followed by 1 hr and 30 min incubation before selection. This resulted in the accumulation of additional mutations: V445K, I468T, T470M and also the fixation of the previously observed mutation S477N in all sequences cloned. 4 minor mutations N354E, K417F, V367W, and S494P, with only a single sequence each, were identified. One should note that V445K and T470M require two nucleotide mutations to be reached, demonstrating the efficiency of using multiple rounds of library creation on top of previous libraries (and not single clones). Interestingly, these mutations were not located at the binding interface but rather in the peripheries, which is in line with previously described computational fast association design, where periphery mutations were central (25). From the FA library, we determined the isogenic binding for 5 different clones with clone RBD-62 being the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 8 Fig. 3 Cryo-EM structure of the ACE2-RBD-62 complex at 2.9 Å resolutions. A) The Cryo-EM electron density map with ACE2 (cyan), RBD-62 RBM (magenta), and RBD core (pink). B) Cartoon representation of the ACE2-RBD-62 model with eight mutations resolved in the electron density map (orange). C) The S477N, Q498R and N501Y mutations depicted in RBM (orange spheres) interacting with S19, Q42 and K353 of ACE2 respectively (cyan spheres) are situated at the two extremes of the RBD-ACE2 interface, suggested to stabilizing the complex. D) The interaction network formed between RBD-62 mutations and ACE2. RBD-WT residues are in white (heteroatom coloring schema). E) Electrostatic complementarity between RBD and ACE2 is strengthened in RBD-62 by positive charges at positions N460K, E484K, and Q498R. The black line one ACE2 indicates the RBD binding site. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 9 best. Yeast display titration showed an affinity of 2.5 ± 0.2 pM (Fig. 2E and Table 1). The other clones tested from the FA library had affinities between 5 to 10 pM (Fig. S3 and Table S2). ACE2 receptor and clones RBD-52, RBD-521, and RBD-62 were expressed and purified (Fig. S4). Measuring the binding affinity to ACE2 using the Octet RED96 System showed a systematically lower binding affinity in comparison to yeast titration (Table 1). For WT, yeast titration was reduced from 1.6 to 38 nM and for RBD-62 the affinity was reduced from 2.5 to 60 pM. However, the improvement in affinity is similar for both methods (~600-fold). While most of the improvement came from reduced koff (Fig. 2F and G) kon increased 8-fold, from 1.7x105 to 13 x105 M-1s-1 for RBD-62 (Table 1). In addition, RBD-62 is 4 0C more stable than WT, probably due to the introduction of the I358F stabilizing mutation (Fig. S1). To further increase the RBD- 62 affinity we prepared a site-directed mutational library on top of RBD-62, including the 15 mutations suggested from deep mutational scanning (6), which require more than one nucleotide Table 1 – Biophysical parameters of the mutant clones selected by yeast display. For more details see Table S2. Clone Library Plasmida Mutations Tmb [°C] Yeast displayc KD,app (pM) Octet REDd KD, (pM) kone M-1s-1 x105 RBD-WT - pJYDC1 AA 336 – 528 53.5 1600 ± 200 38000 ± 10000 1.7 ± 0.06 RBD-32 B3 pJYDC1 I358F, S477N, N501Y ND 936 ± 60 ND RBD-33 B3 pJYDC1 I358F, E484K, N501Y ND 126 ± 1.5 ND RBD-36 B3 pJYDC1 I358F, I468T, N481Y, N501Y 54.6 184.4 ± 1.9 ND RBD-48 B4 pJYDC3 I358F, S477N, Q498R, N501Y 65.3 ± 2.1 ND RBD-52 B5 pJYDC1 I358F, N460K, E484K, S494P, Q498R, N501Y 61.9 59 ± 6.2 3000 ± 1000 0.52 ± 0.07 RBD-521 B5 pJYDC1 I358F, N460K, E484K, Q498R, N501Y 58.2 11.96 ± 2.4 340 ± 40 5.7 ± 0.03 RBD-55 B5 pJYDC1 I358F, E484K, Q498R, N501Y 54.8 18.5 ± 0.6 ND RBD-62 FA pJYDC3 I358F, V445K, N460K, I468T, T470M, S477N, E484K, Q498R, N501Y 57.9 2.5 ± 0.2 60 ± 16 13 ± 1 RBD-71 - pJYDC3 I358F, V367W, R408D, K417V, V445K, N460K, I468T, T470M, S477N, E484K, Q498R, N501Y 63 8.5 ± 1.5 200 ± 100 16 ± 1 a pJYDC1 plasmid is using intrinsic eUnaG2 reporter; pJYDC3 plasmid contains DnbALFA reporter; see (20) b Melting temperature as measured by differential scanning fluorimeter Tycho NT.6 (NanoTemper Technologies GmbH) c KD values measured between yeast surface-exposed RBD variants and the monomeric extracellular portion of ACE2 receptor. d,e Measured by Octet RED96 system (ForteBio) by using AR2G biosensors. For details see Materials and Methods. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 10 change to be reached (Fig. S6). Surprisingly, these mutations did not significantly increase the affinity towards RBD-62 as they did for wild-type (6). Yet, a combination of three of them stabilized RBD-62 by 5 0C, creating RBD-71, but at the cost decreased binding affinity (Table 1, Fig. S6). This demonstrates the limitation of the use of single amino-acid changes from deep mutational scanning to obtain high-affinity binders. RBD-62-ACE2 structure We determined the cryo-EM structure of the N-terminal peptidase domain of the ACE2 (G17- Y613) receptor bound to the RBD-62 (T333-K528) (Fig. 3A), including nine mutations (I358F, V445K, N460K, I468T, T470M, S477N, E484K, Q498R, N501Y; Table 1, Figs. 2A and 3B). Structure comparison of the ACE2-RBD-62 complex and the WT complex (PDB ID: 6M0J) revealed their overall similarity with rmsd of 0.97 Å across 586 amino acids of the ACE2 and 0.66 Å among 143 amino acids of the RBD (Fig. S10A). Three segments; R357-S371 (β2, α2), G381- V395 (α3), and F515-H534 (β11) are disordered in RBD-62, and thus not visible in the electron density map (Fig. S9 and blue cartoon in Fig. S10B). These segments are situated opposite to ACE2 binding interface and therefore not stabilized and rigidified by ACE2 contacts. All mutations, except I358F, are present in the electron density map. Details of cryo-sample preparation, data acquisition, and structural determination are given in the Supplementary Materials Methods. The cryo-EM data collection and refinement statistics are summarized in Fig. S7, S8, S9, and Table S3. Mutations V445K, N460K, I468T, T470M, S477N, E484K, Q498R, and N501Y are part of the receptor-binding motif (RBM) that interacts directly with ACE2 (orange spheres Fig. 3B and C) (3). The RBM including residues S438-Q506 shows the most pronounced conformational differences in comparison to the RBD-WT (the black circle in Fig. S10A). Out of the nine mutations in the RBM four involve intramolecular interactions, stabilizing the RBD-62 structure, including hydrogen contacts between K460 and D420, T468 and R466, and M470 and Y351. The mutations S477N, Q498R, N501Y are forming new contacts with ACE2. The Arg at position 498 makes a salt bridge to Q42 and hydrogen contact to Y41 of ACE2 making together with mutation N501Y (Y has contact with K353) a strong network of new interactions supporting the impact of these two residues (Fig. 3D). Calculating the electrostatic potential of the RBD-62 in comparison to RBD-WT shows a much more positive surface of the former, which is complementary to the negatively charged RBD binding surface on ACE2 (Fig. 3E). In addition, the mutation N477 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 11 interacts with S19 of ACE2 (Fig. 3C). Interestingly the interface of ACE2-RBD involves the interaction of amino acid residues from the N-terminal segment Q24-Q42 , K353, and D355 of the ACE2 domain and residues from the RBM domain of the RBD. The S477N, Q498R, and N501Y mutations in RBD-62 are situated at the two extremes of the RBD-ACE2 interface therefore stabilizing the complex (Fig. 3C). RBD-62 inhibits SARS-CoV-2 infection without affecting ACE2 enzymatic activity The main driver of this study was to generate a tight inhibitor of ACE2 for medicinal purposes, which will be administered to the nose and lungs through inhalation. Therefore, we had to verify that the evolved RBD does not interfere with the ACE2 enzymatic activity, which is important in the Renin-Angiotensin-Aldosterone system (17, 18). We assayed the impact of RBD-WT and RBD-62 proteins on ACE2 activity. Both the in vitro assay and assays done on various cells expressing ACE2 did not show much difference in ACE2 activity with and without RBD-WT or RBD-62 added (Figs. 4A, S5). Finally, we explored the inhibition of RBD-WT and RBD-62 on viral entry. Initially, we used Lentivirus pseudotyped with spike protein variant SΔC19 (26). This spike variant lacked the last 19 amino acids that are responsible for its retention in the endoplasmic reticulum. The relative cellular entry was analyzed by flow-cytometry of Lentivirus infection promoting GFP Fig. 4 Inhibition of RBD-WT and RBD-62 on ACE2 activity and their potential to inhibit viral entry and infection. (A) ACE2 activity (in vitro or on cells) assayed using SensoLyte® 390 ACE2 Activity Assay Kit. Fluorogenic peptide cleavage by ACE2 was measured in 10 seconds intervals over 30 minutes. The activity rate is indicated by the slope of the plot [product/time]. An Ace2 inhibitor (Inh.), provided with the kit, was used as the negative control. (A). ACE2 activity was measured in vitro after the addition of 100 nM of RBD-WT or RBD-62 to purified ACE2. (upper panel). ACE2 activity was measured following incubation with RBD-WT or RBD-62 on HeLa cells transiently transfected with human full-length ACE2 (bottom panel). (B) Inhibition of infection of HEK-293T cells stably expressing ACE2 by Lentivirus pseudotyped with SARS-CoV-2 spike protein. (C) Inhibition of SARS-CoV-2 infection by RBD-WT and RBD-62 proteins. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 12 signal. The HEK-293T cells stably expressing hACE2 were pre-incubated with serial dilutions of the two RBDs for 1 h and then the pseudovirus was added for 48 hrs. Results in Fig. 4B show that the EC50 was reduced from 88 nM for RBD-WT to 5.1 nM for RBD-62. Next, RBD-WT and RBD-62 were evaluated for their potency in inhibiting SARS-CoV-2 infection to VeroE6 cells (Fig. 4C). Similar to the pseudovirus, also here the EC50 was reduced from 90 to 6.8 nM for RBD-WT and RBD-62 respectively. More significantly, RBD-62 blocked >99% of viral entry and replication, while RBD-WT blocked only ~75% of viral replication. The complete blockage of viral replication, using a low nM concentration of RBD-62 makes it a promising drug candidate. Discussion The SARS-CoV-2 pandemic is an ongoing event, with the virus constantly acquiring new mutations. Intriguingly, the naturally selected mutations S477N, E484K, and N501Y of the Spike protein RBD, which show higher infectivity, were selected by yeast surface display affinity maturation already in the first round, giving rise to the South-African, E484K, N501Y, and British variants that bind ACE2 13 and 3.5-fold tighter than RBD-WT. Following three additional rounds of yeast display selection resulted in 600-fold tighter binding in comparison to RBD-WT. The selection process took advantage of combinatorial selection, without compromising protein- stability. The high-affinity binder, RBD-62 was evaluated as a potential drug and showed to efficiently block ACE2, without affecting its important enzymatic activity. While natural virus selection is not as efficient as in vitro selection, the gained information on the more critical mutations can be used as a tool to identify emerging mutations. We hypothesize that E484R will continue to spread and will become more dominant, especially in combination with N501Y. In contrast, we do not expect the rapid spread of S494P. Importantly, the mutation Q498R appeared in the library B4 after the incorporation of Tyr at position 501. This combination dramatically increased the affinity below 100 pM as is shown by the difference between RBD-32 and RBD-44 (Table 1). Notably, the wild-type RBD codon at position 498 is CAA, allowing for direct change to arginine codon CGA. R498 was not sampled yet by the virus (Fig. 2A) but its appearance should be carefully monitored. Moreover, R498 is located in a hypervariable location of the RBD (Fig. S11), which makes its appearance more plausible. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 13 We successfully solved the Cryo-EM structure of RBD-62 to high resolution. The structure shows that RBD-62 has much improved electrostatic complementarity with ACE2, in relation to RBD-WT (Figure 3). This can be attributed to the use of the Fast Association protocol. The structure contains many of the currently evolving mutations (S47N, E484K, N501Y) and can serve not only as valuable source of information but also as a “crystal ball” to predict future virus evolution steps. To evaluate the effect of mutations in the RBD on antibody binding, we manually inspected 92 antibody-RBD (nanobody, Spike) structures for clashes. 28 of the antibodies bind outside the RBM and 8 interactions are similar with RBD-62 and RBD-WT. However, for 56 antibodies, a decrease in the number of contacts was observed and in 9 cases major clashes with RBD-62 are observed (Fig. S12). Notably, E484R and Q498R caused most of the observed effects. These findings suggest the need for close monitoring of the efficiency of drugs and vaccines for current and future mutations. An intriguing question is whether the spreading of the tighter binding SARS-CoV-2 variants in humans is accidental. From the similarity to yeast display selection, where stringent conditions are used, one may hypothesize that stringent selection is also driving the rapid spread of these mutations. Face masks of low quality (which are by far the most abundant) would provide such selection conditions, as they reduce exhaled viral titers, given tighter binding variants an advantage over WT to spread rapidly in the population (as a result of R0 of mutated viruses being >1, while <1 for WT viruses). This should be urgently investigated, as one may consider the mandatory use of higher quality face-masks, which will reduce viral titer to bellow infection levels (as indeed seen with medical personal who use such masks) and stop spreading these tighter binding virus mutations. Acknowledgments Funding: This research was supported by the Israel Science Foundation (grants No. 3814/19 and 1268/18) within the KillCorona – Curbing Coronavirus Research Program and by the Ben B. and Joyce E. Eisenberg Foundation. Authors contribution: J.Z. and G.S. conceived the project; J.Z., S.M., M.S., E.Z., J.C., B.M and G.S. performed experiments; N.E. prepared cryo-EM samples and built atomic models and refined structures with O.D. J.Z, N.E, O.D and G.S wrote the manuscript. Competing interests: The authors J.Z. and G.S. declare the US Provisional Patent Application No. 63/125,984 (Yeda Ref.: 2020-091). Data and materials availability: maps and atomic coordinates have been deposited in the Protein Data Bank (www.rcsb.org) and the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 14 Electron Microscopy Data Bank (www.ebi.ac.uk/pdbe/emdb with accession codes: XXX, XXX, respectively. Supplementary Materials Materials and Methods Supplementary text Table S1 – S3 Figs. S1 – S12 References (27 – 41) References 1. M. Hoffmann et al., SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 181, 271-280.e278 (2020). 2. J. Lan et al., Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 581, 215-220 (2020). 3. D. Wrapp et al., Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263 (2020). 4. W. Tai et al., Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cellular & Molecular Immunology 17, 613-620 (2020). 5. A. C. Walls et al., Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281-292.e286 (2020). 6. T. N. Starr et al., Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 182, 1295-1310.e1220 (2020). 7. H. Tegally et al., Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. medRxiv, 2020.2012.2021.20248640 (2020). 8. J. Chen, R. Wang, M. Wang, G.-W. Wei, Mutations Strengthened SARS-CoV-2 Infectivity. Journal of Molecular Biology 432, 5212-5226 (2020). 9. S. Lukassen et al., SARS-CoV-2 receptor ACE2 and TMPRSS2 are primarily expressed in bronchial transient secretory cells. EMBO J 39, e105114-e105114 (2020). 10. C. G. K. Ziegler et al., SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues. Cell 181, 1016-1035.e1019 (2020). 11. L. Dai, G. F. Gao, Viral targets for vaccines against COVID-19. Nature reviews. Immunology, 1-10 (2020). 12. S. H. Nile et al., COVID-19: Pathogenesis, cytokine storm and therapeutic potential of interferons. Cytokine & Growth Factor Reviews 53, 66-70 (2020). 13. C. O. Barnes et al., SARS-CoV-2 neutralizing antibody structures inform therapeutic strategies. Nature 588, 682-687 (2020). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 15 14. T. M. Abd El-Aziz, A. Al-Sabi, J. D. Stockand, Human recombinant soluble ACE2 (hrsACE2) shows promise for treating severe COVID-19. Signal Transduction and Targeted Therapy 5, 258 (2020). 15. D. Schütz et al., Peptide and peptide-based inhibitors of SARS-CoV-2 entry. Adv Drug Deliv Rev 167, 47-65 (2020). 16. L. Cao et al., De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426-431 (2020). 17. D. d. F. Lelis, D. F. d. Freitas, A. S. Machado, T. S. Crespo, S. H. S. Santos, Angiotensin- (1-7), Adipokines and Inflammation. Metabolism 95, 36-45 (2019). 18. H. Zhang, J. M. Penninger, Y. Li, N. Zhong, A. S. Slutsky, Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target. Intensive Care Medicine 46, 586-590 (2020). 19. G. Wang et al., Dalbavancin binds ACE2 to block its interaction with SARS-CoV-2 spike protein and is effective in inhibiting SARS-CoV-2 infection in animal models. Cell Research, (2020). 20. J. Zahradník, D. Dey, S. Marciano, G. Schreiber, An enhanced yeast display platform demonstrates the binding plasticity under various selection pressures. bioRxiv, 2020.2012.2016.423176 (2020). 21. G. Chao et al., Isolating and engineering human antibodies using yeast surface display. Nature Protocols 1, 755-768 (2006). 22. S. Elbe, G. Buckland-Merrett, Data, disease and diplomacy: GISAID's innovative contribution to global health. Global Challenges 1, 33-46 (2017). 23. S. Kemp et al., Recurrent emergence and transmission of a SARS-CoV-2 Spike deletion ΔH69/ΔV70. bioRxiv, 2020.2012.2014.422555 (2020). 24. R. Cohen-Khait, G. Schreiber, Selecting for Fast Protein–Protein Association As Demonstrated on a Random TEM1 Yeast Library Binding BLIP. Biochemistry 57, 4644- 4650 (2018). 25. T. Selzer, S. Albeck, G. Schreiber, Rational design of faster associating and tighter binding protein complexes. Nature structural biology 7, 537-541 (2000). 26. H. Cohen-Dvashi et al., Coronacept – a potent immunoadhesin against SARS-CoV-2. bioRxiv, 2020.2008.2012.247940 (2020). 27. L. Benatuil, J. M. Perez, J. Belk, C.-M. Hsieh, An improved yeast transformation method for the generation of very large human antibody libraries. Protein Engineering, Design and Selection 23, 155-159 (2010). 28. A. R. Aricescu, W. Lu, E. Y. Jones, A time- and cost-efficient system for high-level protein production in mammalian cells. Acta crystallographica. Section D, Biological crystallography 62, 1243-1250 (2006). 29. Y. Peleg, T. Unger, Application of the Restriction-Free (RF) cloning for multicomponents assembly. Methods in molecular biology (Clifton, N.J.) 1116, 73-87 (2014). 30. D. S. Wilson, A. D. Keefe, Random mutagenesis by PCR. Current protocols in molecular biology Chapter 8, Unit8.3 (2001). 31. R. D. Gietz, Yeast transformation by the LiAc/SS carrier DNA/PEG method. Methods in molecular biology (Clifton, N.J.) 1163, 33-44 (2014). 32. D. N. Mastronarde, Automated electron microscope tomography using robust prediction of specimen movements. Journal of structural biology 152, 36-51 (2005). 33. A. Punjani, J. L. Rubinstein, D. J. Fleet, M. A. Brubaker, cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nature Methods 14, 290-296 (2017). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 16 34. A. Punjani, H. Zhang, D. J. Fleet, Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nature Methods 17, 1214-1221 (2020). 35. A. Punjani, D. J. Fleet, 3D Variability Analysis: Resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. bioRxiv, 2020.2004.2008.032466 (2020). 36. P. D. Adams et al., PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta crystallographica. Section D, Biological crystallography 66, 213- 221 (2010). 37. B. P. Klaholz, Deriving and refining atomic models in crystallography and cryo-EM: the latest Phenix tools to facilitate structure analysis. Acta crystallographica. Section D, Structural biology 75, 878-881 (2019). 38. P. Emsley, K. Cowtan, Coot: model-building tools for molecular graphics. Acta crystallographica. Section D, Biological crystallography 60, 2126-2132 (2004). 39. V. B. Chen et al., MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica. Section D, Biological crystallography 66, 12-21 (2010). 40. E. F. Pettersen et al., UCSF Chimera—A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25, 1605-1612 (2004). 41. F. Amanat et al., A serological assay to detect SARS-CoV-2 seroconversion in humans. Nature Medicine 26, 1033-1036 (2020). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 17 Supplementary Materials Materials and methods Cloning and DNA manipulations The RBD domain variants (see Table S1) were PCR amplified (KAPA HiFi HotStart ReadyMix, Roche, Switzerland) from codon-optimized SARS-CoV-2 Spike protein gene (Sino Biological, SARS-CoV-2 (2019-nCoV) Cat: VG40589-UT, GenBank: QHD43416.1) by using appropriate primers. Amplicons were purified by using NucleoSpin® Gel and PCR Clean-up kit (Nacherey- Nagel, Germany) and eluted in DDW. Yeast surface display plasmid pJYDC1 (Adgene ID: 162458) and pJYDC3 (162460) were cleaved by NdeI and BamHI (NEB, USA) restriction enzymes, purified, and tested for non-cleaved plasmids via transformation to E.coli Cloni® 10G cells (Lucigen, USA). Each amplicon was mixed with cleaved plasmid in the ratio: 4 µg insert: 1 µg plasmid per construct, electroporated in S.cerevisiae EBY100 (27), and selected by growth on SD-W plates. Cloning of ACE2 extracellular domain (AA G17-Y613) gene and RBDs into vectors pHL-sec (28) were done in two steps. Initially, the RBD gene was inserted in helper vector pCA by restriction-free cloning (29). pCA is a pHL-sec derivative lacking 862 bp in the GC rich region (nt 672 - 1534). In the second step, the correctly inserted, verified by sequencing, RBDs with flanking sequences were cleaved by using restriction enzymes XbaI and XhoI (NEB, USA) and ligated (T4 DNA ligase, NEB, USA) in cleaved full-length plasmid pHL-sec. Site-directed mutagenesis of RBDs was performed by restriction-free cloning procedure (29). Megaprimers were amplified by KAPA HiFi HotStart ReadyMix (Roche, Switzerland), purified with NucleoSpin™ Gel and PCR Clean-up Kit (Nachery-Nagel, Germany), and subsequently inserted by PCR in the destination using high fidelity Phusion® (NEB, USA) or KAPA polymerases. The parental plasmid molecules were inactivated by DpnI treatment (1 h, NEB, USA) and the crude reaction mixture was transformed to electrocompetent E. coli Cloni® 10G cells (Lucigen, USA). The clones were screened by colony PCR and their correctness was verified by sequencing. DNA libraries preparation SARS-CoV-2 RBD gene (RBD) libraries were prepared by MnCl2 error-prone mutagenesis (30) using Taq Ready-mix (Hylabs, Israel). The mutagenic PCR reactions (50 µl) were supplemented (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 18 with increasing MnCl2 concentrations: 0.05, 0.1, 0.2, 0.4, 0.6, 0.8 and 1.0 nM. Template DNA concentration ranged between 100 and 400 ng per reaction and 20 – 30 reaction cycles were applied. The amplified DNA was purified, pooled, and used directly for yeast transformation via electroporation. The whole gene randomization amplicon comprised RBD and linker between it and Aga2p protein (nucleotides -152 – 621, pJYDC1 vector). Libraries B3, B4, and B5 were prepared by homologous recombination of an invariant fragment of RBD with necessary overlaps (1 – 321) and the mutagenized library fragment (260 – 621). The mutagenic fragments were prepared by the same error-prone PCR procedure (20 cycles). Yeast transformation, cultivation, and expression procedures The detailed description of all the procedures and our enhanced yeast display platform itself was described in details (20). Briefly, plasmids were transformed into the EBY100 Saccharomyces cerevisiae (27, 31). Single colonies were inoculated into 1.0 ml liquid SD-CAA media (20), and grown overnight at 30°C (220 rpm). The overnight cultures were spun down (3000 g, 3 min) and the exhausted culture media was removed before dilution in the expression media 1/9 (20) to OD ~ 1. The expression cultures were grown at different temperatures 20, 30, and 37 °C for 8 – 24 h at 220 rpm, depending on the experimental setup. The expression co-cultivation labeling was achieved by the addition of 1 nM DMSO solubilized bilirubin (pJYDC1, eUnaG2 reporter holo- form formation, green/yellow fluorescence (Ex. 498 nm, Em. 527 nm)) or 5 nM ALFA-tagged mNeonGreen (pJYDC3, DnbALFA). Aliquots of cells (100 ul) were collected by centrifugation (3000 g, 3 min) resuspended in ice-cold PBSB buffer (PBS with 1 g/L BSA), passed through cell strainer nylon membrane (40 µM, SPL Life Sciences, Korea), and analyzed. Binding assays and affinity determination using yeast surface display Aliquots of yeast expressed and labeled cells ready for flow-cytometry analysis were resuspended in analysis solution with a series of labeled ACE2 concentrations. The concentration range was of CF®640R succinimidyl ester labeled (Biotium, USA) ACE2 extracellular domain (AA Q18 – S740) was dependent on the protein analyzed (0.1 pM – 50 nM). The analysis solution volume was adjusted (1 – 100 ml) to avoid the ligand depletion greater than 10% as well as the time needed to reach the equilibrium (1 h – 12 h, 5 rpm, 4 °C) (21). After the incubation, samples were collected (3000 g, 3 min), resuspended in 200 ul of ice-cold PBSB buffer (200 µl), passed through a cell strainer, and analyzed. The expression and binding signals were determined by (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 19 flow cytometry using BD Accuri™ C6 Flow Cytometer (BD Biosciences, USA). The cell analysis and sorting were done by S3e Cell Sorter (BioRad, USA). The analysis was done by single-cell event gating (Fig. S2), green fluorescence channel (FL1-A) was used to detect RBD expression positive cells (RBD+) via eUnaG2 or DnbALFA, and far-red fluorescent channel (FL4-A) recorded CF®640R labeled ACE2 binding signals (CF640+). The eUnaG2 signals were automatically compensated by the ProSort™ Software and pJYDNp positive control plasmid (Adgene ID 162451 (20)). The mean FL4-A fluorescence signal values of RBD+ cells, subtracted by RBD-, were used for determination of binding constant KD. The standard non-cooperative Hill equation was fitted by nonlinear least-squares regression using Python 3.7. The total concentration of yeast exposed protein was fitted together with two additional parameters describing the given titration curve (6). Production and purification of RBD and ACE2 proteins The extracellular part of ACE2 (Q18 – S740) and RBD protein variants (Table S1) were produced in Expi293F cells (ThermoFisher). Pure DNA was transfected using ExpiFectamine 293 Transfection Kit (ThermoFisher) using the manufacturer protocol. 72 hours post-transfection, the cells were centrifuged at 1500 rpm for 15 minutes. The supernatant was filtered using 0.45 µm Nalgene, ThermoFisher filter and the pellet was discarded. The filtered supernatant was loaded onto a 5 ml of HisTrap Fast Flow column (Cytivia (GE, USA), cat 17-5255-01). ÄKTA pure (Cytivia, USA) was used to purify the protein. The column was washed in 25 mM Tris, 200 mM NaCl 20 mM imidazole, then, the protein was eluted using gradient elution with elution buffer containing 25 mM Tris, 200 mM NaCl 1M imidazole. Buffer exchange to PBS and the concentration of the protein were done by using amicons® (Merck Millipore Ltd, cat:UFC900324). Cryo-Electron Microscopy Sample preparation: 2.5 µl of ACE2-RBD-62 complex at 3.5 mg/ml concentration was transferred to glow discharged UltrAuFoil R 1.2/1.3 300 mesh grids (Quantifoil), blotted for 2.5 seconds at 4°C, 100% humidity, and plunge frozen in liquid ethane cooled by liquid nitrogen using a Vitrobot plunger (Thermo Fisher Scientific). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 20 Cryo-EM image acquisition: Cryo-EM data were collected on a Titan Krios G3i transmission electron microscope (Thermo Fisher Scientific) operated at 300 kV. Movies were recorded on a K3 direct detector (Gatan) installed behind a BioQuantum energy filter (Gatan), using a slit of 20 eV. Movies were recorded in counting mode at a nominal magnification of 165,000x, corresponding to a physical pixel size of 0.53 Å. The dose rate was set to 16.2 e-/pixel/sec, and the total exposure time was 1.214 sec, resulting in an accumulated dose of 70 e-/Å2. Each movie was split into 57 frames of 0.021 sec. The nominal defocus range was -0.7 to -1.1 µm, however, the actual defocus range was larger. Imaging was done using an automated low dose procedure implemented in SerialEM (32). A single image was collected from the center of each hole using image shift to navigate within hole arrays and stage shift to move between arrays. The ‘Multiple Record Setup’ together with the ‘Multiple Hole Combiner’ dialogs were used to map hole arrays of up to 3x3 holes. Beam tilt was adjusted to achieve coma-free alignment when applying image shift. Cryo-EM image processing: Image processing was performed using CryoSPARC software v3.0.1 (33). The processing scheme is outlined in Fig. S7. A total of 4470 acquired movies were subjected to patch motion correction, followed by patch CTF estimation. Of these, 3357 micrographs having CTF fit resolution better than 5 Å and relative ice thickness lower than 1.07, were selected for further processing. Initial particle picking was done using the ‘Blob Picker’ job on a subset of 100 micrographs. Extracted particles were iteratively classified in 2D and their class averages were used as templates for automated particle picking from all selected micrographs, resulting in 2,419,995 picked particles. Particles were extracted, binned 6x6 (60- pixel box size, 3.18 Å/pixel), and cleaned by multiple rounds of 2D classification, resulting in 1,649,355 particles. These particles were used for ab initio 3D reconstruction with 5 classes. Out of the 5 classes only one, containing 552,575 particles, refined to high resolution. Two additional classes may show ACE2 in a closed conformation (containing 249,841 and 503,670 particles), however, they did not refine, partially because of preferred orientation. The 3D class containing 552,575 particles was refined as follows: Particles were re-extracted only from micrographs with defocus lower than 1.7 µm, binned 2x2, and subjected to homogeneous refinement (355,891 particles, 200-pixel box size, 1.06 Å/pixel). The particles were then sub-classified into 2 classes, and particles from the higher-resolution class were re-extracted without binning in 680-pixel boxes, subjected to per particle motion correction, followed by non-uniform refinement (34) with per-particle defocus optimization. The final map, at a resolution of 2.9 Å (Fig. S8), was (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 21 sharpened with a B-factor of -83 before atomic model building. In the final map, the RBD is only partially resolved at the distal region from the ACE2 interface. To better understand the reason for the missing density, we subjected the particles from the well-refined 3D class (355,891 particles) to variability analysis (35), with a binary mask imposed on the RBD region (Fig. S9). Classification into 5 distinct classes based on 3 eigenvectors, revealed variable density at the RBD distal region, which could not be modeled reliably. The cryo-EM data collection process and refinement statistics are summarized and visualized in Fig. S7, S8, S9, and Table S3. Model building: The atomic model of the ACE2-RBD-62 was solved by docking into the Cryo- EM maps the homologous refined structure of the SARS-CoV-2 spike receptor-binding domain bound with ACE2 (PDB-ID 6M0J) as a model, using the Dock-in-Map program in PHENIX (36). All steps of atomic refinements were carried out with the Real-space refinement in PHENIX (37). The model was built into the cryo-EM map by using the COOT program (38). The ACE2-RBD- 62 model was evaluated with the MOLPROBIDITY program (39). The ACE2 (G17-Y613) contains one zinc ion linked to H374, H378, and E402 and three N-acetyl-β-glucosaminide (NAG) glycans linked to N53, N90, and N546. In the RBD-62 structure (T333-K528) three fragments; R357-S371 (β2, α2), G381-V395 (α3), and F515-H534 (β11) are disordered, and thus not visible in the electron density map. Details of the refinement statistics of the ACE2-RBD62 structure are described in Table S3. 3D visualization and analyses were performed using UCSF Chimera (40) and PyMol (Schrödinger, Inc.; 2.4.0). Analysis of RBD circulating virus variants All amino acid substitutions in the RBD (116) were downloaded from the GISAID database (23 December 2020) (22) with the corresponding numbers of sequences and regions and plotted against the binding (ΔLog10(KD,App)) or expression (ΔLog10MFI) extracted from the RBD deep mutational scanning dataset (6). We gratefully acknowledge all GISAID contributors and Starr et all for sharing their data. Octet RED binding analysis Octet RED96 System (forte ́BIO, Pall Corp., USA) was used for real-time binding determination. Briefly, 10 µg/ml of ACE2 diluted in 10 mM NaAcetate pH5.5 was immobilization to an amine- reactive 2G biosensor using standard procedure. The purified RBD was diluted in a sample buffer (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 22 (PBS+0.1% BSA+0.02% Tween20). Analyte concentrations, association, and dissociation times were adjusted per sample. Data Analysis v10 software (forte ́BIO, Pall Corp., USA) was used for data fitting, with the mathematical model assuming a simple 1:1 stoichiometry. Pseudo-virus production and inhibition of infection by RBD Pseudo-virus production: SARS-CoV-2-Spike pseudotyped Lentivirus was produced by co- transfection of Hek293T cells pCMV ΔR8.2, pGIPZ-GFP, (26) and pCMV3 SΔC19 at a ratio of 1:1:1. 24 hours before the transfection 1 x 106 cells were seeded into a 10 cm plate. On the day of the transfection cells were washed by Dulbecco's Modified Eagle's Medium (DMEM) (Gibco 11965092) and 5 ml of Opti-MEM (Gibco 11058021) was added to the plate. 10 µg of plasmids mix was transfect using lipofectamine 2000 transfection reagent (Thermo Fisher 11668027) according to the manufacturer’s instructions. After 4 hours, the media was replaced by 9 ml of fresh media. The supernatant was harvested 72 h post-transfection, centrifuged (1000 g, 5 min), and filtered to remove all residual debris (Millex-HV Syringe Filter Unit, 0.45 µm). RBD inhibition assay: HEK-293T cells stably expressing hACE2 (GenScript M00770) were seeded into 24-well plate at an initial density of 6 x 104 cells per well. The following day cells were pre-incubated with serial dilutions of RBDs (1 h) and then the pseudotyped Lentivirus was added. After 24 h, the cell medium was replaced with fresh DMEM, and cells were grown for an additional 24 h. After this procedure, cells were harvested and the GFP signal was analyzed by flow cytometry (BD Accuri™ C6 Plus Flow Cytometer, BD Biosciences, USA). Inhibition of SARS-CoV-2 infection The strain 2019-nCoV/IDF0372/2020 was supplied by the National Reference Centre for Respiratory Viruses hosted by Institute Pasteur (Paris, France) and headed by Dr. Sylvie van der Werf. The human sample from which strain 2019-nCoV/IDF0372/2020 was isolated has been provided by Dr. X. Lescure and Pr. Y. Yazdanpanah from the Bichat Hospital. The experiments were done by Institute Pasteur. VeroE6 (C1008) cells were grown in DMEM with 10% serum and 1% penicillin to 50% confluence in 384 well format and incubated with RBDs at given concentration for 2 hrs before 0.1 MOI of SARS-CoV-2 was added for one hour. The inoculum was subsequently removed and a medium with the RBD was added. After 48 hrs of incubation, the supernatant was recovered and viral load was measured using RT-PCR with forward primer: TAATCAGACAAGGAACTGATTA, reverse primer: CGAAGGTGTGACTTCCATG. In (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 23 parallel, cell viability was assessed after 48 hrs incubation using the CellTiter Glo kit from Promega. Raw data are normalized against appropriate negative and positive controls and are expressed as the fraction of virus inhibition. The curve fit was performed using the variable Hill slope model of four parameters logistic curve: Response = Baseline + (Max – Baseline)/(1+10^(logEC50-Log(C)+Hill)) ACE2 activity assay Human ACE2 activity was evaluated using SensoLyte® 390 ACE2 Activity Assay Kit (ANASPEC; cat# 72086) according to manufacturer's protocol, with the following changes - assay was performed in 384 well plates with a ratio of 1:5 of the recommended volume of buffer, substrate, and inhibitor. The activity was measured on either purified ACE2 (0.75 ng; Abcam, ab151852) or on the following cell lines - HeLa transiently transfected with ACE2 (6000 cells per assay), HEK-293T stable transfected with ACE2 (GenScript M00770, 8000 cells per assay), Caco2 cells (40,000 cells per assay). To assess the effect of RBD on ACE2 activity 10 nM WT RBD or RBD-B62 were added before activity measurement. The activity rate is indicated by the slope of the plot [product/time]. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 24 Supplementary text Optimizing the RBD domain length for yeast display and protein expression To optimize the RBD for yeast display, we screened multiple different constructs for yeast surface expression. RBDs of different starting and termination positions were cloned in a pJYDC1 vector and their impact on expression, stability, and ACE2 binding were determined (Table S1). The RBDcon1 was the shortest construct lacking the last C-terminal loop of the RBD domain (516 – 528) and including one unpaired cysteine. This resulted in poor expression and binding. The RBDcon2 and con3 included this loop, resulting in domain stabilization and an increase both in binding and expression. Although RBDcon4 (41) construct demonstrated high expression yields both in yeast and Expi293F™ cells, as well as good thermo-stability, we decided not to use it in yeast display since one unpaired cysteine (C538) is close to its C-terminus and the construct contains part of the neighboring domain. We continued with the RBDcon2 and RBDcon3 constructs for yeast display and protein expression in Expi293F™ cells respectively. Supplementary tables Table S1 – Comparison of different RBD domains for yeast display and protein expression. Construct Positiona Number of AA Size [kDa] Yeast expression [mean FL1*103]b Yeast display estimated KD [nM]c Melting temperature [°C] RBDcon1 336-516 181 20.5 16.4 3.2 ± 0.1 RBDcon2 336-528 193 21.7 32.9 1.6 ± 0.3 53.5 ± 0.3 RBDcon2b 333-528 196 22.1 37.9 1.1 ± 0.3 RBDcon3 330-528 199 22.3 38.2 1.0 ± 0.1 RBDcon4 319-541 223 25.1 67.6 1.2 ± 0.2 53.8 ± 0.2 a numbers are according to UniProtKB- P0DTC2 b measured in pJYDC1 (eUnaG2 fluorescence signal) c Binding affinity against ACE2 was determined by FACS, with the relevant construct expressed on yeast surface. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 25 Table S2 Analysis of mutant clones selected by yeast display. * - The yeast display affinity was determined using 4 different concentrations of ACE2 (scr – see Fig. S3) or by full titration curve (full). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 26 Table S3: Cryo-EM data collection and refinement statistics of ACE2-RBD62 Data collection EM equipment Voltage (kV) Detector Energy filter Pixel size (Å) Electron dose (e-/Å2) Defocus range (µm) Number of collected micrographs Number of selected micrographs 3D Reconstruction Software Number of used particles Resolution (Å) Symmetry Map sharpening B factor (Å2) PDB code Refinement Software Cell dimensions (Å) Model composition Protein residues Atoms Sugar Zn RMSD Bonds length (Å) Bonds Angle ( ̊) Ramachandran plot statistics (%) Preferred Allowed Outlier Titan Krios (Thermo Fisher Scientific) 300 K3 (Gatan) BioQuantum (Gatan), 20 eV slit 0.53 70 -0.4 to -1.7 4,470 2,535 CryoSPRAC 164,636 2.9 C1 -83 XXX Phenix 313.056 729 5,873 3 1 0.007 0.687 95.12 4.88 0.1 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 27 Supplementary Material Figures: Fig. S1 The I358F mutation, selected by yeast surface display, increases protein stability and expression. A) The position of I358F (bright yellow) mutation in the RBD structure (PDB ID 6M17) and the neighboring residues within 5 Å distance (pale yellow). B) Shows the residues involved in the formation of the hydrophobic cavity around I358F mutation predicted from the X- ray structure. Additional residues that are involved: K356, R357, S359, V395, Y396. Inset – the wild-type residue (isoleucine in magenta) overlaid with the phenylalanine mutant. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 28 Fig. S2 Gaiting and selection strategies for in vitro evolution of SARS-CoV-2 RBD domain. A, B) Gating strategy for FACS sorting. In the first step, yeast cells are isolated by their FSC-A and SSC-A properties (A). In the second step (B), single cells are isolated by their FSC properties (area and height) on the diagonal plot. The Green area represents the gated region. C) Selection strategy for affinity maturation. The library was titrated with a range of ACE2 concentrations to select the concentration with limited signal (inset 1). Under such conditions, the tighter binding clones gain the highest advantage over the parental population. Using less stringent selection (insets 2 – 4) reduces the advantage of the tighter binders. Using too low concentrations of ACE2 protein will also result in loss of selectivity. D) Affinity maturation library after 3 sorts, where the separation between parental and tighter binding population is well defined. The top 0.1 – 0.3 % of cells were sorted – green region. E) Fast association selection strategy. The library was incubated with a constant concentration (30 pM) of ACE2 for a different times. The time with minimal signal was determined and used for the selection of clones with faster association. The same shape of the sorting region as in (D) was applied. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 29 Fig. S3 Evaluating the binding affinity of 5 individual clones, from libraries B3 and FA. Five single-clones were evaluated for binding to ACE2 from each library, to determine the range of affinity maturations after FACS selection. Each clone was incubated with four (library B3) or six (FA) different concentrations of ACE2. The binding curve was fitted using additional parameters describing the curve minimum and maximum as determined from the RBD-WT titration curve. Calculated affinities are in Table S2. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 30 Fig. S4 Protein purification of ACE2, RBD-62 and the complex between the two. Both proteins were expressed in Expi293F cells and secreted to cell culture media. A) SDS-PAGE analysis after NiNTA agarose purification of ACE2 receptor extracellular portion (AA Q18 – S740). B) SDS-PAGE analysis from NiNTA agarose purification of RBD-62 (AA 333-528). C) The ACE2 + RBD-62 complex was purified by gel filtration chromatography column prior to CryoEM. ACE2 protein was mixed with an excess of RBD-62 (1:1.5), incubated 1h on ice, and applied on the chromatography column by using ÄKTA pure FPLC system. The first peak corresponds to the complex (SDS-gel inset) and the second peak represents excess RBD-62. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 31 Fig. S5 SARS-CoV-2 RBD mutations in the population and their expression level. A) Relation between the impact of mutations on yeast surface expression and their occurrence in the population. Expression was measured as the mean fluorescence intensity (MFI) of the specific clone expressed on the yeast surface by Star ret al. (6) (black and red) or by us (blue, inset). Empty squares and black dots are showing data with < 5 or ˃ 5 sequences recorded, respectively. The emerging mutations in the population are shown in red. The graph shows that the variance in expression decreases with higher occurrence in the population. B) Relation between the affinity (x-axis), expression (y-axis), and the occurrence in population: Empty squares < 5 sequences; black dots ˃ 5 sequences; red dots represent four emerging mutants (all with more than 100 sequences). Based on A and B, rapidly spreading mutations have increased ACE2 binding affinity without compromising protein stability. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 32 Fig. S6 Site-directed mutagenesis of RBD-62, using affinity enhancing mutations. 23 mutations were predicted to enhance RBD-ACE2 binding (6). These mutations were evaluated for enhancing the affinity of RBD-62 towards ACE2. A) Impact of mutations, on top of RBD-62 on ACE2 binding (y-axis) and yeast surface expression. Three mutations (orange circles), which have the highest impact on expression, were combined in RBD-71 (red triangle). B) Localization of stabilizing (yellow) and binding enhancing mutations depicted in the RBD structure (PDB ID 6m17, best rotamer is shown). C) Binding curve of RBD-71 with RBD-62 for comparison. D) Normalized protein melting curves for RBD-WT, RBD-62, RBD-52, and RBD-71 measured using the Tycho NT.6 (NanoTemper). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 33 Fig. S7 Single-particle cryo-EM processing scheme. The details of the process are described in the Methods section under “Cryo-EM image processing”. The number of particles in each map is indicated under the map’s image, along with the map’s resolution where relevant. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 34 Fig. S8 Resolution estimate and angular distribution for the ACE2-RBD-62 cryo-EM map. (A) Fourier Shell Correlation (FSC) curves. (B) Angular distribution plot. (C) An alpha-helical segment showing the map density and fitted atomic coordinates. (D) Cryo-EM map colored according to local resolution estimate. The inset shows a slice through the RBD-ACE2 interface. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 35 Fig. S9 Variability analysis of the RBD. (A) Particle images from the well-resolved 3D class were subjected to 3D variability analysis. (B) Central slices through the three eigenimages calculated with a binary mask around the RBD region. (C) Five 3D classes, which were calculated based on the eigenimages. The maps show variable density for the RBD. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 36 Fig. S10 Global comparison between RBD-WT and RBD-62. A) The RBD-62 preserves its typical twisted five-stranded antiparallel β sheet (β1, β3-β5, and β10) with an extended insertion containing the short β5-β9 strands, α4, and η3 helices and loops. The biggest differences are pronounced between M470 and F490 (black circle). B) The upper part comprised of three segments: R357-S371 (β2, α2), G381-V395 (α3), and F515-H534 (β11) is not resolved in the electron density map (blue ribbon, added from PDB ID: 6M0J). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 37 Fig. S11 An analysis of conserved positions computed by ConSurf server depicted on the RBD-62 structure. The amino acids are colored by their conservation grades with turquoise- through-maroon indicating variable-through-conserved. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 38 Fig. S12 RBD-62 mutations are interfering with binding to multiple antibodies. The RBD-62 (magenta) was structurally overlayed with RBD-WT (white). S477N, E484K, Q498R, and N501Y RBD mutated residues were analyzed for disruptive contacts/clashes with corresponding binding antibody/nanobody (green) in relation to RBD-WT. Four examples A) PDB ID: 6YZ5, B) PDB ID: 7CAN, C) PDB ID: 7JVB, D) PDB ID: 7CHE, where RBD-62 (but not RBD-WT) forms serious clashes with the second chain. Further experimental evaluation is needed to support our observation. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425392doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425392 10_1101-2021_01_06_425465 ---- 74855466 Structural Basis of KAI2 Divergence in Legume Angelica M. Guercio1, François-Didier Boyer2, Catherine Rameau3, Alexandre de Saint Germain3†, Nitzan Shabek1† 1 Department of Plant Biology, University of California – Davis, Davis, CA 95616 2 Université Paris-Saclay, CNRS, Institut de Chimie des Substances Naturelles, UPR 2301, 91198, Gif-sur-Yvette, France 3 Institut Jean-Pierre Bourgin, INRAE, AgroParisTech, Université Paris-Saclay, 78000, Versailles, France †Correspondence should be addressed to: nshabek@ucdavis.edu , Alexandre.De-Saint- Germain@inrae.fr (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Abstract The α/β hydrolase KARRIKIN INSENSITIVE-2 (KAI2) mediates the perception of smoke- derived butenolides (karrikins) and an elusive endogenous hormone (KAI2-ligand, KL) found in all land plants. It has been suggested that KAI2 gene duplication and sub-functionalization events play an adaptative role for diverse environments by altering the receptor responsiveness to specific KLs. These diversification occurrences are exemplified by the variable number of functional KAI2 receptors among different plant species. Legumes represent one of the largest families of flowering plants and contain many essential agronomic crops. Along the legume lineage the KAI2 gene underwent a duplication event resulting in KAI2A and KAI2B. Here we show that the model legume, Pisum sativum (Ps), expresses three distinct KAI2 homologues, two of which, KAI2A and KAI2B have uniquely sub-functionalized. We characterize biochemically the distinct ligand sensitivities between these divergent receptors and report the first crystal structure of PsKAI2 in apo and butenolide-bound states. Our study provides a comprehensive examination of the specialized ligand binding ability of legume KAI2A and KAI2B and sheds light on the perception and enzymatic mechanism of the KAI2-butenolide complex. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Introduction Karrikins (KARs) are a family of butenolide small molecules produced from the combustion of vegetation and are a bio-active component of smoke1–4. These molecules are capable of inducing germination of numerous species of plants, even those not associated with fire or fire-prone environments such as Arabidopsis1,5–9. Through studies in Arabidopsis, KAR sensitivity was shown to be dependent on three key proteins: a KAR receptor, an α/β hydrolase KARRIKIN INSENSITIVE2 (KAI2), an F-box MORE AXILLIARY GROWTH 2 (MAX2) component of the Skp1-Cullin-F-box (SCF) E3 ubiquitin ligase, and the proposed target of ubiquitination and degradation, the transcriptional corepressor SMAX1/SMXL24,10–13. An increasing number of studies have shown that KAI2 and KAR signaling components are involved in the regulation of many plant developmental processes including seedling development, leaf shape, cuticle formation, and root development, as well as play roles in AM fungi symbiosis and abiotic stress response2–4,14–16. The striking similarities between KAR and strigolactone (SL) signaling pathways have been the focus of an increasing number of studies. Both SLs and KARs share a similar butenolide ring structure but instead of the KAR pyran moiety, the butenolide is connected via an enol ether bridge to either a tricyclic lactone (ABC rings) in canonical SLs, or to a structural variety in non- canonical SLs17,18. The receptor for SL, DWARF14 (D14) shares a similar α/β hydrolase fold as KAI2 and a parallel signaling cascade requiring the function of the MAX2 ubiquitin ligase and downregulation of SMXLs, corepressors which also share some structural elements with SMAX1/SMXL24,10,13,19. Unlike KARs, SLs are plant hormones that act endogenously, but were also found to be exuded by plant roots. SLs regulate diverse physiological responses such promoting hyphal branching of arbuscular mycorrhizal (AM) fungi to enhance the efficiency of (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 AM symbiosis, stimulating germination of root parasitic plant species, repressing shoot branching, affecting lateral root formation, primary root growth, root hair elongation, secondary growth in the stem, leaf senescence, and adventitious root formation 20–32. Notably, KAI2 family receptors have undergone numerous duplication events within various land plant lineages. D14 was found to be an ancient duplication in the KAI2 receptor in seed plant lineage followed by sub-functionalization of the receptor, making it uniquely implied in SL signaling33–36. The age- old question in receptor diversity has been the evolutionary purpose and functional significance of KAI2 duplication events. It has been shown that D14 and KAI2 are not able to complement each other functions in planta37–41. To this end, within the ligand binding site of KAI2 receptors the substitution of a few amino acids can alter ligand specificity between KAI2 duplicated copies39,42. While the role of the D14 receptor in SL signaling is well established, KAI2 receptors and KAR signaling are less understood. Furthermore, given the fact that KAI2 is ancestral to D14 and that KAR signaling controls diverse developmental processes including those unrelated to fire, it has been suggested that KAI2s are able to perceive an endogenous ligand(s), of which is currently unknown and tentatively named KAI2-Ligand (KL)33,34,41,43. Thus far, several crystal structures of KAI2/D14 receptors have been reported and have led to a greater understanding of receptor-ligand perception and the hydrolytic activity of the receptor towards certain ligands11,12,19,31,36,38,44–51. The divergence between duplications of KAI2 receptors to confer altered ligand specificity has been partially addressed at the physiological and biochemical level for only few plant species, and a structural examination has been limited38– 40,42. Legumes represent one of the largest families of flowering plants and contain many essential crops. Beyond their agronomic value, most legume species are unique among plants because of (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 their ability to fix nitrogen by utilizing symbiosis with rhizobia, in addition to AM fungi symbiosis. Because of the potential functional diversification and specialization of KAI2-ligand, we characterized and examined the KAI2 receptor mechanism in legume, using Pisum sativum (Ps) as a model. In this study, we examined the implications of the pre-legume KAI2 duplication event that resulted in legume KAI2A and KAI2B clades42. We found that Pisum sativum expresses three distinct KAI2 homologues, two of which, KAI2A and KAI2B have uniquely sub-functionalized. We characterize biochemically the distinct ligand sensitivities between these divergent receptors and further report the first crystal structure of PsKAI2B in apo and a unique butenolide-bound state at high resolution (1.6 Å and 2.0 Å, respectively). Altogether our findings provide a comprehensive examination of the specialized ligand binding ability of legume KAI2A and KAI2B and sheds light on the perception and enzymatic mechanism of KAI2 receptors. Results Genetic identification and characterization of the legume Pisum sativum KAI2 genes To characterize the karrikin pathway sensing mechanisms in legume we examined the evolutionary context of representative legume KAI2s. We focused on the Pisum sativum genome that encodes distinct KAI2 gene copies and represents the diversity of legume KAI2 duplication events (Figure 1a and Figure S1). Notably, the legume lineage has undergone an independent duplication event resulting in distinct KAI2A and KAI2B protein receptors. We identified three KAI2 homologs in the pea genome52 that clearly group within the core KAI2 clade by phylogenetic analysis. One (Psat4g083040) renamed PsKAI2B, grouped in the same subclade as the legume KAI2Bs (including Lotus japonicus, Lj, KAI2B42) and two (Psat2g169960, termed (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 PsKAI2A and Psat3g014200) in the same subclade as the legume KAI2As (including LjKAI2A42) (Figure 1a and Figure S1). Psat3g014200 was very likely a pseudogene as the putative encoded protein is lacking 82 amino acids (aa) in the middle of the protein in comparison to PsKAI2A and PsKAI2B. By cloning PsKAI2A coding sequence (CDS) we identified 2 transcripts for this gene, corresponding to 2 splicing forms (Figure 1b). The transcript PsKAI2A.1 comes from intron splicing and produces a protein of 305 aa. Thus, this protein shows a C-terminal extension of 33 aa similar to LjKAI2A (Figure S2), missing in other KAI2 proteins. The PsKAI2A.2 transcript arises from the intron retention, which shows a premature STOP codon 2 nucleotides after the end of the first exon. This leads to a 272 aa protein showing a similar size to other KAI2 proteins described (Figure 1b and Figure S2). From this analysis, it is clear that the KAI2 clade has undergone an independent duplication event in the legume lineage resulting in these KAI2A and KAI2B forms (Figure S1a-b). To examine potential functional divergence between the PsKAI2A and PsKAI2B forms, we first analyzed the aa sequences and identified notable alterations in key residues, of which numerous are likely to be functional changes as indicated in later analyses (Figure S2). To further characterize divergence of these genes we studied the expression patterns of the two PsKAI2 forms in various tissues of the Pisum plant (Figure 1c-d). Interestingly, the expression of PsKAI2s revealed a ten-fold higher expression of PsKAI2A in comparison to PsKAI2B and distinct patterns between the two forms in the roots, suggesting sub-functionalization between PsKAI2A and PsKAI2B. PsKAI2 genes can rescue inhibition of hypocotyl elongation of kai2-2 Arabidopsis mutant To test the function of PsKAI2 proteins in planta a cross-species complementation was performed by transforming the Arabidopsis kai2-2 mutant with the 2 splicing forms of PsKAI2A (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 (PsKAI2A.1, PsKAI2A.2, and PsKAI2B, Figure 1e). The proteins were expressed as fusion proteins with mCitrine or HA epitope driven by the native AtKAI2 promoter (pAtKAI2). The widely described hypocotyl elongation assay37,43 was performed under low light conditions, which causes an elongated hypocotyl phenotype of the kai2-2 mutant when compared to Ler. All constructs completely restored the phenotype of the kai2-2 mutant to WT phenotype, except the pAtKAI2::PsKAI2A.1-6xHA construct which restored partially the phenotype of the kai2-2 mutant (Figure 1e). Because in Arabidopsis the stereoisomer of the synthetic strigolactone (−)- GR24 may act as KL mimic compound by triggering developmental responses via AtKAI211,43,53, we investigated hypocotyl elongation through the PsKAI2 proteins by quantifying hypocotyl length after (−)-GR24 treatment. Only the lines expressing AtKAI2 control protein were able to respond to the treatment whereas all the complemented lines with PsKAI2s did not significantly respond to (−)-GR24 (Figure 1e). These results suggest that PsKAI2 proteins are the functional orthologues of the Arabidopsis KAI2, however the differences in the ligand sensitivity between all expressed KAI2s were more elusive compared to the recently reported study in lotus42 and as suggested by our subsequent biochemical results. Biochemical data reveal altered ligand specificity and activity between PsKAI2s To investigate the functional specificity between Pisum KAI2 receptors, we have purified PsKAI2 recombinant proteins and investigated various ligand-interaction and ligand-enzymatic activities of the receptors (Figures 2-3 and Figures S3-S5). We first examined KAI2A and KAI2B ligand interactions via the thermal shift assay (DSF) with various KAI2/D14 family ligands including (+) and (−)-GR24 enantiomers (also known as GR245DS and GR24ent-5DS, respectively54) and (+)- and (−)-2’-epi-GR24 (also known as GR24ent-5DO and GR245DO, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 respectively) (Figure 2a-i). DSF analyses revealed PsKAI2B has an increased change in stability in the presence of (−)-GR24 compared to PsKAI2A which has little to no alteration (Figure 2c-d). Thus, PsKAI2B protein differs from its ortholog from lotus, which is not destabilized by (−)-GR2442 and suggests different ligand specificity among legumes. The other ligands and enantiomers induce no detectable shift in stability for either PsKAI2 proteins. In addition, an extensive interaction screen using intrinsic fluorescence further confirmed that only the (−)-GR24 stereoisomer interacts with PsKAI2 proteins (Figure 2j and Figure S4). The calculated Kd revealed that PsKAI2B has a better affinity for (−)-GR24 (Kd = 89.43 ± 12.13 μM) than PsKAI2A (115.40 ± 9.87 μM) as also indicated by the DSF assay. To further examine the catalytic activity of KAI2 enzymes, an enzymatic assay was performed by quantifying the hydrolytic activity of PsKAI2 towards distinct ligands. To that end, KAI2 proteins were incubated with (+)-GR24, (−)-GR24, (+)-2’-epi-GR24 and (−)-2’-epi-GR24 in presence of 1-indanol as an internal standard followed by ultraperformance liquid chromatography (UHPLC)/UV DAD analysis (Figure 3). The activity PsKAI2A and PsKAI2B was measured in comparison to AtD14, AtKAI2, and RMS3. These results show that PsKAI2A could only cleave (−)-GR24, however PsKAI2B is able to cleave (+)-GR24, (−)-GR24 and (−)- 2’-epi-GR24 stereoisomers. Unlike RMS3, AtD14 and AtKAI2 have no detectable cleavage for (+)-2’-epi-GR24, strongly indicating that PsKAI2s have different stereoselectivity. To further investigate the cleavage kinetics activity of PsKAI2 proteins, we performed an enzymatic assay with the pro-fluorescent probes that were previously designed for detecting SL perception mechanism55. Here, (±)-GC240 probe bearing one methyl group on the D-ring was used to measure hydrolysis activity by PsKAI2s, RMS3, AtD14, and AtKAI2 enzymes (Figure S5a-b). As expected, PsKAI2A showed no activity, similar to AtKAI2, as previously reported55. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Surprisingly, PsKAI2B is able to cleave (±)-GC240 probe in a similar manner as AtD14 and RMS3. It has been previously demonstrated that probes without a methyl group, such as dYLG, can serve as the hydrolysis substrate for AtKAI253. To that end, we used the (±)-GC486 probe bearing no methyl on D-ring, and notably, PsKAI2B was able to hydrolyze the probe, whereas PsKAI2A shows little to no activity (Figure S5c-d). Furthermore, PsKAI2A and AtKAI2 exhibit biphasic time course of fluorescence, consisting of an initial phase, followed by a plateau phase. By comparing the kinetics profiles, we noticed that with PsKAI2B, RMS3 and AtD14 proteins, the plateau is higher (1 µM versus 0.3 µ M of DiFMU), even if it takes PsKAI2B longer to reach this plateau (Figure S5c-d). Taken together with the comparative kinetic analysis, PsKAI2B hydrolysis activity is more similar to SL receptors and further highlights the distinct function compared to PsKAI2A. Structural insights into legume KAI2s divergence To elucidate the differential ligand selectivity between KAI2A and KAI2B, we first determined the legume crystal structure of Pisum sativum KAI2B at 1.6Å resolution (Figure 4 and Table 1). The PsKAI2B structure shares the canonical α/β hydrolase fold and is comprised of base and lid domains (Figure 4a). The core domain contains seven-stranded mixed β-sheets (β1–β7), five α- helices (αA, αB, αC, αE and αF) and five 310 helices (ŋ1, ŋ2, ŋ3, ŋ4, and ŋ5). The helical lid domain (residues 124–195, Figure S2) is positioned between strands β6 and β7 and forms two parallel layers of V-shaped helices (αD1-4) that create a deep pocket area adjoining the conserved catalytic Ser-His-Asp triad site (Figure 4a and Figure S2). Despite the sequence variation (77% similarity between PsKAI2B and AtKAI2, Figure S2), we did not observe major structural rearrangements between PsKAI2B and the previously determined Arabidopsis KAI2 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 structure48 as shown by an Root Mean Squared Deviation (RMSD) of 0.35 Å for superposition of backbone atoms (Figure 4b). Nonetheless, further structural comparative analyses have identified two unique residues alterations in positions 129 and 147 within the lid domain. These changes appear to marginally alter the backbone atoms and distinguish legume KAI2s family from other KAI2s species (Figure S2 and Figure 4b). The asparagine residue in position 129 is more variable within legume KAI2s, and alanine or serine in position 147 has diverged from bulky polar residues compared to other plant KAI2s. These amino acids alterations are likely to play role in downstream events rather than directly modulate distinct ligand perception. To further determine the differential ligand specificity between PsKAI2A and PsKAI2B, we utilized the PsKAI2B crystal structure reported here to generate a 3D model for PsKAI2A. As expected, PsKAI2A structure exhibits a similar backbone atom arrangement (RMSD of 0.34 Å) that parallels the PsKAI2B structure (Figure 5a). Nonetheless, we identified eight significant divergent amino acids between the two structures including residues involved in forming the ligand binding pocket as well as solvent-exposed surfaces (Figure 5b-d and Figure S6a-b). Because these variants are evolutionarily conserved across legume, the analysis of the underlined residues not only distinguishes between KAI2A and KAI2B in Pisum but can be extrapolated to all legume KAI2A/B diverged proteins. Structural comparative analysis within the ligand- binding pocket shows divergent solvent accessibility between PsKAI2A and PsKAI2B (Figure 5b). PsKAI2B exhibits a structural arrangement that results in a larger volume of the hydrophobic pocket (125.4 A3) yet with a smaller entrance circumference (30.3 Å) than PsKAI2A (114.8 A3 and 33.6 Å, respectively, Figure 5b). Further in silico docking experiments of (−)-GR24 with PsKAI2B results in a successful docking of the ligand that is totally buried in the pocket and positioned in a pre-hydrolysis orientation nearby the catalytic triad. In contrast, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 docking experiments of (−)-GR24 with PsKAI2A results in more restricted interaction where the ligand is partially outside the pocket (Figure S6c). Notably, there are five key residues that are found to directly alter the pocket morphology (Figure 5c and Figure S6a-b). Among these residues, L160/S190/M218 in PsKAI2A and the corresponding residues, M160/L190/L218 in PsKAI2B are of particular interest because of their functional implications in the pocket volume and solvent accessibility (Figure 5d). Residue 160 is positioned at the entrance of the ligand- binding pocket in helix αD2, thus the substitution of leucine (L160 in KAI2A) to methionine (M160 in KAI2B) results in modifying the circumference of PsKAI2B pocket entrance (Figure 5b-d). While both L160 and M160 represent aliphatic non-polar residues, the relative low hydrophobicity of methionine as well as its higher plasticity are likely to play major role in modifying the ligand pocket. The conserved legume divergence in residue 190 (S190 in PsKAI2A and L190 in PsKAI2B, Figure 5d) is positioned in helix αD4 and represents a major structural arrangement at the back of the ligand envelope. Because leucine has moderate flexibility compared to serine and much higher local hydrophobicity, this variation largely attributes to the changes in the pocket volume as well as fine-tunes available ligand orientations. Further sequence and structural analysis of the variant in position 218 (M218 in KAI2A and L218 in PsKAI2B) placed it in the center of the Asp loop56 (D-loop, region between β7 and αE, Figure 5c-d). In D14, the D-loop has been reported to affect SL perception and cleavage as well as impact protein-protein interactions in SL signaling51,56 PsKAI2B forms a complex with the D-OH of (−)-GR24 To further examine the molecular interaction of PsKAI2B with the enantiomeric GR24, we co- crystallized and solved the structure of PsKAI2B-(−)-GR24 at 2.0Å resolution (Figure 6a and (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Table 1). Electron density map analysis of the ligand-binding pocket revealed the existence of a unique ring-shaped occupancy that is contiguously linked to the catalytic serine (S95) (Figure 6a-b). The structural comparison of the backbone atoms between apo-PsKAI2B and PsKAI2B- (−)-GR24 did not reveal significant differences (Figure S7a) and is in agreement with previously reported apo and ligand bound D14/KAI2 crystal structures11,12,19,36,50. This striking similarity suggests that a major conformational change, if indeed occurs as suggested for D1451, may happen after the nucleophilic attack of the catalytic serine and the (−)-GR24 cleavage which is likely to be highly unstable state for crystal lattice formation. Further analysis suggests that 5-hydroxy-3-methylbutenolide (D-OH ring), resulting from the (−)-GR24 cleavage, is trapped in the catalytic site (Figure S7b-d). The lack of a defined electron density fitting with the tricyclic lactone (ABC ring) may exclude the presence of the intact GR24 molecule. Other compounds present in the crystallization condition were tested for their ability to occupy the SER95-contiguous density, and D-OH group of (−)-GR24 demonstrated the highest correlation coefficient calculated score and the best fit in the PsKAI2B co-crystal structure (Figure S7c). Additional tests of D-OH binding including in silico docking simulations and analyses revealed a high affinity for D-OH in a specific orientation and in agreement with the structure presented here (Figure S7d). The most probable orientation of the D-OH positions the methyl group (C4’) together with the hydroxyl group of D-OH towards the very bottom/back of the pocket near the catalytic serine, where the O5” atom is coordinated by both N atoms of F26 and V96 (Figure 6b-c). The hemiacetal group (C2’) of D-OH is oriented towards the access groove of the pocket with angles (between carbon and oxygen atoms) supporting the captured D- OH in an orientation in which cleavage of the intact (−)-GR24 may have taken place. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 The C5’ of D-OH appears to form a covalent bond with Oγ of S95 (dark gray line in Figure 6c) and generates a tetrahedral carbon atom. The overall positioning of this molecule is strictly coordinated by F26, H246, G25, and I193 residues. Remarkably, the electron density around the S95 does not display an open D-OH group (2,4,4,-trihydroxy- 3-methyl-3-butenal as previously described for OsD1419) that could directly result from the nucleophilic attack event, but rather correspond to a cyclized D-OH ring linked to the S95. This D-OH ring is likely to be formed by water addition to the carbonyl group at C2’ that is generated after cleavage of the enol function and cyclization to re-form the butenolide (Figure 6d). The formation of this adduct could also serve as an intermediate before the transfer to the histidine residue. Taken together, our crystal structure highlights a potential new intermediate in the ligand cleavage mechanism by KAI2 proteins. Discussion The emerging characterization of karrikin/KL signaling in non-fire ecology plant receptors has been of great interest in the plant signaling field. While there are many missing pieces in the karrikin signaling puzzle, it is clear that KAI2 serves as the key sensor in this pathway. Furthermore, the coevolution between receptors and ligands in diverse contexts throughout plant evolution is of great interest in many biological fields. The limited natural occurrence of karrikin molecules and the evolutionary conservation of KAI2 receptors throughout land plants suggest that the function of KAI2s are preserved to regulate plant development and response to stresses by perceiving an endogenous ligand(s) (KL). Here, we identified and characterized the first KAI2 receptors in pea (P. sativum) that serve as representatives of the independent duplication event and subsequent sub-functionalization in legumes. The identification of both PsKAI2A and (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 PsKAI2B genes corroborates the recent finding that the KAI2 gene duplication event occurred in Papilionoidaea before the diversification of legumes42. Interestingly, similarities in expression patterns are found between pea and lotus with global higher expression of the A clade in comparison to the B clade and specific expression in roots of the B clade in comparison to the A clade. Further studies with PsKAI2A/B mutants i.e. for the establishment of the symbioses in Pisum roots could explain this differential expression in roots, as no clear root phenotype has been observed in lotus. The occurrences of molecular coevolution of ligands and their specialized receptors have been previously demonstrated for phytohormones such as SL57, ABA58, GA59, and more recently, karrikins39,42. Even though the exact identity of KL ligands remains to be revealed, it is likely that the ligands share a common chemical composition to SLs. It has been shown that the synthetic SL analogue, rac-GR24, can function by binding KAI2 in Arabidopsis11,43,53. In this work we carried out a comprehensive biochemical interrogation and found that PsKAI2B can form stronger interactions with the enantiomeric GR24, (−)-GR24, compared to PsKAI2A. Moreover, we found that while both KAI2s are active hydrolases, they have distinct binding affinity and stereoselectivity towards GR24 stereoisomers. These findings indicate yet again, that sub-functionalization of KAI2s via substitutions in only few amino acids can greatly alter ligand affinity, binding, enzymatic activity, and probably signaling with downstream partners38,42. KAI2/D14 crystal structures have greatly impacted our understanding of these receptor ligand-binding pockets and their ability to not only accommodate, but also hydrolyze certain ligands11,12,19,31,36,38,44–51. The first crystal structure of legume PsKAI2B together with the PsKAI2A homology model reported here, further substantiates the structural basis of this differential ligand selectivity. We identified conserved key amino acid changes that alter the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 shape of the pocket and confer altered ligand specificities. These novel atomic structures of KAI2 enabled us to analyze the distinction between key residues L160/S190/M218 in PsKAI2A and the corresponding residues M160/L190/L218 in PsKAI2B. These findings further support recent in planta and biochemical studies that demonstrate that residues 160 and 190 are required for differential ligand specificity between lotus KAI2A and KAI2B42. Furthermore, residue 190 was also identified in the parasitic plant Striga hermonthica as being involved in forming differential specificity pockets between the highly variable and functionally distinct ShKAI2s, referred to as HTLs35,36. While the changes in positions 160 and 190 directly reshape the pocket morphology, the variant in position 218 is located in the center of the D-loop56. The D-loop contains the aspartic acid of the catalytic triad (D217) and has been suggested to play an important role in SL perception and cleavage by D14 as well as downstream protein-protein interactions51,56. Therefore, the conserved substitution of KAI2A and KAI2B in M218 to L218 respectively across legumes not only contributes to ligand selectivity and hydrolysis, but may also affect downstream interaction(s). Based on the analogy with the D14-MAX2 perception mechanism, the KAI2 receptor is likely to adopt different conformational states upon ligand binding and cleavage. As such, the identification of unique residue variations in the lid (between KAI2A and KAI2B, respectively in positions 129 and 147) reported here, infer a sub-functionalization in the receptor regions that are likely to be involved in MAX2 and/or SMAX1 and/or SMXL2 downstream interactions. Therefore, it remains to be further elucidated whether these KAI2A/B distinctive residues play a role in fine tuning the formation of the protein complex with MAX2-SMAX1/SMXL2. The crystal structure of ligand bound PsKAI2B provides a mechanistic view of perception and cleavage by KAI2s. Based on the crystallization conditions and following a (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 detailed investigation of the electron density, we were able to overrule common chemicals and place the (−)-GR24 D-OH ring with higher relative fitting values than other components. The absence of positive electron density peaks corresponding to the intact (−)-GR24, and thus the presence of only the D-OH, raise questions of whether the S95-D-OH adduct recapitulates a pre- or post- cleavage intermediate state of (−)-GR24. The possibility that the trapped molecule represents a post cleavage state is intriguing and may provide a new intermediate state where S95 is covalently linked to the cleavage product. As such, the S95-D-OH adduct could explain the single turnover cycle that was observed for KAI2s in this study. Previous studies of the single turnover activity of D14 suggest that a covalent intermediate is formed between the catalytic histidine and serine51. The chemical similarity of the D-OH butenolide ring of karrikin and GR24 suggests that the KL signal may share a parallel structure and perhaps will be biochemically processed via multiple steps and intermediate adducts. Therefore, the significance of this study may also reveal a similar mechanism regarding SL perception and cleavage by D14. Our data in planta clearly demonstrate that PsKAI2A and PsKAI2B genes can replace the AtKAI2 ortholog, yet we were unable to conclude KAI2A/B ligand binding specificity by using the KL mimic compound (−)-GR24. The ambiguity in detecting ligand specificity in vivo is likely to remain a challenge in the karrikin field until the identification of endogenous KL. Once KL(s) will be revealed, it will be important to test the response of the Arabidopsis complementation lines to KL(s) and further validate the function of the key residues L160/S190/M218 in planta. Additionally, future studies with pea mutants will elucidate PsKAI2A and PsKAI2B functional divergence and reveal the distinct physiological functions, and in particularly the symbiotic relationship with AM fungi, that could shed light on the differential expression patterns in the roots. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 This study illuminates the complex evolution of KAI2s in plants and particularly in legumes. We provide comprehensive structural and biochemical evidence of the specialization and sub-functionalization of KAI2 receptors and their sensitivity to butenolide compounds. Because of their ability to fix atmospheric nitrogen through plant–rhizobium symbiosis, legume crops such as pea or fava bean are attracting increasing attention for their agroecological potential. Thus, better understanding of KAR/KL perception and signaling in these staple crops may have far-reaching impacts on agro-systems and food security. Methods Protein sequence alignment and phylogenetic tree analyses Representative KAI2 sequences of 41 amino acid sequences were downloaded from Phytozome and specific genome databases as shown in Figure S1. Alignment was performed in MEGA X60 using the MUSCLE multiple sequence alignment algorithm61. Sequence alignment graphics were generated using CLC Genomics Workbench v12. The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model62. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model, and then selecting the topology with superior log likelihood value. The percentage of trees in which the associated taxa clustered together is shown next to the branches63. Tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Analysis involved 41 amino acid sequences with a total of 327 positions in the final dataset. Evolutionary analyses were conducted in MEGA X60. Constructs and generation of transgenic lines (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 The expression vectors for transgenic Arabidopsis were constructed by MultiSite Gateway Three-Fragment Vector Construction kit (Invitrogen). AtKAI2 and PsKAI2A.2 constructs were tagged with 6xHA epitope tag or mCitrine protein at their C-terminus. Lines were resistant to hygromycin. The AtKAI2 native promoter (0.7 kb) was amplified by PCR with the primer AtKAI2_promo_attB4 (5’-ggggacaactttgtatagaaaagttgccTTCACGACCAGTATGGTTTACTCA- 3‘) and AtKAI2_promo_attB1R (5’- ggggactgcttttttgtacaaacttgcCTCTCTAAAGAAGATTCTTCTCTGGTT-3‘) from Col-0 genomic DNA and cloned into the pDONR-P4P1R vector, using Gateway recombination (Invitrogen). The 6XHA with linker and mCitrine tags were cloned into pDONR-P2RP3 (Invitrogen) as described in de Saint Germain et al.55. PsKAI2A.1, PsKAI2A.2 and PsKAI2B CDS were PCR amplified from Pisum cv. Térèse cDNA with the primers PsKAI2A_attB1 (5’- GGGGACAAGTTTGTACAAAAAAGCAGGCTtcATGGGGATAGTGGAAGAAGCA-3‘); PsKAI2A.1_attB2_STOP (5’-ggggaccactttgtacaagaaagctgggtcCAAATCTGCCTCAAGTTTCA- 3‘); PsKAI2A.2_attB2_STOP (5’- ggggaccactttgtacaagaaagctgggtcCCTTATTGGCTCAATATTAA-3‘); PsKAI2b_attB1 (5’- GGGGACAAGTTTGTACAAAAAAGCAGGCTtcATGGGAATAGTGGAAGAAGC-3‘); PsKAI2B_attB2_STOP (5’-ggggaccactttgtacaagaaagctgggtcAGCTACAATATCATAACGAA- 3‘); and the AtKAI2 CDS was PCR amplified from Col-0 cDNA with the primers AtKAI2_attB1 (5’-ggggacaagtttgtacaaaaaagcaggcttcATGGGTGTGGTAGAAGAAGC-3‘) and AtKAI2_attB2_ΔS (5’-ggggaccactttgtacaagaaagctgggtcCATAGCAATGTCATTACGAAT-3‘) and then recombined into the pDONR221 vector (Invitrogen). The suitable combination of AtKAI2 native promoter, AtKAI2, PsKAI2A.1, PsKAI2A.2 or PsKAI2B and 6XHA or mCitrine (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 was cloned into the pH7m34GW final destination vectors by using the three fragment recombination system64 and were thusly named pAtKAI2::AtKAI2-6xHA, pAtKAI2::AtKAI2- mCitrine, pAtKAI2::PsKAI2A.1-6xHA, pAtKAI2::PsKAI2b-6xHA and pAtKAI2::PsKAI2A.2- mCitrine. Transformation of Arabidopsis Atkai2-2 mutant was performed according to the conventional floral dipping method65, with Agrobacterium strain GV3101. For each construct, only a few independent T1 lines were isolated and all lines were selected in T2. Phenotypic analysis shown in Figure 1e was performed on the T3 homozygous lines. Hypocotyl elongation assays. Arabidopsis seeds were surface sterilized by consecutive treatments of 5 min 70% (v/v) ethanol with 0.05% (w/v) sodium dodecyl sulfate (SDS) and 5 min 95% (v/v) ethanol. Then seeds were sown on half-strength Murashige and Skoog (½ MS) media (Duchefa Biochemie) containing 1% agar, supplemented with 1 μM (−)-GR24 or with 0.01 % DMSO (control). Seeds were stratified at 4 °C (2 days in dark) then transferred to the growth chamber at 22 °C, under 20-30 µ E /m2/sec of white light in long day conditions (16 hr light/ 8 hr dark). Seedlings were photographed and hypocotyl lengths were quantified using ImageJ66. 2 plates of 10-12 seeds were sown for each genotype x treatment. Using Student t-tests, no statistically significantly different means were detected between plates. The data from the 20-24 seedlings were then used for a one-way ANOVA. Chemicals (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Enantiopure GR24 isomers were obtained as described in de Saint Germain et al.55 or purchased from StrigoLab. Profluorescent probes (GC240, GC486) were obtained as described in de Saint Germain et al.55. Protein preparation and purification PsKAI2A.2 and PsKAI2B were independently cloned and expressed as a 6× His-SUMO fusion proteins from the expression vector pAL (Addgene). These were cloned utilizing primers PsKAI2A_F (5’-aaaacctctacttccaatcgATGGGGATAGTGGAAGAAG-3‘), PsKAI2A.1_R (5’- ccacactcatcctccggTTACAAATCTGCCTCAAGTTTC-3‘), PsKAI2A.2_R (5’- ccacactcatcctccggTTACCTTATTGGCTCAATATTAAGTTG-3‘), PsKAI2B_F (5’- aaaacctctacttccaatcgATGGGAATAGTGGAAGAAGC-3‘), and PsKAI2B_R (5’- ccacactcatcctccggTCAAGCTACAATATCATAACGAATG-3‘). BL21 (DE3) cells transformed with the expression plasmid were grown in LB broth at 16 °C to an OD600 of ∼0.8 and induced with 0.2 mM IPTG for 16 h. Cells were harvested, re-suspended and lysed in extract buffer (50 mM Tris, pH 8.0, 200 mM NaCl, 5 mM imidazole, 4% Glycerol). All His-SUMO-PsKAI2s were isolated from soluble cell lysate by Ni-NTA resin. The His- SUMO-PsKAI2 was eluted with 250 mM imidazole and subjected to anion-exchange. The eluted protein was than cleaved with TEV (tobacco etch virus) protease overnight at 4 °C. The cleaved His-SUMO tag was removed by passing through a Nickel Sepharose and PsKAI2 was further purified by chromatography through a Superdex-200 gel filtration column in 20 mM HEPES, pH 7.2, 150 mM NaCl, 5 mM DTT, 1% Glycerol. All proteins were concentrated by ultrafiltration to 3–10 mg/mL−1. RMS3, AtD14, AtKAI2 were expressed in bacteria with TEV cleavable GST tag, purified and used as described in de Saint Germain et al.55. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Enzymatic degradation of GR24 isomers by purified proteins Ligands (10 µM) were incubated without and with purified proteins (5 µM) for 150 min at 25 ºC in PBS (0.1 mL, pH 6.8) in presence of (±)-1-indanol (100 µM) as the internal standard. The solutions were acidified to pH 1 with 10% trifluoroacetic acid in CH3CN (v/v) (2 µ L) to quench the reaction and centrifuged (12 min, 12,000 tr/min). Thereafter the samples were subjected to RP-UPLC-MS analyses using Ultra Performance Liquid Chromatography system equipped with a PDA and a Triple Quadrupole mass spectrometer Detector (Acquity UPLC-TQD, Waters, USA). RP-UPLC (HSS C18 column, 1.8 μm, 2.1 mm × 50 mm) with 0.1% formic acid in CH3CN and 0.1% formic acid in water (aq. FA, 0.1%, v/v, pH 2.8) as eluents [10% CH3CN, followed by linear gradient from 10 to 100% of CH3CN (4 min)] was carried out at a flow rate of 0.6 mL/min. The detection was performed by PDA using the TQD mass spectrometer operated in Electrospray ionization positive mode at 3.2 kV capillary voltage. The cone voltage and collision energy were optimized to maximize the signal and were respectively 20 V for cone voltage and 12 eV for collision energy and the collision gas used was argon at a pressure maintained near 4.5.10-3 mBar. Enzymatic assay with pro-fluorescent probes Enzymatic assay and analysis have been carried out as described in de Saint Germain et al.55, using a TriStar LB 941 Multimode Microplate Reader from Berthold Technologies. The experiments were repeated three times. Protein melting temperatures (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Differential Scanning Fluorimetry (DSF) experiments were performed on a CFX96 TouchTM Real-Time PCR Detection System (Bio-Rad Laboratories, Inc., Hercules, California, USA) using excitation and emission wavelengths of 490 and 575 nm, respectively. Sypro Orange (λex/λem : 470/570 nm; Life Technologies Co., Carlsbad, California, USA) was used as the reporter dye. Samples were heat-denatured using a linear 25 to 95 °C gradient at a rate of 1.3 °C per minute after incubation at 25 °C for 30 min in the absence of light. The denaturation curve was obtained using CFX manager™ software. Final reaction mixtures were prepared in triplicate in 96-well white microplates, and each reaction was carried out in 20 μL scale in Phosphate buffer saline (PBS) (100 mM Phosphate, pH 6.8, 150 mM NaCl) containing 6 μg protein (such that final reactions contained 10 μM protein), 0-1000 μM ligand (as shown on the Figure 2a-h), 4% (v/v) DMSO, and 0.008 μL Sypro Orange. Plates were incubated in darkness for 30 minutes before analysis. In the control reaction, DMSO was added instead of ligand. The experiments were repeated three times. Intrinsic tryptophan fluorescence assays and kinetics Intrinsic tryptophan fluorescence assays and determination of the dissociation constant KD has been performed as described in de Saint Germain et al.55, using the Spark® Multimode Microplate Reader from Tecan. Crystallization, data collection and structure determination The crystals of PsKAI2B were grown at 25 °C by the hanging-drop vapor diffusion method with 1.0 μL purified protein sample mixed with an equal volume of reservoir solution containing 0.1 M HEPES pH 7.5, 2.75% v/v PEG 4000, 2.75% v/v PEG-ME 5000. The crystals of PsKAI2B in (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 complex with (−)-GR24 were grown at 25 °C by the hanging-drop vapor diffusion method with 1.0 μL purified protein complex (preincubated with 1 mM (−)-GR24, StrigoLab) and mixed with an equal volume of reservoir solution containing 0.1 M HEPES pH 7.5, 2.75% PEG 2000, 2.75% v/v PEG-ME 5000, 1mM (−)-GR24. Crystals of maximum size were obtained and harvested after 2 weeks from the reservoir solution with additional 20% MPD serving as cryoprotectant. X- ray diffraction data was integrated and scaled with HKL2000 package67. PsKAI2s crystal structures were determined by molecular replacement using the AtKAI2 model (PDB: 5Z9H)68 as the search model. All structural models were manually built, refined, and rebuilt with PHENIX69 and COOT70. Structural biology modelling and analyses Model structure illustrations were made by PyMOL71. PsKAI2A model structure was generated using iTASSER72–74. Ligand identification, ligand-binding pocket analyses, and computing solvent accessible surface values analyses were carried out using Phenix LigandFit69,75,76, CASTp software77,78, and AutoDock Vina79, respectively. LigPlot+ program80 was used for 2-D representation of protein-ligand interactions from standard PDB data format. Data Availability The atomic coordinates of apo and ligand-bound forms of PsKAI2 structures has been deposited in the Protein Data Bank with accession codes 7K2Z and 7K38, respectively. All relevant data are available from corresponding authors upon request. Acknowledgements (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 We thank the beamline staff at ALS for help with data collection. This work is supported by UC Davis new faculty start-up funds. The Shabek laboratory is supported by National Science Foundation. This work is supported by the Institut Jean-Pierre Bourgin's Plant Observatory technological platforms. F.-D.B. is supported by CHARM3AT Labex program (ANR-11-LABX- 39). A.d.S.G. is supported by AgreenSkills from the European Union in the framework of the Marie-Curie FP7 COFUND People Programme and fellowship from Saclay Plant Sciences (ANR-17-EUR-0007). Author Contributions AM.G., F.-D.B., C.R., A.dS.G., and N.S. conceived and designed the experiments. N.S., A.dS.G., and AM.G. conducted the protein purification, biochemical and crystallization experiments. N.S. and AM.G. determined and analyzed crystal structures and conducted in silico studies. AM.G., A.dS.G., and N.S. wrote the manuscript with the help from all other co-authors. Author Information Authors declare no competing interests. Correspondence and requests for materials should be addressed to N.S. (nshabek@ucdavis.edu) and A.dS.G. (Alexandre.De-Saint-Germain@inrae.fr). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 References 1. Flematti, G. R., Ghisalberti, E. L., Dixon, K. W. & Trengove, R. D. A compound from smoke that promotes seed germination. Science (80-. ). 305, 977 (2004). 2. Nelson, D. C. et al. Karrikins enhance light responses during germination and seedling development in Arabidopsis thaliana. Proc. Natl. Acad. Sci. U. S. A. 107, 7095–7100 (2010). 3. Sun, X. D. & Ni, M. HYPOSENSITIVE to LIGHT, an alpha/beta fold protein, acts downstream of ELONGATED HYPOCOTYL 5 to regulate seedling de-etiolation. Mol. Plant 4, 116–126 (2011). 4. Waters, M. T. et al. Specialisation within the DWARF14 protein family confers distinct responses to karrikins and strigolactones in Arabidopsis. Development 139, 1285–1295 (2012). 5. Flematti, G. R. et al. Preparation of 2H-furo[2,3-c]pyran-2-one derivatives and evaluation of their germination-promoting activity. J. Agric. Food Chem. 55, 2189–2194 (2007). 6. Flematti, G. R., Scaffidi, A., Dixon, K. W., Smith, S. M. & Ghisalberti, E. L. Production of the seed germination stimulant karrikinolide from combustion of simple carbohydrates. J. Agric. Food Chem. 59, 1195–1198 (2011). 7. Dixon, K. W., Merritt, D. J., Flematti, G. R. & Ghisalberti, E. L. Karrikinolide - A phytoreactive compound derived from smoke with applications in horticulture, ecological restoration and agriculture. Acta Hortic. 813, (2009). 8. Stevens, J. C., Merritt, D. J., Flematti, G. R., Ghisalberti, E. L. & Dixon, K. W. Seed germination of agricultural weeds is promoted by the butenolide 3-methyl-2H-furo[2,3- c]pyran-2-one under laboratory and field conditions. Plant Soil 298, 113–124 (2007). 9. Long, R. L. et al. Prior hydration of Brassica tournefortii seeds reduces the stimulatory effect of karrikinolide on germination and increases seed sensitivity to abscisic acid. Ann. Bot. 105, 1063–1070 (2010). 10. Nelson, D. C. et al. F-box protein MAX2 has dual roles in karrikin and strigolactone signaling in Arabidopsis thaliana. Proc. Natl. Acad. Sci. 108, 8897–8902 (2011). 11. Guo, Y., Zheng, Z., La Clair, J. J., Chory, J. & Noel, J. P. Smoke-derived karrikin perception by the a/B hydrolase KAI2 from Arabidopsis. Proc. Natl. Acad. Sci. 110, 8284–8289 (2013). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 12. Kagiyama, M. et al. Structures of D14 and D14L in the strigolactone and karrikin signaling pathways. Genes to Cells 18, 147–160 (2013). 13. Stanga, J. P., Smith, S. M., Briggs, W. R. & Nelson, D. C. SUPPRESSOR OF MORE AXILLARY GROWTH2 1 Controls Seed Germination and Seedling Development in Arabidopsis. Plant Physiol. 163, 318–330 (2013). 14. Gutjahr, C. et al. Rice perception of symbiotic arbuscular mycorrhizal fungi requires the karrikin receptor complex. Science (80-. ). 350, 1521–1524 (2015). 15. Li, W. et al. The karrikin receptor KAI2 promotes drought resistance in Arabidopsis thaliana. PLoS Genet. 13, e1007076 (2017). 16. Wang, L., Waters, M. T. & Smith, S. M. Karrikin-KAI2 signalling provides Arabidopsis seeds with tolerance to abiotic stress and inhibits germination under conditions unfavourable to seedling establishment. New Phytol. 219, 605–618 (2018). 17. Scaffidi, A. et al. Exploring the molecular mechanism of karrikins and strigolactones. Bioorganic Med. Chem. Lett. 22, 3743–3746 (2012). 18. Yoneyama, K. Recent progress in the chemistry and biochemistry of strigolactones. J. Pestic. Sci. 45, 45–53 (2020). 19. Zhao, L. H. et al. Crystal structures of two phytohormone signal-transducing α/β hydrolases: Karrikin-signaling KAI2 and strigolactone-signaling DWARF14. Cell Res. 23, 436–439 (2013). 20. Cook, C. E., Whichard, L. P., Turner, B., Wall, M. E. & Egley, G. H. Germination of witchweed (striga lutea lour.): Isolation and properties of a potent stimulant. Science (80-. ). 154, 1189–1190 (1966). 21. Sorefan, K. et al. MAX4 and RMS1 are ortholosgous dioxygenase-like genes that regulate shoot branching in Arabidopsis and pea. Genes Dev. 17, 1469–1474 (2003). 22. Kapulnik, Y. et al. Strigolactones interact with ethylene and auxin in regulating root-hair elongation in Arabidopsis. J. Exp. Bot. (2011) doi:10.1093/jxb/erq464. 23. Rasmussen, A. et al. Strigolactones suppress adventitious rooting in arabidopsis and pea. Plant Physiol. 158, 1976–1987 (2012). 24. Lopez-Obando, M., Ligerot, Y., Bonhomme, S., Boyer, F. D. & Rameau, C. Strigolactone biosynthesis and signaling in plant development. Dev. (2015) doi:10.1242/dev.120006. 25. Akiyama, K., Matsuzaki, K. I. & Hayashi, H. Plant sesquiterpenes induce hyphal (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 branching in arbuscular mycorrhizal fungi. Nature 435, 824–827 (2005). 26. Gomez-Roldan, V. et al. Strigolactone inhibition of shoot branching. Nature 455, 189–194 (2008). 27. Arite, T. et al. D14, a strigolactone-Insensitive mutant of rice, shows an accelerated outgrowth of tillers. Plant Cell Physiol. 50, 1416–1424 (2009). 28. Besserer, A. et al. Strigolactones stimulate arbuscular mycorrhizal fungi by activating mitochondria. PLoS Biol. 4, e226 (2006). 29. Li, S. W., Xue, L., Xu, S., Feng, H. & An, L. Mediators, genes and signaling in adventitious rooting. Bot. Rev. 75, 230–247 (2009). 30. Agusti, J. et al. Strigolactone signaling is required for auxin-dependent stimulation of secondary growth in plants. Proc. Natl. Acad. Sci. U. S. A. 180, 20242–20247 (2011). 31. Hamiaux, C. et al. DAD2 is an α/β hydrolase likely to be involved in the perception of the plant branching hormone, strigolactone. Curr. Biol. 22, 2032–2036 (2012). 32. Kapulnik, Y. et al. Strigolactones affect lateral root formation and root-hair elongation in Arabidopsis. Planta 233, 209–216 (2011). 33. Bythell-Douglas, R. et al. Evolution of strigolactone receptors by gradual neo- functionalization of KAI2 paralogues. BMC Biol. 15, 1–21 (2017). 34. Swarbreck, S. M., Guerringue, Y., Matthus, E., Jamieson, F. J. C. & Davies, J. M. Impairment in karrikin but not strigolactone sensing enhances root skewing in Arabidopsis thaliana. Plant J. 98, 607–621 (2019). 35. Toh, S. et al. Structure-function analysis identifies highly sensitive strigolactone receptors in Striga. Science (80-. ). 350, 203–207 (2015). 36. Xu, Y. et al. Structural basis of unique ligand specificity of KAI2-like protein from parasitic weed Striga hermonthica. Sci. Rep. 6, 1–9 (2016). 37. Waters, M. T. et al. A selaginella moellendorffii ortholog of KARRIKIN INSENSITIVE2 functions in arabidopsis development but cannot mediate responses to karrikins or strigolactones. Plant Cell 27, 1925–1944 (2015). 38. Bürger, M. et al. Structural Basis of Karrikin and Non-natural Strigolactone Perception in Physcomitrella patens. Cell Rep. 26, 855–865 (2019). 39. Sun, Y. K. et al. Divergent receptor proteins confer responses to different karrikins in two ephemeral weeds. Nat. Commun. 11, (2020). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 40. de Saint Germain, Alexandre, Jacobs, A., Brun, G. & Boyer, F.-D. A Phelipanche ramosa KAI2 Protein Perceives enzymatically Strigolactones and 2 Isothiocyanates. bioRxiv (2020) doi:10.1101/2020.06.09.136473. 41. Sun, Y. K., Flematti, G. R., Smith, S. M. & Waters, M. T. Reporter gene-facilitated detection of compounds in arabidopsis leaf extracts that activate the karrikin signaling pathway. Front. Plant Sci. 7, 1799 (2016). 42. Carbonnel, S. et al. Lotus japonicus karrikin receptors display divergent ligand-binding specificities and organ-dependent redundancy. bioRxiv 754937 (2020) doi:10.1101/754937. 43. Conn, C. E. & Nelson, D. C. Evidence that KARRIKIN-INSENSITIVE2 (KAI2) Receptors may Perceive an Unknown Signal that is not Karrikin or Strigolactone. Front. Plant Sci. 6, 1–7 (2016). 44. Shabek, N. et al. Structural plasticity of D3–D14 ubiquitin ligase in strigolactone signalling. Nature 563, 652–656 (2018). 45. Xu, Y. et al. Structural analysis of HTL and D14 proteins reveals the basis for ligand selectivity in Striga. Nat. Commun. 9, 3947 (2018). 46. Takeuchi, J. et al. Rationally designed strigolactone analogs as antagonists of the D14 receptor. Plant Cell Physiol. 59, 1545–1554 (2018). 47. Hamiaux, C. et al. Inhibition of strigolactone receptors by N-phenylanthranilic acid derivatives: Structural and functional insights. J. Biol. Chem. 293, 6530–6543 (2018). 48. Bythell-Douglas, R. et al. The Structure of the Karrikin-Insensitive Protein (KAI2) in Arabidopsis thaliana. PLoS One 8, e54758 (2013). 49. Nakamura, H. et al. Molecular mechanism of strigolactone perception by DWARF14. Nat. Commun. 4, (2013). 50. Zhao, L. H. et al. Destabilization of strigolactone receptor DWARF14 by binding of ligand and E3-ligase signaling effector DWARF3. Cell Res. 25, 1219–1236 (2015). 51. Yao, R. et al. DWARF14 is a non-canonical hormone receptor for strigolactone. Nature 536, 469–473 (2016). 52. Kreplak, J. et al. A reference genome for pea provides insight into legume genome evolution. Nat. Genet. 51, 1411–1422 (2019). 53. Yao, J. et al. An allelic series at the KARRIKIN INSENSITIVE 2 locus of Arabidopsis (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 thaliana decouples ligand hydrolysis and receptor degradation from downstream signalling. Plant J. 96, 75–89 (2018). 54. Scaffidi, A. et al. Strigolactone hormones and their stereoisomers signal through two related receptor proteins to induce different physiological responses in arabidopsis. Plant Physiol. 165, 1221–1232 (2014). 55. De Saint Germain, A. et al. An histidine covalent receptor and butenolide complex mediates strigolactone perception. Nat. Chem. Biol. 12, 787–794 (2016). 56. Seto, Y. et al. Strigolactone perception and deactivation by a hydrolase receptor DWARF14. Nat. Commun. 10, 191 (2019). 57. Conn, C. E. et al. Convergent evolution of strigolactone perception enabled host detection in parasitic plants. Science (80-. ). 349, 540–543 (2015). 58. Weng, J. K., Ye, M., Li, B. & Noel, J. P. Co-evolution of Hormone Metabolism and Signaling Networks Expands Plant Adaptive Plasticity. Cell 166, 881–893 (2016). 59. Yoshida, H. et al. Evolution and diversification of the plant gibberellin receptor GID1. Proc. Natl. Acad. Sci. U. S. A. 115, E7844–E7853 (2018). 60. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547– 1549 (2018). 61. Edgar, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 62. Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8, 275–282 (1992). 63. Felsenstein, J. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution (N. Y). 39, 783–791 (1985). 64. Karimi, M., Bleys, A., Vanderhaeghen, R. & Hilson, P. Building blocks for plant gene assembly. Plant Physiol. 145, 1183–1191 (2007). 65. Clough, S. J. & Bent, A. F. Floral dip: A simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 16, 735–743 (1998). 66. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012). 67. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 mode. Methods Enzymol. 276, 307–326 (1997). 68. Lee, I. et al. A missense allele of KARRIKIN-INSENSITIVE2 impairs ligand-binding and downstream signaling in Arabidopsis thaliana. J. Exp. Bot. 69, 3609–3623 (2018). 69. Adams, P. D. et al. PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. Sect. D Biol. Crystallogr. 66, 213–221 (2010). 70. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. Sect. D Biol. Crystallogr. 66, 486–501 (2010). 71. DeLano, W. L. The PyMOL Molecular Graphics System, Version 2.3. Schrödinger LLC (2020). 72. Yang, J. & Zhang, Y. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res. 43, W174–W181 (2015). 73. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 (2010). 74. Yang, J. et al. The I-TASSER suite: Protein structure and function prediction. Nat. Methods 12, 7–8 (2014). 75. Moriarty, N. W., Grosse-Kunstleve, R. W. & Adams, P. D. Electronic ligand builder and optimization workbench (eLBOW): A tool for ligand coordinate and restraint generation. Acta Crystallogr. Sect. D Biol. Crystallogr. 65, 1074–1080 (2009). 76. Terwilliger, T. C., Klei, H., Adams, P. D., Moriarty, N. W. & Cohn, J. D. Automated ligand fitting by core-fragment fitting and extension into density. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 915–922 (2006). 77. Binkowski, T. A., Naghibzadeh, S. & Liang, J. CASTp: Computed Atlas of Surface Topography of proteins. Nucleic Acids Res. 31, 3352–3355 (2003). 78. Dundas, J. et al. CASTp: Computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res. 34, W116–W118 (2006). 79. Steffen, C. et al. AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility. J. Comput. Chem. 30, 2785–2791 (2010). 80. Laskowski, R. A. & Swindells, M. B. LigPlot+: Multiple ligand-protein interaction diagrams for drug discovery. J. Chem. Inf. Model. 51, 2778–2786 (2011). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Figure Legends Figure 1. Evolutionary analysis and differential expression of the legume Pisum sativum KAI2s. (a) Maximum likelihood phylogeny of 24 representative KAI2 amino acid sequences. Node values represent percentage of trees in which the associated taxa clustered together. Vertical rectangles highlight distinct KAI2 family clades. Black circle indicates legume duplication event. Pink and green circles mark the position of PsKAI2As and PsKAI2B respectively. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. (b) PsKAI2A and PsKAI2B are homologues to AtKAI2 and encode α- β/hydrolases. Schematic representation of the PsKAI2A and PsKAI2B genes; Exons are in thick pink and green lines, intron colored in thin gray lines and UTR regions shown as thick gray lines. Bases are numbered from the start codon. PsKAI2A shows 2 splicing variants. Spliced introns are shown as bent (“V”) lines. Bold lines represent intron retention. Inverted triangle (▼) indicates premature termination codons. (c-d) Differential expression pattern of PsKAI2A (c, pink) and PsKAI2B (d, green). Transcript levels in the different tissues of 21 old wild-type Pisum sativum plants (cv. Terese) were determined by real-time PCR, relative to PsEF1α. Data are means ± SE (n = 2 pools of 8 plants). Inset drawing of a node showing the different parts of the pea compound leaf. (e) Hypocotyl length of 7-day-old seedlings grown under low light at 21 °C. Data are means ± SE (n = 20-24; 2 plates of 10-12 seedlings per plate). Grey bars: Mock (DMSO), orange bars: (−)-GR24 (1µM). Complementation assays using the AtKAI2 promoter to express AtKAI2 (control) or PsKAI2 genes as noted above the graph. Proteins were tagged with 6xHA epitope or mCitrine protein. Statistical differences were determined using a one-way ANOVA with a Tukey multiple comparison of means post-hoc test, statistical differences of P<0.05 are represented by different letters. Means with asterisks indicate significant inhibition (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 compared to mock-treated seedlings with *** corresponding to p ≤ 0.001 and * to p ≤ 0.01 , as measured by t- test. Figure 2. Biochemical analysis of PsKAI2A and PsKAI2B interactions with different GR24 isomers. The melting temperature curves of 10 µM PsKAI2A (a, c, e, g) or PsKAI2B (b, d, f, h) with (+)-GR24 (a-b), (−)-GR24 (c-d), (+)-2’-epi-GR24 (e-f), or (−)-2’-epi-GR24 (g-h) at varying concentrations are shown as assessed by DSF. Each line represents the average protein melt curve for three technical replicates; the experiment was carried out twice. (i) Chemical structure of ligands used in DSF assay (a-h). (j) Plots of fluorescence intensity versus SL concentrations. The change in intrinsic fluorescence of AtKAI2, PsKAI2A and PsKAI2B was monitored (see Figure S4) and used to determine the apparent Kd values. The plots represent the mean of two replicates and the experiments were repeated at least three times. The analysis was performed with GraphPad Prism 7.05 Software. Figure 3. Comparative enzymatic activity of AtD14, AtKAI2, RMS3, PsKAI2A and PsKAI2B proteins with GR24 isomers. UPLC-UV (260 nm) analysis showing the formation of the ABC tricycle from GR24 isomers. The enzymes (10 µM) hydrolysis activity was monitored after incubation with 10 µ M (+)-GR24 (yellow), (−)-GR24 (orange), (+)-2’-epi-GR24 (blue), or (−)-2’-epi-GR24 (purple). The indicated percentage corresponds to the hydrolysis rate calculated from the remaining GR24 isomer, quantified in comparison with indanol as an internal standard. Data are means ± SE (n = 3). nd = no cleavage detected. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Figure 4. The crystal structure of legume KAI2. (a) Overview of PsKAI2B structure. Lid and base domains are colored in forest and light green respectively with secondary structure elements labeled. (b) Structural alignment of PsKAI2B and AtKAI2 (PDB ID: 4HTA) shown in light green and wheat colors respectively. Root-mean-square deviation (RMSD) value of the aligned structures is shown. The location and conservation of legume KAI2 unique residues, alanine in position 147 (A147) and asparagine N129, are highlighted on the structure shown as sticks as well as in reduced Multiple Sequence Alignment from Figure S1. Figure 5. Structural divergence analysis of legume KAI2A and KAI2B. (a) Structural alignment of PsKAI2A and PsKAI2B shown in pink and light green colors respectively. RMSD of aligned structures is shown. (b) Analysis of PsKAI2A and PsKAI2B pocket volume, area, and morphology is shown by solvent accessible surface presentation. Pocket size values were calculated via the CASTp server. (c) Residues involved in defining ligand-binding pocket are shown on each structure as sticks. Catalytic triad is shown in red. (d) Residues L/M160, S/L190, and M/L218 are highlighted as divergent legume KAI2 residues, conserved among all legume KAI2A or KAI2B sequences as shown in reduced Multiple Sequence Alignment from Figure S1. Figure 6. Structural basis of PsKAI2B ligand interaction. (a) Surface (left) and cartoon (right) representations of PsKAI2B crystal structure in complex with (−)-GR24 D-OH ring. Protein structure is shown in blue/gray and ligand in orange. (b) Close-up view on ligand interactions and contiguous density with the catalytic serine S95. Electron density for the ligand is shown in navy blue and blue/gray mesh for the labeled catalytic triad. The contiguous density (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 between S95 and the D-OH ring indicates a covalent bond. The electron density is derived from 2mFoDFc (2fofc) map contoured at 1.0σ. (c) Side view of PsKAI2B-D-OH structure shown in cartoon with highlighted (orange) the intact D-OH ring structure. 2-D ligand interaction plot was generated using LigPlot+ software. Dark grey line represents S95-D-OH ring covalent bond. (d) Schematic diagram of the proposed mechanism for the formation of the D-ring intermediate covalently bound to S95. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Ro ot Ro ot a pe x St ipu le Te nd ril No de Ax illa ry b ud Ep ico tyl St em Sh oo t a pe x Fl ow er Fl or al bu d 0.000 0.005 0.010 0.015 R el at iv e P sK A I2 B tr an sc rip t l ev el s Le r ka i2- 2 pA tK AI 2:: At KA I2- 6x HA #1 pA tK AI 2:: At KA I2- mc itri ne #1 pA tK AI 2:: Ps KA I2A .1- 6X HA #1 pA tK AI 2:: Ps KA I2A .1- 6X HA #2 pA tK AI 2:: Ps KA I2A .1- 6X HA #3 pA tK AI 2:: Ps KA I2B -6 XH A #1 pA tK AI 2:: Ps KA I2A .2- mc itri ne #1 0 2 4 6 H yp oc ot yl le ng th (m m ) Mock (—)-GR24 ca Le gu m e K A I2 B Le gu m e K A I2 A D 14 K A I2 Tendril Leaflets Axillary bud Stem Stipules Leaf Node PsKAI2A Gene Psat2g169960 1-268 815 1087 1189 1385 STOP 1-445 1915 2307371 1474 ATG b d STOPATG e 1-268 815 1087 1189 1385 STOPAUG 819 STOP PsKAI2A.1 variant transcript 1-268 AUG PsKAI2A.2 variant transcript 819 STOP PsKAI2B Gene Psat4g083040 PsKAI2B transcript STOP 1-445 1915 2307371 1474 AUG a *** b a * a * c a a a a Ro ot Ro ot a pe x St ipu le Te nd ril No de Ax illa ry b ud Ep ico tyl St em Sh oo t a pe x Fl ow er Fl or al bu d 0.00 0.05 0.10 0.15 0.20 0.25 R el at iv e P sK A I2 A tr an sc rip t l ev el s (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 0 1 0 0 2 0 0 3 0 0 4 0 0 0 .0 0 .5 1 .0 µ M ( - ) - G R 2 4 d F /d F m a x AtKAI2 PsKAI2a.2 PsKAI2b (+ )- G R 24 (– )- G R 24 (+ )- 2’ -e pi -G R 24 (– )- 2’ -e pi -G R 24 -100 -80 -60 -40 -20 0 20 40 20 40 60 80 -d (R FU )/d t Temperature (°C) -100 -80 -60 -40 -20 0 20 40 20 40 60 80 -d (R FU )/d t -100 -80 -60 -40 -20 0 20 40 20 40 60 80 -d (R FU )/d t -100 -80 -60 -40 -20 0 20 40 20 40 60 80 -d (R FU )/d t -300 -200 -100 0 100 200 20 40 60 80 Temperature (°C) -300 -200 -100 0 100 200 20 40 60 80 -300 -200 -100 0 100 200 20 40 60 80 -300 -200 -100 0 100 200 20 40 60 80 ba PsKAI2A PsKAI2B dc fe hg i (–)-GR24 (+)-GR24 (+)-2’-epi-GR24 (–)-2’-epi-GR24 j AtKAI2 PsKAI2A PsKAI2B µM (–)-GR24 Kd= 115.4 +/- 9.876 µM Kd= 88.88 +/- 19.17 µM Kd= 89.43 +/- 12.13 µM dF /d Fm ax O O O O O O O O O O O O O O O O O O O O 0 100 200 300 400 0.0 0.5 1.0 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 At D1 4 At KA I2 Ps KA I2A Ps KA I2B RM S3 0 10 20 30 40 50 60 70 80 90 100 110 C le av ag e (% ) (+)-GR24 (—)-GR24 (+)-2'epi-GR24 (—)-2’epi-GR24 nd nd (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 b a PsKAI2B 180⚬ Base Lid aA aD2 b2 PsKAI2B AtKAI2 RMSD ~0.35 A147R147 N129D129 aD1 aD3 aD4 h2h3 aB b4 aC h4 h5 b6 aEb7 147129aa position: Le gu m e aF (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 ba PsKAI2B PsKAI2A c RMSD = 0.34 160aa position: A B 190 M160 L190 L160 S190 L218 M218 d PsKAI2BPsKAI2A 90⚬ 90⚬ A B SA vol. 114.8 125.4 SA area 231.2 234.4 SA circum. 33.6 30.3 ligand accessible surface 218 160 218 190 aD2 aD4 aD2D-loop D-loop (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 F26 V96 S95 I193 G25 H246 D 1’ 2’ 3’ 4’ 7’ 5’ 2.88 2.86 2’’ 5’’ a PsKAI2B (–)-GR24 (D-OH) b s = 1.0 90⚬ S95 D217 H246 s = 1.0 (2fofc) 1’ 2’ 3’ 4’ 5’ 7’ D-OH c d S95 O N N H246 O O C H H S95 O O O D217 O O C D217 PsKAI2B PsKAI2B = ABC=CHO tricycle1' 2' 4' 2' 3' 5' 5' S95 O N N HH246 O O C D217 PsKAI2B H N N H246 H δ- δ+ δ- O O O OO OO OO HH ABC =CH OH O HO OH 2' 5' D (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 Table 1. Data collection, phasing and refinement statistics PsKAI2B (apo form, with glycerol) (−)-GR24 D-OH - bound PsKAI2B Data collection Space group C2 C2 Cell dimensions a, b, c (Å) 87.59, 71.14, 49.06 87.08, 71.82, 48.79 α, β, γ (°) 90, 117, 90 90, 117.3, 90 Resolution (Å) 43.47-1.61 (1.66-1.61)* 43.36-2.00 (2.07-2.00) Rsym 0.080 (0.589) 0.082 (0.316) I / σI 31.01 (1.52) 35.13 (4.11) Completeness (%) 99.2 (84.5) 98.73 (87.53) Redundancy 6.4 (3.2) 6.1 (4.5) Refinement Resolution (Å) 1.61 2.00 No. reflections 34306 17837 Rwork / Rfree (%) 15.9/17.7 16.9/21.1 No. atoms 2395 2298 Protein 2110 2110 Ligand/ion 6 8 Water 279 180 B-factors Protein 19.92 26.5 Ligand/ion 33.73 24.60 Water 32.07 32.22 R.m.s. deviations Bond lengths (Å) 0.009 0.013 Bond angles (°) 0.88 1.03 Ramachandran favored (%) 98.51 98.88 Ramachandran allowed (%) 1.49 1.12 Ramachandran outliers (%) 0 0 PDB ID 7K2Z 7K38 *Statistics for the highest-resolution shell are shown in parentheses. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425465doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425465 10_1101-2021_01_06_425584 ---- 37271726 1 Comprehensive multi-omics study of the molecular perturbations 1 induced by simulated diabetes on coronary artery endothelial cells 2 3 Aldo Moreno-Ulloa1,2*, Hilda Carolina Delgado-De la Herrán1,3, Carolina Álvarez-Delgado3, 4 Omar Mendoza-Porras4, Rommel A. Carballo-Castañeda1 and Francisco Villarreal5,6 5 6 1MS2 laboratory, Biomedical Innovation Department, Center for Scientific Research and Higher 7 Education of Ensenada (CICESE), Baja California, México 8 2Specialized Laboratory in Metabolomics and Proteomics (MetPro), CICESE, México 9 3Mitochondrial Biology Laboratory, Biomedical Innovation Department, Center for Scientific 10 Research and Higher Education of Ensenada (CICESE), Baja California, México 11 4CSIRO Livestock and Aquaculture, Queensland Bioscience Precinct, 306 Carmody Rd, St 12 Lucia, QLD, Australia 13 5School of Medicine, University of California, San Diego, CA, USA 14 6San Diego VA Healthcare System 15 16 17 18 * To whom correspondence should be addressed: Biomedical Innovation Department, CICESE 19 Carretera Ensenada-Tijuana No. 3918, Zona Playitas, CP. 22860, Ensenada, B.C. Mexico, 20 Phone: +52(646)175-05-00 ext. 2721, E-mail: amoreno@cicese.mx 21 22 23 24 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract 25 Coronary artery endothelial cells (CAEC) exert an important role in the development of 26 cardiovascular disease. Dysfunction of CAEC is associated with cardiovascular disease in 27 subjects with type 2 diabetes mellitus (T2DM). However, comprehensive studies of the effects 28 that a diabetic environment exerts on this cellular type scarce. The present study characterized 29 the molecular perturbations occurring on cultured bovine CAEC subjected to a prolonged diabetic 30 environment (high glucose [HG] and high insulin [HI]). Changes at the metabolite and peptide 31 level were assessed by untargeted metabolomics and chemoinformatics, and the results were 32 integrated with proteomics data using published SWATH-based proteomics on the same in vitro 33 model. Our findings were consistent with reports on other endothelial cell types, but also identified 34 novel signatures of DNA/RNA, aminoacid, peptide, and lipid metabolism in cells under a diabetic 35 environment. Manual data inspection revealed disturbances on tryptophan catabolism and 36 biosynthesis of phenylalanine-based, glutathione-based, and proline-based peptide metabolites. 37 Fluorescence microscopy detected an increase in binucleation in cells under treatment that also 38 occurred when human CAEC were used. This multi-omics study identified particular molecular 39 perturbations in an induced diabetic environment that could help unravel the mechanisms 40 underlying the development of cardiovascular disease in subjects with T2DM. 41 42 43 Keywords: SWATH-Proteomics; Metabolomics; Type 2 Diabetes Mellitus; Endothelial cells; 44 Feature-Based Molecular Networking 45 46 47 48 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 1. Introduction 49 Damage to coronary artery endothelial cells (CAEC) leads to coronary endothelial dysfunction, 50 which is associated with the development of cardiac pathologies in subjects with and without 51 coronary atherosclerosis (1). Subjects with type 2 diabetes mellitus (T2DM) are particularly at 52 increased risk of myocardial infarction (2) and coronary endothelial dysfunction has been 53 implicated in the prognosis (3). A high-glucose (HG) environment —hallmark of T2DM— leads to 54 nitric oxide signaling, cell cycle (4), apoptosis (5), angiogenesis (6), and DNA structure impairment 55 (7). However, given the intrinsic heterogeneity of the endothelium, the molecular perturbations 56 caused by HG vary accordingly with the type of studied endothelial cells (8, 9). For instance, 57 human microvascular endothelial cells showed increased gene expression of endothelial nitric 58 oxide synthase, superoxide dismutase 1, glutathione peroxidase 1, thioredoxin reductase 1 and 59 2 compared to the regulation observed in human umbilical vein endothelial cells (HUVEC) when 60 cultured in HG for 24 h. Furthermore, the response of endothelial cells to HG is influenced by the 61 duration of exposure (10, 11) as demonstrated in bovine aortic and human microvascular 62 endothelial cells where cell proliferation and apoptosis were higher at <48 h compared to 8 weeks 63 of exposure (10). In another example of time-dependent response, increased apoptosis (derived 64 from DNA fragmentation) and tumor necrosis factor alpha protein levels were reported in human 65 coronary artery endothelial cells (HCAEC) after only 24 h of incubation with HG (5). Hence, the 66 molecular response to HG cannot be generalized among endothelial cell types. Previously we 67 reported impaired mitochondrial function/structure and nitric oxide signaling in HG treated HCAEC 68 for 48 h (12). However, a 72 h study documented an increased in pro-inflammatory cytokines (13) 69 and oxidative stress in HCAEC (14). The long-term (>72 h) effect of HG in CAEC has not been 70 as extensively documented compared to other endothelial cell types. Characterizing the effect of 71 HG on CAEC may allow us to identify key signaling pathways (or specific biomolecules) 72 associated with the development of endothelial dysfunction and cardiac pathologies. 73 Here, liquid chromatography coupled to mass spectrometry (LC-MS2)-based untargeted 74 metabolomics and SWATH-based quantitative proteomics data, as well as bio- and chemo-75 informatics were used to characterize the molecular perturbations occurring in Bovine Coronary 76 Artery Endothelial Cells (BCAEC) under a prolonged diabetic environment. 77 78 2. Methods 79 2.1 Chemical and reagents 80 Recombinant human insulin was purchased from Sigma Aldrich (St. Louis, MO, USA). Antibiotic-81 antimitotic solution, trypsin-EDTA solution 0.25%, Hank’s Balanced Salt Solution (HBSS) without 82 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 phenol red, Dulbecco’s Modified Eagle’s Media (DMEM) with glutamine, Fetal Bovine Serum 83 (FBS), Hoechst 33258, Pentahydrate (bis-Benzimide)-FluoroPure™, and methanol-free 84 formaldehyde (16% solution) were obtained from Thermo Fisher Scientific (Waltham, MA, USA). 85 Methanol, Acetonitrile, and water were Optima™ LC-MS Grade and obtained from Fisher 86 Scientific (Hampton, NH, USA). Ethanol LiChrosolv® Grade was obtained from Merck KGaA 87 (Darmstadt, Germany). Rabbit anti-Von Willebrand factor (vWf) antibody and goat anti-rabbit IgG 88 conjugated to Alexa Fluor 488 were obtained from Abcam (Cambridge, MA, USA). 89 90 2.2 Cell culture 91 BCAEC were purchased from Cell applications, Inc. (San Diego, CA, USA) and grown as 92 previously described (15). In brief, cells were grown with DMEM (5.5 mmol/L glucose, 93 supplemented with 10% FBS and 1% antibiotic-antimitotic solution) at 37 oC in an incubator with 94 a humidified atmosphere of 5 % CO2. Before experiments, cells were switched to DMEM with 1% 95 FBS for 12 h to maintain the cells under a quiescent state. The model to simulate diabetes is 96 described in (15) (Figure 1). Endothelial cells were cultured for 12 days to determine the chronic 97 molecular perturbations caused by simulated diabetes and to avoid the early (within 48 h) cell 98 proliferation effects caused by HG (10, 16). In brief, cells were first treated with 100 nmol/L insulin 99 (high-insulin, HI) in normal glucose (NG, 5.5 mmol/L in DMEM) for 3 days (17) and then 100 maintained in high-glucose (HG, 20 mmol/L in DMEM) and constant HI for 9 days. This sequential 101 scheme tried to mimic the pathophysiological conditions that occur in T2DM patients, wherein 102 hyperinsulinemia precedes hyperglycemia (18). Cells were used at passages between 6 to 12. 103 The control group did not receive HI nor HG treatment. For selected experiments (binucleation 104 analysis), HCAEC (55 years old Caucasian male, history of T2DM for >5 years) were purchased 105 from Cell Applications, Inc. and subjected to the same conditions as BCAEC but using MesoEndo 106 Growth Medium (Cell Applications, Inc.) to induce proliferation. For simulated diabetes, HCAEC 107 were treated with HI and HG as with BCAEC but, MesoEndo Growht Medium was used instead. 108 For consistency, the group that underwent simulated diabetes (HG + HI) will be referred to as the 109 “experimental group”. All experiments were carried out in triplicate. 110 111 2.3 Immunofluorescence 112 As previously described (15), 100,000 cells per well were seeded onto 12-well plates (Corning® 113 CellBIND®) and exposed to simulated diabetes. Thereafter, BCAEC and HCAEC were washed 114 with PBS to remove dead cells and debris. Cells were fixed, permeabilized, and blocked as 115 described before (19). Cells were then incubated with a polyclonal antibody against the vWf 116 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 (1:400, 3% BSA in PBS) overnight at 4oC and thereafter washed 3x with PBS. Alexa Fluor 488-117 labeled anti-rabbit (1:400 in PBS) was then used as a secondary antibody for 1 h at RT and 118 washed 3x with PBS. As a negative control, cells were incubated only with secondary antibody to 119 assess for non-specific binding. Cell nuclei were stained with Hoechst 33258 (2 µg/ml in HBSS) 120 for 30 min and washed 3x with PBS. Fluorescent images were taken in at least three random 121 fields per condition using an EVOS® FLoid® Cell Imaging Station with a fixed 20x air objective. 122 Image analysis was performed through ImageJ software (version 2.0.0). 123 124 2.4 Metabolite extraction 125 Cells were seeded at 300,000 cells per well in 6-well plates (Corning® CellBIND®) and treated as 126 above. After HG and HI conditions, metabolites were extracted following a published protocol for 127 adherent cells with some modifications (20) (Figure 1). In brief, after washing the cells 3 x with 128 PBS, 500 µL of a cold mixture of methanol: ethanol (50:50, v:v) were added to each well, covered 129 with aluminum foil, and incubated at -800C for 4 h. Cells were then scrapped using a lifter (Fisher 130 Scientific, Hampton, NH, USA), and the supernatant was transferred to Eppendorf tubes before 131 centrifugation for 10 min at 14,000 rpm at 40C. The supernatant was transferred to another tube 132 and dried down by SpeedVac™ System (Thermo Fisher Scientific, Waltham, MA, USA). Samples 133 were reconstituted in water/acetonitrile 95:5 v/v with 0.1% formic, centrifuged at 14,000 rpm for 134 10 min at 4o C. The particle free supernatant was recovered for further LC-MS2 analysis. 135 136 2.5 LC-MS2 data acquisition for metabolomics 137 Metabolites were loaded into an Eksigent nanoLCâ 400 system (AB Sciex, Foster City, CA, USA) 138 with a HALO Phenyl-Hexyl column (0.5 x 50 mm, 2.7 µm, 90 Å pore size, Eksigent AB Sciex, 139 Foster City, CA, USA) for data acquisition using the LC-MS parameters previously described with 140 some modifications (21). In brief, the separation of metabolites was performed using gradient 141 elution with 0.1% formic acid in water (A) and 0.1% formic acid in ACN (B) as mobile phases at a 142 constant flow rate of 5 µL/min. The gradient started with 5% B for 1 min followed by a stepped 143 increase to 100%, B over 26 min and held constant for 4 min. Solvent composition was returned 144 to 5% B for 0.1 min. Column re-equilibration was carried out with 5% mobile phase B for 4 minutes. 145 Potential carryover was minimized with a blank run (1 µL buffer A) between sample experimental 146 samples. The eluate from the LC was delivered directly to the TurboV source of a TripleTOF 147 5600+ mass spectrometer (AB Sciex, Foster City, CA, USA) using electrospray ionization (ESI) 148 under positive mode. ESI source conditions were set as follows: IonSpray Voltage Floating, 5500 149 V; Source temperature, 350°C; Curtain gas, 20 psi; Ion source gases 1 and 2 were set to 40 and 150 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 45 psi; Declustering potential, 100 V. Data was acquired using information-dependent acquisition 151 (IDA) with high sensitivity mode selected, automatically switching between full-scan MS and 152 MS/MS. The accumulation time for TOF MS was 0.25 s/spectra over the m/z range 100-1500 Da 153 and for MS/MS scan was 0.05 s/spectra over the m/z 50-1500 Da. The IDA settings were as 154 follows charge state +1 to +2, intensity 125 cps, exclude isotopes within 6 Da, mass tolerance 50 155 mDa, and a maximum number of candidate ions 20. Under IDA settings, the ‘‘exclude former 156 target ions’’ was set as 15 s after two occurrences and ‘‘dynamic background subtract’’ was 157 selected. Manufacturer rolling collision energy (CE) option was used based on the size and 158 charge of the precursor ion using formula CE=m/z x 0.0575 + 9. The instrument was automatically 159 calibrated by the batch mode using appropriate positive TOF MS and MS/MS calibration solutions 160 before sample injection and after injection of two samples (<3.5 working hours) to ensure a mass 161 accuracy of <5 ppm for both MS and MS/MS data. Instrument performance was monitored during 162 data acquisition by including QC samples (pooled samples of equal volume) every 4 experimental 163 samples. Data acquisition of experimental samples was also randomized. 164 165 2.6 Metabolomics data processing 166 Mass detection, chromatogram building and deconvolution, isotopic assignment, feature 167 alignment, and gap-filling (to detect features missed during the initial alignment) from LC-MS2 168 datasets was performed using XCMS (https://xcmsonline.scripps.edu) (22) and MZmine (23) 169 software. The XCMS pipeline was used for normalization of feature area and statistical analysis. 170 To identify or annotate the metabolites at the chemical structure and class level, the MS2-171 containing features extracted with MZmine were further analyzed using the Global Natural 172 Products Social Molecular Networking (GNPS) (24), Network Annotation Propagation (NAP) (25) 173 and MS2LDA (26) in silico annotation tools, and Classyfire automated chemical classification (27), 174 as previously described (21) with some modifications. The confidences of such annotations are 175 level 2 (probable structure by library spectrum match) and level 3 (tentative candidates) in 176 agreement with the Metabolomics Standards Initiative (MSI) classification (28). Molecular 177 networking, NAP, and Classyfire outputs were integrated using the MolNetEnhancer workflow 178 (29). Molecular networks were visualized using Cytoscape version 3.8.2 (30). In addition, 179 chemical substructures (co-occurring fragments and neutral losses referred to as “mass2motifs” 180 [M2M]) were recognized using the MS2LDA web pipeline (http://www.ms2lda.org) to further 181 annotate metabolites (level 3, MSI). The detailed processing parameters for XCMS and MZmine 182 pipelines are found in the supporting information. 183 184 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 2.7 Peptidomics data processing 185 For peptide identification, raw .wiff and .wiff.scan files (same files used for MZmine and XCMS) 186 from the experimental and control groups were analyzed separately using ProteinPilot software 187 version 4.2 (Ab Sciex, Foster City, CA, USA) with the Paragon algorithm. MS1 and MS2 data were 188 searched against the Bos taurus SwissProt sequence database (6006 reviewed 189 proteins+common protein contaminants, February 2019 release). The parameters input was: 190 sample type, identification; digestion, none; Cys alkylation, none; instrument, TripleTOF 5600; 191 special factors, none; species, Bos taurus; ID focus, biological modifications, and amino acid 192 substitutions; search effort, thorough ID. False discovery rate analysis was also performed. All 193 peptides were exported and those with a >90% confidence were linked to the corresponding 194 feature extracted by the XCMS algorithm using their accurate mass and retention time 195 information. For peptide quantification, we employed the normalized feature abundances (MS1 196 level) generated by XCMS. A significance threshold of p<0.05 (Welch’s t test) was utilized. 197 198 2.8 Proteomics data reprocessing 199 The SWATH-based proteomics data (identifier PXD013643), hosted in ProteomeXchange 200 consortium via PRIDE (31), was reanalyzed with some modifications. The parameters used to 201 build the spectral library remained the same (15), while the parameter for peptides per protein 202 was set to 100 in the software SWATH® Acquisition MicroApp 2.0 in PeakView® version 1.2 (AB 203 Sciex, Foster City, CA, USA). The obtained protein peak areas were exported to Markerview™ 204 version 1.3 (AB Sciex, Foster City, CA, USA) for further data refinement, including assignment of 205 IDs to files and removal of reversed and common contaminants. Peak areas were exported in a 206 .tsv file, and normalized with NormalyzerDE online version 1.3.4 (32). The NormalyzerDE pipeline 207 comprises 8 different normalization methods (Log2, variance stabilizing normalization, total 208 intensity, median, mean, quantile, CycLoess, and robust linear regression). The results of 209 qualitative (MA plots, scatter plots, box plots, density plots) and quantitative (pooled intragroup 210 coefficient of variation [PCV], median absolute deviation [PMAD], estimate of variance [PEV]) 211 parameters were compared between the normalization methods to select the most appropriate. 212 213 2.9 Bioinformatic analysis of proteomics data 214 Proteins that passed the significance threshold were first converted to their corresponding Entrez 215 Gene (GeneID) using https://www.uniprot.org/uploadlists/ and then transformed to their human 216 equivalents using the ortholog conversion feature in https://biodbnet-217 abcc.ncifcrf.gov/db/dbOrtho.php. Bioinformatic analysis was done on OmicsNet website platform 218 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 (https://www.omicsnet.ca/) (33, 34). First, a protein-protein interaction (PPI) molecular network 219 (first-order network containing query or seeds molecules and their immediate interacting partners) 220 using STRING PPI database was built (35) and then pathway enrichment analysis was performed 221 using the built-in REACTOME and the Kyoto Encyclopedia of Genes and Genomes (KEGG) 222 databases. To visualize modules (functional units) contained in the molecular network the 223 WalkTrap algorithm (within OmicsNet platform) was employed. Hypergeometric test was used to 224 compute p-values. 225 226 2.10 Integrative analysis of proteomics and metabolomics data 227 The molecular interactions between the proteins and metabolites differentially abundant between 228 HG + HI and NG were determined in OmicsNet (32, 33). The lists of proteins (EntrezGene ID) 229 and metabolites (HMDB ID) were loaded to build a composite network using protein-protein 230 (STRING database selected) and metabolite-protein (KEGG database selected) interaction types. 231 The primary network relied on the metabolite input. Pathway enrichment analysis was performed 232 using the built-in REACTOME and KEGG databases. Hypergeometric test was used to compute 233 p-values. 234 235 2.11 Statistical analysis 236 All experiments were performed in triplicate. Based on the accuracy (determination of real fold-237 changes) of SWATH-based quantification (36), proteins with a fold change ≥ 1.2 or ≤ 1/1.2 and a 238 p-value <0.05 (Welch’s t-test) were considered as differentially abundant between NG and HG + 239 HI conditions. For the metabolomics data, features with a fold change ≥ 1.3 or ≤ 1/1.3 and a p-240 value <0.05 (Welch’s t-test) were considered as differentially abundant. We did not apply multiple-241 test corrections to calculate adjusted p-values, because this process could obscure proteins or 242 metabolites with real changes (true-positives) (37). Instead, the analysis was focused on top-243 enriched signaling pathways (adjusted p-value <0.01) that allowed us to determine a set of 244 interacting proteins and metabolites with relevant biological information and contributes in 245 reducing false positives. For multivariate statistical analysis and heatmap visualization, 246 Metaboanalyst 4.0 (https://www.metaboanalyst.ca) was utilized. Principal component analysis 247 (PCA) was used to assess for sample clustering behavior and inter-group variation. No scaling 248 was used for PCA and heatmap analysis. Software PRISM 6.0 (GraphPad Software, San Diego, 249 CA) was used for the creation of volcano plots and column graphs. 250 251 2.12 Data availability 252 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 The raw datasets supporting the metabolomics results are available in the GNPS/MassIVE public 253 repository (38) under the accession number MSV000084307. The specific parameters of the tools 254 employed for metabolite annotation are available on the following links: for classical molecular 255 networking, 256 https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=604b3d077e00430a9bc288eebf154b9b; for 257 FBMN 258 https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=5e2839037969442e868d9df21309d561; for 259 NAP, 260 https://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=96cda48c0df64d3398a8f9088907afb261 5; for MS2LDA, http://ms2lda.org/basicviz/summary/1197/ (need to log-in as a registered or guest 262 user); for MolNetEnhancer, 263 https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=de80b9c765e042ffab7767a3101054fd. The 264 quantitative results generated using the XCMS platform can be accessed after logging into the 265 following link https://xcmsonline.scripps.edu and searching for the job number 1395724. SWATH 266 data is accessible on the ProteomeXchange with dataset identifier PXD013643. 267 268 3. Results 269 270 Untargeted metabolomics 271 Overall 5571 features or potential metabolites were detected using XCMS and MZmine, wherein 272 957 (~18%) features were commonly identified in both platforms (Figure 2A). Based on the 273 relative quantification using XCMS, 140 and 82 features were detected with reduced and 274 increased abundances respectively in the experimental group compared to the control group 275 (Figure 2B). The effects of HG and HI in the experimental group are observed by PCA analysis 276 wherein the experimental samples clustered away from the control group (Figure 2C). The 277 consistency of the LC-MS equipment is apparent by the clustering of the QC samples (Figure 278 2C). Further, the heatmap visualization of the top 100-modulated metabolites exhibited the 279 different distribution patterns among groups (Figure 2D). Using the GNPS platform for automatic 280 metabolite annotation, 106 compounds (excluding duplicates and contaminants) were putatively 281 annotated with a level 2 confidence annotation (MS2 spectral match) (Table S1) in agreeance 282 with the MSI classification (28). Some metabolites identified by the GNPS platform could not be 283 quantified because they were not detected by the XCMS algorithm during feature area 284 normalization and quantification. Moreover, GNPS Molecular Networking aligned the MS2-285 containing features (n=1,013) based on their structural similarity, creating 118 independent 286 networks or clusters with at least two connected nodes (Figure 3A). The use of MolNetEnhancer 287 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 workflow allowed to putatively identify chemical classes (level 3, MSI) for 56 of the 118 288 independent networks. The top-10 most abundant annotated chemical classes and associated 289 metabolites are shown in Figure 3A. Three-clusters from the network were further analyzed 290 because they contained annotated metabolites by spectral matching, which facilitates the 291 annotation of other cluster’s nodes. Cluster 1 revealed two metabolites linked to the 292 organonitrogen compounds class with reduced abundance in the experimental group (Figure 3B). 293 Library spectral match (level 2, MSI) suggest PC(16:0/18:1(9Z)) and PC(18:0/18:2(9Z,12Z)) as 294 putative candidates, which was supported by MS2LDA phosphocholine-substructure recognition 295 (Figure 3C). In cluster 2, glutathione-based metabolites (MSI level 3) were detected through 296 fragments m/z 308.0925, 233.0575, 179.0475, and 162.0225 retrieved by the M2M_453 297 substructure and associated with glutathione structure using mzCloud in silico predictions (Figure 298 4A). The precursor ion at m/z 713.1472 and glutathione (annotated at level 2, MSI) were detected 299 with increased abundance in the experimental group. MS2LDA visualization, at the M2M level, 300 correlated with the GNPS molecular networking clustering (Figure 4B). In cluster 3, various 301 phenylalanine-based metabolites were putatively annotated aided by MS2LDA substructure 302 recognition (Figure 4C and 4D). Within this cluster, glutamyl-phenylalanine (annotated at level 2, 303 MSI) and the precursor ions at m/z 297.1802 and 487.1548 presented with increased abundance 304 in the experimental vs. control group. On the other hand, various aminoacids were annotated 305 (level 2, MSI) by GNPS spectral matching and manual inspection of data (Table S2). Threonine, 306 valine, proline, leucine, serine, glutamic acid, methionine, and tyrosine presented increased 307 abundance (fold change range 1.3-1.7, p<0.05) in the experimental vs. control group. Particularly, 308 metabolites linked to the catabolism of tryptophan via the serotonin and kynurenine pathway (39) 309 were annotated (level 2, MSI), including melatonin, acetyl serotonin, and kynurenine (Table S1). 310 However, only kynurenine was significantly elevated in the experimental group. The full list of 311 annotated metabolites, differential abundances and another relevant feature information is shown 312 in Table S2. 313 314 Peptidomics 315 Experimental and control datasets were analyzed separately to identify the peptides and their 316 biological modifications. The complete list of peptides identified by ProteinPilot between the 317 experimental and control groups are described in Table S3. Proline oxidation was the most 318 frequent biological modification detected in the experimental group datasets. We identified 8 and 319 12 peptides with a confidence of >90% in the control and experimental group, respectively. 320 Differential abundance of 2 proline-rich peptides was observed in the experimental group 321 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 compared to the control group. An additional tripeptide was manually annotated with a LPP 322 sequence (Table S4). 323 324 Proteomics 325 The re-analysis of the SWATH data (PXD013643 dataset) facilitated the identification of 952 326 quantifiable proteins (717 proteins with at least 2 unique peptides, 1% false discovery rate) and 327 no missing values among technical and biological replicates (Table S5). Sample datasets were 328 normalized using 8 different methods to select the most appropriate based on quantitative and 329 qualitative parameters on our dataset. Quantile normalization produced a better qualitative and 330 quantitative profile and was selected to further process our data (Figure S1). PCA analysis of 331 normalized data denoted a clear separation of the groups suggesting overall differences in their 332 proteomes (Figure 5A). Differential abundance analysis revealed 32 and 33 proteins with 333 increased and decreased abundance in the experimental group (Figure 5B). Further, the 334 heatmap visualization of the top 50-modulated proteins exhibited the different distribution patterns 335 among the experimental and control groups (Figure 5C). To obtain a molecular insight we 336 performed a functional enrichment analysis using a network-based approach. First, we created a 337 composite network comprising PPI between the modulated proteins by simulated diabetes (seed 338 proteins) and their immediate interacting partners (highest confidence >0.9) retrieved from 339 STRING Database (incorporated in OmicsNet platform). The principal network using the up-340 modulated proteins consisted of 461 proteins, 709 edges and 18 seed proteins (nodes with blue 341 shadow) and is illustrated in Figure 5D. Eight modules or clusters were generated, that may 342 represent relevant complexes or functional units (40). The 5 most significant (adjusted p-value 343 <0.05) REACTOME and KEGG pathways on the global network are shown in Table 1. Two 344 modules contained multiple seed proteins and were linked to DNA/RNA and protein metabolism 345 pathways using the WalkTrap algorithm (Figure 5D). On the other hand, the principal network 346 using the down-modulated proteins consisted of 488 proteins, 513 edges and 18 seed proteins 347 identified eleven modules wherein one module (with 2 seed proteins) indicated associations with 348 mitochondrial function pathways (Figure 5E). 349 350 Integration of Metabolomics and Proteomics 351 The signaling pathways perturbed by simulated diabetes were identified by a composite network 352 of interacting metabolites and proteins using OmicsNet built-in databases. Figure 6 illustrates the 353 composite bi-layered metabolite-PPI network using the up-modulated molecules (under simulated 354 diabetes) comprised of 9 metabolites (seed metabolites), 177 edges, and 166 proteins (5 seed 355 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 proteins). The 10 top-most enriched signaling pathways identified in the composite network are 356 shown in Table 2. The two principal modules highlighted by the WalkTrap algorithm were linked 357 to glutathione and amino acid metabolism. We noted a smaller interaction between Acyl-protein 358 thioesterase 1 (LYPLA1) and a phosphatidylcholine metabolite when simultaneously analyzing 359 up- and down-modulated proteins and metabolites. No significant composite network was 360 identified using the down-modulated proteins and metabolites. 361 362 Cellular morphology 363 To better understand the effects that simulated diabetes exerts on endothelial cells the changes 364 on cellular structure endpoints were evaluated. The endothelial nuclei morphology in the BCAEC 365 control and experimental groups were evaluated using fluorescent-staining and image analysis. 366 We also evaluated the presence of vWF (marker of endothelial cells) in BCAEC and HCAEC, to 367 reveal the cellular boundary and to demonstrate their endothelial phenotype (41). We noted an 368 increase in the percentage of binucleated BCAEC in the experimental group compared to the 369 control group (top panel Figure 7A and 7B). A similar result with larger nuclei, was observed 370 when using HCAEC as a human in vitro model (bottom panel Figure 7A and 7B). Finally, as 371 expected, we observed a typical intracellular localization of vWF and a 100% positivity in 372 endothelial cells. 373 374 4. Discussion 375 This study investigated the molecular perturbations occurring in coronary endothelium cells 376 subjected to prolonged simulated diabetes that facilitated the identification of signaling pathways 377 and specific molecules that could be associated with the development of cardiovascular disease. 378 To achieve this, we employed a MS-based multi-omics approach coupled to fluorescence 379 microscopy to detect structural changes. Endothelial cells cover the inner surface of blood vessels 380 and are distributed across the body. Their functions include: acting as a mechanical barrier 381 between the circulating blood and adjacent tissues as well as modulating multiple functions in 382 distinct organs (42). These regulatory functions vary according to localization and vascular bed-383 origin (43). HG blood levels are detrimental to endothelial cells function in T2DM leading to 384 coronary endothelial dysfunction and development of CVD (44, 45). The molecular effects of HG 385 on endothelial cells have been previously characterized (4, 6, 7, 10, 11); nevertheless, the 386 endothelial cell types used in these studies are not intrinsically involved in CVD. The present study 387 used an in vitro model involving endothelial cells that modulate the heart function, CAEC (46). 388 Our model not only used HG (20 mmol/L) to simulate diabetes (4, 6, 7, 10, 11) but first induced 389 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 insulin resistance to mimic the pathophysiological conditions that occur in T2DM wherein 390 hyperinsulinemia precedes hyperglycemia (18). Diabetes was simulated for up to 12 days to 391 mimic chronic HG exposure and to prevent measuring cell proliferation known to occur in early 392 HG (10, 16). Despite a lack of apparent increase in cell proliferation in the experimental group 393 compared to control group after twelve days, an increase in overall protein abundance was 394 detected by Bradford assay (data not shown) and inferred from total ion chromatogram (TIC) of 395 MS (Figure S1A). We suggest that protein synthesis is increased as a consequence of the higher 396 presence of bi-nucleated CAEC (with increased DNA/RNA metabolism) under HG + HI compared 397 to that in the control cohort (Figure 7A and 7B). Previous studies have shown reduced endothelial 398 cell proliferation (mostly in HUVEC) after long-term (7-14 days) HG exposure (4, 11, 47-53), 399 accompanied by an increase in protein synthesis (53). This MS-based methodological pipeline 400 that included appropriate controls during data acquisition (QC) and processing (e.g., 401 normalization, filtering, annotation, dereplication, etc.), allowed the identification of global 402 changes in the metabolome of CAEC under HG + HI. Specifically, increased abundance of valine, 403 leucine, tyrosine, serine, leucine, proline, methionine, and glutamic acid in cells under HG 404 conditions was observed; and this is consistent with reports on human aortic endothelial cells 405 (54). Notably, several clinical studies have established a direct relationship between 406 prevalence/incidence of T2DM and increased levels of valine, leucine and tyrosine in serum and 407 plasma (55-59). Our results support the role of CAEC in contributing to the elevated pool of amino 408 acids seen in circulation under a HG environment. We speculate that increased levels of these 409 amino acids could result from either increased production or reduced degradation as suggested 410 in endothelial cells (immortalized cell line, EA.hy 926) that transition from a glycolytic metabolism 411 towards lipid and amino acid oxidation when challenged by HG (60). Furthermore, evidence of 412 increased tryptophan catabolism was identified through the kynurenine pathway. In this regard, a 413 non-significant decrease of ~ 40% in the abundance of tryptophan was detected. However, a 414 significant increase of ~ 450% in kynurenine (tryptophan’s main metabolite) (61) between the HG 415 + HI group and NG group was also observed, which is a key finding as elevated plasma levels of 416 kynurenine are known to increase CVD risk (62, 63). This novel finding contributes to expanding 417 the understanding of amino acid metabolism in endothelial cells under simulated diabetes. Acetyl 418 serotonin and melatonin which are components of the serotonin pathway that degrades 419 tryptophan (64) were also detected with only minor abundancy increases (20-30%) in the HG + 420 HI group compared to control. Differences in glutathione (cysteine-glutamic acid-glycine, 421 tripeptide) metabolism in CAEC were also found, suggesting an increased response to oxidative 422 stress (65). In line with this observation, previous research reported a glutathione-dependent 423 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 reaction to ambient HG in artery-derived endothelial cells (66, 67) but the same could not be 424 observed in vein-derived endothelial cells (68, 69). This emphasizes the different responses to 425 HG among endothelial phenotypes. Here, novel evidence is provided of the up-regulation of 426 glutathione-based metabolites. The composite protein network suggested an increase in 427 glutathione metabolism supported by elevated levels of oxidized glutathione and, one of its 428 synthetic precursors, glutamic acid. At the protein level, peroxiredoxin (PRDX2 and PRDX6) and 429 thioredoxin (TXN2, mitochondrial) showed increased abundances in the experimental group, 430 which are part of the cells natural enzymatic defense against oxidative stress (70). The 431 substructure analysis of metabolomics data facilitated identifying glutamic acid- and 432 phenylalanine-based metabolites, presumably di- or tri-peptides, including the annotated 433 metabolite glutamyl-phenylalanine. Furthermore, the CAEC peptidome analysis suggested an 434 increase in proline-containing peptides. This type of peptide is of particular interest because of 435 their resistance to non-specific proteolytic degradation, body distribution and remarkable 436 biological effects (71-74). Yet, the precise function of such phenylalanine-, glutamine-, and 437 proline-based peptides remains to be characterized in CAEC. We can only speculate that they 438 are the result of a compensatory mechanism to reduce glucose cellular damage. Also, increased 439 protein abundance of core and regulatory subunits from the proteasome complex (PSMA4 and 440 PSMD3) was found in cells under simulated diabetes. This suggests an increased protein 441 degradation and subsequent peptide formation in response to HG. Metabolomic profiling also 442 revealed changes in the lipidome of CAEC challenged with HG + HI, wherein a reduction in 443 phosphatidylcholine (PC) lipids and subsequent increase in phosphocholine were noted. 444 Changes in the phospholipidomic profile of bovine aortic endothelial cells treated with HG for 24 445 h has also been reported in a lipidome study (75). Here, proteomics and metabolomics data were 446 manually integrated and this allowed to determine critical roles for PAFAH1B2 and LYPLA1 in 447 mediating the degradation of PC lipids (Figure 8). PAFAH1B2 was found to be up-regulated in 448 this study and it is known to be associated with inflammation and higher levels of lysoPC (76). As 449 a result, PAFAH1B2 could increase the pool of lysoPC lipids, further exacerbating inflammation 450 in the cardiovascular system (77). On the other hand, LYPLA1 has a lysophospholipase activity 451 that can hydrolyze a range of lysophospholipids, including LysoPC, thereby generating a fatty 452 acid and glycerophosphocholine as products (78). Increased levels of phosphocholine (~ 460%) 453 were detected in HG treated cells compared to control, that could be associated with the 454 degradation of LysoPC lipids. It should be noted that the use of pathways databases such as 455 KEGG and REACTOME possess some limitations when dealing with lipid metabolites because 456 its chemical diversity is not well annotated/defined within the databases. For example, KEGG 457 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 provides a chemical class identifier instead of individual identity to lipids, constricting their 458 biological importance (79). Thus, based on our manual inspection of the metabolomics-459 proteomics data and in line with the evidence, we suggest that simulated diabetes evokes 460 inflammation on BCAEC and that PAFAH1B2 and LYPLA1 play a role in modulating such 461 process. 462 Previously, we reported the multinucleation of CAEC cultured under simulated diabetes (15). This 463 type of cell possesses ≥2 nuclei. Here, we replicated our previous findings of increased 464 binucleation in BCAEC. The same outcome was obtained when using HCAEC as a human in vitro 465 model (Figure 7A and 7B), validating the binucleation process in other CAEC. After refinement 466 of LC-MS2 data and bioinformatics re-processing of published SWATH-based datasets of BCAEC 467 under simulated diabetes (15), molecular signatures and pathways that could be linked to the 468 binucleation process were found (Figure 8). For instance, we noted an increased abundance of 469 proteins, under simulated diabetes, with reported nuclei localization and linked to DNA 470 metabolism, including ribosomal proteins RPS7, RPS13, and RPL9 (80). Further, we observed 471 an increased abundance of proteasome proteins, PSMA4 and PSMD3, which are linked to protein 472 metabolism (81). Hence, we infer that the CAEC binucleation occurs as a compensatory 473 mechanism to increase the cell capacity to metabolize the excess of ambient glucose by 474 increasing the cell metabolic machinery (transcription/translation processes). Although an 475 increase in cell proliferation could boost a coordinated increase of ribosomal and proteasome 476 proteins, we do not believe this is the case here, as mentioned before. After 4-5 days of simulated 477 diabetes, cells occupied 100% of the well's plate surface, thereby impeding to harbor more cells 478 because endothelial cells grow as a monolayer. This is consistent with findings stating that when 479 endothelial cells become highly confluent, they stop growing due to cell-cell contact, even in the 480 presence of growth factors (82). In support of this, up-stream (CTGF and CD62) (83, 84) (Table 481 S5) and down-stream proteins (FABP4) (85) (Table S5) involved in angiogenesis and proliferation 482 were down-regulated by simulated diabetes. Importantly, there is evidence (not in endothelial 483 cells) of cellular processes contributing to the stimulation of cellular binucleation without increases 484 in cell proliferation, including cellular enhancement of antimicrobial defenses (86), senescence 485 (87), and malignancy (88). Various mechanisms have been linked to the binucleation process, 486 such as cytokinesis failure, cellular fusion, mitotic slippage, and endoreduplication (89). The 487 elucidation of the exact molecular mechanisms leading to the binucleation process of CAEC is 488 beyond the scope of our study. 489 In conclusion, this study applied an integrated multi-omics and bioinformatics/chemoinformatics 490 approach to characterize the molecular perturbations that simulated diabetes exerts on CAEC. 491 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 We confirmed several independent studies that reported alterations at protein and metabolite 492 levels in endothelial cells of different sources than coronary vessels. Metabolomics, identified 493 alterations in amino acid, peptide, and phospholipid metabolism. Notably, the chemoinformatic 494 analysis identified unreported alterations of phenylalanine-, glutathione-, and proline-based 495 peptides on coronary endothelium under simulated diabetes. Proteomics provided evidence of 496 reduced mitochondrial mass and angiogenesis. The integration of proteomics and metabolomics 497 identified increased glutamic acid metabolism and suggested that the antioxidant enzymes are 498 involved in protecting the cells from oxidative stress. Fluorescence microscopy reported the 499 appearance of non-proliferative binucleated CAEC cells as a mean to metabolize the excess of 500 ambient glucose. Overall, our study improved the understanding of the molecular disturbances 501 caused by simulated diabetes that could mediate CAEC dysfunction and may be relevant in the 502 context of CVD in subjects with T2DM. 503 504 505 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 5. Acknowledgements 506 This work was derived in part from the Thesis Project of H.C.D.H. at the Posgrado en 507 Ciencias de la Vida, CICESE. We thank Alan G. Hernández-Melgar for his invaluable 508 technical assistance with the NormalyzerDE software. 509 510 511 512 513 514 515 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 6. Funding 516 Part of this work was supported by CICESE (Grant No. 685109 to AMU and Internal 517 Project No. 685-110 from CAD), NIH R01 DK98717 (to FV), and VA Merit-I01 BX3230 (to 518 FV). 519 520 521 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 7. Conflict of interest 522 Dr. Villarreal is a co-founder and stockholder of Cardero Therapeutics, Inc. 523 524 525 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 8. Author contributions 526 A.M.U. contributed to the study conception and design, data acquisition, formal analysis, 527 methodology, project administration, and funding acquisition. H.C.D.H., L.D.M, and R.A.C.C. 528 contributed to the data acquisition, formal analysis and interpretation of some experiments. 529 C.A.D., and F.V. contributed to funding acquisition and resources. O.M.P contributed to data 530 interpretation and critical revision of manuscript. All authors contributed to the drafting, revising, 531 and approval of the final version of the manuscript. 532 533 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 References 534 535 1. Halcox, J. P.; Schenke, W. H.; Zalos, G.; Mincemoyer, R.; Prasad, A.; Waclawiw, M. A.; 536 Nour, K. R.; Quyyumi, A. A., Prognostic value of coronary vascular endothelial dysfunction. 537 Circulation 2002, 106, (6), 653-8. 538 2. Lind, M.; Wedel, H.; Rosengren, A., Excess Mortality among Persons with Type 2 539 Diabetes. N Engl J Med 2016, 374, (8), 788-9. 540 3. Gutierrez, E.; Flammer, A. J.; Lerman, L. O.; Elizaga, J.; Lerman, A.; Fernandez-Aviles, 541 F., Endothelial dysfunction over the course of coronary artery disease. Eur Heart J 2013, 34, 542 (41), 3175-81. 543 4. Lorenzi, M.; Cagliero, E.; Toledo, S., Glucose toxicity for human endothelial cells in 544 culture. Delayed replication, disturbed cell cycle, and accelerated death. Diabetes 1985, 34, (7), 545 621-7. 546 5. Kageyama, S.; Yokoo, H.; Tomita, K.; Kageyama-Yahara, N.; Uchimido, R.; Matsuda, 547 N.; Yamamoto, S.; Hattori, Y., High glucose-induced apoptosis in human coronary artery 548 endothelial cells involves up-regulation of death receptors. Cardiovasc Diabetol 2011, 10, 73. 549 6. Dubois, S.; Madec, A. M.; Mesnier, A.; Armanet, M.; Chikh, K.; Berney, T.; Thivolet, C., 550 Glucose inhibits angiogenesis of isolated human pancreatic islets. J Mol Endocrinol 2010, 45, 551 (2), 99-105. 552 7. Lorenzi, M.; Montisano, D. F.; Toledo, S.; Barrieux, A., High glucose induces DNA 553 damage in cultured human endothelial cells. J Clin Invest 1986, 77, (1), 322-5. 554 8. Patel, H.; Chen, J.; Das, K. C.; Kavdia, M., Hyperglycemia induces differential change in 555 oxidative stress at gene expression and functional levels in HUVEC and HMVEC. Cardiovasc 556 Diabetol 2013, 12, 142. 557 9. Pala, L.; Pezzatini, A.; Dicembrini, I.; Ciani, S.; Gelmini, S.; Vannelli, B. G.; Cresci, B.; 558 Mannucci, E.; Rotella, C. M., Different modulation of dipeptidyl peptidase-4 activity between 559 microvascular and macrovascular human endothelial cells. Acta Diabetol 2012, 49 Suppl 1, 560 S59-63. 561 10. Esposito, C.; Fasoli, G.; Plati, A. R.; Bellotti, N.; Conte, M. M.; Cornacchia, F.; Foschi, A.; 562 Mazzullo, T.; Semeraro, L.; Dal Canton, A., Long-term exposure to high glucose up-regulates 563 VCAM-induced endothelial cell adhesiveness to PBMC. Kidney Int 2001, 59, (5), 1842-9. 564 11. Baumgartner-Parzer, S. M.; Wagner, L.; Pettermann, M.; Grillari, J.; Gessl, A.; 565 Waldhausl, W., High-glucose--triggered apoptosis in cultured endothelial cells. Diabetes 1995, 566 44, (11), 1323-7. 567 12. Ramirez-Sanchez, I.; Rodriguez, A.; Moreno-Ulloa, A.; Ceballos, G.; Villarreal, F., (-)-568 Epicatechin-induced recovery of mitochondria from simulated diabetes: Potential role of 569 endothelial nitric oxide synthase. Diab Vasc Dis Res 2016, 13, (3), 201-10. 570 13. Liu, T.; Gong, J.; Chen, Y.; Jiang, S., Periodic vs constant high glucose in inducing pro-571 inflammatory cytokine expression in human coronary artery endothelial cells. Inflamm Res 2013, 572 62, (7), 697-701. 573 14. Liu, T. S.; Pei, Y. H.; Peng, Y. P.; Chen, J.; Jiang, S. S.; Gong, J. B., Oscillating high 574 glucose enhances oxidative stress and apoptosis in human coronary artery endothelial cells. J 575 Endocrinol Invest 2014, 37, (7), 645-51. 576 15. Hilda Carolina Delgado De la Herrán, L. D.-M., Carolina Álvarez-Delgado, Francisco 577 Villarreal, Aldo Moreno-Ulloa, Formation of multinucleated variant endothelial cells with altered 578 mitochondrial function in cultured coronary endothelium under simulated diabetes. bioRxiv 579 2019. 580 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 16. Li, X. X.; Liu, Y. M.; Li, Y. J.; Xie, N.; Yan, Y. F.; Chi, Y. L.; Zhou, L.; Xie, S. Y.; Wang, P. 581 Y., High glucose concentration induces endothelial cell proliferation by regulating cyclin-D2-582 related miR-98. J Cell Mol Med 2016, 20, (6), 1159-69. 583 17. Madonna, R.; De Caterina, R., Prolonged exposure to high insulin impairs the 584 endothelial PI3-kinase/Akt/nitric oxide signalling. Thromb Haemost 2009, 101, (2), 345-50. 585 18. Zaccardi, F.; Webb, D. R.; Yates, T.; Davies, M. J., Pathophysiology of type 1 and type 2 586 diabetes mellitus: a 90-year perspective. Postgrad Med J 2016, 92, (1084), 63-9. 587 19. Moreno-Ulloa, A.; Miranda-Cervantes, A.; Licea-Navarro, A.; Mansour, C.; Beltran-588 Partida, E.; Donis-Maturano, L.; Delgado De la Herran, H. C.; Villarreal, F.; Alvarez-Delgado, C., 589 (-)-Epicatechin stimulates mitochondrial biogenesis and cell growth in C2C12 myotubes via the 590 G-protein coupled estrogen receptor. Eur J Pharmacol 2018, 822, 95-107. 591 20. Kirkwood, J. S.; Maier, C.; Stevens, J. F., Simultaneous, untargeted metabolic profiling 592 of polar and nonpolar metabolites by LC-Q-TOF mass spectrometry. Curr Protoc Toxicol 2013, 593 Chapter 4, Unit4 39. 594 21. Moreno-Ulloa, A.; Sicairos Diaz, V.; Tejeda-Mora, J. A.; Macias Contreras, M. I.; Castillo, 595 F. D.; Guerrero, A.; Gonzalez Sanchez, R.; Mendoza-Porras, O.; Vazquez Duhalt, R.; Licea-596 Navarro, A., Chemical Profiling Provides Insights into the Metabolic Machinery of Hydrocarbon-597 Degrading Deep-Sea Microbes. mSystems 2020, 5, (6). 598 22. Gowda, H.; Ivanisevic, J.; Johnson, C. H.; Kurczy, M. E.; Benton, H. P.; Rinehart, D.; 599 Nguyen, T.; Ray, J.; Kuehl, J.; Arevalo, B.; Westenskow, P. D.; Wang, J.; Arkin, A. P.; 600 Deutschbauer, A. M.; Patti, G. J.; Siuzdak, G., Interactive XCMS Online: simplifying advanced 601 metabolomic data processing and subsequent statistical analyses. Anal Chem 2014, 86, (14), 602 6931-9. 603 23. Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M., MZmine 2: modular framework for 604 processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC 605 Bioinformatics 2010, 11, 395. 606 24. Aron, A. T.; Gentry, E. C.; McPhail, K. L.; Nothias, L. F.; Nothias-Esposito, M.; 607 Bouslimani, A.; Petras, D.; Gauglitz, J. M.; Sikora, N.; Vargas, F.; van der Hooft, J. J. J.; Ernst, 608 M.; Kang, K. B.; Aceves, C. M.; Caraballo-Rodriguez, A. M.; Koester, I.; Weldon, K. C.; 609 Bertrand, S.; Roullier, C.; Sun, K.; Tehan, R. M.; Boya, P. C.; Christian, M. H.; Gutierrez, M.; 610 Ulloa, A. M.; Tejeda Mora, J. A.; Mojica-Flores, R.; Lakey-Beitia, J.; Vasquez-Chaves, V.; 611 Zhang, Y.; Calderon, A. I.; Tayler, N.; Keyzers, R. A.; Tugizimana, F.; Ndlovu, N.; Aksenov, A. 612 A.; Jarmusch, A. K.; Schmid, R.; Truman, A. W.; Bandeira, N.; Wang, M.; Dorrestein, P. C., 613 Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat 614 Protoc 2020. 615 25. da Silva, R. R.; Wang, M.; Nothias, L. F.; van der Hooft, J. J. J.; Caraballo-Rodriguez, A. 616 M.; Fox, E.; Balunas, M. J.; Klassen, J. L.; Lopes, N. P.; Dorrestein, P. C., Propagating 617 annotations of molecular networks using in silico fragmentation. PLoS Comput Biol 2018, 14, 618 (4), e1006089. 619 26. van der Hooft, J. J.; Wandy, J.; Barrett, M. P.; Burgess, K. E.; Rogers, S., Topic 620 modeling for untargeted substructure exploration in metabolomics. Proc Natl Acad Sci U S A 621 2016, 113, (48), 13738-13743. 622 27. Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; 623 Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S., ClassyFire: 624 automated chemical classification with a comprehensive, computable taxonomy. J Cheminform 625 2016, 8, 61. 626 28. Schymanski, E. L.; Jeon, J.; Gulde, R.; Fenner, K.; Ruff, M.; Singer, H. P.; Hollender, J., 627 Identifying small molecules via high resolution mass spectrometry: communicating confidence. 628 Environ Sci Technol 2014, 48, (4), 2097-8. 629 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 29. Ernst, M.; Kang, K. B.; Caraballo-Rodriguez, A. M.; Nothias, L. F.; Wandy, J.; Chen, C.; 630 Wang, M.; Rogers, S.; Medema, M. H.; Dorrestein, P. C.; van der Hooft, J. J. J., 631 MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and 632 Annotation Tools. Metabolites 2019, 9, (7). 633 30. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; 634 Schwikowski, B.; Ideker, T., Cytoscape: a software environment for integrated models of 635 biomolecular interaction networks. Genome Res 2003, 13, (11), 2498-504. 636 31. Perez-Riverol, Y.; Csordas, A.; Bai, J.; Bernal-Llinares, M.; Hewapathirana, S.; Kundu, 637 D. J.; Inuganti, A.; Griss, J.; Mayer, G.; Eisenacher, M.; Perez, E.; Uszkoreit, J.; Pfeuffer, J.; 638 Sachsenberg, T.; Yilmaz, S.; Tiwary, S.; Cox, J.; Audain, E.; Walzer, M.; Jarnuczak, A. F.; 639 Ternent, T.; Brazma, A.; Vizcaino, J. A., The PRIDE database and related tools and resources 640 in 2019: improving support for quantification data. Nucleic Acids Res 2019, 47, (D1), D442-641 D450. 642 32. Willforss, J.; Chawade, A.; Levander, F., NormalyzerDE: Online Tool for Improved 643 Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis. J 644 Proteome Res 2019, 18, (2), 732-740. 645 33. Zhou, G.; Xia, J., Using OmicsNet for Network Integration and 3D Visualization. Curr 646 Protoc Bioinformatics 2019, 65, (1), e69. 647 34. Zhou, G.; Xia, J., OmicsNet: a web-based tool for creation and visual analysis of 648 biological networks in 3D space. Nucleic Acids Res 2018, 46, (W1), W514-W522. 649 35. Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; 650 Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K. P.; Kuhn, M.; Bork, P.; Jensen, L. J.; von 651 Mering, C., STRING v10: protein-protein interaction networks, integrated over the tree of life. 652 Nucleic Acids Res 2015, 43, (Database issue), D447-52. 653 36. Muntel, J.; Kirkpatrick, J.; Bruderer, R.; Huang, T.; Vitek, O.; Ori, A.; Reiter, L., 654 Comparison of Protein Quantification in a Complex Background by DIA and TMT Workflows 655 with Fixed Instrument Time. J Proteome Res 2019, 18, (3), 1340-1351. 656 37. Pascovici, D.; Handler, D. C.; Wu, J. X.; Haynes, P. A., Multiple testing corrections in 657 quantitative proteomics: A useful but blunt tool. Proteomics 2016, 16, (18), 2448-53. 658 38. Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. 659 D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; 660 Meehan, M. J.; Liu, W. T.; Crusemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderon, 661 M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C. C.; Floros, D. J.; Gavilan, 662 R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, 663 C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; 664 Gerwick, L.; Liaw, C. C.; Yang, Y. L.; Humpf, H. U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; 665 Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; P, C. A. B.; Torres-666 Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; 667 O'Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, 668 H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, 669 K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; 670 Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; 671 Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodriguez, A. M. C.; Lamsa, A.; 672 Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P. M.; Phapale, P.; Nothias, L. F.; 673 Alexandrov, T.; Litaudon, M.; Wolfender, J. L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D. 674 T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Muller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; 675 Knight, R.; Jensen, P. R.; Palsson, B. O.; Pogliano, K.; Linington, R. G.; Gutierrez, M.; Lopes, N. 676 P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N., Sharing and community 677 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 curation of mass spectrometry data with Global Natural Products Social Molecular Networking. 678 Nat Biotechnol 2016, 34, (8), 828-837. 679 39. Bender, D. A., Biochemistry of tryptophan in health and disease. Mol Aspects Med 1983, 680 6, (2), 101-97. 681 40. Poyatos, J. F.; Hurst, L. D., How biologically relevant are interaction-based modules in 682 protein networks? Genome Biol 2004, 5, (11), R93. 683 41. Muller, A. M.; Hermanns, M. I.; Skrzynski, C.; Nesslinger, M.; Muller, K. M.; Kirkpatrick, 684 C. J., Expression of the endothelial markers PECAM-1, vWf, and CD34 in vivo and in vitro. Exp 685 Mol Pathol 2002, 72, (3), 221-9. 686 42. Aird, W. C., Phenotypic heterogeneity of the endothelium: II. Representative vascular 687 beds. Circ Res 2007, 100, (2), 174-90. 688 43. Aird, W. C., Endothelial cell heterogeneity. Cold Spring Harb Perspect Med 2012, 2, (1), 689 a006429. 690 44. Widlansky, M. E.; Gokce, N.; Keaney, J. F., Jr.; Vita, J. A., The clinical implications of 691 endothelial dysfunction. J Am Coll Cardiol 2003, 42, (7), 1149-60. 692 45. Ganz, P.; Vita, J. A., Testing endothelial vasomotor function: nitric oxide, a multipotent 693 molecule. Circulation 2003, 108, (17), 2049-53. 694 46. Paulus, W. J.; Vantrimpont, P. J.; Shah, A. M., Paracrine coronary endothelial control of 695 left ventricular function in humans. Circulation 1995, 92, (8), 2119-26. 696 47. Abe, M.; Ono, J.; Sato, Y.; Okeda, T.; Takaki, R., Effects of glucose and insulin on 697 cultured human microvascular endothelial cells. Diabetes Res Clin Pract 1990, 9, (3), 287-95. 698 48. Du, X. L.; Sui, G. Z.; Stockklauser-Farber, K.; Weiss, J.; Zink, S.; Schwippert, B.; Wu, Q. 699 X.; Tschope, D.; Rosen, P., Introduction of apoptosis by high proinsulin and glucose in cultured 700 human umbilical vein endothelial cells is mediated by reactive oxygen species. Diabetologia 701 1998, 41, (3), 249-56. 702 49. Graier, W. F.; Grubenthal, I.; Dittrich, P.; Wascher, T. C.; Kostner, G. M., Intracellular 703 mechanism of high D-glucose-induced modulation of vascular cell proliferation. Eur J Pharmacol 704 1995, 294, (1), 221-9. 705 50. Kamal, K.; Du, W.; Mills, I.; Sumpio, B. E., Antiproliferative effect of elevated glucose in 706 human microvascular endothelial cells. J Cell Biochem 1998, 71, (4), 491-501. 707 51. Lorenzi, M.; Nordberg, J. A.; Toledo, S., High glucose prolongs cell-cycle traversal of 708 cultured human endothelial cells. Diabetes 1987, 36, (11), 1261-7. 709 52. Quagliaro, L.; Piconi, L.; Assaloni, R.; Martinelli, L.; Motz, E.; Ceriello, A., Intermittent 710 high glucose enhances apoptosis related to oxidative stress in human umbilical vein endothelial 711 cells: the role of protein kinase C and NAD(P)H-oxidase activation. Diabetes 2003, 52, (11), 712 2795-804. 713 53. McGinn, S.; Poronnik, P.; King, M.; Gallery, E. D.; Pollock, C. A., High glucose and 714 endothelial cell growth: novel effects independent of autocrine TGF-beta 1 and hyperosmolarity. 715 Am J Physiol Cell Physiol 2003, 284, (6), C1374-86. 716 54. Yuan, W.; Zhang, J.; Li, S.; Edwards, J. L., Amine metabolomics of hyperglycemic 717 endothelial cells using capillary LC-MS with isobaric tagging. J Proteome Res 2011, 10, (11), 718 5242-50. 719 55. Chen, S.; Akter, S.; Kuwahara, K.; Matsushita, Y.; Nakagawa, T.; Konishi, M.; Honda, T.; 720 Yamamoto, S.; Hayashi, T.; Noda, M.; Mizoue, T., Serum amino acid profiles and risk of type 2 721 diabetes among Japanese adults in the Hitachi Health Study. Sci Rep 2019, 9, (1), 7010. 722 56. Lai, M.; Liu, Y.; Ronnett, G. V.; Wu, A.; Cox, B. J.; Dai, F. F.; Rost, H. L.; Gunderson, E. 723 P.; Wheeler, M. B., Amino acid and lipid metabolism in post-gestational diabetes and 724 progression to type 2 diabetes: A metabolic profiling study. PLoS Med 2020, 17, (5), e1003112. 725 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 57. Lu, Y.; Wang, Y.; Liang, X.; Zou, L.; Ong, C. N.; Yuan, J. M.; Koh, W. P.; Pan, A., Serum 726 Amino Acids in Association with Prevalent and Incident Type 2 Diabetes in A Chinese 727 Population. Metabolites 2019, 9, (1). 728 58. Menni, C.; Fauman, E.; Erte, I.; Perry, J. R.; Kastenmuller, G.; Shin, S. Y.; Petersen, A. 729 K.; Hyde, C.; Psatha, M.; Ward, K. J.; Yuan, W.; Milburn, M.; Palmer, C. N.; Frayling, T. M.; 730 Trimmer, J.; Bell, J. T.; Gieger, C.; Mohney, R. P.; Brosnan, M. J.; Suhre, K.; Soranzo, N.; 731 Spector, T. D., Biomarkers for type 2 diabetes and impaired fasting glucose using a nontargeted 732 metabolomics approach. Diabetes 2013, 62, (12), 4270-6. 733 59. Wang, T. J.; Larson, M. G.; Vasan, R. S.; Cheng, S.; Rhee, E. P.; McCabe, E.; Lewis, G. 734 D.; Fox, C. S.; Jacques, P. F.; Fernandez, C.; O'Donnell, C. J.; Carr, S. A.; Mootha, V. K.; 735 Florez, J. C.; Souza, A.; Melander, O.; Clish, C. B.; Gerszten, R. E., Metabolite profiles and the 736 risk of developing diabetes. Nat Med 2011, 17, (4), 448-53. 737 60. Koziel, A.; Woyda-Ploszczyca, A.; Kicinska, A.; Jarmuszkiewicz, W., The influence of 738 high glucose on the aerobic metabolism of endothelial EA.hy926 cells. Pflugers Arch 2012, 464, 739 (6), 657-69. 740 61. Badawy, A. A., Kynurenine Pathway of Tryptophan Metabolism: Regulatory and 741 Functional Aspects. Int J Tryptophan Res 2017, 10, 1178646917691938. 742 62. Pedersen, E. R.; Tuseth, N.; Eussen, S. J.; Ueland, P. M.; Strand, E.; Svingen, G. F.; 743 Midttun, O.; Meyer, K.; Mellgren, G.; Ulvik, A.; Nordrehaug, J. E.; Nilsen, D. W.; Nygard, O., 744 Associations of plasma kynurenines with risk of acute myocardial infarction in patients with 745 stable angina pectoris. Arterioscler Thromb Vasc Biol 2015, 35, (2), 455-62. 746 63. Sulo, G.; Vollset, S. E.; Nygard, O.; Midttun, O.; Ueland, P. M.; Eussen, S. J.; Pedersen, 747 E. R.; Tell, G. S., Neopterin and kynurenine-tryptophan ratio as predictors of coronary events in 748 older adults, the Hordaland Health Study. Int J Cardiol 2013, 168, (2), 1435-40. 749 64. Polyzos, K. A.; Ketelhuth, D. F., The role of the kynurenine pathway of tryptophan 750 metabolism in cardiovascular disease. An emerging field. Hamostaseologie 2015, 35, (2), 128-751 36. 752 65. Aquilano, K.; Baldelli, S.; Ciriolo, M. R., Glutathione: new roles in redox signaling for an 753 old antioxidant. Front Pharmacol 2014, 5, 196. 754 66. Yuan, W.; Edwards, J. L., Thiol metabolomics of endothelial cells using capillary liquid 755 chromatography mass spectrometry with isotope coded affinity tags. J Chromatogr A 2011, 756 1218, (18), 2561-8. 757 67. Weidig, P.; McMaster, D.; Bayraktutan, U., High glucose mediates pro-oxidant and 758 antioxidant enzyme activities in coronary endothelial cells. Diabetes Obes Metab 2004, 6, (6), 759 432-41. 760 68. Felice, F.; Lucchesi, D.; di Stefano, R.; Barsotti, M. C.; Storti, E.; Penno, G.; Balbarini, 761 A.; Del Prato, S.; Pucci, L., Oxidative stress in response to high glucose levels in endothelial 762 cells and in endothelial progenitor cells: evidence for differential glutathione peroxidase-1 763 expression. Microvasc Res 2010, 80, (3), 332-8. 764 69. Kashiwagi, A.; Asahina, T.; Ikebuchi, M.; Tanaka, Y.; Takagi, Y.; Nishio, Y.; Kikkawa, R.; 765 Shigeta, Y., Abnormal glutathione metabolism and increased cytotoxicity caused by H2O2 in 766 human umbilical vein endothelial cells cultured in high glucose medium. Diabetologia 1994, 37, 767 (3), 264-9. 768 70. Hanschmann, E. M.; Godoy, J. R.; Berndt, C.; Hudemann, C.; Lillig, C. H., Thioredoxins, 769 glutaredoxins, and peroxiredoxins--molecular mechanisms and health significance: from 770 cofactors to antioxidants to redox signaling. Antioxid Redox Signal 2013, 19, (13), 1539-605. 771 71. Scocchi, M.; Tossi, A.; Gennaro, R., Proline-rich antimicrobial peptides: converging to a 772 non-lytic mechanism of action. Cell Mol Life Sci 2011, 68, (13), 2317-30. 773 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 72. Migliaccio, A.; Castoria, G.; de Falco, A.; Bilancio, A.; Giovannelli, P.; Di Donato, M.; 774 Marino, I.; Yamaguchi, H.; Appella, E.; Auricchio, F., Polyproline and Tat transduction peptides 775 in the study of the rapid actions of steroid receptors. Steroids 2012, 77, (10), 974-8. 776 73. Radicioni, G.; Stringaro, A.; Molinari, A.; Nocca, G.; Longhi, R.; Pirolli, D.; Scarano, E.; 777 Iavarone, F.; Manconi, B.; Cabras, T.; Messana, I.; Castagnola, M.; Vitali, A., Characterization 778 of the cell penetrating properties of a human salivary proline-rich peptide. Biochim Biophys Acta 779 2015, 1848, (11 Pt A), 2868-77. 780 74. Vanhoof, G.; Goossens, F.; De Meester, I.; Hendriks, D.; Scharpe, S., Proline motifs in 781 peptides and their biological processing. FASEB J 1995, 9, (9), 736-44. 782 75. Colombo, S.; Melo, T.; Martinez-Lopez, M.; Carrasco, M. J.; Domingues, M. R.; Perez-783 Sala, D.; Domingues, P., Phospholipidome of endothelial cells shows a different adaptation 784 response upon oxidative, glycative and lipoxidative stress. Sci Rep 2018, 8, (1), 12365. 785 76. De Keyzer, D.; Karabina, S. A.; Wei, W.; Geeraert, B.; Stengel, D.; Marsillach, J.; 786 Camps, J.; Holvoet, P.; Ninio, E., Increased PAFAH and oxidized lipids are associated with 787 inflammation and atherosclerosis in hypercholesterolemic pigs. Arterioscler Thromb Vasc Biol 788 2009, 29, (12), 2041-6. 789 77. Tselepis, A. D.; John Chapman, M., Inflammation, bioactive lipids and atherosclerosis: 790 potential roles of a lipoprotein-associated phospholipase A2, platelet activating factor-791 acetylhydrolase. Atheroscler Suppl 2002, 3, (4), 57-68. 792 78. Wang, A.; Dennis, E. A., Mammalian lysophospholipases. Biochim Biophys Acta 1999, 793 1439, (1), 1-16. 794 79. Marco-Ramell, A.; Palau-Rodriguez, M.; Alay, A.; Tulipani, S.; Urpi-Sarda, M.; Sanchez-795 Pla, A.; Andres-Lacueva, C., Evaluation and comparison of bioinformatic tools for the 796 enrichment analysis of metabolomics data. BMC Bioinformatics 2018, 19, (1), 1. 797 80. Zhou, X.; Liao, W. J.; Liao, J. M.; Liao, P.; Lu, H., Ribosomal proteins: functions beyond 798 the ribosome. J Mol Cell Biol 2015, 7, (2), 92-104. 799 81. Goldberg, A. L., Protein degradation and protection against misfolded or damaged 800 proteins. Nature 2003, 426, (6968), 895-9. 801 82. Vinals, F.; Pouyssegur, J., Confluence of vascular endothelial cells induces cell cycle 802 exit by inhibiting p42/p44 mitogen-activated protein kinase activity. Mol Cell Biol 1999, 19, (4), 803 2763-72. 804 83. Yu, Y.; Moulton, K. S.; Khan, M. K.; Vineberg, S.; Boye, E.; Davis, V. M.; O'Donnell, P. 805 E.; Bischoff, J.; Milstone, D. S., E-selectin is required for the antiangiogenic activity of 806 endostatin. Proc Natl Acad Sci U S A 2004, 101, (21), 8005-10. 807 84. Brigstock, D. R., Regulation of angiogenesis and endothelial cell function by connective 808 tissue growth factor (CTGF) and cysteine-rich 61 (CYR61). Angiogenesis 2002, 5, (3), 153-65. 809 85. Elmasri, H.; Ghelfi, E.; Yu, C. W.; Traphagen, S.; Cernadas, M.; Cao, H.; Shi, G. P.; 810 Plutzky, J.; Sahin, M.; Hotamisligil, G.; Cataltepe, S., Endothelial cell-fatty acid binding protein 4 811 promotes angiogenesis: role of stem cell factor/c-kit pathway. Angiogenesis 2012, 15, (3), 457-812 68. 813 86. Quinn, M. T.; Schepetkin, I. A., Role of NADPH oxidase in formation and function of 814 multinucleated giant cells. J Innate Immun 2009, 1, (6), 509-26. 815 87. Holt, D. J.; Grainger, D. W., Multinucleated giant cells from fibroblast cultures. 816 Biomaterials 2011, 32, (16), 3977-87. 817 88. Tse, G. M.; Law, B. K.; Chan, K. F.; Mas, T. K., Multinucleated stromal giant cells in 818 mammary phyllodes tumours. Pathology 2001, 33, (2), 153-6. 819 89. Celton-Morizur, S.; Merlen, G.; Couton, D.; Desdouets, C., Polyploidy and liver 820 proliferation: central role of insulin signaling. Cell Cycle 2010, 9, (3), 460-6. 821 822 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Figure legends 823 824 Figure 1. Illustration of the methodology followed in this study. 825 826 Figure 2. Simulated diabetes induced changes in the metabolome of bovine coronary 827 artery endothelial cells (BCAEC). (A) Venn diagram of features identified among MZmine and 828 XCMS software (0.01 Da and 1 min retention time, thresholds) on LC-MS2 datasets. (B) Volcano 829 plot of all quantified metabolites displaying differences in relative abundance (> +/-30% change, 830 <0.05 p-value cut-offs) between BCAEC cultured in control (NG) media and simulated diabetes 831 (HG+ HI) for twelve days. Values (dots) represent the HG+HI/NG ratio for all metabolites. Red 832 and blue dots denote downregulated and upregulated metabolites in the HG + HI group vs. NG 833 group, respectively. (C) Principal Component Analysis (PCA) of LC-MS2 datasets. Data was log 834 transformed without scaling. Shade areas depict the 95% confidence intervals. (C) HeatMap of 835 the top 100 metabolites ranked by t-test. Abbreviations: NG, normal glucose; HG, high glucose; 836 HI, high insulin; QC, quality control. 837 838 Figure 3. Bovine coronary artery endothelial cells (BCAEC) metabolite molecular network. 839 (A) Molecular classes (according to Classyfire) of the metabolome identified by the 840 MolNetEnhancer workflow and visualized by Cytoscape version 3.8.2. Each node represents a 841 unique feature and the color of the node denotes the associated chemical class. The thickness of 842 the edge (connectivity) indicates the MS2 similarity (Cosine score) among features. The m/z value 843 of the feature is shown inside the node and is proportional to the size of the node. Three selected 844 clusters or connected features as relevant are shown. (B) Inset of cluster 1 denoting the presence 845 of phosphocholine (PC)-containing lipids. Significant differential abundant features among 846 simulated diabetes (HG+HI) and control (NG) groups are indicated with an asterisk (p-value 847 <0.05). (C) Characterization of features in (B) aided by substructure recognition by MSLDA 848 software using MS1 visualization in www.ms2lda.org. Fragment at m/z 184.0725 linked to a PC 849 head group by mzCloud in silico prediction (www.mzCloud.org). Abbreviations: M2M, mass2motif; 850 FC, fold change; NG, normal glucose; HG, high glucose; HI, high insulin. Chemical structures 851 were drawn by ChemDraw Professional version 16.0.1.4. 852 853 Figure 4. Peptide metabolites modulated by simulated diabetes in bovine coronary artery 854 endothelial cells (BCAEC). (A) Cluster 2 retrieved from the main molecular network linked to 855 glutathione and derivatives. The fragments of mass-2-motif (M2M)_453 colored in red are 856 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 characteristic of a glutathione core and the fragments are shown in red. (B) Features associated 857 with M2M_453 using MS1 visualization in www.ms2lda.org. (C) Cluster 3 retrieved from the main 858 molecular network linked to phenylalanine-based metabolites. A singular node at m/z 487.1548 859 is also shown. The fragments of M2M_59 colored in red are characteristic of a phenylalanine core 860 (Heuristic and Quantum Chemical predictions by www.mzCloud.org). (D) Features associated 861 with M2M_59 using MS1 visualization in www.ms2lda.org. In GNPS’s clusters (A and C), the 862 node’s color denotes the chemical class assigned to the cluster. The thickness of the edge 863 (connectivity) indicates the cosine score (MS2 similarity). The m/z value of the feature is shown 864 inside the node and is proportional to the size of the node. Significant differential abundant 865 features among simulated diabetes (HG+HI) and control (NG) groups are indicated with an 866 asterisk (p-value <0.05). In MS2LDA’s nodes (B and D), the green node represents the M2M and 867 squares indicate individual features. Edges represent connections to M2M. Significant differential 868 abundant features among groups are indicated with an asterisk (p-value <0.05). Abbreviations: 869 M2M, mass2motif; FC, fold change; NG, normal glucose; HG, high glucose; HI, high insulin. 870 Chemical structures were drawn by ChemDraw Professional version 16.0.1.4. 871 872 Figure 5. Simulated diabetes induced changes in the proteome of bovine coronary artery 873 endothelial cells (BCAEC). (A) Principal Component Analysis (PCA) of LC-SWATH-MS2 874 datasets. Data was log transformed without scaling. Shade areas depict the 95% confidence 875 intervals. No scaling was used. (B) Volcano plot of all quantified proteins (Quantile normalization) 876 displaying differences in relative abundance (> +/-20% change, <0.05 p-value cut-offs) between 877 BCAEC cultured in control (NG) media and simulated diabetes (HG+ HI) for twelve days. Values 878 (dots) represent the HG+HI/NG ratio for all proteins. Red and blue dots denote downregulated 879 and upregulated proteins in the HG + HI group vs. NG group, respectively. (C) HeatMap of the 880 top 50 metabolites ranked by t-test. Protein-Protein interactome (>0.9 confidence) using the list 881 of proteins with increased abundance (D) and reduced abundance (E) in the HG + HI group. 882 Colored circles denote modules or clusters which may represent relevant complexes or functional 883 units. The input proteins are illustrated with a blue shade and the gene ID is also shown. The 884 most representative pathway (containing more input proteins) for all modules is indicated in blue 885 letters. Abbreviations: NG, normal glucose; HG, high glucose; HI, high insulin. 886 887 Figure 6. 3D Integrative network of the proteomic and metabolomic perturbations caused 888 by simulated diabetes in bovine coronary artery endothelial cells (BCAEC). Composite 889 protein-metabolite network created by OmicsNet using the up-regulated proteins (red nodes) and 890 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 metabolites (magenta nodes) in the HG + HI group (simulated diabetes). Interacting proteins (<0.9 891 confidence) were retrieved from STRING Database and are shown as gray nodes. Abbreviations: 892 NG, normal glucose; HG, high glucose; HI, high insulin. 893 894 Figure 7. Increased cellular binucleation by simulated diabetes in bovine coronary artery 895 endothelial cells (BCAEC) and human coronary artery endothelial cells (HCAEC). (A) 896 Representative immunofluorescence micrographs showing the localization of the von-Willebrand 897 factor (vWf, 1:400, 3% BSA in PBS) in fixed and permeabilized cells. The nuclei were stained 898 using the dye Hoechst 33258 (2 µg/ml in HBSS). White arrows indicate binucleated cells. (B) 899 Quantification of binucleated cells in HCAEC and BCAEC under simulated diabetes (HG+HI) vs. 900 control (NG) group. Fluorescence images were taken in at least three random fields per condition 901 using an EVOS® FLoid® Cell Imaging Station with a fixed 20x air objective. Image analysis was 902 performed by ImageJ software (version 2.0.0). Abbreviations: NG, normal glucose; HG, high 903 glucose; HI, high insulin. 904 905 Figure 8. Summary illustration of study findings. Cellular structures were created using 906 Servier Medical Art templates, which are licensed under a Creative Commons Attribution 3.0 907 Unported License; https://smart.servier.com. Chemical structures were drawn by ChemDraw 908 Professional version 16.0.1.4. 909 910 911 912 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Supporting information 913 914 Table S1. List of all the putatively annotated metabolites by MS2 spectral matching against GNPS 915 public spectral libraries. 916 917 Table S2. List of putatively annotated (MS2 spectral matching) metabolites modulated by 918 simulated diabetes. 919 920 Table S3. List of all detected peptides by ProteinPilot Software using the metabolomics datasets. 921 922 Table S4 Putative annotated proline-peptides altered by simulated diabetes in Bovine Coronary 923 Artery Endothelial Cells by ProteinPilot Software and manual inspection. 924 925 Table S5. List of the detected peptides and proteins in all conditions for SWATH-based 926 quantification. 927 928 Figure S1. Proteomics data normalization results using NormalyzerDE. (A) Total intensity of raw 929 data before normalization. (B) Quantitative parameters of normalization algorithms (pooled 930 intragroup coefficient of variation [PCV], median absolute deviation [PMAD], estimate of variance 931 [PEV]). Qualitative parameters of normalization algorithms; (C) Box plots (D) MA plots, and (E) 932 Density plots. 933 934 Figure S2. Cellular confluence in control and experimental group. Representative 935 micrographs of Bovine Coronary Artery Endothelial Cells (BCAEC) cultured for 9 days with 5.5 936 mmol/L glucose (control group) and 20 mmol/L glucose+100 nmol/L insulin (simulated diabetes 937 or experimental group). Images were taken using an EVOS® FLoid® Cell Imaging Station with a 938 fixed 20x air objective. Abbreviations: NG, normal glucose; HG, high glucose; HI, high insulin. 939 940 941 942 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425584doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425584 http://creativecommons.org/licenses/by-nc-nd/4.0/ Metabolites DDA Omics Integration GO analysis ProteomeXchange PXD013643 `` SWATH-based Proteomics `` Untargeted Metabolomics MeOH/EtOH (50:50, v:v) 0 1 2 3 4 5 6 7 8 9 10 11 12 Insulin 100 nmol/L Glucose 20 mmol/L Days Glucose 5.5 mmol/L Control Substructure annotation Ry Metabolite annotation COOH `` m/z m/z RT In te ns it y In te ns it y In te ns it y MS1 MS2 A B SC IEX Trip le TO F 5 6 0 0 LC-MS/MS Triple®TOF 5600+ A 957 (17.18%) Total features detected: 5571 2194 (39.96%) 2420 (43.44%) n=82n=140 B C D N orm alized m etabolite abundance PC1 P C 2 NG HG+HI QC HG+HI NG QC Glycerophospholipids Organooxygen compounds Fatty acyls Steroids and derivatives Glycerolipids Chemical class Indoles and derivatives Organonitrogen compounds Coumarins and derivatives Connectivity (Cosine Score) Precursor ion m/z value Carboxylic acids and derivatives Benzene and substituted derivatives Unknown 522.3415 194.1019 210.1343 286.0574 703.5731 376.259 376.2327 546.2845 575.2753 467.1657 786.5972 264.2317 393.2856 522.355 184.1104 238.1764 530.2946 413.6366 227.0673425.1357 307.0978 701.5589 760.5815 327.0837 367.3351 507.1058 390.0094 448.2113 418.3511 291.0476 538.2512 472.2715 505.105 161.096 246.0794 376.2308 283.0944 575.2728 264.0784 675.5421 283.1171 440.2142 699.612 * * PC(18:0/18:2(9Z,12Z)) O P O OO N+ O O O O PC(16:0/18:1(9Z)) O P O OO N+ O O O O Cluster 1 588.3571 403.2539 522.3415 289.0601 443.112 884.5907 413.1411 295.1284 367.2373 429.1976 194.1019 399.349 277.1856 514.128 414.6985 600.2445 181.6122 118.086 212.1178 386.2386 545.1027 271.0785 540.1645 664.4615 713.4423 404.2208 413.2847 455.1888 212.1424 345.1096 481.2106 401.3451386.3258 623.364 460.311 615.1719 462.2474 658.375 423.2506 301.169 475.3242 436.3157 133.0648 507.1667 369.3511 360.2117 550.1585 210.1343 391.1352 309.1828 462.2255 516.3869 567.2984 562.1643 299.1096 382.2582 238.1065 382.527 332.2428 349.1834 642.613 299.1441 490.2817 253.2143 500.2991 465.2575 449.1492 322.1153 299.0619 337.0757 436.1964 371.1907 367.1969 757.4697 316.2243 409.2795 371.0673 585.2858 244.1903 344.1112 360.3446 217.1478 390.2615 381.1105 587.2967 345.2034 301.0599 319.2257 273.0908 343.2142 490.1759 520.3321 699.5139 629.159 441.7374 292.1173 456.2792 396.0879 398.7747 309.1296 401.1592 313.2729 528.3076 419.1089 594.2896 655.3817 531.2728 477.1781 286.0574 114.0912 181.1218 536.1638 566.3353 789.4651 229.1418 185.0804 316.2842 349.2379 331.2598 258.1688 422.3033 495.2392 509.2865 550.2611 306.2272 378.2235 358.135 230.1749 387.083 442.8012 751.5126 305.6453 768.5299 455.247 444.3311 298.0777 703.5731 318.2987 358.1104 330.0751 632.3741 376.259 475.1924 414.1404 273.2528 273.1396 385.1528 213.1103 398.2412 446.2587 540.2848 801.4936 353.1448 283.1749 304.8924 504.2723 565.3329 355.1119 279.0797 593.1553 497.2902 202.0859 216.1608 376.2327 359.3151 400.3045 474.1614 711.1412 319.1052 486.0965 696.6349 407.1877 331.9947 230.1386 439.2532 506.2452 285.1309 504.1222 414.2137 498.4031 546.2845 509.1202 867.5631 475.2779 434.2584 817.4667 348.1855 575.2753 200.1057 407.0865 612.4165 375.0856 467.1657 564.358 377.2349 174.9548 386.2077 338.1949 314.2003 786.5972 415.2237 419.17 264.2317 313.2128 225.0906 233.1838 393.2856 257.0585 276.1806 527.156 377.0441 232.6427 297.1442 933.571 411.2079 430.2003 437.1154 197.116 239.9349 928.6203 261.2528 522.355 319.1367 328.1327 420.7869 129.069 403.1745 506.2405 271.0781 232.1639 244.2086 354.2356 889.5489 390.2321 544.3316 107.0795 288.2526 399.222 338.0442 223.0626 439.1149 298.2223 184.1104 369.0139 768.5354 294.1536 448.1644 301.1202 404.1451 242.116 304.2111 299.199 578.3353 172.1334 277.0957 315.264 232.1536 497.2898 678.5877 632.4042 567.3284 633.737 304.0625 284.1591 203.1064 272.2211 443.2961 238.1764 359.0625 385.2079 245.1861 713.1472 358.2426 423.136 229.1379 327.1659 302.1146 477.2296 297.154 548.157 502.0925 399.3618 394.0924 307.1737 301.1301 301.2823 796.5661 309.1275 693.2997 536.1732 316.2115 388.1274 483.2052 331.1683 551.3183 353.1478 473.2587 654.3295 558.7383 530.2946 508.3023 659.359 473.234 383.2783 443.1717 664.1098 231.1589 267.0061 500.2756 679.4094 433.2468 630.8588 409.1622 319.2253 250.1779 247.013 329.0054 691.4994 315.195 474.2317 358.2043 438.2635 239.1529 299.0617 180.1749 433.2028 291.1168 296.2214 300.0763 387.1685 320.1042 716.5212 587.3253 236.0713 197.0779 151.075 413.6366 531.3313 851.3932 227.0673 429.2316 246.1521 453.21 329.202 652.4089 270.0416 552.1533 371.0477 338.1339 613.1577 334.9126 374.1591 309.1524 522.3083 541.2215 415.2797 376.1159 734.6477 257.0572 283.6001 331.0834 374.0627 592.3889 246.1016 425.1357 331.2049 575.209 404.7346 348.1212 283.1205 242.0992 478.292 377.1814 345.1106 285.007 640.268 307.0978 240.1799 402.0959 701.5589 404.2348 661.5624 331.1573 353.2221 509.2995 265.1674 283.1263 416.0969 509.2353 487.1958 620.4355 332.0995 190.1434 263.0813 760.5815 453.2655 382.2214 271.1622 210.0494 309.1573 493.1866 488.2337 295.1274 273.1677 161.1165 349.2384 187.1435 398.0898 428.1321 439.2902 321.1474 535.21 204.6112 446.2585 407.1732 654.19 455.3344 500.3049 409.2372 295.1285 547.331 625.2796 375.0536 242.1172 376.2281 209.0914 597.3382 370.1355 270.0645 541.2606 671.3161 276.1258 468.9809 287.0473 320.2047 277.1414 177.0627 504.3367 735.4868 461.2096 617.3879 589.3121 202.1068 148.0604 520.295 327.0837 347.1596 556.1906 297.1436 334.0427 206.1392 431.0663 911.594 336.1909 329.1818 608.3826 289.2475 367.3351 156.0483 478.1842 619.2416 404.847 465.1608 207.0614 605.4416 579.337 242.2114 310.0135 396.1853 341.206 397.2156 344.2247 433.1485 393.2099 346.1958 434.0664 398.7746 484.137 371.2276 216.1954 318.1811 399.1987 708.4868 435.1311 474.2324 383.0228 291.1928 316.1778 751.5081 467.6187 272.1853 739.6026 434.1648 269.0628 293.6761 744.5533 302.1702 637.1514 331.109 507.1058 416.158 369.1843 489.1045 233.0772 507.1304 407.1706 298.1856 404.1482 427.305 210.1843 857.3778 729.4168 348.2378 477.2335 409.1764 735.4996 779.5132 388.2532 478.2635 431.2514 208.1812 449.2862 576.4091 474.2906 594.1326 336.3255 637.8139 627.095 403.2751 353.2653 683.345 325.0998 331.2087 242.2836 450.2319 342.1657 188.066 339.2526 335.158 300.2013 659.2866 591.358 214.2521 686.4674 545.2051 348.2012 314.1799 231.1164 209.122 503.3049 297.1802 360.1322 403.2229 455.107 322.0628 229.0386 311.1988 316.061 300.0013 520.3407 250.1462 446.2248 387.1969 436.1368 480.1787 390.0094 334.2211 458.2322 399.3568 506.2726 235.1782 389.1216 452.2747 329.0849 502.2879 654.3297 385.2046 421.2828 433.2463 846.4398 402.2327 355.1755 325.0272 360.2022 263.1492 447.1673 402.1759 316.3205 610.3903 396.0943 157.0855 465.1713 256.0605 518.3147 773.4406 653.332 331.1648 448.2113 511.2145 606.138 294.2061 409.1837 198.1845 403.22 486.2007 740.5427 560.4098 345.2984 669.4164 480.8023 407.1805 611.3249 395.2195 333.1659 840.5653 491.1995 211.0939 377.1463 299.1804 431.1815 459.1706 329.1689 401.2022 302.1736 248.108 482.3244 823.5398 358.1599 228.0157 441.1515 247.1286 255.0077 385.1619 534.3515 310.0137 601.3545 307.5846 257.1651 682.583 205.0968233.128 311.0851 708.4518 208.9943 367.317 502.1256 221.117 404.2074 293.0018 418.3511 522.2898 383.0299 232.0109 187.0257 625.39 315.0742 365.157 387.3458 528.1033 363.1622 415.2539 399.268 291.0476 313.0673 288.3147 415.2449 667.3875 480.3075 602.3388 334.0404 261.1445 433.2063 331.9939 331.1366 243.1336 261.1307 313.068 543.1052 331.2228 329.1491 224.185 284.0988 497.7717 332.604 295.1349 538.2512 284.2213 257.2212 285.1892 260.1854 561.3954 343.1966 385.0872 544.3215 318.8867 301.1419 214.0972 377.2579 306.094 356.077 669.2021 717.6223 487.3739 402.1677 204.042 683.5415 592.1747430.2644 421.3172 484.1461 243.0875 597.8556 425.1561 779.5197 874.4691 610.2815 575.2665 362.0542 472.2715 504.3064 592.3891 245.137 414.2359 342.2119 305.2105 651.257 387.2096 376.3416 376.762 505.105 653.3624 613.3397 308.8983 247.132 368.1613 289.0533 237.0678 311.1281 790.3776 569.3138 328.2316 298.1252 521.1332 379.2408 307.5897 120.0809 459.1915 685.3904 443.2189 544.3519 348.3105 264.0825 648.3785 274.1444 541.1202 550.1643 306.2638 472.3602 219.1125 642.6193 290.1934 664.1265 415.2426 504.1223 528.1789 195.1005 290.1952 190.0339 471.1366 487.1548 331.1695 412.2519 416.2854 321.0536 533.4225 386.0815 204.1049 433.2634 353.0485 309.6428 354.0344 315.215 518.0863 409.1898 635.3845 796.5389 537.3384 227.1646 353.2061 480.164 752.5133 264.072 488.3571 308.0629 130.1588 603.2672 654.1961 861.4975 452.0319 293.1149 222.0541 266.173 546.1593 481.2613 198.0459 773.4417 297.2129 312.1869 161.096 357.0869 302.148 493.313 532.3832 373.3665 448.1767 287.1911 181.6125 845.5211 335.0784 289.1088 369.154 288.1805 509.3542 773.4915 641.3634 289.0603 277.1275 202.1436 320.2765 604.3512 355.0651 445.1738 561.3101 304.2108 439.2092 582.272 380.1061 639.3729 521.1369 334.0761 593.2762 324.1011 453.3427 335.1059 515.2178 540.1005 294.2273 380.1122 246.0794 365.2676 462.1453 376.2308 204.045 357.2989 268.0628 215.9822 725.3575 530.1589 184.1328 338.0815 385.1971 268.1542 343.2955 645.3443 283.0944 559.3043 462.248 313.1461 477.3414 529.3357 609.3057 373.1514 405.2599 545.1545 394.1956 385.1877 423.1984 592.1777 409.2473 325.2274 276.0625 436.062 320.1688 228.1955 867.3655 317.21 344.2278 509.2845 517.2837 584.2047 328.9155 341.1805 434.0686 287.2105 242.1159 221.1114 575.2728 597.2364 198.1485 362.3622 312.3253 543.2987 345.3354 315.2268 713.3509 478.0851 357.2366 610.367 264.0784 476.3052 340.0767 286.1397 636.4161 475.2773 652.1914 166.0861 675.5421 546.1562 271.181 387.1935 192.1223 331.1899 351.1808 507.2935 637.3037 225.1095 560.3254 631.3504 359.1477 269.0883 619.2691 164.1065 218.1383 283.1171 259.0201 434.2585 188.1999 385.2071 635.1389 493.2815 479.0776 526.2914 223.0963 429.3016 639.2251 534.3094 298.095 486.226 697.3627 513.3387 373.0739 277.0958 440.2142 711.1232 297.0668 285.0076 420.1996 246.0252 486.1428 516.2999 325.2119 455.2416 581.3643 300.2164 625.2127 421.2334 241.5671 239.1269 301.0709 384.1148 381.2962 299.1163 829.4161 366.1317 711.1342 362.2528 305.1569 548.3632 263.125 526.2574 610.1959 353.1168 337.0808 580.1723 282.1469 383.2029 699.612 408.7552 166.0742 359.7141 218.1194 283.1332 696.4348 671.3179 525.2874 553.3113 443.1129 332.2177 262.1188 539.267 437.2364 184.9848 265.5857 343.1425 188.07 292.096 432.2794 368.1451 261.1094 456.2802 Cluster 1 Cluster 3 Cluster 2 A C MS2LDA 760.5815 m/z * B M2M_526_Phosphocholine-based substructure 0 100 200 300 400 0 25 50 75 100 m/z R el at iv e In te ns ity 184.0725 OH P O OHO N+ Lo g2 F C 3.6 -2.6 1 Peptide metabolites C A 162.0225 m/z M2M_453_Glutathione-based substructure H + 308.0911 m/z OH H N OH O SH H2N O H N OH O SH N H NH2 HO O O O H2 N OH O SH O H N OH O SH N H O 233.059 m/z 162.0219 m/z 179.0484 m/z M2M_59_Phenylalanine-based substructure 166.0875 m/z 166.0875 m/z 166.0875 m/z 166.0875 m/z NH2 OH OH NH2 120.0825 m/z 120.0825 m/z 120.0825 m/z 120.0825 m/z [M+H]+ [M+H]+ MS2LDA M2M_453 B * 615.1719 m/z D MS2LDA M2M_59 *297.1802 m/z 487.1548 m/z * * 295.1285 m/z 277.1856 382.2582 382.527 371.1907 371.0673 306.2272 230.1749 442.8012 509.1202 403.1745 244.2086 385.2079 358.2426 477.2296 371.0477 331.1573 190.1434 295.1285 461.2096 347.1596 329.1818 371.2276 216.1954 318.1811 348.2378 477.2335 208.1812 297.1802 355.1755 331.1648 261.1445 331.1366 295.1349 257.2212 331.1695 264.072 302.148 509.3542 202.1436 334.0761 313.1461 320.1688 188.1999 610.1959 261.1094 N H O OH O HO O NH2 Glutamyl-phenylalanine [M+H]+ Unknown 118.086 481.2106 386.3258 615.1719 409.2795 217.1478 490.1759 629.159 358.1104 398.2412 313.2128 197.116 242.116 299.199 713.1472 502.0925 508.3023 679.4094 409.1622 716.5212 197.0779 338.1339 246.1016 210.0494 242.1172 291.1928 637.1514 311.1988 256.0605 294.2061 307.5846 285.1892 421.3172 305.2105 308.8983 307.5897 321.0536 308.0629 222.0541 266.173 277.1275 268.1542 343.2955 242.1159 513.3387 277.0958 301.0709 526.2574 580.1723 359.7141 283.1332 265.5857 * 308.0925 m/z 179.0475 m/z 233.0575 m/z 162.0225 m/z 179.0475 m/z 308.0925 m/z 308.0925 m/z 179.0475 m/z Precursor ion 629.159 m /z Precursor ion 713.1472 m /z Precursor ion 615.1719 m /z [M+H]+ *[M+H]+ 535.21233.0772 487.1548 Unknown * * [2M+H]+ Lo g2 F C -0.8 3.1 0 Lo g2 F C -1.1 1.3 0 Interacting protein Seed/input protein Connectivity E B n=32n=33 A Protein-protein interaction network symbology Mitochondrial function PSMD3 PSMA4 RPS13 RPL9 RPS7 MCM3 PPP2R2B YWHAQ UBE2N PRMT5 PRDX2 PRDX6 COPG1 DNA/RNA metabolism APRT DDX1 FIS1 MX1 H2AFV DHX9 CAV1 APEX1 DYNLL1 MYH10 RDX YWHAB GABARAPL2 RPL18A COX4I1 UQCRC1 CTGF LAMP1 CPSF6 B2M PDIA4 D C N orm alized protein abundance Protein metabolism-20 -10 0 10 20 -4 0 -2 0 0 20 40 Scores Plot PC 4 ( 14.8 %) P C 1 ( 36 .6 % ) HG+HI NG PC4 P C 1 NG HG+HI HG+HI NG PRDX6 TXN2 APRT Metabolites Proteins PRDX2 Glutamate Glutathione Proline Leucine Tyrosine 2-AminoadipateKynurenine Serine Methionine Threonine OAT HG+HING Nuclei vWF Nuclei vWF HG+HING B A HCAEC BCAEC ≈ 30% ≈ 58% Binucleation HG+HI BINUCLEATION Translation Nuclei Up-regulated Down-regulated Angiogenesis or cell proliferation CTGF AFABP CD62 CAV-1 RPS7 RPS13 RPL9 PSMA4 PSMD3 Integrated Analysis COX4I1 UQCRC1 NDUFB3 NDUFA7 Mitochondrial inner mas CAVIN3 Caveolae DNA and RNA metabolism NH2 H N O OH NH2O NH2 O OH Tryptophan SerotoninKynurenine Catabolism NH2 H N HO N H O OH NH2 O HO O H2N O OH Glutamyl-phenylalanine Phenylalanine-based metabolites Phenylalanine O P O OO N+ O O H R1 PAFAH1B2 Deacylation LysoPC lipids PC lipids OH P O OHO N+ Phosphocholine O P O OO N+ O O R2 R1 LYPLA1 Degradation Inflammation Catabolism Oxidative stress Peptides NH2 O HO O OHOrnitine OAT Glutamic acid NH O OH O H N OH OSH N HNH2 HO O O Ox-Glutathione Glutathione-based metabolites PRDX2 PRDX6 TXN2 ROS Glutathione Proline Mitochondria Table 1. Pathway enrichment analysis of up-regulated and down-regulated proteins in HG+HI group REACTOME Database Total Hits FDR Total Hits FDR Up-regulated Down-regulated Metabolism of RNA 339 142 1.26E- 100 Peptide chain elongation 178 77 5.81E- 48 Metabolism of mRNA 317 136 1.04E-97 Influenza Infection 185 78 6.52E- 48 Synthesis of DNA 95 75 2.37E-79 Nonsense Mediated Decay Independent of the Exon Junction Complex 184 77 4.04E- 47 DNA Replication 102 77 5.67E-79 Influenza Life Cycle 180 76 4.63E- 47 DNA Replication Pre-Initiation 80 68 3.49E-76 Eukaryotic Translation Elongation 186 77 4.63E- 47 M/G1 Transition 80 68 3.49E-76 Nonsense Mediated Decay Enhanced by the Exon Junction Complex 203 80 4.63E- 47 S Phase 122 78 4.62E-71 Nonsense-Mediated Decay 203 80 4.63E- 47 G1/S Transition 113 75 4.37E-70 Influenza Viral RNA Transcription and Replication 176 75 5.77E- 47 Assembly of the pre- replicative complex 63 57 5.61E-67 Viral mRNA Translation 176 75 5.77E- 47 Metabolism of RNA 339 142 1.26E- 100 Eukaryotic Translation Termination 178 75 1.42E- 46 KEGG Database Up-regulated Down-regulated Basal transcription factors 153 111 2.17E- 115 Basal transcription factors 153 88 8.15E- 73 Mismatch repair 45 43 1.68E-53 Nucleotide excision repair 135 46 3.62E- 24 SNARE interactions in vesicular transport 124 41 2.28E-22 Renal cell carcinoma 201 46 2.26E- 16 Base excision repair 36 18 3.46E-13 Endometrial cancer 204 45 1.88E- 15 Human papillomavirus infection 155 34 1.62E-12 Peroxisome 137 35 5.32E- 14 Chemical carcinogenesis 201 36 1.50E-10 Nicotine addiction 193 41 1.48E- 13 Hepatocellular carcinoma 76 18 4.74E-07 Ribosome biogenesis in eukaryotes 79 26 2.92E- 13 Human T-cell leukemia virus 1 infection 162 26 1.49E-06 Gap junction 199 41 3.40E- 13 Chronic myeloid leukemia 97 19 3.79E-06 Herpes simplex virus 1 infection 225 42 5.11E- 12 Notch signaling pathway 160 25 3.79E-06 Glutamatergic synapse 231 36 1.02E- 14 Table 2. Integrative pathway enrichment analysis of up-regulated proteins and metabolites in HG+HI group REACTOME Database Total Hits FDR KEGG Database Total Hits FDR Metabolism of amino acids and derivatives 190 44 8.74E-39 EGFR tyrosine kinase inhibitor resistance 1490 129 1.01E- 72 Metabolism 1490 85 1.33E-35 Glutathione metabolism 56 39 1.63E- 54 Glutathione conjugation 25 21 7.33E-33 Alanine, aspartate and glutamate metabolism 36 24 4.57E- 32 Phase II conjugation 74 25 2.04E-25 ABC transporters 75 27 1.09E- 26 Amino acid synthesis and interconversion (transamination) 18 15 5.97E-23 Cysteine and methionine metabolism 49 22 3.48E- 24 Biological oxidations 142 25 7.73E-18 Pancreatic cancer 82 23 7.55E- 20 tRNA Aminoacylation 42 13 5.19E-12 Drug metabolism - cytochrome P450 72 21 1.67E- 18 Glutathione synthesis and recycling 10 7 3.67E-09 Metabolism of xenobiotics by cytochrome P450 76 21 5.18E- 18 Sulfur amino acid metabolism 25 9 9.84E-09 Drug metabolism - other enzymes 79 20 2.45E- 16 Tryptophan catabolism 11 6 7.51E-07 mRNA surveillance pathway 73 19 8.78E- 16 10_1101-2021_01_06_425536 ---- Recognition of a Tandem Lesion by DNA Glycosylases Explored Combining Molecular Dynamics and Machine Learning Recognition of a Tandem Lesion by DNA Glycosylases Explored Combining Molecular Dynamics and Machine Learning Emmanuelle Bignon1,*, Natacha Gillet1, Chen-Hui Chan1, Tao Jiang1, Antonio Monari2, and Elise Dumont1,3 1Univ. Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F69342, Lyon, France 2Université de Lorraine and CNRS, LPCT UMR 7019, 54000 Nancy, France 3Institut Universitaire de France, 5 rue Descartes, 75005 Paris *emmanuelle.bignon@univ-cotedazur.fr ABSTRACT The combination of several closely spaced DNA lesions, which can be induced by a single radical hit, constitutes a hallmark in the DNA damage landscape and radiation chemistry. The occurrence of such tandem base lesions give rise to a strong coupling with the double helix degrees of freedom and induce important structural deformations, in contrast to DNA strands containing a single oxidized nucleobase. Although such complex lesions are known to be refractory to repair by DNA glycosylases, there is still a lack of structural evidence to rationalize these phenomena. In this contribution, we explore, by numerical modeling and molecular simulations, the behavior of the bacterial glycosylase responsible for base excision repair (MutM), specialized in excising oxidatively-damaged defects such as 7,8-dihydro-8-oxoguanine (8-oxoG). The difference in lesion recognition between a simple damage and a tandem lesions featuring an additional abasic site is assessed at atomistic resolution owing to microsecond molecular dynamics simulation and machine learning postprocessing, allowing to extensively pinpoint crucial differences in the interaction patterns of the damaged bases. This work advocates for the use of such high throughput numerical simulations for exploring the complex combinatorial chemistry of tandem DNA lesions repair and more generally multiple damaged sites of the utmost significance in radiation chemistry. Keywords: MutM, DNA repair glycosylase, tandem lesion, molecular dynamics simulations Introduction The chemical stability of DNA components is fundamental to maintain the genome stability, hence preventing unwanted mutations or cell death. Indeed, the accumulation of DNA lesions has been recognized as one of the principal causes of cancer development1. Although DNA maximizes its stability through its helical structure, its constituting nucleic acids are constantly exposed to damaging agents, either endogenous or exogenous, that inevitably lead to the production of lesions. Among the different sources of DNA lesions, we may briefly remind oxidative agents, such as free radicals or reactive oxygen species (ROS), UV light, and ionizing radiations. As a consequence, specific and highly efficient repair machineries exist that are able to recognize the presence of lesions in the genome and remove them to reinstate undamaged DNA strands2, 3. Specific DNA repair pathways may depend on the organisms, and are also related to the kind of lesions, for instance for localized oxidatively-induced damages the base excision repair (BER) pathway is preferred4, 5, while for more extended and bulky lesions, such as base dimerization, the nucleotide excision repair (NER) mechanism is favored. Yet this sophisticated repair mechanism has been reported to be strongly impaired when not only one but two adjacent DNA lesions are located on the same strand, the so-called tandem lesions. The formation of tandem lesions can derive from a single radical hit, and their biological impact is now well established. While their formation mechanism has been delineated6, the reasons underlying their resistance to repair are more elusive and should be analyzed taking firmly into account specific structural modification. The most common oxidative tandem lesions feature two adjacent oxidized nucleobases. In the following we will specifically consider 8-oxoguanine (8-oxoG) and an abasic apurinic/apyrimidinic site (Ap), as shown in Figure 1-C. This arrangement is particularly relevant also because Ap are also the most common outcome of ionizing radiations after excision of an entire nucleobase.7. Interestingly, Ap sites also represent key intermediates of the BER machinery and result from the action of DNA glycosylases before being further processed and removed by endonucleases. The presence of a tandem lesion, or more generally multiple damaged sites (MDS), that are the hallmarks of radiation chemistry, induces strong coupling (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 between the lesions that in turn is translated into important structural deformations of the nucleic acid as compared to its ideal structure, i.e. either undamaged strand or sequences containing an isolated lesion. The unusual structural deformations induced by tandem lesions or MDS may also well justify their globally lower repair rate as compared to other lesions8–11. To cope with their frequency, canonical DNA lesions benefit from a most efficient repair. For instance, 8-oxoG, which is well-known to mismatch with adenine and hence is potentially mutagenic12, is repaired by formamidopyrimidine DNA glycosylase, an enzyme that is referred to as Fpg in eukaryotes, while its bacterial counterpart is called MutM. The latter recognizes the presence of 8-oxoG in the genome and specifically binds at the damaged site13. Many studies have contributed to dissect the mode of action of MutM/Fpg in presence of a single 8-oxoG in particular concerning the recognition of the lesion14–16. Fpg17, 18 has been shown to recognize 8-oxoG among other oxidatively-induced lesions and to subsequently proceed to its extrusion initiating the base excision process4, 5. The mechanisms of recognition19 and extrusion17, 20, 21 of 8-oxoG have been scrutinized through a series of techniques, including molecular modeling and simulations, and are now relatively well characterized. Recently, Simmerling et al.22, while recognizing the role of the damaged base flipping in favoring its recognition, have also pinpointed the existence of preliminary recognition steps correlating with the rapid sliding of Fpg along the DNA strand that is incompatible with a recognition mechanism based on the systematic flip of all the bases. In addition, the same authors have also identified that 8-oxoG flips preferably through the major groove. The free energy required for the extrusion of 8-oxoG in extrahelical position has also been estimated by La Rosa and Zacharias16, also taking into account the contributions due to the DNA global bending and twisting. A most important feature of MutM/Fpg efficiency has been traced back to the crucial M77, R112, and F114 amino acids triad. Indeed, it permits to disrupt 8-oxoG interactions within the DNA helix by intercalating above the 8-oxoG position, thereby facilitating its extrusion towards the active site. Besides, other several important MutM/Fpg residues (K60, H74, Y242, K258, and R264) are known to stabilize the DNA helix by interacting with its backbone20. 3' 5'3' 5' dG1 dT2 dA3 dG4 dA5 dT6 dC7 dC8 dG9 dG10 dA11 dC12 dG13 dC26 dA25 dT24 dC23 dT22 dA21 dG20 OG19 dC18 dC17 dT16 dG15 dC14 B C 8oxoG Ap OG19 R112 F114 M77 H74 K60 R264 Y242 K258 A Figure 1. (A) Cartoon representation of the bacterial MutM in interaction with a 13-bp double stranded DNA helix harboring 8-oxoguanine (OG19) as the 19th nucleobase — PDB ID code 3GO821. The magnified section highlights the position of the catalytic triad (M77, R112, and F114 in green) and the residues interacting with the DNA backbone (in orange) around the damage. (B) Sequence of the 13-bp oligonucleotide, showing the position of the 8-oxoguanine (in red). In simulations with tandem lesions, dG20 is mutated in silico into an abasic site (Ap). (C) Chemical structure of the 8-oxoguanine and the abasic site lesions. On the other hand, several studies have addressed the behavior of tandem-containing oligonucleotides, either from a biochemical and repair perspective9 or from a structural point of view23, also relying on molecular modeling and simula- tions11, 24, 25. Globally, the different approaches agree in pointing out a strong effect of closely spaced lesions in modifying the structure and dynamics of the oligonucleotide. In addition, strong sequence effects, depending both on the relative position of the cluster lesions and on the nearby undamaged bases contribute to the complexity of the global landscape. The interaction of MDS-containing oligonucleotides with repair enzymes and in particular both E. Coli and human endonucleases10, 26 has been reported. The perturbations exerted by the secondary lesion on the protein/DNA contact regions, and the consequent decrease in its repair efficiency, as observed for some particular tandem lesions have also been highlighted. However, no analysis of the structural behavior of Fpg and MutM in presence of tandem DNA lesions has been reported, despite the relevance that such lesions may assume in conditions of strong oxidative stress or ionizing radiations. In this work, we take advantage of the existing knowledge of 8-oxoG recognition by MutM to investigate the structural and dynamic impact of the presence of a second, adjacent lesion, namely an Ap site. Relying on all-atom, explicit-solvent molecular dynamics10, 27 we simulate the structural and dynamical behavior of a MutM:DNA complex. We consider both the situation in which only a single lesion (8-oxoG) is present and compare it to the one in which the adjacent guanine base 2/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 dG20, present in the X-ray structure (PDB ID 3GO8), has been in silico mutated to an Ap site - see Figure 1-B. We clearly show that the presence of the tandem lesion induces important structural deformations to the DNA that significantly perturb the protein/nucleic acid interaction pattern, hence being susceptible to alter the 8-oxoG extrusion. We extensively describe the changes in the 8-oxoG lesion structural signature upon the presence of an adjacent Ap site and the perturbation of the interaction network with MutM (Fpg), which contribute to ultimately diminish the recognition efficiency. Results We report the structural and dynamical properties of the bacterial MutM interacting with a 13-bp DNA sequence harboring either a single 8-oxoG at the position 19 (as found in the crystal structure PDB ID 3GO8) or 8-oxoG coupled with an Ap site at position 20, along two replicas reaching 1µ s MD simulation time each. The numbering of the nucleic acids used hereafter corresponds to the one in Figure 1-B; the numbering of MutM residues refers to the crystallographic structure (PDB ID 3GO8). Tandem lesions impact the interaction network around 8-oxoG The interaction network as found in the MutM:DNA crystal containing a single 8-oxoG lesion is conserved stable along our MD simulations. A most important structural feature in MutM is its intercalation triad, consisting of the M77, R112 and F114 residues. Those three amino acids are located around the 8-oxoG in the minor groove, weakening the stabilizing interactions of the lesion within the double-helix to facilitate its extrusion. R112 interacts with the complementary dC7, while M77 and F114 intercalate directly above 8-oxoG and disrupt the stable π -stacking with the adjacent base-pair – see Figure 2-A. These interactions are persistent along the entire MD simulations of the singly-damaged system. F114 is involved in π -staking with dG20 during 91.8% of the time series, with the distance between heavy atoms of their aromatic rings averaging at 5.35±0.5 Å. The rest of the time, F114 stacks transiently with dC7 facing dG20 and their aromatic rings maintain a distance of 6.7±0.5 Å along the simulations – see Figure 3. M77 intercalates between OG19 and dG20, as its terminal methyl group remains at 4.8± 0.4 Å of the N9 atom involved in the N-glycosidic bond, and is ideally positioned to act on OG19 desoxyribose moiety to drag it outwards20. OG19 dG20 dC8 R112 M77 F114 R76 A OG19 dC8 Ap20R76 F114 R112 B dC7 M77 R264 R264 Figure 2. Cartoon representation of MutM interacting with the DNA helix harboring a single 8-oxoG lesion (OG19, A) or tandem lesions 8-oxoG + Ap (OG19 and Ap20, B). H-bonds are depicted as dashes pink lines and the DNA structure is rendered transparent for sake of clarity. Upon multiple lesions, the interaction pattern around 8-oxoG (OG19) is perturbed. The intercalation triad M77/R112/F114 is shifted down by R76 which comes to interact between Ap20 and the facing dC7, preventing M77 ad F114 intercalation above 8-oxoG. R264, normally interacting with the DNA backbone phosphates between positions 19 and 20, is now involved in hydrogen bonding with OG19 carbonyl. R112 side chain amino groups form H-bonds with the nitrogen and carbonyl of dC8 over 76.8% of the simulation time, the distance between these two atom groups being of 5.1±0.6 Å. Several other amino acids have been identified to stabilize the MutM:DNA complex by interacting with the negatively-charged phosphate groups of the backbone namely K60, H74, Y242, K258, and R264. These interactions are stable in our simulations and the highly conserved R264 forms strong H-bonds between OG19 and dG20 phosphate groups - as shown in Figure S1. Noteworthy, R264 is known to play a role in 8-oxoG extrusion22. The structural behavior of MutM:DNA(8-oxoG) observed here corroborates the hypothesis of a highly dynamic system, whose functional flexibility is known to be central to ensure its biological role through the recognition and extrusion of 8-oxoG28, 29. 3/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 The dG20 → Ap20 mutation induces a clear perturbation of this well-characterized interaction network. A first consequence is the perturbation of the dynamics of the intercalation triad. The presence of the abasic site involves, in the first 100 ns of simulation, a rapid reorientation of R76 situated just above the intercalation triad. R76 side chain turns towards the damaged site, and is found closer to OG19, at 7.8±0.8 Å vs. 11.8±0.7 Å observed in the singly 8-oxoG-containing duplex. R76 does not interact directly with OG19 but rather positions itself in the gap between the Ap site and the facing dC7, bridging the two residues through stable H-bonds as reported in Figure 2-B. The distance between the R76 guanidinium nitrogens and the dC7/Ap20 H-bond acceptor atoms lies at 2.6±1.1 Å and 3.0±1.8 Å, respectively, in the tandem-damaged MutM:DNA complex. Comparatively, the dC7-R76 distance is of 8.3±0.7 Å in the singly damaged system – see Figure 3-B. The reorientation of R76 reshapes the canonical interaction network of the intercalation triad, which is globally shifted downwards the duplex. F114 is pushed away from position 20 and comes closer to the opposite strand, the dC7-F114 distance drops to 5.7±0.6 Å, although its strong cation-π interaction with R76 avoids direct stacking with dC7. Additionally, the distance between the R76 guanidinium extremity and the F114 aromatic ring is of 3.9±0.5 Å vs. 8.0±0.9 Å in the singly-damaged system, while the interaction of R112 with the estranged dC8 is destabilized. In presence of tandem lesions, R112 lies further from dC8 (5.9±1.0 Å) than what is observed for the singly-damaged complex (5.1±0.6 Å). The intercalation of M77 is prevented in presence of the tandem lesion since its terminal methyl group rotates away from OG19:N9 (5.3±0.6 Å), while the interaction with M77 corresponds to a more rigid binding mode, with the formation of a H-bond between the sulfur atom and one hydrogen of OG19. The corresponding distance is reduced to 2.7±0.7 Å vs. 3.5±0.8 Å with the singly damaged (OG19) duplex. B A Figure 3. Distribution of relevant distances involving the intercalation triad (A) and R76 (B) upon a single 8-oxoG mutation (single) or Ap + 8-oxoG lesions (tandem). The presence of the Ap site at position 20 makes M77 and R112 move away from 8-oxoG and the facing dC8. F114 makes π -stacking with dC7 because the nucleobase at position 20 is now absent. R76 comes closer to 8-oxoG and intercalates in the gap left by the abasic site, with formation of very stable H-bonds bridging Ap20 and the facing dC7. It also interacts with F114, preventing it to stack within the double-helix. Interactions between the DNA backbone and MutM tend to be more rigid upon tandem damages than in the singly damaged duplex. The H-bond between K60 and the phosphate at position 20 is stronger as witnessed by the –NH3+... P distance that is reduced to 5.3±1.1 Å vs. 8.2±2.6 Å for the singly-damaged system, as well as the interaction of H24 with dA21 (NH - P distance of 4.3±0.6 Å vs. 5.4±2.0 Å with the isolated 8-oxoG) and Y242 H-bond with OG19 (OH - P distance of 5.1±1.7 Å vs 7.1±2.3 Å). However, the interactions of R264 with the DNA helix is strongly perturbed: in the singly-damaged system, R264 forms stable H-bonds with OG19 and dG20 phosphates (CZ - P distance of 4.9±1.9 Å and 4.9±1.3 Å, respectively) which are 4/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 disrupted upon the presence of the additional Ap site (CZ - P distance of 8.1±3.2 Å and 8.8±2.4 Å, respectively). The R264 position experiences important fluctuations in the tandem-damaged complex, and can form stable H-bond with OG19:O8 - see Figure 2-B. the R264:CZ - OG16:O8 distance is below 4 Å for 42% of the simulation time in the tandem-damaged complex, while in the singly-damaged complex such short distance amounts to 11% only - see Figure S2. This first local analysis suggests that the singly- vs. tandem-damaged 13-bp duplex present different interaction patterns, with non trivial changes in the binding mode and its dynamics. In order to probe more extensively the structural and dynamic consequences of dG20 → Ap20 substitution, we have relied on a recently-proposed machine-learning protocol30 to identify other residues possibly implied in the recognition mechanism. Systematic assessment of interacting residues through machine-learning protocol In order to probe the residues that exhibit important interactions with the DNA duplex, a machine-learning protocol based on the multilayer perceptrons (MLP) classifier was set up. The latter allows to generate a "footprint" of the residues that are particularly involved in MutM:DNA bonding – see Figure 4. A score function, in the following referred to as ’importance’, is attributed to each residue: the higher the score, the higher the contribution to the MutM:DNA complex stabilization. Using a threshold of 0.04 of importance, 47 and 61 residues out of 273 single out in the singly and tandem-damaged system, respectively. The three residues of the intercalation triad (i.e. M77, R112 and F114) show a slightly higher contribution in the tandem- (0.043, 0.042, 0.045) than in the singly-damaged system (0.041, 0.038, 0.037). R76 and Q78, adjacent to M77 in the MutM sequence, also present high values. As highlighted by the visual inspection of our MD trajectories, in the tandem-damaged system, R76 flips towards the lesion site to compensate for the nucleobase removal at position 20 by bridging Ap20 to the facing dC7 through strong H-bonds. The importance score for R76 is 0.045 with tandem lesions vs 0.037 with isolated 8-oxoG, corroborating the significant role of this residue in MutM:DNA binding upon the presence of Ap20, in line with the newly-formed and very stable H-bonds with the lesion site. Q78 importance is higher than the threshold in both tandem- (0.042) and singly- (0.043) damaged systems. This residue interacts with R112, contributing to the H-bonds network in the vicinity of the lesion. Adjacent to F114, G115 importance in the stabilization of the complex is also enhanced upon dG20 → Ap20 mutation (0.042 in tandem vs. 0.033 with isolated 8-oxoG). Additional visual inspection of the MD trajectories reveals that G115 forms a strong H-bond with R76, helping in maintaining the latter intercalated between Ap20 and the facing dC7. loop P2 R35 M77 R76 F114 R112 N174 Y242 R264 Figure 4. Importance of the contribution of residues to the MutM:DNA complex bonding for the singly-damaged (blue) and the tandem-damaged (orange) systems. The threshold value above which the importance of the residue for the stabilization of the complex is considered as significant is 0.04. Some of the key-residues as well as the flexible loop region are pinpointed by the arrows. Contributions of amino acids to the bonding are mostly higher upon 8-oxoG/Ap combination, suggesting a more rigid complex upon multiple damage sites than with an isolated 8-oxoG. Over the five key residues reported to anchor the phosphate DNA backbone (K60, H74, Y242, K258, and R26431), only the closest to OG19 are associated with importance scores above the threshold of 0.04: Y242 (0.040 and 0.052 for the singly- and tandem-damaged system), K258 (0.037 and 0.040) and R264 (0.051 and 0.061). The G173 and Y176 residues also 5/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 contribute to the H-bonding with the DNA backbone. N174 shows high importance values, 0.040 and 0.053 for singly- and tandem-damaged systems, this is due either to its interaction with the damaged site backbone or through indirect coupling with R264, as previously described in the literature17. I173 is involved in hydrophobic interactions with Y242 that in turn interacts with OG19 backbone. Among other residues whose contribution is above the threshold, R35 forms H-bonds with either dC8 or dC23 backbone, P130 and M166 interact with dT22:P, D165 and R150 maintain the 5’-terminus backbone of the DNA strand 1 (in the dG1 and dT2 surroundings), L164 stabilizes the position of the key-residue R264, while G263 and G265 form H-bonds with R264 or directly with the DNA backbone, and K258 interacts with the dC17 phosphate. Globally the MLP analysis clearly reveals that the protein residues comprised between the position 210 and 237 exhibit the highest values of importance. They correspond mostly to a large, flexible loop, comprising the residues 221–234 at the C-terminus that is prone to disorder, but also known to have an implication for DNA recognition despite being spatially far from the double-helix21. Amino acids at the N-terminus also show significant contributions to the MutM:DNA bonding. The proline located at the very end of the N-terminal region has an important catalytic role since it reacts with the C1’ atom of the deoxyribose sugar moeity of the 8-oxoguanine to form a Schiff base, and hence it induces the cleavage of the N-glycosidic bond which constitutes the first step of the repair process. Adjacent to P2, the vicinal E3 is also known to play a role in MutM catalytic efficiency. Interestingly, the contribution of these two residues to the MutM:DNA stability decreases from 0.054 in the singly-damaged to 0.044 for the tandem-damaged complex, hence corroborating a subtle reduction of the excision efficiency. Other residues of the N-terminus (L4, P5, E6) also show a drop in their contribution upon dG20 → Ap20 mutation. The residues which single out in this MLP analysis match very well with the ones evidenced by previous works on MutM and Fpg14, 15, 17, 19, 20, 28. Our machine-learning post-processing allows to disentangle a complex interaction pathway, which is already well-established for 8-oxoG-containing DNA29 but perturbed upon the presence of tandem lesions as revealed by the present simulations. It allows to generate an exhaustive map of residues showing importance for the protein-DNA interactions, beyond the simple visual investigation based on the data from the literature. Noteworthy, the nucleic acid importance score in the MutM:DNA bonding is enhanced upon the presence of Ap20, denoting again a more constrained oligonucleotide - see Figure S3. Mechanical and dynamic properties of the DNA strand In order to assess the mechanical and dynamic properties of the DNA strand, the MD trajectories were post-processed with the Curves+ program32 to evaluate the structural parameters of the double helix. The first signature of the B-helix is often the bend angle, which reaches typical values around 51±11◦ upon interaction with MutM for the singly-damaged oligonucleotide. Such extreme values for bending are typical33, 34 and necessary to facilitate the extrusion of the lesion towards the enzyme active site. The presence of the Ap site at position 20 is not sufficient to perturb the global bending of the 13-bp oligonucleotide (49±12◦), but rather induces local deformations. dC8-OG19 parameter Single Tandem Local bending (◦) 8.6 ±1.6 7.6 ±1.9 Tip (◦) 14.0 ±5.8 8.1 ±6.1 Inclination (◦) 19.5 ±4.3 18.3 ±7.0 Buckle (◦) -16.1 ±8.4 -8.8 ±10.7 Propel (◦) 5.6 ±7.5 0.0 ±9.3 Opening (◦) 0.1 ±3.9 1.0 ±3.6 Shear (Å) -0.14 ±0.33 -0.21 ±0.39 Stretch (Å) 0.05 ±0.12 0.02 ±0.12 Stagger (Å) 0.54 ±0.33 0.49 ±0.43 Table 1. Averaged values of the dC8-OG19 base-pair structural parameters, for the single 8-oxoG (Single, left) and the tandem 8-oxoG+Ap (Tandem, right). Structural parameters of the dC8-OG19 basepair are particularly impacted, with values lower for the tandem- than for the singly-damaged system - see Table 1. Importantly, the backbone parameters ’Bend’, ’Tip’ and ’Inclination’ are lower when the Ap site is present at position 20, denoting a straighter portion of DNA helix than what is normally found in the canonical single-damaged MutM:DNA complex - see Figure 5 and Figure S4. The values monitored for these parameters are of 8.6±1.6◦, 14.0±5.8◦, and 19.5±4.3◦, respectively in the singly-damaged system, vs 7.6±1.9◦, 8.1±6.1◦, and 18.3±7.0◦ in the tandem- damaged complex. Besides, several intra base-pair structural parameters are also found closer to the canonical B-DNA for the dC8-OG19 base-pair. Especially, the ’Buckle’ and ’Propeller’ drop from -16.1±8.4◦ to -8.8±10.7◦ and from 5.6±7.5◦ 6/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 to 0.0±9.3◦, respectively upon dG20 → Ap20 mutation. Other parameters (Opening, Shear, Stretch, and Stagger) show less significant deviations - see Table 1 and Figure S5. Noteworthy, Qi et al21 reported a change in puckering values upon oxidation of a guanine residue. 8-oxoG would exhibit a C4’-exo puckering while a canonical dG ribose moiety would harbor a C2’-endo conformation. This would promote the recognition of 8-oxoG by MutM. In our simulations, the frequency of the C2’-endo conformation of OG19 is increased by the presence of tandem lesion compared to a single 8-oxoG (42.4% and 21.3%, respectively). However, rather than the C4’-exo puckering (2.9% and 7.6% for tandem- and single-damaged), the C1’-exo is the main or second preponderant conformation (42.5% and 61.5%) - see Figure S6. Concerning the inter base-pair parameters, DNA structural values are comparable for single- and tandem-damaged systems and in agreement with previous works16. As could be expected though, the absence of the nucleobase at position 20 upon mutation to Ap site influences the stability of the canonical stacking that is usually conserved in the singly-damaged complex. It is reflected in the distribution of the parameters values, which is much broader in the presence of Ap at position 20 - see Figure S7. This highlights the blurrier structural signature exhibited by the tandem-damaged DNA helix, which is another criteria that might affect the interaction with the surrounding amino acids, hence the efficiency of the 8-oxoG extrusion by MutM. Inclination (°) Propeller (°) Tip (°) 0 20 40 0 20-20 0 20-20 40 0 20-20 400 20-20 Ta n d e m S in g le 0 20 40 Figure 5. Distribution of three characteristic DNA helix intra base-pair parameters for dC8-OG19 over 2 µ s MD simulation, for a single 8-oxoguanine (Single, red, top) and both 8-oxoguanine and Ap site (Tandem, blue, bottom). The structural deformation with respect to canonical B-DNA is globally shier for the tandem-damaged than the singly-damaged complex. Discussion MutM, the bacterial analog of the human Fpg, is responsible for the recognition and repair of the utmost common 8-oxoG lesion. The Fpg(MutM):DNA interface has been investigated by NMR, X-ray and molecular dynamics simulations, probing the key residues that play a crucial role in the most specific recognition of 8-oxoG, but also of other DNA lesions15, 17, 19–22, 28, 34–37. An intercalation triad (M77, R112, F114) has been characterized, and several other residues are known to be essential in MutM:DNA interactions and 8-oxoG extrusion, guiding the lesion towards the N-terminal proline responsible for the Schiff base formation. Intrahelical insertion of a single F114 wedge residue22, 36, 37 is marked and allows a slow scanning of the double helix by MutM and analog enzymes. Among MutM key-residues, R264, located in the Zn-finger domain, is highly conserved and important for 8-oxoG extrusion20, 21. N174 also plays a key-role and its mutation leads to the perturbation of the R264 contacts17, 22. Besides, the C-terminal flexible loop is known to be essential for the 8-oxoG recognition by folding over the lesion in a capping process19, 21. While the recognition and repair of single 8-oxoG by MutM are well documented, their perturbation upon the presence of tandem lesions is very poorly understood. However, it has been shown that ionizing radiations can lead to the formation of tandem lesions38, rendering 8-oxoG refractory to excision by glycosylases8, 39. Such multiple damaged sites are highly mutagenic and increase the risks of cancer development40, 41. They can also be cytotoxic as their error-prone repair can result in the formation of deleterious double-strand breaks42, 43. Noteworthy, the high toxicity of the DNA lesions induced by ionizing radiation is also exploited for the development of cancer (radio)-therapies44. In this context, we investigated the structural impact of tandem lesions on the interactions between MutM and a 13-bp oligonucleotide harboring the 8-oxoG lesion at 7/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 position 19. Using molecular modeling and machine-learning analysis, we highlighted a structural re-organization of MutM canonical interaction network around 8-oxoG upon the presence of an adjacent abasic site at position 20. The interaction network involving the intercalation triad and the damage is perturbed by the dG20 → Ap20 mutation. The MutM:DNA interactions are more pronounced, leading to a more rigid system, which could explain the difficulty of MutM to process such multiple damaged sites. While in the simulation of the singly-damaged system, the classical interaction patterns are observed, the presence of an additional Ap site results in the rotation of R76 that provokes a shift of the intercalation triad. Noteworthy, as R76 is poorly conserved in MutM sequences from different organisms20, 29, one cannot rule out the possibility of a different reorganization around the intercalation triad. First observations of our MD trajectories allowed to describe the re-shaping of the MutM:DNA interaction patterns - see Figure 2 and 3. The structural analysis of the DNA oligonucleotide also reveals changes in the local conformation of the lesion site (see Figure 5 and Table 1), which might jeopardize the efficiency of 8-oxoG recognition by the enzyme. In order to go beyond the visual observation of MutM:DNA interactions, we applied machine learning (ML) techniques to provide an extensive map of these contacts. ML methods have gained enormous amount of attention in recent years. Their power in finding important information out of large amount of data has been exploited by the biochemistry community, many interesting applications have been showcased in the literature. Recently, Fleetwood et al.30 have demonstrated its capability in learning ensemble properties from molecular simulations and providing easily interpretable metrics describing important structural or chemical features. The machine-learning analysis of our trajectories is based on the demystifying package from Fleetwood et al.30. Residues highlighted as providing a significant contribution to the MutM:DNA bonding by the MLP analysis are in agreement with data from the literature. Comparison of the residues importance in MutM:DNA interactions upon single or tandem lesions allowed to pinpoint the changes in the interaction patterns, which concern the most important features of MutM - see Figure 4. Apparently in contradiction with common chemical sense, MLP analysis revealed that the dG20 → Ap20 mutation leads to stronger, more stable interactions between the two macromolecules. The contribution of the residues involved in DNA anchoring is almost systematically increased in the tandem-damaged system. Nuleic acids also exhibit stronger interactions with MutM in the case of tandem lesion, which overall suggests that the presence of a second damage somehow results in a more rigid complex than when an isolated 8-oxoG is present. However, the global rigidity of the tandem-damaged MutM:DNA complex can actually be counterproductive for repair since it has been evidenced that flexibility of the DNA strand is a key feature correlating with 8-oxoG removal29. This consideration is also further reinforced by the fact that conversely, the catalytic N-terminal residues are less involved in the MutM:DNA complex stability in the case of tandem-damaged nucleotides. This is also the case for the 211-234 loop region which is known to play a key-role in 8-oxoG extrusion. Hence, the presence of the Ap site alongside the 8-oxoG lesion impacts the canonical structural behavior of these two important MutM regions, which might also contribute to the lower repair efficiency. Our study provides an example of the predictive power of all-atom, MD simulations coupled to machine learning analysis, applied to a very challenging test-case. Indeed, the combination of oxidatively-generated DNA lesions embrace a combinatorial chemistry, with contrasted structural, mechanical and dynamic properties. Additionally, MutM/Fpg are very flexible proteins29, certainly difficult to properly sample. The efficiency of our protocols gives perspectives for its extension towards other tandem systems and the investigation of sequence effects12, 45, 46. Furthermore, the biological significance of rationalizing this complex scenario is also unquestionable. Indeed, ionizing radiations can be satisfactorily exploited in cancer therapy, and the inhibition of repair enzyme by combined chemotherapy can prove a most valuable synergy in assuring the accumulation of lesions necessary to reach the apoptosis threshold. Understanding of the molecular mechanisms underlying DNA repair is thus crucial for also offering novel perspectives for cancer research. Materials and Methods All-atom molecular dynamics simulations All MD simulations were performed with the Amber and Ambertools 2018 packages47. The starting X-ray structure of Bacillus stearothermophilus MutM was taken from the structure obtained by Verdine and coworkers21, PDB ID code 3GO8. The crystallographic self-complementary ds-DNA is a 13-bp sequence d(GTAGATCCGACG). (CGTCCGGATCT) featuring 8-oxoG as the 19th nucleobase (in bold). It should be noted that the β F-α 10 loop 217–237 of MutM, absent from the crystal structure, was reconstructed using Modeller. The zinc atom present in the zinc-finger motif of MutM was kept and described with parameters taken from the Zinc AMBER Force Field (ZAFF) developed by Merz and coworkers48. 19 potassium ions were added to neutralize the MutM:DNA complex, which was embedded in a 92x97x91 Å3 TIP3P water molecules bath. The Amber ff14SB49 was used throughout, including the bsc1 force field corrections for the DNA duplex50. The parameters for 8-oxoG and Ap site have been generated with a standard antechamber procedure embedded in Amber 1847, as described in previous references51–53 and in agreement to the literature. Four 10,000 steps minimization runs were carried out on the initial MutM:DNA complex, imposing restraints on the amino and nucleic acids, that were gradually decreased 8/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425536 from 20 to 0 kcal/mol/Å2 along the four runs. The temperature was then raised from 0 to 300K in a 50 ps thermalization step, and afterwards kept constant using the Langevin thermostat with a collision frequency γ ln of 1 ps−1. The system was subjected to a 1 ns equilibration run in the NPT ensemble. Finally, two replica of 1 µ s production run were performed to sample the conformational ensemble of the system. The Particle Mesh Ewald method was used to treat electrostatic interactions, with a 9.0 Å cutoff. The structural descriptors of the DNA helix were evaluated based on a post-processing analysis with Curves+32 and other distance and RMSD values were monitored using Ambertools. Multilayer Perceptrons analysis The Multilayer Perceptrons (MLP) is a fully connected artificial neural network (ANN) with one input layer, one output layer and at least one hidden layer. After tests, the architecture of the MLP was chosen to contain a single layer of 200 neurons to provide good accuracy. The rectified linear unit function (ReLU)54 was used for the activation of neurons, and the Adam algorithm was used for optimization. The inverse of the distances between the geometric centers of the residues were used as the input features for the multilayer perceptrons neural network, due to better overall performance over Cartesian coordinates, according to Fleetwood et al30. These internal coordinates were computed for all residue pairs and all frames. Each frame of the trajectories was labelled as either 1 or 0 according to whether the distance between the DNA lesion(s) and the protein is lower (bounded) or higher (non-bounded) than 10 Å. These sets of input features and labels were fed to the MLP classifier for training. Upon completion of the training, layerwise relevance propagation (LRP) was performed to find out the important features of the DNA/MutM interface. Acknowlegements Support from ENS de Lyon is gratefully acknowledged. This work was performed within the framework of the LABEX PRIMES (ANR-11-LABX-0063) of Université de Lyon, within the program "Investissements d’Avenir" (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). References 1. Basu, A. K. DNA damage, mutagenesis and cancer. Int. J. Mol. Sci. 19, 970, DOI: 10.3390/ijms19040970 (2018). 2. Cadet, J. & Davies, K. J. A. Oxidative DNA damage & repair: An introduction. Free. Radic. Biol. Medicine 106, 100–110, DOI: 10.1016/j.freeradbiomed.2017.02.017 (2017). 3. Chatterjee, N. & Walker, G. C. Mechanisms of DNA damage, repair, and mutagenesis. Environ. Mol. Mutagen. 58(5), 235–263, DOI: 10.1002/em.22087 (2017). 4. David, S. S., O’Shea, V. L. & Kundu, S. Base-excision repair of oxidative DNA damage. Nature 447, 941–950, DOI: 10.1038/nature05978 (2007). 5. Fortini, P. et al. 8-oxoguanine DNA damage: at the crossroad of alternative repair pathways. Mutat. Res. Mol. Mech. Mutagen. 531, 127 – 139, DOI: 10.1016/j.mrfmmm.2003.07.004 (2003). 6. Hong, I. S., Carter, K. N., Sato, K. & Greenberg, M. M. Characterization and mechanism of formation of tandem lesions in dna by a nucleobase peroxyl radical. J. Am. Chem. Soc. 129, 4089–4098, DOI: 10.1021/ja0692276 (2007). 7. Cadet, J. & Wagner, J. R. DNA base damage by reactive oxygen species, oxidizing agents, and uv radiation. Cold Spring Harb. Perspectives Biol. 5, DOI: 10.1101/cshperspect.a012559 (2013). 8. Bergeron, F., Auvré, F., Radicella, J. P. & Ravanat, J.-L. Ho• radicals induce an unexpected high proportion of tandem base lesions refractory to repair by DNA glycosylases. Proc. Natl. Acad. Sci. 107, 5528–5533, DOI: 10.1073/pnas.1000193107 (2010). 9. Georgakilas, A. G., O’Neill, P. & Stewart, R. D. Induction and repair of clustered DNA lesions: What do we know so far? Radiat. Res. 180, 100–109, DOI: 10.1667/RR3041.1 (2013). 10. Gattuso, H. et al. Repair rate of clustered abasic DNA lesions by human endonuclease: Molecular bases of sequence specificity. The J. Phys. Chem. Lett. 7, 3760–3765, DOI: 10.1021/acs.jpclett.6b01692 (2016). 11. Bignon, E. et al. Correlation of bistranded clustered abasic DNA lesion processing with structural and dynamic DNA helix distortion. Nucleic Acids Res. 44, 8588–8599, DOI: 10.1093/nar/gkw773 (2016). 12. Noguchi, M., Urushibara, A., Yokoya, A., O’Neill, P. & Shikazono, N. The mutagenic potential of 8-oxog/single strand break-containing clusters depends on their relative positions. Mutat. Res. Mol. Mech. Mutagen. 732, 34–42, DOI: 10.1016/j.mrfmmm.2011.12.009 (2012). 9/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint 10.3390/ijms19040970 10.1016/j.freeradbiomed.2017.02.017 10.1002/em.22087 10.1038/nature05978 10.1016/j.mrfmmm.2003.07.004 10.1021/ja0692276 10.1101/cshperspect.a012559 10.1073/pnas.1000193107 10.1667/RR3041.1 10.1021/acs.jpclett.6b01692 10.1093/nar/gkw773 10.1016/j.mrfmmm.2011.12.009 https://doi.org/10.1101/2021.01.06.425536 13. Morland, I. et al. Human DNA glycosylases of the bacterial fpg/mutm superfamily: an alternative pathway for the repair of 8-oxoguanine and other oxidation products in DNA. Nucleic acids research 30, 4926–4936, DOI: 10.1093/nar/gkf618 (2002). 14. Serre, L., Pereira de Jésus, K., Boiteux, S., Zelwer, C. & Castaing, B. Crystal structure of the lactococcus lactis formamidopyrimidine-DNA glycosylase bound to an abasic site analogue-containing DNA. The EMBO J. 21, 2854–2865, DOI: 10.1093/emboj/cdf304 (2002). 15. Amara, P., Serre, L., Castaing, B. & Thomas, A. Insights into the DNA repair process by the formamidopyrimidine-DNA glycosylase investigated by molecular dynamics. Protein Sci. 13, 2009–2021, DOI: 10.1110/ps.04772404 (2004). 16. La Rosa, G. & Zacharias, M. Global deformation facilitates flipping of damaged 8-oxo-guanine and guanine in DNA. Nucleic Acids Res. 44, 9591–9599, DOI: 10.1093/nar/gkw827 (2016). 17. Qi, Y., Spong, M. C., Nam, K., Karplus, M. & Verdine, G. L. Entrapment and structure of an extrahelical guanine attempting to enter the active site of a bacterial DNA glycosylase, mutm. J. Biol. Chem. 285, 1468–1478, DOI: 10.1074/ jbc.M109.069799 (2010). 18. Michaels, M. L., Pham, L., Cruz, C. & Miller, J. H. Mutm, a protein that prevents g c→t a transversions, is formamidopyrimidine-DNA glycosylase. Nucleic Acids Res. 19, 3629–3632, DOI: 10.1093/nar/19.13.3629 (1991). 19. Fromme, J. C. & Verdine, G. L. DNA lesion recognition by the bacterial repair enzyme mutm. J. Biol. Chem. 278, 51543–51548, DOI: 10.1074/jbc.M307768200 (2003). 20. Fromme, J. C. & Verdine, G. L. Structural insights into lesion recognition and repair by the bacterial 8-oxoguanine DNA glycosylase mutm. Nat Struct Mol Biol 9, 544–552, DOI: 10.1038/nsb809 (2002). 21. Qi, Y. et al. Encounter and extrusion of an intrahelical lesion by a DNA repair enzyme. Nature 462, 762–766, DOI: 10.1038/nature08561 (2009). 22. Li, H. et al. A dynamic checkpoint in oxidative lesion discrimination by formamidopyrimidine–DNA glycosylase. Nucleic Acids Res. 44, 683, DOI: 10.1093/nar/gkv1092 (2015). 23. Hazel, R. D., Tian, K. & de los Santos, C. Nmr solution structures of bistranded abasic site lesions in DNA. Biochemistry 47, 11909–11919, DOI: 10.1021/bi800950t (2008). 10.1021/bi800950t. 24. Fujimoto, H. et al. Molecular dynamics simulation of clustered DNA damage sites containing 8-oxoguanine and abasic site. J. Comput. Chem. 26, 788–798, DOI: 10.1002/jcc.20184 (2005). 25. Cleri, F., Landuzzi, F. & Blossey, R. Mechanical evolution of DNA double-strand breaks in the nucleosome. PLOS Comput. Biol. 14, 1–24, DOI: 10.1371/journal.pcbi.1006224 (2018). 26. Harrison, L., Hatahet, Z., Purmal, A. A. & Wallace, S. S. Multiply damaged sites in DNA: Interactions with Escherichia coli endonucleases III and VIII. Nucleic Acids Res. 26, 932–941, DOI: 10.1093/nar/26.4.932 (1998). 27. Pérez, A., Luque, F. J. & Orozco, M. Frontiers in molecular dynamics simulations of DNA. Accounts Chem. Res. 45, 196–205, DOI: 10.1021/ar2001217 (2012). 28. Amara, P. & Serre, L. Functional flexibility of bacillus stearothermophilus formamidopyrimidine DNA-glycosylase. DNA Repair 5, 947 – 958, DOI: 10.1016/j.{DNA}rep.2006.05.042 (2006). 29. Landová, B. & Šilhán, J. Conformational changes of DNA repair glycosylase mutm triggered by DNA binding. FEBS Lett. 594, 3032–3044, DOI: 10.1002/1873-3468.13876 (2020). 30. Fleetwood, O., Kasimova, M. A., Westerlund, A. M. & Delemotte, L. Molecular insights from conformational ensembles via machine learning. Biophys. J. 118, 765 – 780, DOI: 10.1016/j.bpj.2019.12.016 (2020). 31. Gilboa, R. et al. Structure of formamidopyrimidine-DNA glycosylase covalently complexed to DNA. J. Biol. Chem. 277, 19811–19816, DOI: 10.1074/jbc.M202058200 (2002). 32. Lavery, R., Moakher, M., Maddocks, J. H., Petkeviciute, D. & Zakrzewska, K. Conformational analysis of nucleic acids revisited: Curves+. Nucleic Acids Res. 37, 5917–5929, DOI: 10.1093/nar/gkp608 (2009). 33. Friedman, J. I. & Stivers, J. T. Detection of damaged DNA bases by DNA glycosylase enzymes. Biochemistry 49, 4957–4967, DOI: 10.1021/bi100593a (2010). 34. Sugahara, M. et al. Crystal structure of a repair enzyme of oxidatively damaged DNA, mutm (fpg), from an extreme thermophile, thermus thermophilus hb8. The EMBO J. 19, 3857–3869, DOI: 10.1093/emboj/19.15.3857 (2000). 10/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint 10.1093/nar/gkf618 10.1093/emboj/cdf304 10.1110/ps.04772404 10.1093/nar/gkw827 10.1074/jbc.M109.069799 10.1074/jbc.M109.069799 10.1093/nar/19.13.3629 10.1074/jbc.M307768200 10.1038/nsb809 10.1038/nature08561 10.1093/nar/gkv1092 10.1021/bi800950t 10.1021/bi800950t 10.1002/jcc.20184 10.1371/journal.pcbi.1006224 10.1093/nar/26.4.932 10.1021/ar2001217 10.1016/j.{DNA}rep.2006.05.042 10.1002/1873-3468.13876 10.1016/j.bpj.2019.12.016 10.1074/jbc.M202058200 10.1093/nar/gkp608 10.1021/bi100593a 10.1093/emboj/19.15.3857 https://doi.org/10.1101/2021.01.06.425536 35. Buchko, G. W., McAteer, K., Wallace, S. S. & Kennedy, M. A. Solution-state nmr investigation of DNA binding interactions in escherichia coli formamidopyrimidine-DNA glycosylase (fpg): a dynamic description of the DNA/protein interface. DNA Repair 4, 327 – 339, DOI: 10.1016/j.{DNA}rep.2004.09.012 (2005). 36. Brooks, S. C., Adhikary, S., Rubinson, E. H. & Eichman, B. F. Recent advances in the structural mechanisms of {DNA} glycosylases. Biochimica et Biophys. Acta (BBA) - Proteins Proteomics 1834, 247 – 271, DOI: 10.1016/j.bbapap.2012.10. 005 (2013). 37. Nelson, S. R., Dunn, A. R., Kathe, S. D., Warshaw, D. M. & Wallace, S. S. Two glycosylase families diffusively scan DNA using a wedge residue to probe for and identify oxidatively damaged bases. Proc. Natl. Acad. Sci. 111, E2091–E2099, DOI: 10.1073/pnas.1400386111 (2014). 38. Watanabe, R., Rahmanian, S. & Nikjoo, H. Spectrum of Radiation-Induced Clustered Non-DSB Damage – A Monte Carlo Track Structure Modeling and Calculations. Radiat. Res. 183, 525 – 540, DOI: 10.1667/RR13902.1 (2015). 39. Lomax, M. E., Cunniffe, S. & O’Neill, P. 8-OxoG retards the activity of the ligase III/XRCC1 complex during the repair of a single-strand break, when present within a clustered DNA damage site. DNA repair 3, 289–299, DOI: 10.1016/j.dnarep.2003.11.006 (2004). 40. Wood, M. L., Dizdaroglu, M., Gajewski, E. & Essigmann, J. M. Mechanistic studies of ionizing radiation and oxidative mutagenesis: genetic effects of a single 8-hydroxyguanine (7-hydro-8-oxoguanine) residue inserted at a unique site in a viral genome. Biochemistry 29, 7024–7032, DOI: 10.1021/bi00482a011 (1990). 41. Moriya, M. Single-stranded shuttle phagemid for mutagenesis studies in mammalian cells: 8-oxoguanine in DNA induces targeted GC –> TA transversions in simian kidney cells. Proc. Natl. Acad. Sci. 90, 1122–1126, DOI: 10.1073/pnas.90.3.1122 (1993). 42. Vignard, J., Mirey, G. & Salles, B. Ionizing-radiation induced DNA double-strand breaks: a direct and indirect lighting up. Radiother. Oncol. 108, 362–369, DOI: 10.1016/j.radonc.2013.06.013 (2013). 43. Thompson, L. H. Recognition, signaling, and repair of DNA double-strand breaks produced by ionizing radiation in mammalian cells: the molecular choreography. Mutat. Res. Mutat. Res. 751, 158–246, DOI: 10.1016/j.mrrev.2012.06.002 (2012). 44. Baskar, R., Lee, K. A., Yeo, R. & Yeoh, K.-W. Cancer and radiation therapy: current advances and future directions. Int. journal medical sciences 9, 193, DOI: 10.7150/ijms.3635 (2012). 45. Sassa, A., Beard, W. A., Prasad, R. & Wilson, S. H. DNA sequence context effects on the glycosylase activity of human 8-oxoguanine DNA glycosylase. J. Biol. Chem. 287, 36702–36710, DOI: 10.1074/jbc.M112.397786 (2012). 46. Sassa, A. & Odagiri, M. Understanding the sequence and structural context effects in oxidative DNA damage repair. DNA repair 93, 102906, DOI: 10.1016/j.dnarep.2020.102906 (2020). 47. Case, D. et al. Amber 2018: San francisco (2018). 48. Peters, M. B. et al. Structural survey of zinc-containing proteins and development of the zinc amber force field (zaff). J. Chem. Theory Comput. 6, 2935–2947, DOI: 10.1021/ct1002626 (2010). 49. Maier, J. A. et al. ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. J. chemical theory computation 11, 3696–3713, DOI: 10.1021/acs.jctc.5b00255 (2015). 50. Ivani, I. et al. Parmbsc1: a refined force field for DNA simulations. Nat. Methods 38, 55–58, DOI: 10.1038/nmeth.3658 (2016). 51. Bignon, E., Dršata, T., Morell, C., Lankaš, F. & Dumont, E. Interstrand cross-linking implies contrasting structural consequences for DNA: insights from molecular dynamics. Nucleic acids research 45, 2188–2195, DOI: 10.1093/nar/ gkw1253 (2017). 52. Bignon, E., Claerbout, V. E. P., Jiang, T. & Dumont, E. Nucleosomal embedding reshapes the dynamics of abasic sites. Sci. Reports 10, 17314, DOI: 10.1038/s41598-020-73997-y (2020). 53. Dumont, E. et al. Singlet oxygen attack on guanine: Reactivity and structural signature within the B-DNA helix. Chem. Eur. J. 22, 12358–12362, DOI: 10.1002/chem.201601287 (2016). 54. Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. vol. 15 of Proceedings of Machine Learning Research, 315–323 (JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 2011). 11/11 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425536doi: bioRxiv preprint 10.1016/j.{DNA}rep.2004.09.012 10.1016/j.bbapap.2012.10.005 10.1016/j.bbapap.2012.10.005 10.1073/pnas.1400386111 10.1667/RR13902.1 10.1016/j.dnarep.2003.11.006 10.1021/bi00482a011 10.1073/pnas.90.3.1122 10.1016/j.radonc.2013.06.013 10.1016/j.mrrev.2012.06.002 10.7150/ijms.3635 10.1074/jbc.M112.397786 10.1016/j.dnarep.2020.102906 10.1021/ct1002626 10.1021/acs.jctc.5b00255 10.1038/nmeth.3658 10.1093/nar/gkw1253 10.1093/nar/gkw1253 10.1038/s41598-020-73997-y 10.1002/chem.201601287 https://doi.org/10.1101/2021.01.06.425536 References 10_1101-2021_01_06_425610 ---- Coordination of phage genome degradation versus host genome protection by a bifunctional restriction-modification enzyme visualized by CryoEM Coordination of phage genome degradation versus host genome protection by a bifunctional restriction-modification enzyme visualized by CryoEM Betty W. Shen1, Joel D. Quispe2, Yvette Luyten3, Benjamin E. McGough4, Richard D. Morgan3 and Barry L. Stoddard1,* 1 Division of Basic Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. Seattle WA 98109 USA 2 Department of Biochemistry University of Washington Seattle WA 98195 USA 3 New England Biolabs 240 County Road Ipswich, MA 01938 USA 4 Scientific Computing Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. Seattle, WA 98109 USA * Corresponding author: bstoddar@fredhutch.org 206-667-4031 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 2 ABSTRACT Restriction enzymes that combine DNA methylation and cleavage activities into a single polypeptide or protein assemblage and that modify just one DNA strand for host protection are capable of more efficient adaptation towards novel target sites. However, they must solve the problem of discrimination between newly replicated and unmodified host sites (needing methylation) and invasive foreign site (needing to lead to cleavage). One solution to this problem might be that the activity that occurs at any given site is dictated by the oligomeric state of the bound enzyme. Methylation requires just a single bound site and is relatively slow, while cleavage requires that multiple unmethylated target sites (often found in incoming, foreign DNA) be brought together into an enzyme-DNA complex to license rapid cleavage. To validate and visualize the basis for such a mechanism, we have determined the catalytic behavior of a bifunctional Type IIL restriction-modification (‘RM’) enzyme (DrdV) and determined its high-resolution structure at several different stages of assembly and coordination with multiple bound DNA targets using CryoEM. The structures demonstrate a mechanism of cleavage by which an initial dimer is formed between two DNA-bound enzyme molecules, positioning the single endonuclease domain from each enzyme against the other’s DNA and requiring further oligomerization through differing protein-protein contacts of additional DNA-bound enzyme molecules to enable cleavage. The analysis explains how endonuclease activity is licensed by the presence of multiple target-containing DNA duplexes and provides a clear view of the assembly through 3D space of a DNA-bound RM enzyme ‘synapse’ that leads to rapid cleavage of foreign DNA. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 3 Bacterial restriction-modification (RM) systems are ubiquitous and highly diverse defense mechanisms that guard host cells against invasive DNA elements, particularly phage genomes (Halford, 2009; Loenen et al., 2014b; Roberts, 2005). RM systems pair two competing enzymatic activities: methylation of adenine or cytosine bases within a target site (which protects the host genome from degradation) versus cleavage of DNA within or at some distance from unmethylated copies of the same target site (which leads to degradation of foreign DNA). In combination with additional innate or ‘preprogrammed’ restriction mechanisms that also act on both host and invader genomes (such as the Pgl (Sumby and Smith, 2002), BREX (Goldfarb et al., 2015), DND (Xu et al., 2010) and Ssp (Xiong et al., 2020) defense systems) and complementary ‘adaptive’ nuclease systems (typified by reprogrammable CRISPR-associated nucleases (Koonin and Makarova, 2019)) RM systems represent an important form of antiviral defense in bacteria. RM systems are loosely divided into at least four major classes, based on their structural composition, biochemical activities and the relationship between their bound DNA targets and subsequent cleavage patterns (Loenen et al., 2014b). Type I and III RM systems contain ATP-dependent translocase domains or subunits that bring together multiple subunits into a DNA-bound protein collision complex or synapse, resulting in cleavage either near (Type III) or at some random distance (Type I) from their target sites (Loenen et al., 2014a; Rao et al., 2014). In contrast, Type II systems do not contain or utilize ATP-dependent motors for motion and activity (Pingoud et al., 2014). Instead, they rely either on the parallel activities of stand-alone methyltransferase (MTase) and endonuclease (Endo) enzymes that independently recognize the same DNA target, or on the physical coupling of methylation and endonuclease domains within a single protein chain or a larger multimeric assemblage, so that both functions are simultaneously targeted by a single DNA recognition module. Type IV endonucleases behave similarly to Type II enzymes, but cleave methylated, rather than unmethylated DNA targets (enabling a bacterial response against phage that methylate their own DNA target to evade restriction endonuclease activity) (Loenen and Raleigh, 2014). RM systems that use a common DNA recognition module to simultaneously target their competing DNA methylation and cleavage activities have the advantage of facilitating the evolution of new DNA specificity (Morgan and Luyten, 2009), since any alteration in DNA targeting will concurrently alter the specificity of host protective methylation and restrictive cleavage of invading DNAs. Many such RM systems modify just one DNA strand within their asymmetric recognition motif, allowing these systems to employ a single DNA recognition module and MTase domain. This presents a distinct challenge, as DNA replication produces one daughter DNA with no protective methylation. Such systems must therefore solve the problem of discrimination between self (which should be methylated and protected from cleavage) versus non-self (which should be rapidly cleaved and degraded). Systems that communicate between two or more sites within a DNA molecule through 1D translocation, such as the Type III and Type ISP systems, solve the problem by requiring sites be in a head to head orientation to license cleavage, since this effectively places methylation in both strands. However, there are numerous Type IIL systems that do not have a translocase function and cut sites without regard to their orientation. How these avoid self-cutting while maintaining (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 4 sufficient restriction of invading DNAs to provide a selective advantage to the host has been an open question. One reasonable (and frequently postulated) solution to that challenge is to (1) ensure that cleavage is significantly faster than methylation, while also requiring that (2) multiple unmethylated DNA target sites be brought together into an enzyme-DNA complex before cleavage is licensed to occur. As a result, whereas an encounter with foreign DNA (typically harboring multiple unprotected sites) would lead to rapid cutting at multiple positions, an encounter between an RM enzyme and one or two unmethylated target site(s) on the host would result in eventual DNA methylation and release of bound enzyme. A variety of structures of RM enzymes that combine their methylation and cleavage domains and activities into single protein chains or complexes have been solved in the presence and absence of bound DNA. These include: (1) Two single-chain Type IIL enzymes (MmeI (Callahan et al., 2016) and BpuSI (Shen et al., 2011), that each contain an N-terminal nuclease domain followed by methyltransferase (MTase) and target recognition domains (TRDs)). (2) A pair of single-chain Type ISP enzymes (LlaGI and LlaBIII, that each incorporate an additional RecA-like ATPase domain into their structures (Chand et al., 2015; Šišáková et al., 2013)). (3) A crystal structure for a complex of EcoP15I (a Type III multichain complex containing two MTase subunits and an Endonuclease subunit) bound to DNA (Gupta et al., 2015). (4) CryoEM structures of the Type I enzyme EcoR124I (a multichain complex containing multiple nuclease- translocase, methyltransferase and specificity subunits) bound to DNA (Gao et al., 2020). That recent analysis built upon lower-resolution models of DNA-bound ‘M2S’ subcomplexes of that same enzyme, as well as those of EcoKI and TteI (Kennaway et al., 2009; Kennaway et al., 2012). (5) A Type IV methyl-dependent restriction endonuclease, MspJI, in a tetrameric complex of MspJI bound to DNA (Horton et al., 2012) (although the Type IV do not have a MTase domain, the MspJI tetrameric complex is relevant to this study). Collectively, these analyses have provided considerable insight into the domain organizations, structural dynamics, DNA recognition specificity, and (for the type ISP LlaGI and LlaBIII enzymes) a unique mechanism of translocation and subsequent cleavage of DNA (Chand et al., 2015; Šišáková et al., 2013). However, a high-resolution structure of a multimeric RM enzyme system engaged in simultaneous recognition complexes with multiple DNA targets (with the methyltransferase and nuclease domains each properly positioned for competing reaction outcomes) has not yet been described. DrdV is a single-chain, type IIL restriction-modification enzyme of length 1029 residues that recognizes the asymmetric DNA target site 5’ CATGNAC 3’ and methylates an adenine (bold and underlined) in one strand, leading to host protection. It contains an N-terminal nuclease domain, a helical connector region followed by a methyltransferase domain, and a C-terminal target recognition domain (TRD). When bound to its DNA target, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 5 it either methylates the underlined adenine within the target or cleaves the top and bottom strand of foreign DNA precisely 10 and 8 basepairs downstream of the target site. In this study, we use CryoEM analysis and supporting biochemical experiments to visualize the stepwise formation of a tetrameric assemblage of DrdV in complex with independently bound DNA target sites. The analysis illustrates the structural basis for generation of an active endonuclease complex of bound enzymes and the basis of crosstalk and cooperativity between multiple copies of the enzyme and bound DNA. METHODS Protein expression and purification. The gene encoding the DrdV RM system (AXG99744.1) was PCR amplified from Deinococcus wulumuqiensis 479 genomic DNA using Q5 hot start high-fidelity DNA polymerase and cloned into the T7 expression-based vector pSAPV6 (Samuelson et al., 2004) using the NEBuilder HiFi DNA Assembly master mix reaction protocol (New England Biolabs, Ipswich, MA). The plasmid construct was confirmed by DNA sequencing of the DrdV gene and flanking vector sequence. The verified plasmid construct was transformed and expressed in the E. coli host ER3081 (F- l- fhuA2 lacZ::T7 gene1 [lon] ompT gal attB::(pCD13-lysY, lacIq) sulA11 R(mcr-73::miniTn10–TetS)2 [dcm] R(zgb- 210::Tn10 –TetS) endA1 D(mcrC-mrr)114::IS10). DrdV endonuclease was purified from 425 g of cells grown at 30°C in Rich media supplemented with 2% glycerol and 0.2% glucose and containing 30µg/ml chloramphenicol. Cells were induced at a final concentration of 0.4mM IPTG and grown for an additional 3 hours at 30°C before harvest. Cells were resuspended in 3 volumes DEAE buffer (300mM NaCl, 50mM Tris pH8, 0.1mM EDTA, 1mM DTT, 5% glycerol), lysed using Microfluidics microfluidizer M110EH (Microfluidics, Westwood, MA) and cell debris removed by centrifugation at 15,000xg for 40min. DrdV endonuclease was purified to near homogeneity via four sequential chromatographic steps: DEAE anion exchange, Heparin HyperD, Source 15Q, and Source 15S (Supplemental Figure S1, panel a). The clarified lysate (1375 mL) was first applied to DEAE (200 mL column bed volume, pH 8.0) and then washed with 2 column volumes (400 mL) of DEAE buffer. The flow-through and wash were pooled (2145 mL), diluted with no-salt DEAE buffer (50 mM Tris pH8, 0.1 mM EDTA, 1 mM DTT, 5% glycerol) to a final NaCl concentration of 150 mM, and applied to a heparin HyperD column (pH 8.0). A salt gradient was run from 150 mM to 1000 mM NaCl and 25 mL fractions were collected. DrdV eluted across fractions 40 to 50 (275 mL total volume). Those fractions were diluted to 55 mM NaCl and applied to a SourceQ column, and the protein eluted via a salt gradient while collecting 22 mL fractions. DrdV eluted across fractions 17 to 22 (132 mL total volume; 2700 mg total protein). The fractions were pooled, diluted to 50 mM NaCl and applied to a Source S column at pH 8.0, and eluted via a salt gradient into 20 mL fractions. DrdV eluted across fractions 13 to 22 (120 mL total volume). They were pooled and dialyzed into storage buffer (250mM NaCl, 10mM Tris pH 9.0, 0.1 mM EDTA, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 6 1 mM DTT, 50% glycerol). 1200 milligrams (1.2 grams) of purified protein was stored at a final concentration of 20 mg/mL. Analytical size exclusion chromatography (SEC) demonstrated that the purified DrdV protein eluted at a volume corresponding to an approximate molecular weight of approximately 100 kilodaltons, corresponding to a monomer in solution. Upon incubation, in the presence of calcium, with an equimolar amount of a double stranded DNA (dsDNA) duplex containing a single enzyme target site ( consisting of a top stranded with sequence 5’ -CAGCCCATGGACCCAGAACCAC/CCACC-3’ (underline = target site; “/” = cut site) and its complementary bottom strand with sequence 3’- GTCGGGTACCTGGGTCTTGG/TGGGTGG 5’), the protein co-eluted with the DNA at a volume corresponding to an approximate molecular weight of 540 kilodaltons, suggesting the formation of a tetrameric enzyme-DNA complex (Supplemental Figure S1, panel b). DrdV endonuclease and methyltransferase assays. Endonuclease activity was assayed in NEBuffer 4 (20mM Tris-acetate, pH7.9, 10mM magnesium acetate, 50 mM potassium acetate, 1 mM DTT) supplemented with 80 µM S-adenosyl-methionine (AdoMet), typically using 1 µg DNA substrate per 50 µl reaction volume at 37°C. Reactions were terminated by adding stop solution containing 0.08% SDS (NEB Gel Loading Dye, Purple) and DNA fragments were analyzed by electrophoresis in agarose gels. Methyltransferase activity was assayed in the same buffer, supplemented with 12.5 mM EDTA (to remove Mg++) and 80 µM AdoMet. Cleavage assays that illustrated the trans-activation of the endonuclease via the addition of dsDNA harboring the enzyme’s target site were performed in the presence of an added oligonucleotide (sequence 5’- GTGCTCAGGTCCATGAGCGAGTCTTTTGACTCGCTCATGGACCTGAGCACTC -3’) that forms a short hairpin double-stranded DNA duplex (Figure 1) containing the CATGNAC recognition site (top and bottom strands of the target corresponding to the underlined bases in the sequence shown) with 8 basepairs upstream (5’) an 8 basepairs downstream (3’) of the target, terminating immediately prior to the site of DNA cutting. Structural visualization via electron microscopy. The protein-DNA complexes were initially evaluated by negative-stained TEM (Supplemental Figure S1, panels c, d, e) followed by screening cooling and vitrification conditions and initial data collection using a GLACIOS 200kV electron microscope (Supplemental Figure S2). A subsequent data set was collected on a KRIOS electron microscope (Supplemental Figure S3). All data preprocessing, which include motion correction, ctf estimation, and exposure curation, as well as 2D particle curations, 3D model generation/refinement, and post refinement were performed using the software package cryoSPARC (Punjani et al., 2017). For each movie stack, the frames were aligned for beam-induced motion correction using Patch-motion-correction. Patch-CTF was used to determine the contrast transfer function parameters. Bad movies were eliminated based on a CTF-fit resolution cut off at 5Å and relative ice thickness of 1.2 estimated from the CTF function by cryoSPARC2. Different particle picking algorithms, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 7 including manual pick, template-based and blob picking were employed to the same dataset and results on model distribution were compared. The evaluation of the density map at all stages and initial fitting of the Phyre2(Kelley et al., 2015) predicted model to the final density map were accomplished in Chimera (Pettersen et al., 2004). The final structures were built and refined with program COOT (Emsley et al., 2010). I. Negative stain transmission electron microscopy (TEM). Negative-stain grids (Supplemental Figure S1, panel c) were prepared by the application of 4 µl of SEC purified samples to a glow discharged uniform carbon film coated grid. The particles were allowed to adsorb to the surface for 30 to 60 seconds. Excess solution was wicked away by briefly touching the edge of a filter paper. The grid was quickly washed three times with 20 µl drops of water and once with a drop of 20 µl 0.5% uranyl formate (UF) followed by staining for ~20 second with a 40 µl UF. The grids were air-dried for at least 2 hours prior to inspection on an in-house JEOL JM1400 microscope (operating at 120 kV) equipped with a Gatan Rio 4kx4k CMOS detector. Both DNA free DrdV and DrdV/DNA complex distributed homogeneously in random orientations over the surface of the carbon film. A small dataset of 126 micrographs was collected using the automated data collection package Leginon (Suloway et al., 2005) from the negative-stained specimen at a pixel size of 1.6Å on a FEI Tecnai Spirit electron microscope (operating at 120 kV) equipped with a Gatan 4k x 4k CCD detector. Initially 2800 particles were hand-picked from all 126 micrographs of the negative stained dataset and subjected to reference free 2D classification. Six out of ten of 2D class averages (Supplemental Figure S1, panel d) from 2D classification were used to reconstruct a four-loped volume with imperfect two-fold symmetry. Homogenous refinement with C1 or C2 symmetry yield envelopes at approximately 17 Å and 14 Å resolution at a Gold Standard Fourier Shell Correlation (GSFSC) of 0.143, respectively. (Supplemental Figure S1, panel e). II. Initial CryoEM Screening and analyses (see Supplemental Figure S2). Using the same protein-DNA complex, CryoEM grids were prepared by applying 3 μL of a DrdV-DNA complex with an absorbance of 0.55 OD at 280 nm (approximately 0.25 mg/ml protein based on quantitative SDS-PAGE analysis) to a glow- discharged Quantifoil1.2/1.3 holey carbon film coated copper grid, which was blotted for 5.0 s and plunge- frozen in liquid ethane using an FEI Vitrobot Mark IV. Screening datasets with a total of 1410 movies were collected from two separate grids on a GLACIOS electron-microscope (operating at 200 kV) equipped with a Gatan K2-Summit direct electron detector at a pixel size of 1.16 Å. The same six selected 2D classes of the negative stained particles (Supplemental Figure S1, panel d) were used as templates for automatic templated particle picking from a total of 1220 movies after frames were aligned and manual exposure curation. After “inspect particle picks” and “local motion correction”, 354364 out of 703210 particles were accepted for 2D classification. After 3 rounds of particle curation, 82768 particles from 26 selected classes (out of 100) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 8 were used for ab-initio reconstruction of one unique 3D model. This initial 3D cryoEM reconstruction of DrdV- DNA complex showed an asymmetric particle with three rather than four lobes as shown for the negative- stained particles. which ruled out higher symmetry of C2. Hence all further refinement processes were performed with C1 symmetry to avoid biased interpretation of the resulting maps, resulting with a trimer map of 3.25Å at a GSFSC of 0.143 between the two half maps. 3D variability analysis (Punjani and Fleet, 2020) of this map showed that the most prominent component arises from the association/dissociation of a third protomer to a dimer core (Supplemental Movie 1). Ab-initio 3D reconstruction with four models revealed the presence of a dimer, a partial trimer with an ill-defined dimer core, and a full-trimer (at 28.4%, 23.8% and 41.8%, respectively) plus a small percentage of smaller fragments (6.0%). Homogeneous, nonuniform refinement followed by Local refinement resulted in a map at 3.3Å for a dimer and a map at 3.4Å for a full trimer (Supplemental Figure S2 and Figure 2, panel a). III. Final CryoEM data collection (see Supplemental Figure S3). A final dataset was collected at the Pacific Northwest Center for CryoEM (PNCC) using a vitrified grid prepared with the complex at a final concentration of ~0.4 mg/ml protein (diluted immediately before application to the grid from a stock solution at ~1.6 mg/ml protein) on a Quantifoil1.2/1.3, 200 mesh copper grid, using a Titan KRIOS electron microscope (FEI) operating at 300 kV, equipped with a Gatan K3 direct electron detector and an energy filter (operated with a slit width of 20 eV) at a super resolution pixel size of 0.5318Å. The data was binned by a factor of two to a pixel size of 1.064Å. After preprocessing (motion correction, ctf estimate and manual exposure curation), 3927 micrographs were accepted from a total of 4300 movies. An automated ‘Blob Picker’ algorithm with maximum and minimum diameters of 240 Å and 110 Å was used for particle picking. After inspection and three rounds of particle curation, 376599 particles were selected for 3D reconstruction and refinements. Ab- initio 3D reconstruction with three models showed that the dataset contained three different classes – trimers (~34%), tetramers (~50%), and a higher molecular aggregate (~15%) that could be the result of the addition of extra monomers to the tetramers or the result of close contact of neighboring particles. Dimers were absent in the PNCC dataset which was prepared from a stock solution at much higher concentration and diluted immediately prior to the preparation of the grids. After homogeneous and non-uniform refinement, the refined tetramer class was further refined after Local- and Global-CTF refinement of the particles, which led to a final resolution of 2.73 Å at a GSFSC of 0.143 between the two half maps. 3D variability display showed that the association-disassociation of a fourth component to the trimer is the most prominent contribution to the 3D variability, thus the particles under the tetramer class (Supplemental Figure S3, tetramer I) were further classified via a second round of ab-initio 3D reconstruction with three models, resulting in a full trimer (45.2%), a tetramer (50.2 %) and a class of small fragments (4.6%). Final refinements of the trimer and tetramer led to a resolution of better than 2.9Å at a GSFSC of 0.143 for both 3D classes (Supplemental Figure S3). Even though the nominal resolution of Tetramer I is higher than Tetramer II, the quality of the latter is actually superior, especially the Mtase and TRD domains and DNA of the fourth protomer. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 9 The density maps of all the enzyme-DNA complexes, visualized at three unique stages of assembly (dimers, trimers and full tetramers) each allowed unambiguous placement of individual protein chains, each containing all 1029 residues with the exception of a short surface exposure loop in the methyltransferase domain (residue 412 – 421) (Supplemental Figure S4), as well as bound copies of the DNA duplex, a bound SAM cofactor, a base-flipped adenine nucleotide in each methyltransferase active site, and calcium ions associated with each endonuclease domain in contact with a DNA strand and scissile phosphate. The map of the tetramers indicated a reduced occupancy for one of the subunits and its bound DNA duplex. Results and Discussion Biochemical activity assays. A series of in vitro biochemical analyses (Figure 1) of DrdV activity demonstrate that the DrdV enzyme displays mechanistic behavior described above, that is believed to lead to different reaction outcomes against ‘self’ versus ‘foreign’ DNA targets: much slower methylation than cleavage, strong activation of cleavage via binding of multiple DNA targets, and coordinated, near- simultaneous cleavage of both strands in multiple target sites within a DNA substrate. (i) Cleavage of unmethylated targets by DrdV is significantly faster than the rate of host-protective methylation (Figure 1a). In a series of in vitro incubations with a standard multisite substrate (lambda DNA), DNA cleavage is nearly complete within 1 to 5 minutes, whereas complete methylation of the same substrate under similar conditions (except for the absence of Mg++ to prevent DNA cleavage) requires up to 16 hours. (ii) DrdV requires multiple sites for efficient, high fidelity cleavage. DrdV cleaves a DNA substrate containing a single target site (a pUC19 plasmid with a DrdV target site added at position 1680) incompletely, cutting only around 20% of the DNA even with an 8-fold excess of enzyme (Figure 1b. left). At higher excess enzyme, star activity (cleavage at closely related non-cognate sites) appears as DrdV begins to make additional, though very partial, cuts at near-cognate sites. In contrast, cleavage activity towards the same plasmid substrate is significantly increased (and off-target star activity is reduced) by supplying a short dsDNA hairpin oligonucleotide that contains the DrdV recognition site in trans (Figure 1b, right). Maximum stimulation is achieved when the oligo is supplied at a ratio of between 1:1 to approximately 6:1 to DrdV enzyme molecules, with enzyme molecules in excess to the substrate target sites to be cut. (iii) DrdV simultaneously cleaves both DNA strands, downstream of the target site, in a coordinated manner. DrdV digestion of a circular DNA substrate (pBR322 plasmid) containing multiple target sites showed little accumulation of the nicked open circular DNA form (Figure 1c), indicating that cleavage events occur in a coordinated reaction with both strands cleaved in a nearly concerted event. The plasmids were initially cut both to linear fragments cut at one site only and to fragments cut at two sites, indicating cleavage can occur at just one site or at two sites in a coordinated manner. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 10 CryoEM Structural analysis. The purified enzyme eluted with an apparent mass of 118 kD from a final size exclusion column. Upon incubation with an equimolar ratio of a DNA duplex containing a single copy of its DNA target site, the protein co-eluted with the DNA over a sharp peak centered at an estimated molecular weight of approximately 540 kD. That sample was used for negative stain EM studies and single particle reconstruction, resulting in a molecular envelope corresponding to an asymmetric tetrameric assemblage with a pair of pseudo-orthogonal dyad symmetry axes, of approximate size 200 x 220 x 100 Å (Supplemental Figure S1e). In addition to these largest particles, smaller particles corresponding to intermediate bi- and tri- lobed assemblages were also observed. We interpreted this result as potentially representing a population of enzyme tetramers bound to multiple DNA targets, interspersed with smaller dimeric and trimeric enzyme-DNA assemblages. The subsequent CryoEM single particle reconstructions showed that DrdV-DNA complexes undergo a concentration dependent oligomerization producing density maps with two, three and four lopes corresponding to dimer, trimer and tetramer and allowed unambiguous placement and subsequent building and refinement of unique enzyme-DNA complexes containing two, three or four protein subunits. All particles contain a highly homologous dimeric core with one or two extra protomers at either side of the trimer and tetramer (Figure 2, Supplemental Figures S2 and S3, Supplemental Movies). The final maps (individually corresponding to 3.5 to 2.8 Å resolution) provided well-resolved features that allowed unambiguous modeling of secondary structure elements and corresponding side chain positions across four sequential functional regions and folded domains within each protein subunit (an N-terminal endonuclease domain, an alpha-helical connector, a methyltransferase domain and a C-terminal target recognition domain, or ‘TRD’) of the enzyme (Figure 2, and Supplemental movie 3). The relative local resolution distribution of all three density maps on the same scale and the sequence of a DrdV subunit with corresponding secondary structures are shown in Supplemental Figure S4. The description of the enzyme-DNA complex features provided below is derived from the density map of the largest observed enzyme assemblage. Those points are also observed in the structures of the dimeric and trimeric species (solved and refined independently), with the exception of small conformational changes that appear to accompany the stepwise addition of the third and fourth enzyme subunits. The relative domain orientations within a single DNA-bound enzyme subunit and its interactions with its target site (observed in all of the structures) is illustrated in Figure 3. The DNA target site (numbered according to their position in the target site, i.e. ‘5 - C1A2T3G4G5A6C7 - 3’; Figure 3a) is bound in a cleft between the methyltransferase domain (MTase, residues 295-635) and target recognition domain (TRD, residues 636- 1029) (Figure 3b). The adenine base at position 6 (‘A6’) is flipped into the methyltransferase active site and positioned near a bound molecule of S-adenosyl-methionine (‘SAM’ or ‘AdoMet’) (Figure 3d,e). Sequence- specific base contacts by enzyme side chains are observed to six (out of the seven) base pairs within the target site, with only the guanine at the fifth position in the target site (where any basepair is tolerated by the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 11 enzyme) excluded from direct readout (Figure 3 and Supplemental Figure S5). Individual side chain contacts to the target bases are formed both by the MTase domain (N448, Q485, K486, K488, N548, R554, D564) and by the TRD (N673, R721, Y764, D803, K807). The flipped adenine base is bracketed by p-stacking with F304 and Y451 and forms additional contacts with F562 and N448. The space vacated by the flipped adenine A6 is occupied by N548 and K567 from the side of the major groove. Within the DNA-enzyme complex formed by a single DrdV subunit as described above, the endonuclease domain is not in contact with the bound DNA duplex (Figure 3bc). Instead, it contacts the top strand and the corresponding scissile phosphate in the DNA duplex bound by the opposing subunit within a central dimeric enzyme assemblage (Figure 4ab). The endonuclease domain from the opposing enzyme subunit is similarly domain swapped; both endonuclease domains are properly positioned to cleave the top strand of the DNA duplex that is engaged by the opposing enzyme subunit (Figure 4c). The enzyme dimer displays an extensive buried interface between the two helical connector domains, largely composed of a pair of buried, symmetrically equivalent electrostatic networks and surrounding hydrophobic and hydrogen-bonded contacts with neighboring residues (Figure 4d). Within each network, a cluster of three acidic residues from one subunit (D224, E229 and D230) is engaged with a corresponding cluster of three basic residues form the opposing subunit (K251, R252 and K259), thereby bringing together at least 12 opposing charged residues. This electrostatic network is augmented by two patches of electrostatic p-stackings between R252 of one subunit and Q27 andY396 of a second subunit and vice versa. In fact, Y396 of the MTase domain is the one and only residue outside of the helical connecter domains that is involved in the interactions between the two subunits in the core dimer. In each endonuclease-DNA interface (Figure 4c) the scissile phosphate is engaged in contacts with a divalent metal ion (a calcium, which was present in the enzyme buffer to prevent cleavage) complexed by a pair of conserved acidic residues (D64 and E79). A neighboring lysine residue (K94) and nearby additional glutamic acid (E25) complete the nuclease active site. The conserved lysine of the canonical PD-ExK endonuclease motif (K81, mutation of which abolishes catalysis) is also positioned near the scissile phosphate where it could participate in catalysis upon adopting a different rotamer than that observed with calcium present in the structure. The dimeric assemblage of enzyme-DNA complexes described above is further augmented by additional bound DrdV subunits, forming DNA-bound trimeric and tetrameric complexes (Figure 1 and Figure 5). (In the tetrameric particles, the fourth and final subunit displays partial, sub-stoichiometric occupancy). In those structures, the additional enzyme subunits are positioned on either side of the dimer described above, via an additional protein-protein interface between two endonuclease domains (Figure 5a)). This interface is again composed primarily of a pair of symmetry-related clusters of opposing charged residues (Figure 5b), each of which corresponds to K8 and D15 from one subunit forming a pair of buried electrostatic contacts with E41 and R46 from the opposing subunit. The additional endonuclease domain also contacts an extension from the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 12 TRD domain of the enzyme bound to the DNA strand being cleaved. The dimerization of the two endonuclease domains places a pair of active sites in alignment with the appropriate scissile phosphates of each strand in a DNA duplex, which in turn allows the enzyme to generate a double strand-break (corresponding to a 2-base 3’ product overhang) downstream of the bound target site Figure 5a box and inset). As a result of the assembly and coordination of the cleavage complex described above, not less than three individual protein subunits and two bound DNA targets are required in order to form all contacts necessary to cleave a single bound DNA duplex (Figure 5c) and four protein subunits are required in order to simultaneously cleave two DNA duplexes. Discussion and Conclusions. The biochemical activities and corresponding structures presented in this study reinforce (and illustrate a structural basis for) the concept that an invading DNA such as a phage genome, that presents multiple unmodified sites in a single construct, will be rapidly cleaved whereas the generation of individual unmodified sites in one daughter chromosome following replication would favor modification, as there is less chance to assemble multiple site-bound molecules into cleavage competent complexes. The inefficient cleavage of single site substrates, and activation by specific DNA target sites in trans, indicates DrdV must interact with multiple sites to achieve rapid and efficient DNA cleavage. The structures of DrdV described here (in several stages of assembly with multiple bound DNA targets) offer a sequential view of such an enzyme before and during each stage of DNA search, encounter, coordination and cleavage. When considered alongside previous crystallographic structures of two related Type IIL enzymes at earlier points in their action (BpuSI, solved in the absence of a bound DNA target (Shen et al., 2011) and MmeI, bound to a single copy of its DNA target, in a monomeric complex (Callahan et al., 2016)), a rather complete picture of the functional cycle and mechanism for such bifunctional R-M enzymes seems to emerge. In those earlier crystal structures, the N-terminal endonuclease domain in unbound BpuSI was found to be well-resolved and packed against the interface between its downstream MTase and TRD domains, in a manner that would require its release in order to bind DNA, effectively sequestering the endonuclease catalytic site to prevent any DNA cutting. In contrast, the endonuclease domain in the DNA-bound MmeI enzyme was unobservable (and presumably displaying considerable motion and flexibility), suggesting release from the sequestered position and search for a partner upon initial recognition and engagement of its specific target site. Like DrdV, MmeI requires multiple sites for cutting and is stimulated by in trans DNA containing a recognition site. A simple model of Type IIL enzyme function in a cell would be one in which the apo enzyme is in an inactive (endonuclease-sequestered) state that scans DNA. Upon encounter of its specific recognition motif, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 13 the enzyme engages in a tight, long-lived complex that releases the endonuclease domain, but the endonuclease domain is not in contact with the DNA bound by the enzyme. Target recognition and binding would then be followed by a kinetic competition between two outcomes: eventual methylation and enzyme release or encounter and capture of an additional target-bound enzyme subunit to form a dimer with exchange of DNA helices to the partner's endonuclease domain. Formation of the central dimer is again followed by a kinetic competition between two outcomes: eventual methylation and enzyme release if no additional DNA-bound partners are encountered, or encounter and capture of an additional target-bound enzyme subunit or two to form a catalytically competent trimer or tetramer, leading to rapid cleavage of the two DNAs bound by the central dimer subunits. The structural analyses presented here also demonstrate that relatively little conformational difference exists between the two core DNA-bound DrdV subunits present in the central dimer particles, and the two additional DNA-bound enzyme subunits that bind to the dimeric assemblage through their endonuclease domains to form the cleavage competent complex. However, examination of differences between those structures does indicate that the formation of endonuclease dimers at each DNA cleavage site (corresponding to the conversion from DNA-bound dimers to larger trimeric and/or tetrameric complexes) is accompanied by observable deformation of the DNA substrates as part of the cleavage mechanism within each bound DNA duplex, and a hinged rigid body rotation of the endonuclease domain by approximately 30o in the catalytic partner subunits relative to that in the central dimer. This rotation highlights the importance of dynamic flexibility of the endonuclease domain relative to the MTase and TRD portion of the protein. The MTase and DNA recognition domains of DrdV and those of the Type ISP enzymes LlaGI and LlaBIII (Kulkarni et al., 2016) are highly similar, indicating evolution from a common origin, yet the way DNA restrictive cleavage is achieved and controlled between these Type IIL and Type ISP RM systems is quite different. The Type ISP license their endonuclease for cutting through collision encounter between enzymes translocating on the DNA in the opposite direction from inverted recognition sites. Their endonuclease domains never actually encounter one another, but simply nick one strand of the DNA multiples times on either side of the collision complex, eventually leading to double strand breaks when the nicks occur close together as the enzymes move against one another. In stark contrast, DrdV remains bound to its recognition site and recruits additional DNA-bound enzyme molecules, first to form a non-catalytic dimer using one set of protein contacts between their linker and methylase domains that positions the endonuclease of each subunit against the DNA of the other, and then to form catalytic complexes through a different set of protein contacts, largely between endonuclease domains, to bring two endonuclease catalytic centers together for double strand cleavage at a fixed distance from the bound recognition site. This implies the endonuclease domains of Type IIL enzymes are under greater evolutionary pressure and corresponding sequence constraint as they form multiple protein-protein contacts as well as positioning the catalytic center for cutting. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 14 Restriction-modification systems such as those that rely on recognition and cleavage of specific target sequences in foreign DNA are complemented by additional phage restriction mechanisms and systems (such as the Pgl (Sumby and Smith, 2002), BREX (Goldfarb et al., 2015), DND (Xu et al., 2010) and Ssp (Xiong et al., 2020) systems) that also utilize a site-specific protective activity (usually a methyltransferase) to again protect the bacterial genome from self-destruction. The exact manner in which the self-modifying protective activity is employed differs between systems (for example, the methyltransferase activity in the Pgl and BREX systems requires the presence of one or more additional protein factors in order to methylate host DNA). Regardless, the observations described here, which demonstrate the basis of at least one mechanism by which protective versus destructive activities in a restriction system can be biased towards self and foreign, respectively, may be reflected (with many possible variations on a theme) within a wide range of alternative forms of cellular defense. Acknowledgements This work was supported by NIH grant R01 GM105691 to BLS, by an Amazon Cloud Credit to BWS, by the Fred Hutchinson Cancer Research Center, and by New England Biolabs. A portion of this research was supported by NIH grant U24GM129547 and performed at the PNCC at OHSU and accessed through EMSL (grid.436923.9), a DOE Office of Science User Facility sponsored by the Office of Biological and Environmental Research. We thank Justin Kollman and David Veesler at the University of Washington for advice and assistance, Janette Myer for Krios data collection, Jeff Tucker and Dan Tenenbaum for assistance in AWS EC2 setup, Melody Campbell for critical reading of the manuscript, and Sue Biggins and Richard Roberts for support, encouragement and advice. Competing Interests Statement YL and RDM are employees of New England Biolab, a manufacturer of reagents, enzymes and tools for molecular biology. The enzyme described in this study, and/or ones similar to it, are commercial products produced by NEB. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 15 REFERENCES Callahan, S.J., Luyten, Y.A., Gupta, Y.K., Wilson, G.G., Roberts, R.J., Morgan, R.D., and Aggarwal, A.K. (2016). Structure of Type IIL Restriction-Modification Enzyme MmeI in Complex with DNA Has Implications for Engineering New Specificities. PLoS biology 14, e1002442. Chand, M.K., Nirwan, N., Diffin, F.M., van Aelst, K., Kulkarni, M., Pernstich, C., Szczelkun, M.D., and Saikrishnan, K. (2015). Translocation-coupled DNA cleavage by the Type ISP restriction-modification enzymes. Nature chemical biology 11, 870-877. Emsley, P., Lohkamp, B., Scott, W.G., and Cowtan, K. (2010). Features and development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486-501. Gao, Y., Cao, D., Zhu, J., Feng, H., Luo, X., Liu, S., Yan, X.X., Zhang, X., and Gao, P. (2020). Structural insights into assembly, operation and inhibition of a type I restriction-modification system. Nature microbiology. Goldfarb, T., Sberro, H., Weinstock, E., Cohen, O., Doron, S., Charpak-Amikam, Y., Afik, S., Ofir, G., and Sorek, R. (2015). BREX is a novel phage resistance system widespread in microbial genomes. Embo J 34, 169-183. Gupta, Y.K., Chan, S.H., Xu, S.Y., and Aggarwal, A.K. (2015). Structural basis of asymmetric DNA methylation and ATP-triggered long-range diffusion by EcoP15I. Nat Commun 6, 7363. Halford, S.E. (2009). Restriction enzymes - The (billion dollar) consequences of studying why certain isolates of phage infect only certain strains of E. coli. Biochemist 31, 10 - 13. Horton, J.R., Mabuchi, M.Y., Cohen-Karni, D., Zhang, X., Griggs, R.M., Samaranayake, M., Roberts, R.J., Zheng, Y., and Cheng, X. (2012). Structure and cleavage activity of the tetrameric MspJI DNA modification-dependent restriction endonuclease. Nucleic acids research 40, 9763-9773. Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., and Sternberg, M.J. (2015). The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10, 845-858. Kennaway, C.K., Obarska-Kosinska, A., White, J.H., Tuszynska, I., Cooper, L.P., Bujnicki, J.M., Trinick, J., and Dryden, D.T. (2009). The structure of M.EcoKI Type I DNA methyltransferase with a DNA mimic antirestriction protein. Nucleic acids research 37, 762-770. Kennaway, C.K., Taylor, J.E., Song, C.F., Potrzebowski, W., Nicholson, W., White, J.H., Swiderska, A., Obarska-Kosinska, A., Callow, P., Cooper, L.P., et al. (2012). Structure and operation of the DNA- translocating type I DNA restriction enzymes. Genes & development 26, 92-104. Koonin, E.V., and Makarova, K.S. (2019). Origins and evolution of CRISPR-Cas systems. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 374, 20180087. Kulkarni, M., Nirwan, N., van Aelst, K., Szczelkun, M.D., and Saikrishnan, K. (2016). Structural insights into DNA sequence recognition by Type ISP restriction-modification enzymes. Nucleic acids research 44, 4396-4408. Loenen, W.A., Dryden, D.T., Raleigh, E.A., and Wilson, G.G. (2014a). Type I restriction enzymes and their relatives. Nucleic acids research 42, 20-44. Loenen, W.A., Dryden, D.T., Raleigh, E.A., Wilson, G.G., and Murray, N.E. (2014b). Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic acids research 42, 3-19. Loenen, W.A., and Raleigh, E.A. (2014). The other face of restriction: modification-dependent enzymes. Nucleic acids research 42, 56-69. Morgan, R.D., and Luyten, Y.A. (2009). Rational engineering of type II restriction endonuclease DNA binding and cleavage specificity. Nucleic acids research 37, 5222-5233. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 16 Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605-1612. Pingoud, A., Wilson, G.G., and Wende, W. (2014). Type II restriction endonucleases--a historical perspective and more. Nucleic acids research 42, 7489-7527. Punjani, A., and Fleet, D.J. (2020). 3D Variability Analysis: Directly resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM images. bioRxiv https://doi.org/10.1101/2020.04.08.032466. Punjani, A., Rubinstein, J.L., Fleet, D.J., and Brubaker, M.A. (2017). cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods 14, 290-296. Rao, D.N., Dryden, D.T., and Bheemanaik, S. (2014). Type III restriction-modification enzymes: a historical perspective. Nucleic acids research 42, 45-55. Roberts, R.J. (2005). How restriction enzymes became the workhorses of molecular biology. Proc Natl Acad Sci U S A 102, 5905-5908. Samuelson, J.C., Zhu, Z., and Xu, S.Y. (2004). The isolation of strand-specific nicking endonucleases from a randomized SapI expression library. Nucleic acids research 32, 3661-3671. Shen, B.W., Xu, D., Chan, S.-H., Zheng, Y., Zhu, Z., Xu, S.-y., and Stoddard, B.L. (2011). Characterization and crystal structure of the type IIG restriction endonuclease RM.BpuSI. Nucleic acids research 39, 8223-8236. Šišáková, E., van Aelst, K., Diffin, F.M., and Szczelkun, M.D. (2013). The Type ISP Restriction- Modification enzymes LlaBIII and LlaGI use a translocation-collision mechanism to cleave non-specific DNA distant from their recognition sites. Nucleic acids research 41, 1071-1080. Suloway, C., Pulokas, J., Fellmann, D., Cheng, A., Guerra, F., Quispe, J., Stagg, S., Potter, C.S., and Carragher, B. (2005). Automated molecular microscopy: the new Leginon system. J Struct Biol 151, 41- 60. Sumby, P., and Smith, M.C. (2002). Genetics of the phage growth limitation (Pgl) system of Streptomyces coelicolor A3(2). Molecular microbiology 44, 489-500. Xiong, X., Wu, G., Wei, Y., Liu, L., Zhang, Y., Su, R., Jiang, X., Li, M., Gao, H., Tian, X., et al. (2020). SspABCD-SspE is a phosphorothioation-sensing bacterial defence system with broad anti-phage activities. Nature microbiology. Xu, T., Yao, F., Zhou, X., Deng, Z., and You, D. (2010). A novel host-specific restriction system associated with DNA backbone S-modification in Salmonella. Nucleic acids research 38, 7133-7141. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 17 Figure Legends. Figure 1. In vitro biochemical analyses of DrdV activities. See Methods for details of all reaction conditions. Panel a: Methylation by DrdV is much slower than endonuclease cleavage. Left: Time course of DrdV incubation in buffer with SAM (AdoMet) but without Mg++ (to prevent cleavage). The DNA substrate (pAd2BsaBI plasmid containing 19 DrdV sites) was incubated with 1 unit DrdV for the indicated time, then immediately purified using a spin column. The purified DNA was then challenged by cutting with DrdV (now in the presence of Mg++) to assess methylation status. Some partial methylation is observed starting at 15 minutes, but full methylation requires between 4 and 16 hrs. Right: Same time course in buffer containing Mg++. Cleavage is 90% complete within 5 minutes and fully complete in 1 hour. Panel b: Cleavage is activated by presence of DNA target site added in trans. Left: DrdV cleavage of a pUC19 plasmid substrate, (linearized with PstI) that harbors a single DrdV site: 2-fold serial dilution of DrdV from 8 to 0.25 units. The extra bands indicate star activity at near-cognate DrdV sites in the presence of the highest amounts (8 and 4 units) of DrdV. Right: DrdV cleavage of the same substrate, at the same enzyme concentrations, each in the presence of 100 mM of a short DNA hairpin duplex containing the DrdV target site. Cleavage goes to completion and displays greatly reduced off-target cutting. Panel c: DrdV cleaves both DNA strands at multiple target sites in a coordinated manner. Left: Time course of DrdV digestion of supercoiled pBR322 DNA (2 units/ug) for 15 s, 30 s, 1, 3, 5, 10, 30 and 60 min. Supercoiled plasmid is converted directly to linear plasmid cut at one site, or to fragments representing cutting at two sites, with very little appearance of open circle (OC) DNA nicked in one strand only. Subsequently the DNA is cut at all three sites. Right: 2-fold serial dilution of DrdV from 8 to 0.125 units/ug pBR322 substrate. At limiting enzyme, the majority of the cut DNA represents cutting at one site to linearize the plasmid. Figure 2. CryoEM analysis of DrdV-DNA complex. Panel a: Density maps of dimeric, trimeric and tetrameric DNA-bound enzyme complexes. The fourth and final subunit in the tetrameric enzyme assemblage is displays sub-stoichiometric partial occupancy. See also Supplementary Movie 1. Panel b: Superposition of all three density maps. Panel c: Front and back views of the DrdV tetramer density maps with individually colored enzyme subunits and bound DNA duplexes. Panel d: Atomic model of the DrdV tetrameric assembly. Panel e: Model of the central DNA-bound enzyme dimer (subunits A and B) extracted from the tetrameric assemblage, overlayed with boxes corresponding to representative regions of density and respective models shown in Panel f. Panel f: Close-up views of CryoEM map corresponding to (i) the N- terminal endonuclease domain; (ii and iii) the central methyltransferase domain and (iv) the C-terminal target recognition domain (TRD) as indicated by boxes in Panel e. Figure 3. Conformation of an individual DrdV subunit bound to a DNA target. The model and maps shown are extracted from the tetrameric enzyme assemblage. Panel a: DNA construct used for CryoEM analyses. The duplex consists of 28 complementary basepairs and spans both the enzyme’s seven basepair target site (red bases and blue underlined site of adenine methylation) and its downstream cleavage sites on (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 18 the top and bottom strands (blue bases flanking the scissile phosphates, 10 and 8 basepairs downstream from the final basepair of the target site). Panel b: Density map of a single DNA bound DrdV protomer with color coded domains: DrdV is a single chain protein spanning 1029 residues, corresponding to an N-terminal endonuclease domain (light blue), a subsequent helical connector region (yellow) and central methyltransferase (‘MTase’) domain (purple) and a C-terminal target recognition domain (‘TRD’) (pink). Panel c: Each enzyme subunit contains a bound S-adenosyl-methionine (‘SAM’) cofactor (magenta) bound in its active site. The adenine base at position 6 in the target is flipped into the MTase active site and is unmethylated. The base is contacted by three aromatic residues (Y451, F304 and F562) and a neighboring asparagine (N448) from the MTase domain. N458 and K567 occupy the space vacated by the filliped-out adenine. Panels d and e: Views of molecular model and corresponding electron density map in the enzyme- DNA interface, with several residues that form additional sequence-specific contacts to the DNA target site shown. Several basic and polar residues from the methyltransferase (including K486 and K488) and the TRD (including D803 and K807) contribute additional base-specific contacts in the target site. The adenine that is targeted for methylation (underlined ‘A’) is clearly flipped out of the DNA duplex and positioned proximal to the bound SAM cofactor; both moieties are clearly visible in the CryoEM density maps. Additional details of basepair-specific contacts are illustrated in Supplemental Figure S5. Figure 4. Formation of a DrdV dimer bound to two individual DNA targets positions their endonuclease domains near a one strand of their partner’s bound DNA duplex. The map shown is extracted from the larger tetrameric assemblage. Panel a: Two different views of the map. The map corresponding to one enzyme subunit is colored solid blue, while the second is colored to indicate the enzyme’s individual domains. The primary interface between individual protein subunits, and the interface between the DNA duplex bound by subunit B and the endonuclease domain of subunit A, are indicated by boxes labeled ‘c’ and ‘d’ and correspond to further detail illustrated in Panels c and d below. Panel b: Ribbon diagram of the DrdV core dimer, again indicating the location of interfaces between the protein subunits (largely via their helical connector regions) and between the nuclease domain of subunit A with DNA target from subunit B. As illustrated in the adjacent cartoon, the endonuclease domains are swapped between subunits, such that each enzyme subunit positions its endonuclease domain in contact with the DNA duplex bound by the opposite enzyme subunit. Panel c: Contacts between the endonuclease active site of subunit A and the DNA duplex bound by subunit B, Illustrating the coordination of a bound calcium ion by residues of the active site and by the scissile phosphate on the corresponding strand of DNA. Panel d: Ribbon model and density illustrating the interface between the enzyme subunits. The interface is largely composed of two buried, symmetry-related clusters of opposing charged residues (3 acidic side chains (D224, E230 and D231) from one subunit and 3 basic residues (K251*, R252* and K259*) from the opposite subunit, and vice-versa), augmented by similarly duplicated cation-pi interactions between R252 from one subunit and Y396* from the other, as well as an additional contact between R252 and Q225*. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 19 Figure 5. Association of endonuclease domains in the enzyme trimer and tetramer assemblages results in three enzyme subunits being jointly involved in the cleavage of a single bound DNA duplex. Panel a: Density map of the DNA-bound enzyme tetramer, colored by subunit. The organization of the assemblage and corresponding coloring is also indicated in the cartoon schematic adjacent to the map. The endonuclease domains from subunits B and C (boxed and colored in shades of green) are jointly positioned directly adjacent to the scissile phosphates of the DNA duplex bound by subunit A, with their active sites appropriately arranged to cleave the DNA duplex downstream from the bound target site. The endonuclease domains from subunits A and D are similarly positioned to cleave the DNA duplex bound by subunit B. Panel b: Orthogonal view of the packing and interactions between endonuclease domains shown in Panel a, and corresponding density map overlayed with atomic model, showing buried, symmetry-related clusters of opposing charged residues (E41 and R46 from one subunit, versus R12 and D15 from the other, and vice- versa) interacting in a pairwise manner within a helical interface (located on the opposite side of the domains from their active site). Panel c: The recognition, binding and cleavage of a single DNA duplex involves contacts and interfaces formed between three separate DNA-bound enzyme subunits. ______________________________________________________________________________________ Supplemental Figure S1. Enzyme purification and initial negative stain EM microscopy. Panel a. SDS- PAGE of DrdV at different stages of purification. See methods for full details of enzyme production Panel b. SEC elution profiles of free and DNA bound DrdV, red and blue curve respectively (left) and elution profile of DNA bound DrdV overlayed with absorbance ratio at 260 nm and 280 nm, indicating formation and elution of DNA-bound enzyme complex. Panel c. Negative stain electron microscopy of DrdV apoenzyme and DNA- bound complex. A: DrdV in the absence of DNA. B: Negative-stained image of DrdV in the presence of an equimolar amount of DNA duplex (sequence provided in Methods) containing the enzyme target recognition site (CATGGAC) and 16 basepairs downstream of the target. C: Selected panels of the DrdV DNA complex. Panel d: Selected 2D particle classes., Panel e: reconstructed low-resolution 3D model of negative-stained DrdV-DNA particles. Supplemental Figure S2. Flow chart for CryoEM analyses using a GLACIOS microscope operating at 200 kV. Data was collected a pixel size of 1.16Å and processed using the package cryoSPARCv2. For full details of data collection and processing approach, see Methods. Supplemental Figure S3. Flow Chart for CryoEM analyses using a KRIOS microscope operating at 300 kV. Computational processing , 3D reconstruction and refinement corresponds to data collected at pixel size of 0.537 Å. For full details of data collection and processing approach, see Methods. Supplemental Figure S4. EM resolution and enzyme sequence versus structure. Panel a. Local resolution distribution of the density maps for the dimer, trimer and tetramer on the same scale. Panel b. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 20 Amino acid sequence and visualized secondary structure of a DrdV subunit. Panel c. Pairwise secondary structure superposition of the two protomers of the core dimer (I), of the two protomers on the side (II) and one each of the protomer in the center and on the side (III)., showing the conservation of the folding of all domains and their relative disposition in all the particles. The only observable difference in the disposition is a hinged rigid body rotation of the nuclease domain of the protomer on the side with respected to that of the protomer in the core dimers. Supplemental Figure S5. Contacts between DrdV and base pairs in the enzyme’s target site. Supplemental Move M1. Rotation and visualization of CryoEM map and corresponding molecular model of the DNA-bound DrdV tetrameric assemblage. Supplemental Move M2. Morph from CryoEM map of DNA-bound DrdV dimeric assemblage to DNA- bound DrdV trimeric assemblage. Supplemental Move M3. Morph from CryoEM map of DNA-bound DrdV trimeric assemblage to DNA- bound DrdV tetrameric assemblage. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 a b (-) (-) c Time Course [Enzyme] [Enzyme] [Enzyme] (-) (-) GACTCGCTCATGGACCTGAGCACTC -3’ CTGAGCGAGTACCTGGACTCGTG -5’ TT T T - trans target + trans target: Figure 1 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 A B C D c ba e R62 S26 P63 W78 R29 N60 DNAtop DNAbottom i. Endonuclease N384 Y379 E386 T388 ii. Methyltransferase a14 Y573 R577 L575 V572 F574 iii. Methyltransferase b17b17 b15 b16 b18 Y512 V510 Y573 L575 I540 I538 Y592 I590 iv. TRD K927 A932 D947 V944 D911L949 T951 f A B D C A B C D d iii iv ii i Figure 2 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 a 5 ’ - C A G C C C A T G G A C C C A G A A C C A C C C A C C - 3 ’ 3 ’ - G T C G G G T A C C T G G G T C T T G G T G G G T G G - 5 ’ b K567 d SAM cleavage Endonuclease MTase TRD Helix connectorc A F304 F562 SAM K567 F304 Y451 N448 F562 K488 K486 R554 R762 Y764 N673 K807 Q485 N548 A G GTA T C T A G C G C e A SAM F562R452 N548 R762 T6 A6 G7 G5 C5 T2 G1 A3 C4 1 2 3 4 5 6 7 +1 +2 +3 +4 +5 +6 +7 +8 +9 +1 0 +1 1 Figure 3 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 c d c d c d b E50 D64 E79 K81 E25 C+10 C+11 A+9 C+12 Ca Ca++ E79 E25 D64 K94 K97 A+9 C+10 C+11 C+12 Y99 K81 K251AR252A Y396B Q227B E230B E230A Q227A R252B Y396AK251B Q225B Q225A K251AR252A Y396B Q227B Q225B Q227A R252B Q225A Y396AK251B a d d c Figure 4 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 c D15B R12B E41B R46B D15C R12C E41C R46C Q34B Q34C b b A453 Y451 N448 F304 Y99 K94 Q485 1 2 Y764 Y666 K488 R522 K807 A T G CGA T G C G C A T G C A T A T G C G C A T G C G C A T G C C G CC G C G C G CG C +12 +13 +15+14+8+7+6+5+4 G C +2 K567S517 5 M762 43 G C +1 N548 Q455 K651 R554R746 K718 S26 S26 Subunit A Subunit B Subunit C S785 A T C +3 6 T A R967 K555 G C +10 +11 N458 R674 D675 E79E25 Ca S80 D64 CaD64 E25 S80 E79 R721 G K486 R62 Q765 N673 Y99 K94 R62 +97 a y z x A B C D D15B R12B Q34B Q34C R46C R46B E41B D15C R12C E41C Figure 5 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 b c d e Hepari n Source Q Source S Final a Supplemental Figure S1 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 Homogeneous, Non-uniform, & Local Refinements 1091 movies ! Preprocessing!1074 movies, Template picker (same as II.a. ) , inspect picks, particle extraction 228748 Particles 336 movies ! Preprocessing ! 323 movies Template picker (six NS-2D classes shown above with a diameter of 220 Å), followed by inspect picks, particle extraction 121,781 particles 121,.534 particles (2D classification, select 2D) X 3 354,364 particles A. Ab-initio reconstruction, single model II. Viewing Direction DistributionGSFSC Resolution 3.25 Å Fragments 7,087 particles 6.0 % Full Trimer 50,820 particles 41.8% Partial trimer 28,925 particles 23.8% Dimers 34,554 particles 28.4% B. Ab-initio reconstruction, 4 models Partial trimer 28,925 4.64Å II. Preliminary cryoEM screening and analysis of sample preparation: a. b. GSFSC Resolution 3.32 Å GSFSC Resolution 3.43 Å Homogeneous, non-uniform & Local Refinements of individual class with respective class of particles Viewing Direction DistributionViewing Direction Distribution 90o 90o 90o 90o Supplemental Figure S2 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 189153 particles, Ab-initio 3D reconstruction, three models Homogenous, non-uniform, CTF & local refinements Tetramer 189,153 particles 50.2% Ab-initio 3D reconstruction, three models Import->Motion correction->CTF estimate->Manual curate Blob picker->inspect pick->extraction-> (2D classification/2D selection) x 3 4300 movies 3927 micrographs 376,559 particles Viewing Direction DistributionGSFSC Resolution 2.73Å Trimer 129,458 particles 34.4% Larger aggregates 57,948 particles 15.4% 45.2% 50.2% 4.6% 2.88 Å 2.86 Å Supplemental Figure S3 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 a b hinge I II III c Supplemental Figure S4 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 Position 1 : CATGNAC 5'- CATGNAC -3' 3’- GTNCATG -5' Met762 Gln485 Arg721 Position 2 : CATGNAC 5'- CATGNAC -3' 3'- GTACNAG -5' Tyr764 Lys486 Position 3 : CATGNAC Leu668 Asn673 Lys488 5'- CATGNAC -3' 3’- GTNCATG -5' Position 4 : CATGNAC 5'- CATGNAC -3' 3’- GTNCATG -5' Asp803Lys807 Position 7 : CATGNAC Arg554Asp564 5'- CATGNAC -3' 3’- GTNCATG-5' 5 ’ - C A G C C C A T G G A C C C A G A A C C A C C C A C C - 3 ’ 3 ’ - G T C G G G T A C C T G G G T C T T G G T G G G T G G - 5 ’ 6 7 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +111 2 3 4 5 Supplemental Figure S5 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 6, 2021. ; https://doi.org/10.1101/2021.01.06.425610doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425610 Shen_etal_DrdV_06Jan2021 DrdV_Figures_Final_December2020 10_1101-2021_01_06_425657 ---- The SCFMet30 ubiquitin ligase senses cellular redox state to regulate the transcription of sulfur metabolism gene The SCFMet30 ubiquitin ligase senses cellular redox state to regulate the 1 transcription of sulfur metabolism genes 2 3 Zane Johnson1, Yun Wang1, Benjamin M. Sutter1, Benjamin P. Tu1* 4 5 1 Department of Biochemistry, University of Texas Southwestern Medical Center, 6 Dallas, TX 75390-9038 7 8 *Correspondence and Lead Contact: benjamin.tu@utsouthwestern.edu 9 10 11 12 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 2 SUMMARY 13 14 In yeast, control of sulfur amino acid metabolism relies upon Met4, a transcription factor which 15 activates the expression of a network of enzymes responsible for the biosynthesis of cysteine and 16 methionine. In times of sulfur abundance, the activity of Met4 is repressed via ubiquitination by 17 the SCFMet30 E3 ubiquitin ligase, but the mechanism by which the F-box protein Met30 senses 18 sulfur status to tune its E3 ligase activity remains unresolved. Here, using a combination of 19 genetics and biochemistry, we show that Met30 utilizes exquisitely redox-sensitive cysteine 20 residues in its WD-40 repeat region to sense the availability of sulfur metabolites in the cell. 21 Oxidation of these cysteine residues in response to sulfur starvation inhibits binding and 22 ubiquitination of Met4, leading to induction of sulfur metabolism genes. Our findings reveal how 23 SCFMet30 dynamically senses redox cues to regulate synthesis of these special amino acids, and 24 further highlight the mechanistic diversity in E3 ligase-substrate relationships. 25 26 INTRODUCTION 27 28 The biosynthesis of sulfur-containing amino acids supplies cells with increased levels of cysteine 29 and methionine, as well as their downstream metabolites glutathione and S-adenosylmethionine 30 (SAM). Glutathione serves as a redox buffer to maintain the reducing environment of the cell and 31 provide protection against oxidative stress, while SAM serves as the methyl donor for nearly all 32 methyltransferase enzymes (Ljungdahl and Daignan-Fornier, 2012, Cantoni, 1975). In the yeast 33 Saccharomyces cerevisiae, biosynthesis of all sulfur metabolites can be performed de novo via 34 enzymes encoded in the gene transcriptional network known as the MET regulon. Activation of 35 the MET gene transcriptional program under conditions of sulfur starvation relies on the 36 transcription factor Met4 and additional transcriptional co-activators that allow Met4 to be 37 recruited to the MET genes (Kuras et al., 1996, Blaiseau and Thomas, 1998). 38 39 When yeast cells sense sufficiently high levels of sulfur in the environment, the MET gene 40 transcriptional program is negatively regulated by the activity of the SCF E3 ligase Met30 41 (SCFMet30) through ubiquitination of the master transcription factor Met4 (Kaiser et al., 2000). 42 Met4 is unique as an E3 ligase substrate as it contains an internal ubiquitin interacting motif (UIM) 43 which folds in and caps the growing ubiquitin chain generated by SCFMet30, resulting in a 44 proteolytically stable but transcriptionally inactive oligo-ubiquitinated state (Flick et al., 2006). 45 Upon sulfur starvation, SCFMet30 ceases to ubiquitinate Met4, allowing Met4 to become 46 deubiquitinated and transcriptionally active. 47 48 Since its discovery, much effort has gone into understanding how Met30 senses the sulfur status 49 of the cell. Several mechanisms have been attributed to Met30 to describe how Met4 and itself 50 work together to regulate levels of MET gene transcripts in response to the availability of sulfur or 51 the presence of toxic heavy metals (Thomas et al., 1995). After the discovery that Met30 is an E3 52 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 3 ligase that negatively regulates Met4 through ubiquitin-dependent and both proteolysis-dependent 53 and independent mechanisms (Rouillon et al., 2000, Flick et al., 2004, Kuras et al., 2002), it was 54 found that Met30 dissociates from SCF complexes upon cadmium addition, resulting in the 55 disruption of the aforementioned ubiquitin-dependent regulatory mechanisms (Barbey et al., 56 2005). It was later reported that this cadmium-specific dissociation of Met30 from SCF complexes 57 is mediated by the Cdc48/p97 AAA+ ATPase complex, and that Met30 ubiquitination is required 58 for Cdc48 to strip Met30 from these complexes (Yen et al., 2012). In parallel, attempts to identify 59 the sulfur metabolic cue sensed by Met30 suggested that cysteine, or possibly some downstream 60 metabolite, was required for the degradation of Met4 by SCFMet30, although glutathione was 61 reportedly not involved in this mechanism (Hansen and Johannesen, 2000, Menant et al., 2006). 62 A genetic screen for mutants that fail to repress MET gene expression found that cho2D cells, 63 which are defective in the synthesis of phosphatidylcholine (PC) from phosphatidylethanolamine 64 (PE), results in elevated SAM levels and deficiency in cysteine levels (Sadhu et al., 2014). 65 However, while Met30 and Met4 have been studied extensively for over two decades, the 66 biochemical mechanisms by which Met30 senses and responds to the presence or absence of sulfur 67 remains incomplete (Sadhu et al., 2014). 68 69 Herein, we utilize prototrophic yeast strains grown in sulfur-rich and sulfur-free respiratory 70 conditions to elucidate the mechanism by which Met30 senses sulfur. Using a combination of in 71 vivo and in vitro experiments, we find that instead of sensing any single sulfur-containing 72 metabolite, Met30 indirectly senses the levels of sulfur metabolites in the cell by acting as a sensor 73 of redox state. We describe a novel mechanism by which an F-box protein can be regulated through 74 the use of multiple cysteine residues as redox sensors that, upon oxidation, disrupt binding of the 75 E3 ligase to its target to enable the activation of a coordinated transcriptional response. 76 77 RESULTS 78 79 SYNTHESIS OF CYSTEINE IS MORE IMPORTANT THAN METHIONINE FOR MET4 80 UBIQUITINATION 81 82 Previous work in our lab has characterized the metabolic and cellular response of yeast cells 83 following switch from rich lactate media (YPL) to minimal lactate media (SL) (Wu and Tu, 2011, 84 Sutter et al., 2013, Laxman et al., 2013, Kato et al., 2019, Yang et al., 2019, Ye et al., 2017, Ye et 85 al., 2019). Under such respiratory conditions, yeast cells engage regulatory mechanisms that might 86 otherwise be subject to glucose repression. Among other phenotypes, this switch results in the 87 acute depletion of sulfur metabolites and the activation of the MET gene regulon (Sutter et al., 88 2013, Ye et al., 2019). To better study the response of yeast cells to sulfur starvation, we 89 reformulated our minimal lactate media to contain no sulfate, as prototrophic yeast can assimilate 90 sulfur in the form of inorganic sulfate into reduced sulfur metabolites. After switching cells from 91 YP lactate media (Rich) to the new minimal sulfur-free lactate media (−Sulfur), we found that 92 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 4 Met30 and Met4 quickly respond to sulfur starvation through the extensively studied ubiquitin-93 dependent mechanisms regulating Met4 activity (Figure 1A) (Yen et al., 2005, Flick et al., 2006, 94 Barbey et al., 2005, Kaiser et al., 2000, Flick et al., 2004). As previously observed, the 95 deubiquitination of Met4 resulted in the activation of the MET genes (Figure 1B) and corresponded 96 well with changes in observed sulfur metabolite levels (Figure 1C). Addition of sulfur metabolites 97 quickly rescued Met30 activity and resulted in the re-ubiquitination of Met4 and the repression of 98 the MET genes. 99 100 As previously noted, Met4 activation in response to sulfur starvation results in the emergence of a 101 second, faster-migrating proteoform of Met30, which disappears after rescuing yeast cells with 102 sulfur metabolites (Sadhu et al., 2014). We found that the appearance of this proteoform is 103 dependent on both MET4 and new translation, as it was not observed in either met4D cells or cells 104 treated with cycloheximide during sulfur starvation (Figure S1A and C). Additionally, this 105 proteoform persists after rescue with a sulfur source in the presence of a proteasome inhibitor 106 (Figure S1B). 107 108 We hypothesized that this faster-migrating proteoform of Met30 might be the result of translation 109 initiation at an internal methionine residue. In support of this possibility, mutation of methionine 110 residues 30, 35, and 36 to alanine blocked the appearance of a lower form during sulfur starvation 111 (Figure S1D). Conversely, deletion of the first 20 amino acids containing the first three methionine 112 residues of Met30 resulted in expression of a Met30 proteoform that migrated at the apparent 113 molecular weight of the wild type short form and did not generate a new, even-faster migrating 114 proteoform under sulfur starvation (Figure S1D). Moreover, the Met30M30/35/36A and Met30D1-20 115 strains expressing either solely the long or short form of the Met30 protein had no obvious 116 phenotype with respect to Met4 ubiquitination or growth in high or low sulfur media (Figure S1E). 117 We conclude that the faster-migrating proteoform of Met30 that is produced during sulfur 118 starvation has no discernible effect on sulfur metabolic regulation under these conditions. 119 120 The sulfur amino acid biosynthetic pathway is bifurcated into two branches at the central 121 metabolite homocysteine, where this precursor metabolite commits either to the production of 122 cysteine or methionine (Figure 1E). After confirming Met30 and Met4 were responding to sulfur 123 starvation as expected, we sought to determine whether the cysteine or methionine branch of the 124 sulfur metabolic pathway was sufficient to rescue Met30 E3 ligase activity and re-ubiquitinate 125 Met4 after sulfur starvation. To determine whether the synthesis of methionine is necessary to 126 rescue Met30 activity, cells lacking methionine synthase (met6D) were fed either homocysteine or 127 methionine after switching to sulfur-free lactate (−Sulfur) media. Interestingly, cells fed 128 homocysteine were still able to ubiquitinate and degrade Met4, while methionine-fed cells 129 appeared to oligo-ubiquitinate and stabilize Met4 (Figure 1D). These observations are consistent 130 with previous reports and suggest Met30 and Met4 interpret sulfur sufficiency through both 131 branches of sulfur metabolism to a degree (Hansen and Johannesen, 2000, Kaiser et al., 2000, 132 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 5 Kuras et al., 2002, Flick et al., 2004, Menant et al., 2006, Sadhu et al., 2014), with the stability of 133 Met4, but not the E3 ligase activity of Met30, apparently dependent on the methionine branch. 134 135 To determine whether Met30 specifically responds to cysteine, cells lacking cystathionine beta-136 lyase (str3D), the enzyme responsible for the conversion of cystathionine to homocysteine, were 137 starved of sulfur and fed either cysteine or methionine. This mutant is incapable of synthesizing 138 methionine from cysteine via the two-step conversion of cysteine into the common precursor 139 metabolite homocysteine. Our results show cysteine was able to rescue Met30 activity even in a 140 str3D mutant, further suggesting cysteine or a downstream metabolite, and not methionine, as the 141 signal of sulfur sufficiency for Met30 (Figure 1D). 142 143 CYSTEINE RESIDUES IN MET30 ARE OXIDIZED DURING SULFUR STARVATION 144 145 The synthesis of cysteine from homocysteine contributes to the production of the downstream 146 tripeptide metabolite glutathione (GSH), which exists at millimolar concentrations in cells and is 147 the major cellular reductant for buffering against oxidative stress (Cuozzo and Kaiser, 1999, Wu 148 et al., 2004). Specifically, glutathione serves to neutralize reactive oxygen species such as 149 peroxides and free radicals, detoxify heavy metals, and preserve the reduced state of protein thiols 150 (Pompella et al., 2003, Penninckx, 2000). Considering the relatively high number of cysteine 151 residues in Met30 (Figure 2A), we sought to determine if these residues might become oxidized 152 during acute sulfur starvation. Utilizing the thiol-modifying agent methoxy-PEG-maleimide 153 (mPEG2K-mal), which adds ~2 kDa per reduced cysteine residue, we assessed Met30 cysteine 154 oxidation in vivo by Western blot. Theoretically, full modification of the 23 cysteines in Met30 by 155 mPEG2K-mal should significantly shift the apparent molecular weight of Met30 by ~45-50 kDa. 156 As expected, Met30 in sulfur-replete rich media migrates at ~140 kDa (Figure 2B, first lane), 157 nicely corresponding to the modification of most if not all of its 23 cysteine residues, suggesting 158 they are all in the reduced state while sulfur levels are high and Met4 is being negatively regulated. 159 However, after shifting into sulfur-free minimal lactate media, Met30 migrates at ~80 kDa — 160 suggesting the majority of its cysteine residues are rapidly becoming oxidized in vivo following 161 acute sulfur starvation (Figure 2B, second and third lane). In contrast, the loading control Rpn10 162 contains a single cysteine residue, and did not exhibit significant oxidation within the same time 163 period of sulfur starvation. As expected, repletion of sulfur metabolites led to the reduction and 164 modification of Met30’s cysteine residues by mPEG2K-mal to the extent seen in the rich media 165 condition. Such oxidation and re-reduction of Met30 cysteines corresponds well with Met4 166 ubiquitination status (Figure 2B). Additionally, when cells were grown in sulfur-free media 167 containing glucose (SFD) as the carbon source, Met30 also becomes oxidized, although on a 168 slower timescale — suggesting this mechanism is not specific to yeast grown under non-169 fermentable conditions (Figure 2C). 170 171 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 6 Considering the link between sulfur starvation and oxidative stress, we next assessed whether 172 simply changing the redox state of sulfur-starved cells could mimic sulfur repletion with respect 173 to Met30 E3 ligase activity. Addition of the potent, membrane-permeable reducing agent DTT to 174 yeast cells starved of sulfur readily reversed Met30 cysteine oxidation. DTT also resulted in the 175 partial re-ubiquitination of Met4, suggesting that Met30 cysteine redox status influences its 176 ubiquitination activity against Met4 (Figure 2D). Taken together, these data strongly suggest 177 cysteine residues within Met30 are poised to become rapidly oxidized in response to sulfur 178 starvation, which is correlated with the deubiquitination of its substrate Met4. 179 180 MET30 CYSTEINE POINT MUTANTS EXHIBIT DYSREGULATED SULFUR SENSING 181 IN VIVO 182 183 After establishing Met30 cysteine redox status as an important factor in sensing sulfur starvation, 184 we sought to determine whether specific residues played key roles in the sensing mechanism. 185 Through site-directed mutagenesis of Met30 cysteines individually and in clusters (Figure S2A 186 and B), we observed that mutation of cysteines in the WD-40 repeat regions of Met30 with the 187 highest concentration of cysteine residues (WD-40 repeat regions 4 and 8) resulted in dysregulated 188 Met4 ubiquitination status (Figure 3A) and MET gene expression (Figure 3B). Specifically, 189 conservatively mutating these cysteines to serine residues mimics the reduced state of the Met30 190 protein, resulting in constitutive ubiquitination of Met4 by Met30 even when cells are starved of 191 sulfur. The mixed population of ubiquitinated and deubiquitinated Met4 in the mutant strains 192 resulted in reduced induction of SAM1 and GSH1, while MET17 appears to be upregulated in the 193 mutants but is largely insensitive to the changes in the sulfur status of the cell. Interestingly, a 194 single cysteine to serine mutant, C414S, phenocopies the grouped cysteine to serine mutants 195 C414/426/436/439S (data not shown) and C614/616/622/630S. These mutants also exhibit slight 196 growth phenotypes when cultured in both rich and −sulfur lactate media supplemented with 197 homocysteine (Figure 3C). Furthermore, these point mutants only effect Met4 ubiquitination in 198 the context of sulfur starvation, as strains expressing these mutants exhibited a normal response to 199 cadmium as evidenced by rapid deubiquitination of Met4 (Figure S2C). 200 201 MET30 CYSTEINE OXIDATION DISRUPTS UBIQUITINATION AND BINDING OF 202 MET4 IN VITRO 203 204 Having observed that Met30 cysteine redox status is correlated with Met4 ubiquitination status in 205 vivo, we next sought to determine whether the sulfur/redox-sensing ability of SCFMet30 E3 ligase 206 activity could be reconstituted in vitro. To this end, we performed large scale immuno-purifications 207 of SCFMet30-Flag to pull down Met30 and its interacting partners in both high and low sulfur 208 conditions for in vitro ubiquitination assays with recombinantly purified E1, E2, and Met4 (Figure 209 4A). Initial in vitro ubiquitination experiments showed little difference in activity between the two 210 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 7 conditions, mirroring prior efforts to demonstrate differential activity of the Met30 E3 ligase in 211 response to stimuli that effect its activity in vivo (Figure S3A) (Barbey et al., 2005). 212 213 Since the cysteine residues within Met30 became rapidly oxidized in sulfur-free conditions, the 214 addition of DTT as a standard component in our IP buffer and in in vitro ubiquitination reactions 215 could potentially reduce oxidized Met30 cysteines and alter its ubiquitination activity towards 216 Met4. To test this possibility, we next performed the Met30 IP and in vitro assay in the complete 217 absence of reducing agent. Strikingly, we observed little to no ubiquitination activity in these 218 conditions (Fig. S3B), suggesting that oxidized Met30 exhibits significantly reduced 219 ubiquitination activity. 220 221 To more rigorously test the effect of reducing agents on the activity of immunopurified SCFMet30, 222 we performed in parallel the Met30-Flag IP with cells grown in both high and low sulfur 223 conditions, with and without reducing agent in the IP. Silver stains of the eluted co-IP Met30 224 complexes showed similar levels of total protein overall and little difference in the abundance of 225 major binding partners between the four conditions (Figure S3C). Western blots of the co-IP 226 samples for the Cdc53/cullin scaffold showed similar binding between the samples with the 227 exception of the −sulfur, −DTT sample which had approximately a third of the amount of Cdc53 228 bound to Met30 (Figure S3D). We suspect this difference is due to the canonical regulation of SCF 229 E3 ligases, which uses cyclic changes in the affinity of Skp1/F-box protein heterodimers to the 230 cullin scaffold based on binding between the F-box protein and its substrate (Reitsma et al., 2017). 231 After performing the initial IP and washing the beads in buffer with and without reducing agent, 232 the final wash step and Flag peptide elution were done without reducing agent in the buffer for all 233 four IP conditions in order to remove any residual reducing agent from the final ubiquitination 234 reaction, which was also performed without reducing agent. A small aliquot of the rich and −sulfur 235 “−DTT” immunopurified SCFMet30 was transferred to a new tube and treated with 5 mM TCEP, a 236 non-thiol, phosphine-based reducing agent, for approximately 30 min while the in vitro 237 ubiquitination assays were set up to test if the low activity of the oxidized SCFMet30 complex could 238 be rescued by treating with another reducing agent before addition to the final reaction. The data 239 clearly demonstrate that the presence of reducing agent in the IP and wash buffer, but not in the 240 elution or final reaction, significantly increased the E3 ligase activity of SCFMet30 in vitro regardless 241 of whether the cells were grown in high (Figure 4C) or low sulfur media (Figure 4D). Further 242 supporting our hypothesis, brief treatment of the oxidized −DTT IP complex with TCEP 243 (−DTT/+TCEP) rescued the activity of the E3 complex in vitro (Figures 4B and C). The same +/ 244 − DTT in vitro ubiquitination experiment done with the C414S and C614/616/622/630S Met30 245 mutants showed lower E3 ligase activity overall relative to wild type Met30, but smaller 246 differences between the plus and minus reducing agent condition (Figure S4A). 247 248 As SCFMet30 E3 ligase activity in vitro is independent of the sulfur-replete or -starved state of the 249 cells from which the co-IP concentrate is produced, and that the activity of the SCFMet30 co-IP 250 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 8 concentrate purified in the absence of reducing agent can be rescued by treatment with another 251 reducing agent, we hypothesized that the low E3 ligase activity of SCFMet30 purified in the absence 252 of reducing agent is due to decreased binding between Met30 and Met4, and not decreased binding 253 between Met30 and the other core SCF components. To test this possibility, lysate for “rich” and 254 “−sulfur” cells was prepared and each was split into three groups, with either reducing agent 255 (+DTT), the thiol-specific oxidizing agent tetramethylazodicarboxamide (+Diamide), or control 256 (−DTT) (Figure 4A). Met30-Flag IPs were performed as previously described for the in vitro 257 ubiquitination assay, except instead of eluting Met30 off of the beads, the +DTT, −DTT, and 258 +Diamide beads were each split into two tubes containing IP buffer ±DTT and bacterially purified 259 Met4. The beads were incubated with purified Met4 prior to washing with IP buffer with or without 260 DTT. We observed a clear, DTT-dependent increase in the fraction of Met4 bound to the Met30-261 Flag beads, with the “+DTT” Met30 IP showing a larger initial amount of bound Met4 compared 262 to the “−DTT” Met30 IP, with even less Met4 bound to the “+Diamide” Met30-Flag beads. 263 Consistent with our hypothesis, the addition of DTT to the Met4 co-IP with “−DTT” or 264 “+Diamide” Met30-Flag beads restored the Met30/Met4 interaction to the degree seen in the 265 “+DTT” Met30-Flag beads. We then performed the same experiment with our Met30 cysteine 266 point mutants. The amount of Met4 bound to these mutants was less sensitive to the presence or 267 absence of reducing agent (Figure S4B). Collectively, these data suggest that the reduced form of 268 key cysteine residues in Met30 enables it to engage its Met4 substrate and facilitate ubiquitination. 269 270 DISCUSSION 271 272 The unique redox chemistry offered by sulfur and sulfur-containing metabolites renders many of 273 the biochemical reactions required for life possible. The ability to carefully regulate the levels of 274 these sulfur-containing metabolites is of critical importance to cells as evidenced by an exquisite 275 sulfur-sparing response. Sulfur starvation induces the transcription of MET genes and specific 276 isozymes, which themselves contain few methionine and cysteine residues (Fauchon et al., 2002). 277 Furthermore, along with the dedicated cell cycle F-box protein Cdc4, Met30 is the only other 278 essential F-box protein in yeast, linking sulfur metabolite levels to cell cycle progression (Su et 279 al., 2005, Su et al., 2008). Our findings highlight the intimate relationship between sulfur 280 metabolism and redox chemistry in cellular biology, revealing that the key sensor of sulfur 281 metabolite levels in yeast, Met30, is regulated by reversible cysteine oxidation. Such oxidation of 282 Met30 cysteines in turn influences the ubiquitination status and transcriptional activity of the 283 master sulfur metabolism transcription factor Met4. While much work has been done to 284 characterize the molecular basis of sulfur metabolic regulation in yeast between Met30 and Met4, 285 this work describes the biochemical basis for sulfur sensing by the Met30 E3 ligase (Figure 5). 286 287 The ability of Met30 to act as a cysteine redox-responsive E3 ligase is unique in Saccharomyces 288 cerevisiae, but is reminiscent of the redox-responsive Keap1 E3 ligase in humans. In humans, 289 Keap1 ubiquitinates and degrades its Nrf2 substrate to regulate the cellular response to oxidative 290 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 9 stress. When cells are exposed to electrophilic metabolites or oxidative stress, key cysteine 291 residues are either alkylated or oxidized into disulfides, resulting in conformational changes that, 292 in turn, either disrupt Keap1 association with Cul3 or Nrf2, both leading to Nrf2 activation 293 (Yamamoto et al., 2018). Our data suggest that in response to sulfur starvation, Met30 can still 294 maintain its association with the SCF E3 ligase cullin scaffold, but that treatment of the oxidized 295 complex with reducing agent is sufficient to stimulate ubiquitination of Met4 in vitro. This, along 296 with the in vivo and in vitro Met30 cysteine point mutant data, leads us to conclude that it is the 297 ability of Met30 to bind its substrate Met4 that is being disrupted by cysteine oxidation. 298 299 Previous work on the yeast response to cadmium toxicity demonstrated that Met30 is stripped from 300 SCF complexes by the p97/Cdc48 segregase upon treatment with cadmium, suggesting that like 301 Keap1, Met30 can utilize both dissociation from SCF complexes and disrupted interaction with 302 Met4 to modulate Met4 transcriptional activation (Barbey et al., 2005, Yen et al., 2012). Recent 303 work on the sensing of oxidative stress by Keap1 has found that multiple cysteines in Keap1 can 304 act cooperatively to form disulfides, and that the use of multiples cysteines to form different 305 disulfide bridges creates an “elaborate fail-safe mechanism” to sense oxidative stress (Suzuki et 306 al., 2019). In light of our findings, we suspect Met30 might similarly use multiple cysteine residues 307 in a cooperative disulfide formation mechanism to disrupt the binding interface between Met30 308 and Met4, but more work will be needed to demonstrate this definitively. It is worth noting the 309 curious spacing and clustering of cysteine residues in Met30, with the highest density and closest 310 spacing of cysteines found in two WD-40 repeats that are expected to be directly across from each 311 other in the 3D structure (Figure 2A). That the mutation of these cysteine clusters to serine have 312 the largest in vivo effect, but mutation of any one cysteine to serine (with the notable exception of 313 Cys414) has no effect, implies some built-in redundancy in the cysteine-based redox-sensing 314 mechanism (Figure S2B). We speculate that the oxidation of the cysteines in the WD-40 repeat 315 region of Met30 work cooperatively to produce structural changes that position Cys414 to make a 316 key disulfide linkage that disrupts the interaction with Met4. 317 318 It was previously hypothesized that an observed, faster-migrating proteoform of Met30 might be 319 involved in the regulation of sulfur metabolism (Sadhu et al., 2014). We deduced that the lower 320 form of Met30 does appear to be the result of transcriptionally-guided, alternative translational 321 initiation. However, this faster-migrating proteoform appears dispensable for sulfur metabolic 322 regulation under the conditions we examined. It is curious that such an ostensibly obvious feedback 323 loop between Met30 and Met4 would appear to have little to no effect on sulfur metabolic 324 regulation. However, during sulfur starvation, a decrease in global translation coincides with an 325 increase in ribosomes containing one, instead of two, methyl groups at universally conserved, 326 tandem adenosines near the 3’end of 18S rRNA (Liu et al.) We speculate that these ribosomes 327 might preferentially translate MET gene mRNAs, as well as preferentially initiate translation at the 328 internal 30, 35, and 36th methionine residues of Met30. 329 330 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 10 The utilization of a redox mechanism for Met30 draws interesting comparisons to the regulation 331 of Met4 via ubiquitination in that both mechanisms are rapid and readily reversible, require no 332 new RNA or protein synthesis, and there is no requirement for the consumption of sulfur 333 equivalents so as to spare them for use in MET gene translation under conditions of sulfur scarcity. 334 It is also striking that while Met30 contains many cysteine residues, Met4 contains none – which 335 has the consequence that as Met30 cysteines are oxidized, there is no possibility that Met4 can 336 make an intermolecular disulfide linkage that might interfere with its release and recruitment to 337 the promoters of MET genes. Upon repletion of sulfur metabolites, cellular reducing capacity is 338 restored, and Met30 cysteine reduction couples the regulation of MET gene activation to sulfur 339 assimilation, both of which require significant reducing equivalents. 340 341 Lastly, we highlight the observation that nearly all of the Met30 protein becomes rapidly oxidized 342 within 15 min of sulfur starvation, in contrast to other nucleocytosolic proteins (Fig. 2B). Bulk 343 levels of oxidized versus reduced glutathione are also minimally changed within this timeframe. 344 These considerations suggest that Met30 is either located in a redox-responsive microenvironment 345 within cells, or that key cysteine residues such as Cys414 are predisposed to becoming oxidized 346 to subsequently inhibit binding and ubiquitination of Met4. Future structural characterization of 347 SCFMet30 in its reduced and oxidized states may reveal the underlying basis of its exquisite 348 sensitivity to, and regulation by, oxidation. Nonetheless, along with SoxR and OxyR transcription 349 factors in E. coli (Imlay, 2013) the Yap1 transcription factor in yeast (Herrero et al., 2008), and 350 Keap1 in mammalian cells, our studies add the F-box protein Met30 to the exclusive list of bona 351 fide cellular redox sensors that can initiate a transcriptional response. 352 353 ACKNOWLEDGMENTS 354 355 We thank members of the Tu lab, Deepak Nijhawan, Hongtao Yu, and George DeMartino for 356 helpful discussions. This work was supported by NIH R01GM094314, R35GM136370, and an 357 HHMI-Simons Faculty Scholars Award to B.P.T. 358 359 AUTHOR CONTRIBUTIONS 360 361 This study was conceived by Z.J. and B.P.T. B.M.S. performed Met30 cysteine point mutant strain 362 construction, Y.W. performed cysteine point mutant cloning and Cdc34 protein purification, and 363 all remaining experiments were directed and performed by Z.J. The paper was written by Z.J. and 364 B.P.T. and has been approved by all authors. 365 366 DECLARATION OF INTERESTS 367 368 The authors declare no competing interests. 369 370 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 11 EXPERIMENTAL PROCEDURES 371 372 Yeast strains, construction, and growth media 373 The prototrophic CEN.PK strain background (van Dijken et al., 2000) was used in all experiments. 374 Strains used in this study are listed in Table S1. Gene deletions were carried out using either tetrad 375 dissection or standard PCR-based strategies to amplify resistance cassettes with appropriate 376 flanking sequences, and replacing the target gene by homologous recombination (Longtine et al., 377 1998). C-terminal epitope tagged strains were similarly made with the PCR-based method to 378 amplify resistance cassettes with flanking sequences. Point mutations were made by cloning the 379 gene into the tagging plasmids, making the specific point mutation(s) by PCR, and amplifying and 380 transforming the entire gene locus and resistance markers with appropriate flanking sequences 381 using the lithium acetate method. 382 383 Media used in this study: YPL (1% yeast extract, 2% peptone and 2% lactate); sulfur-free glucose 384 and lactate media (SFD/L) media composition is detailed in Table S2, with glucose or lactate 385 diluted to 2% each; YPD (1% yeast extract, 2% peptone and 2% glucose). 386 387 Whole cell lysate Western blot preparation 388 Five OD600 units of yeast culture were quenched in 15% TCA for 15 min, pelleted, washed with 389 100% EtOH, and stored at −20°C. Cell pellets were resuspended in 325 µL EtOH containing 1 390 mM PMSF and lysed by bead beating. The lysate was separated from beads by inverting the 391 screwcap tubes, puncturing the bottom with a 23G needle, and spinning the lysate at 2,500xg into 392 an Eppendorf for 1 min. Beads were washed with 200 µL of EtOH and spun again before 393 discarding the bead-containing screwcap tube and pelleting protein extract at 21,000xg for 10 min 394 in the new Eppendorf tube. The EtOH was aspirated and EtOH precipitated protein pellets were 395 resuspended in 150 µL of sample buffer (200 mM Tris pH 6.8, 4% SDS, 20% glycerol, 0.2 mg/ml 396 bromophenol blue), heated at 42°C for 45 min, and debris was pelleted at 16,000xg for 3 min. DTT 397 was added to a final concentration of 25 mM and incubated at RT for 30 min before equivalent 398 amounts of protein were loaded onto NuPAGE 4-12% bis-tris or 3-8% tris-acetate gels. For protein 399 samples modified with mPEG2K-mal, an aliquot of the sample buffer resuspended protein pellets 400 was moved to a fresh Eppendorf and sample buffer containing 15 mM mPEG2K-mal was added 401 for a final concentration of 5 mM mPEG2K-mal before heating at 42°C for 45 min, pelleting 402 debris, and adding DTT. 403 404 Western blots 405 Western blots were carried out by transferring whole cell lysate extracts or in vitro ubiquitination 406 or binding assay samples onto 0.45 micron nitrocellulose membranes and wet transfers were 407 carried out at 300 mA constant for 90 min at 4°C. Membranes were incubated with ponceau S, 408 washed with TBST, blocked with 5% milk in TBST for 1 h, and incubated with 1:5000 Mouse 409 anti-FLAG M2 antibody (Sigma, Cat#F3165), 1:5000 Mouse anti-HA(12CA5) (Roche, 410 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 12 Ref#11583816001), 1:50,000 Rabbit anti-RPN10 (Abcam, ab98843), or 1:3000 Goat anti-Cdc53 411 (Santa Cruz, yC-17) in 5% milk in TBST overnight at 4°C. After discarding primary antibody, 412 membranes were washed 3 times for 5 min each before incubation with appropriate HRP-413 conjugated secondary antibody for 1 h in 5% milk/TBST. Membranes were then washed 3 times 414 for 5 min each before incubating with Pierce ECL western blotting substrate and exposing to film. 415 416 RNA Extraction and Real Time Quantitative PCR (RT-qPCR) Analysis 417 RNA isolation of five OD600 units of cells under different growth conditions was carried out 418 following the manufacture manual using MasterPure yeast RNA purification kit (epicentre). RNA 419 concentration was determined by absorption spectrometer. 5 μg RNA was reverse transcribed to 420 cDNA using Superscript III Reverse Transcriptase from Invitrogen. cDNA was diluted 1:100 and 421 real-time PCR was performed in triplicate with iQ SYBR Green Supermix from BioRad. 422 Transcripts levels of genes were normalized to ACT1. All the primers used in RT-qPCR have 423 efficiency close to 100%, and their sequences are listed below. 424 425 ACT1_RT_F TCCGGTGATGGTGTTACTCA 426 ACT1_RT_R GGCCAAATCGATTCTCAAAA 427 MET17_RT_F CGGTTTCGGTGGTGTCTTAT 428 MET17_RT_R CAACAACTTGAGCACCAGAAAG 429 GSH1_RT_F CACCGATGTGGAAACTGAAGA 430 GSH1_RT_R GGCATAGGATTGGCGTAACA 431 SAM1_RT_F CAGAGGGTTTGCCTTTGACTA 432 SAM1_RT_R CTGGTCTCAACCACGCTAAA 433 434 Metabolite extraction and quantitation 435 Intracellular metabolites were extracted from yeast using a previous established method (Tu et al., 436 2007). Briefly, at each time point, ~12.5 OD600 units of cells were rapidly quenched to stop 437 metabolism by addition into 37.5 mL quenching buffer containing 60% methanol and 10 mM 438 Tricine, pH 7.4. After holding at -40°C for at least 3 min, cells were spun at 5,000xg for 2 min at 439 0°C, washed with 1 mL of the same buffer, and then resuspended in 1 mL extraction buffer 440 containing 75% ethanol and 0.1% formic acid. Intracellular metabolites were extracted by 441 incubating at 75°C for 3 min, followed by incubation at 4°C for 5 min. Samples were spun at 442 20,000xg for 1 min to pellet cell debris, and 0.9 mL of the supernatant was transferred to a new 443 tube. After a second spin at 20,000xg for 10 min, 0.8 mL of the supernatant was transferred to a 444 new tube. Metabolites in the extraction buffer were dried using SpeedVac and stored at −80°C 445 until analysis. Methionine, SAM, SAH, cysteine, GSH and other cellular metabolites were 446 quantitated by LC-MS/MS with a triple quadrupole mass spectrometer (3200 QTRAP, AB SCIEX) 447 using previously established methods (Tu et al., 2007). Briefly, metabolites were separated 448 chromatographically on a C18-based column with polar embedded groups (Synergi Fusion-RP, 449 150 3 2.00 mm 4 micron, Phenomenex), using a Shimadzu Prominence LC20/SIL-20AC HPLC-450 autosampler coupled to the mass spectrometer. Flow rate was 0.5 ml/min using the following 451 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 13 method: Buffer A: 99.9% H2O/0.1% formic acid, Buffer B: 99.9% methanol /0.1% formic acid. T 452 = 0 min, 0% B; T = 4 min, 0% B; T = 11 min, 50% B; T = 13 min, 100% B; T = 15 min, 100% B, 453 T = 16 min, 0% B; T = 20 min, stop. For each metabolite, a 1 mM standard solution was infused 454 into a Applied Biosystems 3200 QTRAP triple quadrupole-linear ion trap mass spectrometer for 455 quantitative optimization detection of daughter ions upon collision-induced fragmentation of the 456 parent ion [multiple reaction monitoring (MRM)]. The parent ion mass was scanned for first in 457 positive mode (usually MW + 1). For each metabolite, the optimized parameters for quantitation 458 of the two most abundant daughter ions (i.e., two MRMs per metabolite) were selected for 459 inclusion in further method development. For running samples, dried extracts (typically 12.5 OD 460 units) were resuspended in 150 mL 0.1% formic acid, spun at 21,000xg for 5 min at 4°C, and 125 461 µL was moved to a fresh Eppendorf. The 125 µL was spun again at 21,000xg for 5 min at 4°C, 462 and 100 µL was moved to mass-spec vials for injection (typically 50 µL injection volume). The 463 retention time for each MRM peak was compared to an appropriate standard. The area under each 464 peak was then quantitated by using Analyst® 1.6.3, and were re-inspected for accuracy. 465 Normalization was done by normalizing total spectral counts of a given metabolite by OD600 units 466 of the sample. Data represents the average of two biological replicates. 467 468 Protein purification 469 6xHis-Uba1 (E1) was purified as previously described (Petroski and Deshaies, 2005), with the 470 exception that the strain was made in the cen.pk background and the His6-tag was appended to the 471 N-terminus of Uba1. Additionally, lysis was performed by cryomilling frozen yeast pellets by 472 adding the pellet to a pre-cooled 50 ml milling jar containing a 20 mm stainless steel ball. Yeast 473 cell lysis was performed by milling in 3 cycles at 25 Hrz for 3 min and chilling in liquid nitrogen 474 for 1 min. Lysate was made by adding 4 ml of buffer for every gram of cryomilled yeast powder, 475 and clarification was performed at 35,000xg instead of 50,000xg. 476 477 Cdc34-6xHis (E2) similarly was purified according to previously described protocols (Petroski 478 and Deshaies, 2005), with the following exceptions; the CDC34 ORF was cloned into pHIS 479 parallel vector such that the N-terminal His tag was eliminated from the vector while incorporating 480 a C-terminal 6xHis tag by PCR. BL21 transformants were grown in LB medium and expression 481 was induced by addition of 0.1 mM IPTG. Cells were lysed by sonication and clarification was 482 done by spinning at 35,000xg for 20 min at 4°C before the Ni-NTA purification was performed as 483 previously described (Petroski and Deshaies, 2005). 484 485 His-SUMO-Met4-Strep-tagII-HA was purified by cloning the MET4 ORF into pET His6 Sumo 486 vector while incorporating a C-terminal Strep-tagII and a single HA tag by PCR. BL21 487 transformants were grown in 2 liters LB medium and induced by addition of 0.1 mM IPTG O/N 488 at 16°C at 200 rpm. Cell pellets were collected and lysed by sonication in buffer containing 50 489 mM Tris pH 7.5, 300 mM NaCl, 10% glycerol, 20 mM imidazole, 1 mM PMSF, 10 µM leupeptin, 490 50 mM NaF, 5 µM pepstatin, 0.5% NP-40, and 2x roche EDTA-free protease inhibitor cocktail 491 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 14 tablet. Lysate was clarified by centrifugation at 35,000xg for 20 min at 4°C and the supernatant 492 was transferred to a 50 ml conical and Met4 was batch purified with 1.5 ml of Ni-NTA agarose by 493 incubating for 30 min at 4°C. After spinning down the Ni-NTA agarose, the supernatant was 494 removed and the agarose was resuspended in the same buffer and moved to a gravity flow column 495 and washed 3 times with 50 mM Tris pH 7.5, 300 mM NaCl, 10% glycerol, and 20 mM imidazole 496 before elution with the same buffer containing 200 mM imidazole. Eluted Met4 was then run over 497 2 ml of Strep-Tactin Sepharose in a 10 ml gravity flow column, washed with 5 CVs Strep-Tactin 498 wash buffer (100 mM Tris pH 8.0, 150 mM NaCl), and eluted by diluting 1 ml 10X Strep-Tactin 499 Elution buffer in 9 ml Strep-Tactin wash buffer and collecting 1.5 ml fractions. Fractions 500 containing pure, full-length Met4 were pooled and concentrated while exchanging the buffer with 501 buffer containing 30 mM Tris pH 7.6, 100 mM NaCl, 5 mM MgCl2, 15% glycerol, and 2 mM 502 DTT. Protein concentration was measured and 1 mg/ml aliquots were made and stored at −80°C. 503 504 SCFMet30-Flag IP and in vitro ubiquitination assay 505 Strains containing Flag-tagged Met30 were grown in rich YPL media overnight to mid-late log 506 phase before dilution with more YPL and grown for 3 h before half of the culture was separated 507 and switched −sulfur SFL media for 15 min. Subsequently, approximately 3000 OD600 units each 508 of YPL and SFL cultured yeast were spun down and frozen in liquid nitrogen. Frozen yeast pellets 509 were cryomilled by adding the pellet to a pre-cooled 50 ml milling jar containing a 20 mm stainless 510 steel ball. Yeast cell lysis was performed by milling in 3 cycles at 25 Hrz for 3 min and chilling in 511 liquid nitrogen for 1 min. Cryomilled yeast powder (~ 4 grams) was moved to a 50 ml conical and 512 resuspended in 16 ml SCF IP buffer (50 mM Tris pH 7.5, 150 mM NaCl, 10 mM NaF, 1% NP-40, 513 1 mM EDTA, 5% glycerol) containing 10 µM leupeptin, 1 mM PMSF, 5 µM pepstatin, 100 µM 514 sodium orthovanadate, 2 mM 1, 10-phenanthroline, 1 µM MLN4924, 1X Roche EDTA-free 515 protease inhibitor cocktail tablet, and 1 mM DTT when specified. Small molecule inhibitors of 516 neddylation and deneddylation were included, and along with a short IP time, intended to minimize 517 exchange and preserve F-box protein/Skp1 substrate recognition modules (Reitsma et al., 2017). 518 The lysate was then briefly sonicated to sheer DNA and subsequently clarified at 35,000xg for 20 519 min and the supernatant was incubated with with 50 µL of Thermo Fisher protein G dynabeads 520 (Cat# 10004D) DMP crosslinked to 25 µL of Mouse anti-FLAG M2 antibody (Sigma, Cat#F3165) 521 for 30 min at 4°C. The agarose was pelleted at 500xg for 5 min, the supernatant was aspirated, and 522 the magnetic beads transferred to an Eppendorf tube. The beads were washed 5 times with 1 ml 523 SCF IP buffer with or without DTT before elution with 1 mg/ml Flag peptide in PBS. The eluent 524 was concentrated in Amicon Ultra-0.5 centrifugal filter units with 10 kDa MW cutoffs to a final 525 volume of ~ 40 µL. Silver stains of the IPs were carried out using the Pierce Silver Stain for Mass 526 Spectrometry kit (Cat#24600) according to the manufacturers protocol. The in vitro ubiquitination 527 assay was performed by placing a PCR tube on ice and adding to it 29 µL of water, 8 µL of 5X 528 ubiquitination assay buffer (250 mM Tris pH 7.5, 5 mM ATP, 25 mM MgCl2, 25% glycerol), 1.2 529 µL Uba1 (FC = 220 nM), 1.2 µL Cdc34 (FC = 880 nM), 0.5 µL yeast ubiquitin (Boston Biochem, 530 FC = 15.5 µM) and incubating at RT for 20 min. The PCR tubes were then placed back on ice and 531 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 15 20 µL of water, 8 µL of 5X ubiquitination assay buffer, 10 µL of concentrated SCFMet30-Flag IP, 532 and 2 µL of purified Met4 (FC = 200 nM) were added, the tubes were moved back to RT, and 20 533 µL aliquots of the reaction were removed, mixed with 2X sample buffer, and frozen in liquid 534 nitrogen over the time course. 535 536 SCFMet30-Flag IP and in vitro Met4 binding assay 537 For the Met4 binding assay, yeast cell lysate was prepared as described for the ubiquitination 538 experiment, except that the lysate was split three ways, with 1 mM DTT, 1 mM 539 tetramethylazodicarboxamide (Diamide) (Sigma, Cat#D3648), or nothing added to the lysate prior 540 to centrifugation at 21,000xg for 30 min at 4°C. The supernatant was transferred to new tubes and 541 100 µL of Thermo Fisher protein G dynabeads (Cat# 10004D) DMP crosslinked to 50 µL of 542 Mouse anti-FLAG M2 antibody (Sigma, Cat#F3165) was divided evenly between the six Met30-543 Flag IP conditions and incubated for 2 h at 4°C while rotating end over end. After incubation, the 544 beads were washed with IP buffer containing 1 mM DTT, 1 mM Diamide, or nothing twice before 545 a final wash with plain IP buffer. Each set of Met30-Flag bound beads prepared in the different IP 546 conditions was brought up to 80 µL with plain IP buffer, and 40 µL was dispensed to new tubes 547 containing 1 mL of IP buffer ± 1 mM DTT and 1 µg of purified recombinant Met4, and were 548 incubated for 2 h at 4°C while rotating end over end for a total of twelve Met4 co-IP conditions. 549 The beads were then collected, washed 3 times with IP buffer ± 1 mM DTT, resuspended in 60 µL 550 2X sample buffer, and heated at 70°C for 10 min before Western blotting for both Met4 and Met30. 551 552 553 554 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 16 REFERENCES 555 556 BARBEY, R., BAUDOUIN-CORNU, P., LEE, T. A., ROUILLON, A., ZARZOV, P., TYERS, 557 M. & THOMAS, D. 2005. Inducible dissociation of SCF(Met30) ubiquitin ligase 558 mediates a rapid transcriptional response to cadmium. EMBO J, 24, 521-32. 559 BLAISEAU, P. L. & THOMAS, D. 1998. Multiple transcriptional activation complexes tether 560 the yeast activator Met4 to DNA. EMBO J, 17, 6327-36. 561 CANTONI, G. L. 1975. Biological methylation: selected aspects. Annu Rev Biochem, 44, 435-562 51. 563 CUOZZO, J. W. & KAISER, C. A. 1999. Competition between glutathione and protein thiols for 564 disulphide-bond formation. Nature cell biology, 1, 130-135. 565 FAUCHON, M., LAGNIEL, G., AUDE, J.-C., LOMBARDIA, L., SOULARUE, P., PETAT, C., 566 MARGUERIE, G., SENTENAC, A., WERNER, M. & LABARRE, J. 2002. Sulfur 567 sparing in the yeast proteome in response to sulfur demand. Molecular cell, 9, 713-723. 568 FLICK, K., OUNI, I., WOHLSCHLEGEL, J. A., CAPATI, C., MCDONALD, W. H., YATES, J. 569 R. & KAISER, P. 2004. Proteolysis-independent regulation of the transcription factor 570 Met4 by a single Lys 48-linked ubiquitin chain. Nat Cell Biol, 6, 634-41. 571 FLICK, K., RAASI, S., ZHANG, H., YEN, J. L. & KAISER, P. 2006. A ubiquitin-interacting 572 motif protects polyubiquitinated Met4 from degradation by the 26S proteasome. Nat Cell 573 Biol, 8, 509-15. 574 HANSEN, J. & JOHANNESEN, P. F. 2000. Cysteine is essential for transcriptional regulation 575 of the sulfur assimilation genes in Saccharomyces cerevisiae. Molecular and General 576 Genetics MGG, 263, 535-542. 577 HERRERO, E., ROS, J., BELLÍ, G. & CABISCOL, E. 2008. Redox control and oxidative stress 578 in yeast cells. Biochimica et Biophysica Acta (BBA)-General Subjects, 1780, 1217-1235. 579 IMLAY, J. A. 2013. The molecular mechanisms and physiological consequences of oxidative 580 stress: lessons from a model bacterium. Nature Reviews Microbiology, 11, 443-454. 581 KAISER, P., FLICK, K., WITTENBERG, C. & REED, S. I. 2000. Regulation of transcription by 582 ubiquitination without proteolysis: Cdc34/SCFMet30-mediated inactivation of the 583 transcription factor Met4. Cell, 102, 303-314. 584 KATO, M., YANG, Y. S., SUTTER, B. M., WANG, Y., MCKNIGHT, S. L. & TU, B. P. 2019. 585 Redox State Controls Phase Separation of the Yeast Ataxin-2 Protein via Reversible 586 Oxidation of Its Methionine-Rich Low-Complexity Domain. Cell, 177, 711-721 e8. 587 KURAS, L., CHEREST, H., SURDIN-KERJAN, Y. & THOMAS, D. 1996. A heteromeric 588 complex containing the centromere binding factor 1 and two basic leucine zipper factors, 589 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 17 Met4 and Met28, mediates the transcription activation of yeast sulfur metabolism. EMBO 590 J, 15, 2519-29. 591 KURAS, L., ROUILLON, A., LEE, T., BARBEY, R., TYERS, M. & THOMAS, D. 2002. Dual 592 regulation of the met4 transcription factor by ubiquitin-dependent degradation and 593 inhibition of promoter recruitment. Mol Cell, 10, 69-80. 594 LAXMAN, S., SUTTER, B. M., WU, X., KUMAR, S., GUO, X., TRUDGIAN, D. C., 595 MIRZAEI, H. & TU, B. P. 2013. Sulfur amino acids regulate translational capacity and 596 metabolic homeostasis through modulation of tRNA thiolation. Cell, 154, 416-29. 597 LIU, K., SANTOS, D. A., HUSSMANN, J. A., SUTTER, B. M., WANG, Y., WEISSMAN, J. S. 598 & TU, B. P. Regulation of translation by 18S rRNA methylation multiplicity. 599 LJUNGDAHL, P. O. & DAIGNAN-FORNIER, B. 2012. Regulation of amino acid, nucleotide, 600 and phosphate metabolism in Saccharomyces cerevisiae. Genetics, 190, 885-929. 601 LONGTINE, M. S., MCKENZIE, A., 3RD, DEMARINI, D. J., SHAH, N. G., WACH, A., 602 BRACHAT, A., PHILIPPSEN, P. & PRINGLE, J. R. 1998. Additional modules for 603 versatile and economical PCR-based gene deletion and modification in Saccharomyces 604 cerevisiae. Yeast, 14, 953-61. 605 MENANT, A., BAUDOUIN-CORNU, P., PEYRAUD, C., TYERS, M. & THOMAS, D. 2006. 606 Determinants of the ubiquitin-mediated degradation of the Met4 transcription factor. J 607 Biol Chem, 281, 11744-54. 608 MILLER, A. W., BEFORT, C., KERR, E. O. & DUNHAM, M. J. 2013. Design and use of 609 multiplexed chemostat arrays. JoVE (Journal of Visualized Experiments), e50262. 610 PENNINCKX, M. 2000. A short review on the role of glutathione in the response of yeasts to 611 nutritional, environmental, and oxidative stresses. Enzyme Microb Technol, 26, 737-742. 612 PETROSKI, M. D. & DESHAIES, R. J. 2005. In vitro reconstitution of SCF substrate 613 ubiquitination with purified proteins. Methods Enzymol, 398, 143-58. 614 POMPELLA, A., VISVIKIS, A., PAOLICCHI, A., DE TATA, V. & CASINI, A. F. 2003. The 615 changing faces of glutathione, a cellular protagonist. Biochem Pharmacol, 66, 1499-503. 616 REITSMA, J. M., LIU, X., REICHERMEIER, K. M., MORADIAN, A., SWEREDOSKI, M. J., 617 HESS, S. & DESHAIES, R. J. 2017. Composition and regulation of the cellular 618 repertoire of SCF ubiquitin ligases. Cell, 171, 1326-1339. e14. 619 ROUILLON, A., BARBEY, R., PATTON, E. E., TYERS, M. & THOMAS, D. 2000. Feedback-620 regulated degradation of the transcriptional activator Met4 is triggered by the SCF(Met30 621 )complex. EMBO J, 19, 282-94. 622 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 18 SADHU, M. J., MORESCO, J. J., ZIMMER, A. D., YATES, J. R., 3RD & RINE, J. 2014. 623 Multiple inputs control sulfur-containing amino acid synthesis in Saccharomyces 624 cerevisiae. Mol Biol Cell, 25, 1653-65. 625 SU, N. Y., FLICK, K. & KAISER, P. 2005. The F-box protein Met30 is required for multiple 626 steps in the budding yeast cell cycle. Mol Cell Biol, 25, 3875-85. 627 SU, N. Y., OUNI, I., PAPAGIANNIS, C. V. & KAISER, P. 2008. A dominant suppressor 628 mutation of the met30 cell cycle defect suggests regulation of the Saccharomyces 629 cerevisiae Met4-Cbf1 transcription complex by Met32. J Biol Chem, 283, 11615-24. 630 SUTTER, B. M., WU, X., LAXMAN, S. & TU, B. P. 2013. Methionine inhibits autophagy and 631 promotes growth by inducing the SAM-responsive methylation of PP2A. Cell, 154, 403-632 15. 633 SUZUKI, T., MURAMATSU, A., SAITO, R., ISO, T., SHIBATA, T., KUWATA, K., 634 KAWAGUCHI, S. I., IWAWAKI, T., ADACHI, S., SUDA, H., MORITA, M., 635 UCHIDA, K., BAIRD, L. & YAMAMOTO, M. 2019. Molecular Mechanism of Cellular 636 Oxidative Stress Sensing by Keap1. Cell Rep, 28, 746-758 e4. 637 THOMAS, D., KURAS, L., BARBEY, R., CHEREST, H., BLAISEAU, P. L. & SURDIN-638 KERJAN, Y. 1995. Met30p, a yeast transcriptional inhibitor that responds to S-639 adenosylmethionine, is an essential protein with WD40 repeats. Mol Cell Biol, 15, 6526-640 34. 641 TU, B. P., MOHLER, R. E., LIU, J. C., DOMBEK, K. M., YOUNG, E. T., SYNOVEC, R. E. & 642 MCKNIGHT, S. L. 2007. Cyclic changes in metabolic state during the life of a yeast cell. 643 Proc Natl Acad Sci U S A, 104, 16886-91. 644 VAN DIJKEN, J. P., BAUER, J., BRAMBILLA, L., DUBOC, P., FRANCOIS, J. M., 645 GANCEDO, C., GIUSEPPIN, M. L., HEIJNEN, J. J., HOARE, M., LANGE, H. C., 646 MADDEN, E. A., NIEDERBERGER, P., NIELSEN, J., PARROU, J. L., PETIT, T., 647 PORRO, D., REUSS, M., VAN RIEL, N., RIZZI, M., STEENSMA, H. Y., VERRIPS, C. 648 T., VINDELOV, J. & PRONK, J. T. 2000. An interlaboratory comparison of 649 physiological and genetic properties of four Saccharomyces cerevisiae strains. Enzyme 650 Microb Technol, 26, 706-714. 651 WU, G., FANG, Y. Z., YANG, S., LUPTON, J. R. & TURNER, N. D. 2004. Glutathione 652 metabolism and its implications for health. J Nutr, 134, 489-92. 653 WU, X. & TU, B. P. 2011. Selective regulation of autophagy by the Iml1-Npr2-Npr3 complex in 654 the absence of nitrogen starvation. Mol Biol Cell, 22, 4124-33. 655 YAMAMOTO, M., KENSLER, T. W. & MOTOHASHI, H. 2018. The KEAP1-NRF2 System: a 656 Thiol-Based Sensor-Effector Apparatus for Maintaining Redox Homeostasis. Physiol 657 Rev, 98, 1169-1203. 658 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 19 YANG, Y. S., KATO, M., WU, X., LITSIOS, A., SUTTER, B. M., WANG, Y., HSU, C. H., 659 WOOD, N. E., LEMOFF, A., MIRZAEI, H., HEINEMANN, M. & TU, B. P. 2019. 660 Yeast Ataxin-2 Forms an Intracellular Condensate Required for the Inhibition of TORC1 661 Signaling during Respiratory Growth. Cell, 177, 697-710 e17. 662 YE, C., SUTTER, B. M., WANG, Y., KUANG, Z. & TU, B. P. 2017. A Metabolic Function for 663 Phospholipid and Histone Methylation. Mol Cell, 66, 180-193 e8. 664 YE, C., SUTTER, B. M., WANG, Y., KUANG, Z., ZHAO, X., YU, Y. & TU, B. P. 2019. 665 Demethylation of the Protein Phosphatase PP2A Promotes Demethylation of Histones to 666 Enable Their Function as a Methyl Group Sink. Mol Cell, 73, 1115-1126 e6. 667 YEN, J. L., FLICK, K., PAPAGIANNIS, C. V., MATHUR, R., TYRRELL, A., OUNI, I., 668 KAAKE, R. M., HUANG, L. & KAISER, P. 2012. Signal-induced disassembly of the 669 SCF ubiquitin ligase complex by Cdc48/p97. Mol Cell, 48, 288-97. 670 YEN, J. L., SU, N. Y. & KAISER, P. 2005. The yeast ubiquitin ligase SCFMet30 regulates 671 heavy metal response. Mol Biol Cell, 16, 1872-82. 672 673 674 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 20 FIGURE LEGENDS 675 676 Figure 1. Met30 and Met4 response to sulfur starvation and repletion under respiratory 677 growth conditions. 678 (A) Western blot analysis of a time course performed with yeast containing endogenously tagged 679 Met30 and Met4 that were cultured in rich lactate media (Rich) overnight to mid log phase before 680 switching cells to sulfur-free lactate media (−sulfur) for 1 h, followed by the addition of a mix of 681 the sulfur containing metabolites methionine, homocysteine, and cysteine at 0.5 mM each 682 (+Met/Cys/Hcy). 683 (B) Expression of MET gene transcript levels was assessed by qPCR over the time course shown 684 in (A). Data are presented as mean and SEM of technical triplicates. 685 (C) Levels of key sulfur metabolites were measured over the same time course as in (A) and (B), 686 as determined by LC-MS/MS. Data represent the mean and SD of two biological replicates. 687 (D) met6∆ or str3∆ strains were grown in “Rich” YPL and switched to “−sulfur” SFL for 1 h to 688 induce sulfur starvation before the addition of either 0.5 mM homocysteine (+HCY), 0.5 mM 689 methionine (+MET), or 0.5 mM cysteine (+CYS). 690 (E) Simplified diagram of the sulfur metabolic pathway in yeast. 691 692 Figure 2. Met30 cysteine residues become oxidized during sulfur starvation. 693 (A) Schematic of Met30 protein architecture and cysteine residue location. 694 (B) Western blot analysis of Met30 cysteine redox state in lactate media as determined by 695 methoxy-PEG-maleimide (mPEG2K-mal) modification of reduced protein thiols. For every 696 reduced cysteine thiol in a protein, mPEG2K-mal adds ~ 2 kDa in apparent molecular weight. 697 (C) Same Western blot analysis as in (B), except that yeast were cultured in sulfur-free glucose 698 media (SFD) for 3 h before the addition of 0.5 mM each of the sulfur metabolites homocysteine, 699 methionine, and cysteine (+Met/Cys/Hcy). 700 (D) Yeast were subjected to the same rich to −sulfur media switch as in (B), except that following 701 the 15 min time point, 5 mM DTT was added to the culture for 15 min and Met30 cysteine residue 702 redox state and Met4 ubiquitination were assessed by Western blot. 703 704 Figure 3. Met30 cysteine point mutants display dysregulated sulfur sensing. 705 (A) Western blot analysis of Met30 cysteine redox state and Met4 ubiquitination status in WT and 706 two cysteine to serine mutants, C414S and C614/616/622/630S. 707 (B) MET gene transcript levels over the same time course as (A) for the three strains, as assessed 708 by qPCR. Data are presented as mean and SEM of technical triplicates. 709 (C) Growth curves of the three yeast strains used in (A) and (B) in sulfur-rich YPL media or −sulfur 710 SFL media supplemented with 0.2 mM homocysteine. Cells were grown to mid-log phase in YPL 711 media before pelleting, washing with water, and back-diluting yeast into the two media conditions. 712 Data represent mean and SD of technical triplicates. 713 714 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 21 Figure 4. Met30 cysteine oxidation disrupts ubiquitination and reduces binding to Met4 in 715 vitro. 716 (A) Schematic for the large-scale SCFMet30-Flag immunopurification from rich high sulfur (YPL) 717 and −sulfur (SFL) conditions for use in in vitro ubiquitination or binding assays with recombinant 718 Met4 protein. 719 (B) Western blot analysis of Met4 in vitro ubiquitination by SCFMet30-Flag immunopurifications 720 from cells cultured in sulfur-replete rich media. Cryomilled YPL yeast powder was divided evenly 721 for two Flag IPs performed identically with the exception that one was done in the presence of 1 722 mM DTT (+DTT) and the other was performed without reducing agent present (−DTT). To test if 723 the addition of reducing agent could rescue the activity of the “−DTT” IP, a small aliquot of the 724 “−DTT” SCFMet30-Flag complex was transferred to a new tube and was treated briefly with 5 mM 725 TCEP while the in vitro ubiquitination reaction was set up (−DTT/+TCEP). The first three lanes 726 are negative control reactions performed either without SCFMet30-Flag IP, recombinant Met4, or 727 ubiquitin. 728 (C) The same Western blot analysis of Met4 in vitro ubiquitination as in (B), except that the 729 SCFMet30-Flag complex was produced from −sulfur SFL cells. 730 (D) Western blot analysis of the Met4 binding assay illustrated in (A). Rich and −sulfur lysate 731 were both split three ways, and lysate with 1 mM DTT (+DTT), 1 mM diamide (+Diamide), or 732 control (−DTT) were incubated with anti-Flag magnetic beads to isolate Met30-Flag complex. The 733 Met30-Flag bound beads from each condition were then split in half and distributed into tubes 734 containing IP buffer ± 1 mM DTT and purified recombinant Met4. The mixture was allowed to 735 incubate for 2 h before the beads were washed, boiled in sample buffer, and bound proteins were 736 separated on SDS-PAGE gels and Western blots were performed for both Met30 and Met4. 737 738 Figure 5. Model for sulfur-sensing and MET gene regulation by the SCFMet30 E3 ligase. 739 In conditions of high sulfur metabolite levels, cysteine residues in the WD-40 repeat region of 740 Met30 are reduced, allowing Met30 to bind and facilitate ubiquitination of Met4 in order to 741 negatively regulate the transcriptional activation of the MET regulon. Upon sulfur starvation, 742 Met30 cysteine residues become oxidized, resulting in conformational changes in Met30 that allow 743 Met4 to be released from the SCFMet30 complex, deubiquitinated, and transcriptionally active. 744 745 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 22 SUPPLEMENTAL FIGURE LEGENDS 746 747 Figure S1. Characterization of the faster-migrating proteoform of Met30. 748 (A) Western blot of yeast treated with 200 µg/ml cycloheximide during sulfur starvation 749 demonstrates that production of the faster-migrating proteoform is dependent on new translation. 750 (B) The faster-migrating proteoform persists after rescue from sulfur starvation when treated with 751 a proteasome inhibitor. Cells were starved of sulfur for 3 h to accumulate the faster-migrating 752 proteoform, and then sulfur metabolites were added back concomitantly with MG132 (50 µM). 753 (C) The faster-migrating proteoform of Met30 is dependent on Met4. The met4∆ yeast strain does 754 not produce the second proteoform of Met30 when starved of sulfur. 755 (D) Western blot analysis of strains expressing either wild type Met30, Met30 D1-20aa, or Met30 756 M30/35/36A. Yeast cells harboring the N-terminal deletion of the first twenty amino acids of 757 Met30 (which contain the first three methionine residues) or have the subsequent three methionine 758 residues (M30/35/36) mutated to alanine do not create faster-migrating proteoforms. 759 (E) Met30(D1-20aa) and Met30(M30/35/36A) strains do not exhibit any growth phenotypes in 760 −sulfur glucose media with or without supplemented methionine. There are also no defects in 761 growth rate following repletion of methionine. Data represent mean and SD of technical triplicates. 762 763 Figure S2. Identification of key cysteine residues in Met30 involved specifically in sulfur 764 amino acid sensing. 765 (A) Schematic of Met30 protein architecture and cysteine residue location. 766 (B) Western blot analysis of various Met30 cysteine point mutants and Met4 ubiquitination status 767 in rich and −sulfur media. 768 (C) Western blot analysis of Met30 cysteine redox state and Met4 ubiquitination status in WT and 769 two cysteine to serine mutants, C414S and C614/616/622/630S, following treatment with 500 µM 770 CdCl2. 771 772 Figure S3. SCFMet30-Flag IP/in vitro ubiquitination assay demonstrating the dependence of 773 reducing agent in the IP on SCFMet30 E3 ligase activity. 774 (A) Initial IPs for SCFMet30-Flag complex were performed in the presence of 1 mM DTT prior to 775 Flag peptide elution and concentration. No DTT was used in the in vitro ubiquitination assay itself, 776 yet the E3 ligase activities for the E3 complex were indistinguishable between complex isolated 777 from high sulfur versus low sulfur cells. 778 (B) The same IP/in vitro assay as in (A), with the sole exception that DTT was not included during 779 the IP and wash steps. 780 (C) Silver stains of immunopurified SCFMet30-Flag complex isolated from rich and −sulfur cells 781 prepared in the presence or absence of DTT used in Figures 4B and C. 782 (D) Western blot of Cdc53 amounts from immunopurified SCFMet30-Flag complex shown in S2C 783 and used in Figures 4B and C. We speculate the reduced Cdc53 abundance in the −sulfur, −DTT 784 IP is the result of the canonical regulation of SCF E3 ligases, which causes reduced association 785 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 23 between Skp1/F-box heterodimers to the Cdc53 scaffold when binding between the F-box and its 786 substrate is reduced. 787 788 Figure S4. SCFMet30-Flag IP/in vitro ubiquitination assay using Met30 cysteine point mutants 789 (A) In vitro ubiquitination assays were carried out as described in Figure 4B with cell lysate 790 powder from WT, C414S, and C614/616/622/630S Met30 strains grown in rich media. The heavier 791 loading of the C414S mutant is likely due to a difference in cryomill lysis efficiency, and is not a 792 difference in the amount of starting material used. 793 (B) Met4 binding was assessed in the C414S and C614/616/622/630S mutants as described in 794 Figure 4D using cell lysate powder from cells grown in rich media. The fold change in Met4 795 binding in the presence and absence of DTT was quantified for each strain and for each Met30 796 immunopurification condition using ImageJ (version 1.53). 797 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 24 Table S1. Strains used in this study. 798 BACKGROUND GENOTYPE SOURCE CEN.PK MATa (van Dijken et al., 2000) CEN.PK MATa (van Dijken et al., 2000) CEN.PK MATa; MET30-FLAG::KanMX This study CEN.PK MATa; MET30-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; MET30-FLAG::KanMX MET4-HA::Hyg met6D::Nat This study CEN.PK MATa; MET30-FLAG::KanMX MET4-HA::Hyg str3D::Nat This study CEN.PK MATa; met30::MET30-C414S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C614/616/622/630S- FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30D::Phleo HO::MET30-FLAG::Nat MET4-HA::Hyg This study CEN.PK MATa; met30D::Phleo HO::MET30Daa1-20- FLAG::Nat Met4-HA::Hyg This study CEN.PK MATa; met30D::Phleo HO::MET30-M30/35/36A- FLAG::Nat Met4-HA::Hyg This study CEN.PK MATa; MET30-FLAG::KanMX MET4-HA::Hyg pdr5D::Nat This study CEN.PK MATa; met4D::KanMX MET30-FLAG::Hyg This study CEN.PK MATa; cup1p-6xHis-TEV-UBA1::Hyg This study CEN.PK MATa; met30::MET30-C201S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C374S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C426S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C436S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C439S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C455S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C528S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C544S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C584S-FLAG::KanMX MET4-HA::Hyg This study .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 25 CEN.PK MATa; met30::MET30-C614S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C616S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C584/622S-FLAG::KanMX MET4-HA::Hyg This study CEN.PK MATa; met30::MET30-C630S-FLAG::KanMX MET4-HA::Hyg This study 799 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 26 Table S2. Recipe for sulfur-free media. 800 salts (g L-1) CaCl2•2H2O 0.1 NaCl 0.1 MgCl2•6H2O 0.412 NH4Cl 4.05 KH2PO4 1 metals (mg L-1) boric acid 0.5 CuCl2•2H2O 0.0273 KI 0.1 FeCl3•6H2O 0.2 MnCl2•4H2O 0.4684 Na2MoO4•2H2O 0.2 ZnCl2•H2O 0.1895 vitamins (mg L-1) biotin 0.002 calcium pantothenate 0.4 folic acid 0.002 inositol 2 niacin 0.4 4-aminobenzoic acid 0.2 pyridoxine HCl 0.4 riboflavin 0.2 thiamine-HCl 0.4 801 Recipes are derived from (Miller et al., 2013). 802 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ Met4-HA Rich Met30-Flag Rpn10 75 kDa 100 kDa 150 kDa −Sulfur +Met/Cys/Hcy Time (min) 0 15 60 15 60 A Lactate (respiratory) 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Rpn10 Time (min) Rich −Sulfur 0 15 60 15 60 15 60 0 15 60 15 60 15 60 met6∆ str3∆ D Lactate (respiratory) +Hcy +Met Rich −Sulfur +Cys +Met R -S 15 -S 60 +M CH 15 +M CH 60 0.01 0.1 1 10 100 R el at iv e ab un da nc e Methionine R -S 15 -S 60 +M CH 15 +M CH 60 0.1 1 10 R el at iv e ab un da nc e GSH R -S 15 -S 60 +M CH 15 +M CH 60 0.1 1 10 100 R el at iv e ab un da nc e Cysteine R -S 15 -S 60 +M CH 15 +M CH 60 0.1 1 10 R el at iv e ab un da nc e GSSG R -S 15 -S 60 +M CH 15 +M CH 60 0.1 1 10 100 R el at iv e ab un da nc e Cystathionine R -S 15 -S 60 +M CH 15 +M CH 60 0.01 0.1 1 10 R el at iv e ab un da nc e SAM R -S 15 -S 60 +M CH 15 +M CH 60 0.1 1 10 100 R el at iv e ab un da nc e SAH C SO4 2- homocysteine methionine SAM GSH SAH cystathionine cysteine MET6 STR3 CYS4 STR2CYS3 GSH1 GSH2 SAH1 SAM1 SAM2 E B Figure 1 Ub-Met4-HA Ub-Met4-HA R -S 15 -S 60 +M CH 15 +M CH 60 0 5 10 15 20 25 R el at iv e m R N A E xp re ss io n MET17 R -S 15 -S 60 +M CH 15 +M CH 60 0 1 2 3 SAM1 R el at iv e m R N A E xp re ss io n R -S 15 -S 60 +M CH 15 +M CH 60 0 5 10 15 GSH1 R el at iv e m R N A E xp re ss io n .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ WD 8 WD 7 WD 6 WD 5 WD 4 WD 3 WD 2 WD 1F-Box 95111 164 201 205 211 228 614 616 622 630 236 239 293 374 414 426 436 439 455 528 544 584 640607-635550-578509-538461-499419-449380-408340-368300-328180-227a.a. 1 SCF-Binding Met4-BindingA Met30-Flag B 75 kDa 100 kDa 100 kDa 150 kDa 75 kDa mPEG2K-mal Met30-Flag mPEG2K-mal Rpn10 Met4-HA Rich Rpn10 150 kDa −Sulfur +Met/Cys/Hcy Time (min) 0 15 60 15 60 Lactate (respiratory) Met30-Flag C 75 kDa 100 kDa 100 kDa 150 kDa 75 kDa mPEG2K-mal Met30-Flag mPEG2K-mal Rpn10 Met4-HA +Met Rpn10 150 kDa −Sulfur +Met/Cys/Hcy Time (min) 0 90 180 15 60 Glucose (glycolytic) Met30-Flag D 75 kDa 100 kDa 100 kDa 150 kDa 75 kDa mPEG2K-mal Rpn10 Met4-HA Rich Rpn10 150 kDa −Sulfur +DTT Time (min) 0 15 15 Lactate (respiratory) Figure 2 Ub-Met4-HA Red-Met30 Ox-Met30 Ub-Met4-HA Red-Met30 Ox-Met30 mPEG2K-mal Met30-Flag Red-Met30 Ox-Met30 Ub-Met4-HA Ox Red .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ Met4-HA Met30-Flag Rpn10 75 kDa 100 kDa 150 kDa 100 kDa 150 kDa 75 kDa Lactate (respiratory) mPEG2K-mal Rpn10 Rich −Sulfur +Met/Cys/Hcy Time (min) 0 15 60 15 60 WT Rich −Sulfur +Met/Cys/Hcy 0 15 60 15 60 C414S Rich −Sulfur +Met/Cys/Hcy 0 15 60 15 60 C614/616/622/630S A R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 0 2 4 6 R el at iv e m R N A E xp re ss io n MET17 R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 0 10 20 30 R el at iv e m R N A E xp re ss io n SAM1 R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 R -S 15 -S 60 +M CH 15 +M CH 60 0 5 10 15 GSH1 R el at iv e m R N A E xp re ss io n WT C414S C614/616/ 622/630S B 0.0 1.5 3.0 4.5 6.0 7.5 9.0 0.0 0.5 1.0 1.5 Time (h) in YPL O D 60 0 WT C414S C614/616/622/630S C 0 3 6 9 12 15 18 21 24 0.0 0.1 0.2 0.3 0.4 0.5 Time (h) in SFL + 0.2 mM Hcy after switch O D 60 0 WT C414S C614/616/622/630S Figure 3 Ub-Met4-HA mPEG2K-mal Met30-Flag Red-Met30 Ox-Met30 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Time (min) 60 60 60 0 30 60 180 60 60 60 0 30 60 180 0 30 60 180 + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + + Flag purification Rich SCFMet30-Flag Ubiquitin Met4 +DTT −DTT −DTT/ +TCEP B 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Time (min) 60 60 60 0 30 60 180 60 60 60 0 30 60 180 0 30 60 180 + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + + Flag purification −Sulfur SCFMet30-Flag Ubiquitin Met4 +DTT −DTT −DTT/ +TCEP C A Rich −SulfurRich Rich Switch 50% of cells to −Sulfur media Collect and cryomill cell pellets "Rich" cell lysate powder "−Sulfur" cell lysate powder Met30 IP and in vitro Met4 ubiquitination assay Add IP buffer to Rich and −Sulfur powder Split lysate, IP Met30 and SCF core components +/− DTT +DTT −DTT +DTT −DTT Wash Met30-bound beads, elute and concentrate the Met30 E3 complex, and perform in vitro ubiquitination assays with purified E1 (Uba1), E2 (Cdc34), ubiquitin, and Met4 Met30 IP and in vitro Met4 binding assay Prepare Rich and −Sulfur lysate identically as for the ubiquitination experiment Split lysate, IP Met30 in the presence of DTT, Diamide, or control −DTT+DTT +Diamide Wash Met30-bound beads of unbound Met4, boil beads in sample buffer, and Western blot for Met4 to assess binding Wash Met30-bound beads, split each Met30 IP in half, and incubate beads with purified Met4 +/− DTT +/−DTT +/−DTT +/−DTT −DTT+DTT +Diamide +/−DTT +/−DTT +/−DTT Rich −Sulfur Met30-Flag Met4-HA Met30-Flag IP Met4-HA co-IP +DTT +DTT −DTT +DTT −DTT −DTT +DTT −DTT +Diamide +DTT +DTT −DTT +DTT −DTT −DTT +DTT −DTT +DiamideInput Rich −Sulfur Met4-HA D Figure 4 Ub-Met4-HA Ub-Met4-HA .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ A C dc 53 Low sulfur metabolite levels Hrt1 N8 Ub Ub Skp1 Met4 Met30 Met30 S——S High sulfur metabolite levels E2 Ub Ub Ub SH HS Met31/32 Met genes OFF C dc 53 Hrt1 N8 Skp1 Met4 E2 Ub Met31/32 Met genes ON Met4 Figure 5 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ Time (min) 0 60 120 180 15 45 90 0 60 120 180 15 45 90 0 60 120 180 15 45 90 Time (min) 0 30 60 60 120 120 180 180 Met4-HA Met30-Flag Rpn10 100 kDa 150 kDa − − − + − + − +CHX Met4-HA Met30-Flag Rpn10 100 kDa 150 kDa −Sulfur+Met Time (min) 0 30 60 120 180 15 45 90 Met30-Flag Rpn10 +Met −Sulfur+Met +Met WT 0 30 60 120 180 15 45 90 met4∆ Met4-HA Met30-Flag Rpn10 A C D E 0 2 4 6 8 0 1 2 3 4 Time (h) in +Met Glucose O D 60 0 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Time (h) in −Sulfur Glucose O D 60 0 +Met −Sulfur Glucose (glycolytic) Time (min) 0 180 15 15 30 30 60 60 − − − + − + − +MG132 B +Met +Met Glucose (glycolytic) −Sulfur Glucose (glycolytic) Glucose (glycolytic) −Sulfur+Met +Met WT −Sulfur+Met +Met ∆1-20 −Sulfur+Met +Met M30/35/36A Glucose (glycolytic) Figure S1 100 kDa 150 kDa 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Time (h) in +Met Glucose after switch from −Sulfur Glucose (3h) O D 60 0 WT Δ1-20 M30/35/36A Ub-Met4-HA Ub-Met4-HA .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ WD 8 WD 7 WD 6 WD 5 WD 4 WD 3 WD 2 WD 1F-Box 95111 164 201 205 211 228 614 616 622 630 236 239 293 374 414 426 436 439 455 528 544 584 640607-635550-578509-538461-499419-449380-408340-368300-328180-227a.a. 1 SCF-Binding Met4-BindingA Met30-Flag B 75 kDa 100 kDa 100 kDa 150 kDa 75 kDa Met4-HA R Rpn10 150 kDa −S Time (min) 0 15 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15 WT Ub-Met4-HA Lactate (respiratory) +DTT R −S C201S R −S C374S R −S C414S R −S C426S R −S C436S R −S C439S R −S C455S Met30-Flag Met4-HA R Rpn10 −S Time (min) 0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15 WT Ub-Met4-HA R −S C528S R −S C544S R −S C584S R −S C614S R −S C616S R −S C584/622S R −S C630S Figure S2 C Met4-HA Met30-Flag Rpn10 75 kDa 100 kDa 150 kDa 100 kDa 150 kDa 75 kDa Lactate (respiratory) mPEG2K-mal Rpn10 Rich +Cd Time (min) 0 15 45 90 0 15 45 90 0 15 45 90 WT Rich +Cd C414S Rich +Cd C614/616/622/630S Ub-Met4-HA mPEG2K-mal Met30-Flag Red-Met30 Ox-Met30 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Time (min) 60 60 60 0 15 30 60 180 60 60 60 0 15 30 60 180 + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + + Flag purification +DTT SCFMet30-Flag Ubiquitin Met4 Rich −Sulfur A 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Time (min) 60 60 60 0 15 30 60 180 60 60 60 0 15 30 60 180 + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + + Flag purification −DTT SCFMet30-Flag Ubiquitin Met4 Rich −Sulfur B C +DTT −DTT +DTT −DTT Rich −Sulfur 150 kDa 100 kDa 75 kDa 50 kDa 37 kDa 25 kDa 20 kDa Cdc53 Met30 Skp1 Cdc53 100 kDa +DTT −DTT Rich +DTT −DTT −SulfurD Figure S3 Ub-Met4-HA Ub-Met4-HA .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ Figure S4 75 kDa 100 kDa 150 kDa Met4-HA Met30-Flag Time (min) 0 60 180 0 60 180 0 60 180 0 60 180 0 60 180 0 60 180 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Flag purification Rich SCFMet30-Flag Ubiquitin Met4 +DTT A Ub-Met4-HA +D TT −D TT +D iam id e +D TT −D TT +D iam id e +D TT −D TT +D iam id e 0 2 4 6 M et 4 pu lld ow n (+ D TT /– D TT ) WT C414S C614/616/622/630S Input +DTT −DTT Met4-HA Met30-Flag Met4-HA Met30-Flag IP Met4-HA co-IP WT C414S +DTT +DTT −DTT +DTT −DTT −DTT +DTT −DTT +Diamide +DTT +DTT −DTT +DTT −DTT −DTT +DTT −DTT +Diamide C614/616/622/630S +DTT +DTT −DTT +DTT −DTT −DTT +DTT −DTT +Diamide WT C414S C614/616/622/630S −DTT +DTT −DTT +DTT −DTT B .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.06.425657doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425657 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_07_425621 ---- Molecular dynamics simulations and functional studies reveal that hBD-2 binds SARS-CoV-2 spike RBD and blocks viral entry into ACE2 expressing cells Molecular dynamics simulations and functional studies reveal that hBD-2 binds SARS-CoV-2 spike RBD and blocks viral entry into ACE2 expressing cells Liqun Zhang1,5, Santosh K. Ghosh2,5, Shrikanth C. Basavarajappa3,5, Jeannine Muller-Greven4, Jackson Penfield1, Ann Brewer1, Parameswaran Ramakrishnan3,7, Matthias Buck4,7 and Aaron Weinberg2,6,7 1Chemical Engineering, Tennessee Technological University, Cookeville, TN 38505 2Biological Sciences, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44124 3Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH 44124 4Department of Physiology and Biophysics, School of Medicine, Case Western Reserve University, Cleveland, OH 44124 5contributed equally 6Lead contact 7Correspondence: pxr150@case.edu (PR); mxb150@case.edu (MB); axw47@case.edu (AW) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ ABSTRACT: New approaches to complement vaccination are needed to combat the spread of SARS-CoV-2 and stop COVID-19 related deaths and long-term medical complications. Human beta defensin 2 (hBD-2) is a naturally occurring epithelial cell derived host defense peptide that has antiviral properties. Our comprehensive in-silico studies demonstrate that hBD-2 binds the site on the CoV-2-RBD that docks with the ACE2 receptor. Biophysical and biochemical assays confirm that hBD-2 indeed binds to the CoV-2- receptor binding domain (RBD) (KD ~ 300 nM), preventing it from binding to ACE2 expressing cells. Importantly, hBD-2 shows specificity by blocking CoV-2/spike pseudoviral infection, but not VSV-G mediated infection, of ACE2 expressing human cells with an IC50 of 2.4+ 0.1 µM. These promising findings offer opportunities to develop hBD-2 and/or its derivatives and mimetics to safely and effectively use as novel agents to prevent SARS-CoV-2 infection. Key words: Human beta defensin-2 (hBD-2), ACE2 receptor, receptor binding domain (RBD), SARS- CoV-2, COVID-19 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ INTRODUCTION The ongoing COVID-19 pandemic, the result of infection by SARS-Coronavirus-2 (CoV-2), continues to infect people worldwide; having claimed over 1.75 million lives (Johns’ Hopkins University) as of late December 2020. While the first vaccines are now being administered, albeit initially to a select population, the virus continues to evolve in significant ways. This situation requires the discovery of novel therapeutic approaches, possibly to be used independently or in conjunction with existing approved regimens, to impede the virus’ relentless spread. All coronaviruses, including CoV-2, express the all-important S (Spike) protein that gives these viruses the characteristic corona or crown appearance (Siu et al., 2008; Yoshimoto, 2020). The S protein is responsible for binding to the host cell receptor followed by fusion of the viral and cellular membranes, (Walls et al., 2016). To engage a host cell receptor, the receptor-binding domain (RBD) of the S protein undergoes hinge-like conformational movements that transiently hide or expose its determinants for receptor binding (Wrapp et al., 2020). Structural fluctuations of the RBD, relative to the entire S protein, enable exposure of the receptor-binding motif (RBM), which mediates interaction with the receptor angiotensin-converting enzyme 2 (ACE2) on the host cell (Lan et al., 2020; McCallum et al., 2020; Walls et al., 2020; Yan et al., 2020). Since this is believed to be the critical initial event in the infection cascade, the RBD has been proposed as a potential target for therapeutic strategies (Tai et al., 2020). The high degree of dynamics of the RBD:ACE2 complex (Brielle et al., 2020; Ghorbani et al., 2020; Spinello et al., 2020; Xiong et al., 2020), suggests that binding of small flexible proteins and peptides may inhibit Spike protein:host cell receptor interactions, which can be interrogated by computational modeling and simulations most suitable for exploring these interactions (Amaro and Mulholland, 2020). Nature’s own antimicrobial peptides (AMPs) have been proposed as multifunctional defenses that participate in the elimination of pathogenic microorganisms, including bacteria, fungi, and viruses (Diamond et al., 2009). Exhibiting antimicrobial and immunomodulatory properties, AMPs have been intensively studied as alternatives and/or adjuncts to antibiotics in bacterial infections and have also gained substantial attention as anti-viral agents (Mulder et al., 2013). Human beta defensins (hBDs), the major AMP group expressed naturally in mucosal epithelium, provide a first-line of defense against various infectious pathogens, including enveloped viruses (Leikina et al., 2005; Quiñones-Mateu et al., 2003; Ryan et al., 2011). The hBDs are cationic peptides, which assume small β-sheet structures varying in length from 33 to 47 amino acid residues and which are primarily expressed by epithelial cells (Bensch et al., 1995; Harder et al., 2001; Harder et al., 1997; Schibli et al., 2002). HBD-2 has been shown to express throughout the respiratory epithelium from the oral cavity to the lungs and, it is believed that this defensin plays a very important role in defense against respiratory infections (Diamond et al., 2008). Altered hBD- 2 expression in the respiratory epithelium is known to be associated with the pathogenesis of several respiratory diseases such as asthma, pulmonary fibrosis, pneumonia, tuberculosis and rhinitis, (Diamond et al., 2008; Doss et al., 2010; Ooi et al., 2015; Rivas-Santiago et al., 2005; Semple and Dorin, 2012). HBD-2 has been demonstrated to inhibit human respiratory syncytial virus (RSV) infection by blocking viral entry through destabilization/disintegration of the viral envelope (Kota et al., 2008). It might also have important immunomodulatory roles during coronavirus infection as well, as hBD-2 conjugated to the MERS receptor binding domain (RBD) has been reported in a mouse model to promote better protective antibodies to RBD than RBD alone (Kim et al., 2018). In the present study, we examined the ability of hBD-2 to act as a blocking agent against CoV-2. HBD-2 is an amphipathic, beta-sheeted, highly cationic (+6 charge) molecule of 41 amino acids, and is stabilized by three intramolecular disulfide bonds that protects it from degradation by proteases (Sawai et al., 2001). The protein has been studied before with molecular dynamics simulations, (Yeasmin et al., 2018) (Barros et al., 2020; Ghorbani et al., 2020; Spinello et al., 2020). Through extensive in silico docking .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ and molecular dynamic simulation analyses we report herein that hBD-2 binds to the receptor binding motif (RBM) of the RBD of CoV-2 that associates with the ACE2 receptor. Biophysical and biochemical studies confirmed that hBD-2 binds the RBD and also prevents it from binding ACE2. Moreover, by utilizing a physiologically relevant platform, we revealed that hBD-2 effectively blocks CoV-2 spike expressing pseudovirions from entering ACE2 expressing human cells. Harnessing the utility of naturally occurring AMPs, such as hBD-2, and their derived smaller peptides, could be a viable approach at developing novel CoV-2 therapeutics. RESULTS: Interrogating the interaction of SARS-CoV-2 RBD with ACE2 and hBD-2 using in silico docking and molecular dynamics simulations RBD:ACE2 complex: We began our in silico work by running, as a reference, a 50 ns all-atom molecular dynamics (MD) simulation of the ACE2:RBD complex. The final structure was compared with the initial experimental crystal structure (Lan et al., 2020), as shown in Figure S1A. Only small deviations are seen in some of the loop regions and at the N- and C-termini of both proteins; the overall rms deviation (RMSD), of the structure, calculated for backbone Ca atoms, is around 1.2 Å, for ACE2, around 2.1 Å for the RBD and around 2.4 Å for the complex (Figure S1B) [Supplementary information]. The result of calculating the rms fluctuation (RMSF) for the Ca atom of each residue in the RBD and in ACE2 is shown in Figure 1A (Left and Right). Overall, the main-chain fluctuations in the RBD and ACE2 are small with a magnitude of around 0.6 Å for the most structured, α-helical and β-sheet parts. As can be seen, the loop regions are more flexible, having a higher RMSF of up to 4 Å. The difference in fluctuations between ACE2 and RBD in their bound and free states in solvent are shown in Figure 1B. As is usually expected, most regions at the RBD:ACE2 interaction interface become less flexible (shaded in blue), while other changes, including increases in fluctuations (shaded in red) are seen further away from the interface, consistent with the recent description of allostery in the spike protein (Gross et al., 2020; Ray et al., 2020). Upon complex formation, the RBD and ACE2 proteins form intermolecular hydrogen bonds, which is one of the driving forces for their binding. These bonds, calculated over the course of the 50 ns simulation, are plotted in Figure 1C. In the first 15 ns, the average number of hydrogen bonds fluctuates between 2-7, but settles at a slightly lower number, 2-5, at the end. Importantly, these bonds are highly dynamic with occupancy between 20-40%. Hydrogen bonds with good persistency are listed in the table in Figure 1. ACE2 residues Lys353 and Gly502, Tyr83 and Asn487, Asp30 and Lys417 formed hydrogen bonds with duration of at least 34%. In total, 7 of 9 H-bonds of the RBD:ACE2 interface in the crystal structure (Lan et al., 2020) are populated with reasonable occupancy in the simulations. Similar behavior has been seen in other simulations (Ghorbani et al., 2020; Spinello et al., 2020) with the difference likely explained by solution vs. crystallization conditions. Water molecules were observed at the interface in other simulations and are likely bridging the interactions (Malik et al., 2020), also underscoring the dynamic nature of the interactions (see below). To further indicate the overall stability of the interface in the simulations, we calculated the solvent accessible surface area, which is buried between the RBD and ACE2 proteins in the complex. During 50 ns, this buried surface area fluctuates between a minimum of 750 Å2 to a maximum of 1000 Å2, but this is maintained at an average of ~900 Å2 over the last 25 ns of the trajectory. We also calculated the distance map between ACE2 and RBD atoms, which are closer than 5 Å on average as a reference (see below). RBD:hBD-2 (monomer): In order to explore the initial possible bound structures between the two proteins, we carried out docking with Cluspro and Haddock (see Methods in supplement). The best predicted models were used as starting structures for all-atom MD, as above; however, since the initial docked structures are not well converged, we carried out the simulations for up to 500 ns. We also ran repeat simulations with different starting seeds (initial velocity assignments). The simulations performed are summarized in Table S1 [Supplementary information]. In Figure 2A we present the most converged and apparently stable trajectory, showing the initial structure when compared to the last structure (after .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ 500 ns). Slight rotation of hBD-2 relative to the initial structure is indicated at 75 ns by the transition of RMSD when plotted as a function of simulation time (Figure 2B); however, for the remaining 427 ns, hBD- 2 stayed in the same position. Analysis of the other three trajectories is provided in Figure S2. The comparison of main-chain fluctuations in RBD and hBD-2, between their bound and free states is shown in Figure 3A. Overall, the binding region becomes less flexible on the RBD in similar key loop regions whose dynamics are dampened by ACE2 binding, while on the side of hBD-2 a significant number of main-chain sites also see their fluctuations decreased. The results are mapped to the final structure of the trajectory in Figure 3B. As above, we calculated the number of intermolecular hydrogen bonds formed between hBD-2 and RBD over the course of the trajectory (Figure 3C). They are fewer, with an average 4 ± 1, compared to those bridging the RBD:ACE2 complex. Similarly, with the exception of the hBD-2 residue Arg23, which forms a hydrogen bond with the ACE2 residue Glu484 greater than 50% of the time, the occupancy of other hydrogen bonds is reduced compared to the reference complex. As before, the occupancy of these interactions is not 100%; i.e., more like 30%, suggesting that they are somewhat dynamic (see discussion below) and are accompanied by indirect H-bond interactions with water molecules near or at the interface bridging the interactions (Malik et al., 2020). Both of these features were also found in simulations of the RBD:ACE2 interaction, as already noted; however, the dynamics of these interactions appear to be more prevalent in the RBD:hBD-2 interaction. As might be expected for the cationic hBD-2, the positively charged sidechains are a prominent feature in the interactions, especially Arg22 and Arg23. The RBD residues most persistently involved in the interaction with hBD-2 are shown in the table of Figure 3. With the exception of Gln498, the interaction between hBD-2 and RBD involves amino acids that are within a few residues of those that are involved between ACE2 and RBD and cover a good proportion of the same interface area. The persistency of the complex is also confirmed in the changes in accessible surface area, which is buried between the two proteins, and fluctuates moderately around a value of 700 ± 150 Å2. The value is smaller than that of ACE2 (900 Å2), indicating that less area is covered. This is expected since the hBD-2 protein is considerably smaller than the RBD. A distance map, comparing residues which are on average closer than 5 Å in the RBD:ACE2 and RBD:hBD-2 complexes is shown in Figure 4. For the RBD:ACE2 interaction (Figure 4A), residues 20 to 45, 75 to 85, as well as a short stretch of residues around 327, 355 and 387 on ACE2 bind with the RBD, whose binding interface ranges from residue 445 to 505. Some of the RBD residues are in loop regions; e.g., 404 and 417, which also come close to ACE2 over the course of the 50 ns simulation. The contact analysis for the RBD:hBD-2 complex over the course of the 500 ns simulation is shown in Figure 4B. Remarkably, in comparison with the RBD:ACE2 complex, essentially all residues of the RBD which contact ACE2, either the same ones or their close neighbors, are also in contact with hBD-2. However, there are some subtle shifts. For example, RBD residues 475- 478 make contact with ACE2 but not with hBD-2, where these interactions may have shifted to residue 473. Also, a regional area of residues 438-444 contacts hBD-2, which is not seen with ACE2. These contacts may be absent in the RBD:ACE2 complex because it is less dynamic, and only sampled for 50ns. Alternatively, they may provide a mechanistic entry for hBD-2 in replacing/competing away ACE2 from the spike trimer. In order to confirm consistent binding of hBD-2 to RBD, we started simulations from the same initial structure of Figure 2 and repeated the simulation for three more times, each with a different random seeds (simulation details are shown in Table S1, and results are shown in Figure S2). These simulations are consistent with the results above in terms of the RMSD and the average surface area buried. The change in fluctuations in forming the complex varied. The average numbers of hydrogen bonds, around 2 ± 1 at any one time, are slightly less; however, as stated above, Arg22 and Arg23 are the major residues on hBD-2 contributing to the formation of hydrogen bonds. RBD:hBD-2 (dimer): Although the affinity of hBD-2 for dimerization is modest (Hoover et al., 2000), it is possible that binding to the RBD stabilizes the dimeric form. We, therefore, also docked the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ hBD-2 dimer to the RBD and carried out simulations. The initial and final structure comparison of one of the simulations is shown in Figure S3 A and the RMSD is plotted as a function of simulation time (Figure S3 B). The comparison of the RMSF of the hBD-2 dimer and RBD in their bound state with their fluctuations in their free state is shown as well (Figure S3). Intriguingly, when compared to hBD-2 monomer binding, a few regions of the RBD do not diminish as much in flexibility, while some actually become more flexible. The buried surface accessible area is slightly larger (about 10% larger) for the dimer compared to monomer binding, confirming that interactions to both units of the dimer from the RBD exist. The distance map is given in Figure S4. As shown, both units of the hBD-2 dimer can bind with the RBD in the residue range of 445 to 500. Mostly dimer associated residues from 17 to 24 are in tight and close contact with the RBD. The number of hydrogen bonds formed between the hBD-2 dimer and the RBD are similar to those formed by the monomer and the RBD (Figure 3C and Figure S4A). Again the hBD-2 Arg23 is the most prominent interacting residue. In fact, unit 1 of the dimer can form more hydrogen bonds with the RBD, and also one hydrogen bond from unit 2 is prominent, again involving its Arg23, this time to Glu406 on RBD, which is outside the region typically interacting with ACE2. Remarkably, the persistency of the hydrogen bonds is increased from ~ 30% in the monomer to ~50% in the dimer (shown in Figure S5), suggesting overall that binding of a dimeric hBD-2 may be favorable. RBD:hBD-2-interaction energy calculation Due to the caveats associated with calculations of free energy estimations from trajectories such as the ones run for this study, we carried out the binding interaction energy calculation for RBD binding with ACE2 and hBD-2 monomer/dimer, respectively, using the popular GBSA method (see Materials & Methods section). We report the average energies and standard deviations as a histogram in Figure S6. These interaction energies have similar values and all are slightly negative. Comparing the binding energy between RBD with ACE2 and with hBD-2 monomer/dimer, the average binding energy of the RBD with ACE2 is -37 ± 8 kcal/mol whereas average binding energy of RBD with hBD-2 dimer is -34 ± 8 kcal/mol, and similarly for RBD binding with hBD-2 monomer. However, it is likely that the entropy change upon binding RBD is significantly more favorable for binding to hBD-2 than binding to ACE2 since the former is more dynamic in the bound state, giving less of an entropy penalty upon binding. In fact this latter indication suggests that peptides, which are initially unstructured in the unbound state could also maintain considerable flexibility in the bound state and may thus be powerful antagonists of the RBD:ACE2 interaction. Detailed thermodynamics analyses, both experimental and computational are needed to clarify this point. Irrespective of these estimated numerical values, the calculations suggest that hBD-2 at a sufficiently high concentration should be able to block the binding of RBD with ACE2. Our experimental analysis with RBD:hBD-2 interactions using purified proteins and the spike-pseudovirion assay suggests such a concentration is likely to be in the vicinity of the IC50 of 2.4 µM. Experimental studies confirming the binding of hBD-2 with the RBD We used multiple experimental approaches to confirm the in silico findings of hBD-2 and SARS-CoV-2 RBD binding. Microscale Thermophoresis (MST) showed that CoV-2 RBD interacts with recombinant hBD-2 (rhBD-2) with a dissociation constant of ~300 nM (Figure 5A). This interaction is weaker (> 3 μM) when hBD-2 loses its natural conformation under disulfide bond reducing conditions (Figure 5A). We then followed up using a functional ELISA assay, and found that rhBD-2 bound to immobilized RBD in a linear range (over concentrations of 1.5 to 100 nM), as detected by biotinylated anti-hBD-2 detection antibodies (Figure 5B). We then examined the binding of rhBD2 and recombinant histidine tagged-RBD (His-RBD) derived from our expression system for codon optimized CoV-2 RBD (see Materials and Methods) by co- immunoprecipitation. By incubating rhBD-2 with His-RBD at a ratio of 1.5:1.0, followed by nickel bead immunoprecipitation of His-RBD and probing for hBD-2 in Western blots, we found significant binding of hBD-2 to His-RBD (Figure 5C). Control Western blots showed only modest background binding of hBD- 2 in the absence of RBD, thereby confirming the specificity of the RBD:hBD-2 interaction (Figure 5C). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ HBD-2 blocks the binding of RBD with cellular ACE2 Next, we examined whether rhBD-2 can interfere with the binding of RBD to the host ACE2 receptor. We utilized HEK 293T cells that overexpress the human ACE2 receptor in the assays and incubated these cells with FLAG-RBD containing culture supernatant with and without rhBD-2. We immunoprecipitated RBD through the FLAG tag and examined the co-precipitation of ACE2. We found that FLAG-RBD effectively precipitated ACE2 and the addition of hBD-2 competitively decreased RBD-ACE2 binding (Figure 5D). RBD levels were also decreased in the immunoprecipitate upon rhBD-2 addition, further suggesting a direct interaction of RBD with rhBD-2, thereby preventing RBD-ACE2 binding (Figure 5D). HBD-2 specifically inhibits SARS-COV-2 spike-mediated pseudoviral infection After discovering that rhBD-2 binds RBD and competitively inhibits RBD binding to ACE2, we investigated whether rhBD-2 can inhibit spike mediated pseudoviral entry into ACE2 expressing cells. A luciferase reporter expressing CoV-2 spike-dependent lentiviral system (Crawford et al., 2020) was used to study the competitive inhibitory effects of rhBD-2 on CoV-2 spike-mediated infection. We infected ACE2 expressing HEK 293T cells using the pseudotyped virus and found substantial luciferase activity in a viral dose dependent manner (Figure 6A). Next, we studied the effect of rhBD-2 on spike-dependent viral infection of ACE2/HEK 293T cells by luciferase activity and found that hBD-2 decreased the spike mediated pseudoviral infection (Figure 6B and 6C). To further validate that the inhibitory effect of hBD-2 is specific to a spike-mediated infection, we used a virus pseudotyped with vesicular stomatitis virus glycoprotein (VSVG) as an independent control. Viruses pseudotyped with VSVG are pantropic; i.e., they can infect all cell types (Lever et al., 2004), and do not depend on ACE2 for entry. We obtained significant infection of ACE2/HEK 293T cells using VSVG pseudotyped virus without or with the addition of rhBD-2 (Figure 6D and 6E), thereby demonstrating the specificity of hBD-2 in blocking CoV-2 spike glycoprotein mediated infection of ACE2 expressing cells. We then inquired if increased inhibition of spike mediated pseudoviral entry was directly proportional to increased concentration of hBD-2. We discovered that indeed there was a clear hBD-2 dose response inhibition of pseudoviral entry (Figure 6F and 6G), and that the inhibitory concentration50 (IC50) was approximately 2.4 ± 0.1 µM (Figure 6H). At a concentration of 15 µg/ml rhBD-2 decreased the spike- mediated pseudoviral infection by over 80% (Figure 6H). DISCUSSION The human body expresses over a hundred AMPs that are found in either intracellular granules of professional phagocytes and/or in epithelial cells of mucosa lining our external and internal surfaces (Dawgul et al., 2016). Beta defensins and LL-37, the only member of the cathelin AMPs expressed in humans, are localized to the mucosa of the oral cavity, nares and upper airway (Diamond and Ryan, 2011; Ghosh et al., 2007; Khurshid et al., 2017; Lee et al., 2002; Mathews et al., 1999; Singh et al., 1998); i.e., sites deemed vulnerable to CoV-2 entry and initial infection. Indeed, these two types of AMPs, part of the epithelial cell’s arsenal of innate responses used to defend against viral challenges at mucosal sites, have been shown to interrupt viral infection of various viruses, including coronaviruses (Kim et al., 2018). However, when a mucosal site becomes overwhelmed by a microbial threat, replenishing the AMP armamentarium locally after initial release; i.e., time from transcriptional activation, translation, post- translational modification to rerelease, takes multiple hours and makes bystander cells more vulnerable to viral infection. Moreover, if a microbial threat can inhibit production or release of these AMPs, it renders this innate defense useless. To overcome this, the AMPs or their mimetics, if administered exogenously in high enough concentrations, could be a sound therapeutic strategy to protect the host at vulnerable mucosal sites without eliciting an unwanted immunological response against the agent. Interestingly, these same AMPs have been shown to be released by human mesenchymal stem cells (hMSCs) (Krasnodembskaya et al., 2010; Sutton et al., 2016), recently repurposed to treat COVID-19 patients. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ While hMSCs have been shown to contribute to the recovery of severely ill CoV-2 infected patients (Moll et al., 2020; Tsuchiya et al., 2020), the role that AMPs play and the mechanism by which hMSCs ameliorate symptoms of COVID-19 remains to be determined. However, the modulation of severe inflammation and microbicidal activity related to pulmonary disease are outcomes attributable to these AMPs (Alcayaga-Miranda et al., 2017; Chow et al., 2020; Krasnodembskaya et al., 2010; Sutton et al., 2016) . We chose to interrogate hBD-2 for its ability to block CoV-2 from infecting vulnerable cells because of its innate role in protecting the oral cavity and the upper airway, and because its mouse ortholog has been shown to inhibit other coronaviruses (Zhao et al., 2016). The computer simulations that we ran of hBD-2 and the RBD showed remarkable stability of the complex even after 500 ns. There was also a clear overlap of binding sites when compared to the RBD:ACE2 complex, as verified by analysis of protein- protein residue contact distance maps. Multiple methods involving MST, ELISA and immunoprecipitation followed by western blotting independently verified that hBD-2 binds to the RBD, thereby validating our in silico data. Competitive inhibition assays were able to show that hBD-2 reduced RBD:ACE2 binding by removing RBD from solution, which would otherwise be available for binding ACE2. Finally, by incorporating a luciferase reporter expressing CoV-2 spike-dependent lentiviral system (Lever et al., 2004), we demonstrated that hBD-2 inhibited viral entry into ACE2 expressing HEK 293T cells in a dose dependent manner, with an IC50 of ~2.4 µM. This concentration is much less than most other inhibitory concentrations attributed to hBD-2 antimicrobial activity (Joly et al., 2004) and points to a favorable affinity, and possibly also avidity, of the interaction between hBD-2 and the RBD. Interestingly, hBD-2 begins to show hemolytic activity at a concentration 30 times greater (70 µM) than our IC50 (Koeninger et al., 2020), and shows no signs of cytotoxic effects against various other human cells (Warnke et al., 2013) at over twice our IC50 (Herrera et al., 2016; Mi et al., 2018; Sakamoto et al., 2005). This suggests a favorable therapeutic window for hBD-2 before unacceptable toxicity becomes an issue. Clearly, next steps in conclusively showing the efficacy of hBD-2 against CoV-2 would be to conduct live viral in vitro infections of ACE2 expressing cells in a BSL3 facility followed by in vivo CoV-2 infection studies in appropriate animal models (Kim et al., 2020). In vivo application of hBD-2 has proven successful in addressing a number of diseases. This includes a recent study demonstrating efficacy in experimental colitis in a mouse model (Koeninger et al., 2020) and therapeutic intranasal application of hBD-2 to reduce the influx of inflammatory cells into bronchoalveolar lavage fluid (Pinkerton et al., 2020). Of relevance to our study is the use of smaller hBD fragments; i.e., mimetics, of mouse beta defensin 4 (mBD-4) (Zhao et al., 2016), the ortholog of hBD-2, that when administered intra-nasally, rescued 100% of mice from the lethal challenge of human and avian influenza A, SARS-CoV and MERS-CoV (LeMessurier et al., 2016). Therefore, should in vivo studies of hBD-2 prove to be successful in blocking live CoV-2 infection in an animal model, the fact that the peptide is endogenous to humans and would not elicit an immunogenic response, give it a high probability of being safe and a quicker route to human clinical trials. In fact, several AMPs, as well as AMP mimetics are currently undergoing clinical trials for multiple different diseases (Mookherjee et al., 2020). A recent in silico molecular docking study predicted a strong binding interaction between LL-37 and the RBD, demonstrating the blocking potential of LL-37 for ACE2 binding (Lokhande, 2020). This was followed up by a surface plasmon resonance study confirming the simulation results (Roth et al., 2020). Since LL-37 has also been shown to possess antiviral activity (Tripathi et al., 2015), these results support the idea that more than one AMP could be utilized, possibly in a “cocktail” to act as a potent viral blocking agent. Recent findings also highlight that neuropilin-1 (NRP1), a receptor involved in multiple physiological processes and expressed on many cell types (Roy et al., 2017), is being utilized by CoV-2 to facilitate entry and infection (Cantuti-Castelvetri et al., 2020; Daly et al., 2020). Time will tell if blocking ACE2 alone will be enough to reduce CoV-2 infection and/or reduce the severity of symptoms, or if an additional strategy of also blocking entry via NRP1 will be required. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Not unexpectedly, CoV-2 is mutating, albeit at a relatively slower rate than influenza viruses; i.e., two to six fold slower over a given time frame (Manzanares-Meza and Medina-Contreras, 2020). Ongoing studies indicate that it has developed a number of mutations of which 89 have been associated with the RBD (Chen et al., 2020; Wang et al., 2020). Furthermore, 52 out of 89 mutations are in the receptor- binding motif (RBM), i.e., the region of RBD that is in direct contact with ACE2, indicating that the virus may be accumulating mutations in that region to improve its interaction with ACE2 (Li et al., 2020). Fortunately, while these and other mutations appear to have evolved for greater transmissibility, they have not resulted in greater pathogenicity. The variant that has recently received much attention is “VUI- 202012/01,” the one first reported in southeast England that presents with multiple amino acid changes to the spike protein (https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars- cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563) While not confirmed yet in animal experiments, early reports suggest that it may be >50% more infectious than the parent strain. Of particular importance to us is the asparagine to tyrosine conversion in position 501 (N501Y), as this is one of the contact residues within the RBM that plays a role in binding to ACE2. As shown in Figure S7, the bound hBD-2 monomer and dimer are on average not close to the sidechain site of residue 501 (> 8A between nearest atoms). Furthermore, compared to the ring-ring (pi-pi) contact between residue side- chains, which is highly probable between the UK-mutant RBD and ACE2, stabilizing the interaction as shown by a deep mutagenesis study with the N501Y mutation enhancing binding (Starr et al., 2020), neither a sidechain ring or positively charged sidechain of hBD-2 appears to come near in our models of its complex with the (original) RBD. At the same time it should be noted that the interaction of the RBD with ACE2, and especially with hBD-2, is considerably dynamic (Zhang et al., 2016; Zhang and Buck, 2017). Although this has not yet been measured in the RBD:ACE2 or RBD:hBD-2 platforms, the entropy of the interaction is likely to be not as unfavorable as seen in complexes where one or both partner proteins have to become significantly rigid. It is now becoming clear that many protein-protein complexes are inherently dynamic (Zhang et al., 2016; Zhang and Buck, 2017), thus minimizing the unfavorable entropy change that would otherwise occur on binding. This is especially important for the binding of peptides, which may be relatively unstructured in solution and suggests that design of hBD-2 and LL-37 derived peptides would be a fruitful endeavor. While vaccines against SARS-CoV-2 have recently been approved by the FDA and are planned for distribution and administration in a large scale to cover most of the American population over the next year, we see the AMP strategy as complementary to vaccines. While the CoV-2 vaccines appear to show >90% efficacy, there will certainly be some degree of morbidity and mortality, as seen in all vaccines (Kaselitz et al., 2019), many people will refuse vaccination (Pogue et al., 2020; Schwarzinger et al., 2010) and a significant number will either fail to mount effective neutralizing antibodies or high enough titers (Goodwin et al., 2006; Ndifon et al., 2009; Ovsyannikova et al., 2017). Many of these low or non- responders are predicted to be in the COVID-19 high-risk population. Additionally, vaccines more than likely will provide protection for a limited amount of time, as neutralizing antibodies wane, and many people could face reinfection. Because of the multiple advantages of using small peptides like hBD-2 and their derived smaller mimetics, such as high specificity, low toxicity, lack of immunogenicity, low cost of production and ease of administration, they possess the potential for both safety and efficacy. Molecules such as hBD-2 could be delivered, in the future, intra-orally and/or intra-nasally as prophylactic aerosols, in early stages of infection, when telltale symptoms appear and in combinatorial therapeutic approaches for more severe situations. ACKNOWLEDGEMENTS: We thank Energy Center (CESR) of Tennessee Technological University for partially supporting graduate student Jackson Penfield and the pilot fund from Drs. Weinberg and Buck for undergraduate student Ann Brewer. The simulations were mainly done on Ohio Supercomputer Center Pitzer machines, and partly on high performance computers in Tennessee Technological University. We thank Dr. Jesse Bloom, Fred Hutchinson Cancer Center for kindly proving the plasmids to generate Spike pseudovirus and HEK .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ 293T cells expressing ACE2 receptor, and Dr. Parvesh Shrestha of the Buck lab, for help with MST experiments. Dr. Buck is currently funded by NIH R01 grant from the National Eye Institute R01EY029169 and his part of the project was also supported by pilot grant from the Department of Physiology and Biophysics of Case Western Reserve University. Dr. Ramakrishnan is supported by NIH/NIAID grants R01AI116730 and R21AI144264, NIH/NCI grant R21 CA246194 and a pilot funding from NORD Family Foundation for COVID related research. Dr. Weinberg was supported by pilot funds from the Department of Biological Sciences of the School of Dental Medicine, CWRU. AUTHOR CONTRIBUTIONS: Conceptualization, LZ, SKG, PR, MB, AW. Methodology, LZ, SKG, PR, MB. Investigation, LZ, SCB, JM, JP, AB, SKG, PR. Writing – Original Draft, LZ, SKG, PR, AW. Writing – Review & Editing, SKG, PR, LZ, MB, AW. Visualization, SKG, MB, PR, AW. Supervision, LZ, PR, MB, AW, Project Administration, AW. Funding Acquisition, LZ, MB, PR, AW. DECLARATION OF INTERESTS: None to declare METHODS: Resource availability Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Aaron Weinberg (axw47@case.edu). Materials Availability This study did not generate new unique reagents. Cells HEK 293T and HEK 293T cells stably expressing ACE2 receptor (ACE2 HEK293T) were cultured in DMEM media containing 10% FBS, 100 U/ml penicillin/streptomycin and 4 mM L-Glutamine. Plasmids pHAGE-CMV-Luc2-IRES-ZSgreen-W, HDM-HgPM2, HDM-tat1b, pRC-CMV-Rev1b, and SARS-CoV-2 Spike-ALAYT plasmids were previously described (Crawford et al., 2020). FLAG and HIS tagged RBD were expressed from a pcDNA3 vector with leader sequence and leucine zipper as previously described (Ramakrishnan et al., 2004). Structure information The structure of human beta defensin 2 (hBD-2) in the monomer and dimer form is available in the PDP with ID 1FD3 (Hoover et al., 2000). The hBD-2 sequence is 41 residues long: GIGDPVTCLKSGAICHPVFCPRRYKQIGTCGLPGTKCCKKP. The five boldened residues were found to form hydrogen bonds with the RBD during the simulations (see below/main paper). The structure of the RBD domain of the Spike protein is also available in complex with ACE2 at 2.45 Å resolution in the PDB with ID 6M0J (Lan et al., 2020) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ METHOD DETAILS Docking and all-atom simulations Two kinds of docking programs were applied; one was Cluspro (Kozakov et al., 2006; Kozakov et al., 2017; Porter et al., 2017; Vajda et al., 2017), while the other was HADDOCK (Dominguez et al., 2003; van Zundert et al., 2016). The x-ray structures of hBD-2 and of the SARS-CoV-2 S-protein RBD were uploaded to the Cluspro docking webserver without additional preparation. The best docked structures were clustered, with most of them showing that the hBD-2 binds to the RBD at sites used for the association between ACE2 and RBD. The best structure was selected based on the docking programs’ score and the predicted binding sites between hBD-2 and RBD. Cluspro is a rigid body protein docking method. It is based on a Fast Fourier Transform correlation approach, which makes it feasible to generate and evaluate billions of docked conformations by simple scoring functions as shown in Equation (1). It is an implementation of a multistage protocol: rigid body docking (used PIPER), an energy based filtering, ranking the retained structures based on clustering properties, and finally, the refinement of a limited number of structures by energy minimization. In the Cluspro docking, the PIPER interaction energy is calculated using the following equation: E=0.40Erep-0.40Eatt+600Eelec+1.00EDARS (1) Here, Erep and Eatt are contributions of the van der Waals interaction energy, and Eelec is an electrostatic energy term. EDARS is a pairwise structure-based potential constructed by the Decoys as the Reference State (DARS) method (Chuang et al., 2008). It primarily represents a desolvation contribution, i.e., the free energy change due to the removal of the water molecules from the interface (Kozakov et al., 2006). Since in the PIPER calculation, the entropic term was not included in Cluspro docking, the PIPER energy result should not be used to rank clusters. Instead, the population of clusters was applied to rank the clusters. In our simulations, the RBD:hBD-2 complex structure from the top cluster was taken and continued with all-atom molecular dynamics simulations. In the HADDOCK docking, since the binding interface between the ACE2 receptor and RBD are known, residues from 400 to 520 on the RBD were selected as the target binding sites, while the entire hBD-2 peptide taken as a potential binding site. Default values for all other parameters were applied. After that, the best 5 structures, by HADDOCK scoring, were selected. Based on the best 6 (including above 5 from HADDOCK and 1 from Cluspro docking) structures predicted above, all-atom molecular dynamics simulations were set up using the CHARMM36m (Huang et al., 2017) forcefield and VMD program (Humphrey et al., 1996). One of the deprotonated states of histidine was used (denoted HSD), and the native disulfide bonding in the hBD-2 was set up. After solvating the protein with an equilibrated box of TIP3P water molecules, the closest distance between atoms on the proteins and the edge of simulation box is 12 Å. The equivalent of 0.15 M in Na and Cl ions was added into the box plus several ions to neutralize the net charge of the system. The desired temperature is 310 K and pressure is 1 atm, using standard thermo- and barostats. After a brief energy minimization using the conjugate gradient and line search algorithm, 4 ps of dynamics was run at 50 K, and then the system was brought up to 310 K over an equilibration period of 1 ns using NAMD program version 2.12 (Phillips et al., 2005). This was followed by trajectories that continued for up to 200 or 500 ns at 1 atm and 310 K using the NPT ensemble. As a comparison, we also simulated the RBD bound with ACE2 using the structures from (Lan et al., 2020) and the same method as above. HBD-2 can also form a non-covalent dimer at high concentration in solution (Hoover et al., 2000) (with PDB ID of 1FD3). The initial bound structure of the hBD-2 dimer .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ with the RBD was predicted using targeted HADDOCK docking. The best structure predicted was used in all-atom MD simulations as detailed above. The simulation systems, set up, the number of atoms and box size information are shown in Table S1. To analyze the trajectories, the Root Mean Square Deviation (RMSD) and Fluctuations (RMSF) of the proteins were calculated using the VMD program and an in-house analysis script based on the coordinates of the backbone Ca atoms after aligning the trajectories respectively, to the original crystal structure of the RBD, hBD-2, and to the initial complex structure of the RBD and hBD-2 predicted from docking. The buried surface area (BSA) for the complex was calculated in two steps using the VMD program and a script using the Richards and Lee method with the water probe size of 1.4 Å (Lee and Richards, 1971). First, the total solvent accessible surface area of the complex (ASAcomplex) was calculated based on the complex’s trajectory. Second, the accessible surface area of each protein in the complex (ASArbd, ASAhbd2) was calculated for each protein individually. Then, the BSA is calculated using Equation (2): BSA=0.5*(ASArbd + ASAhbd2 – ASAcomplex) (2) The number of hydrogen bonds between the RBD and ACE2 or the RBD and hBD-2 were calculated using the VMD program with the heavy atom distance cutoff of 3.0 Å and the angle cutoff of 20 degrees deviation from H-bond linearity. The time a particular H-bond is formed over the course of the simulation is monitored and is expressed as % occupancy. In order to find out the residues on the binding interface, the closest distance between every residue atom (including hydrogen) between the RBD and hBD-2 was calculated and averaged over the trajectory run. The average distances between each residue on RBD and on hBD-2 are shaded by proximity on a red to white color-scale and were used to build the distance maps. Furthermore, based on the long term simulation trajectories of the complexes of Supplementary Table S1, the total pairwise interaction energy was calculated using the MM-GBSA method (Genheden and Ryde, 2015) by applying NAMD and the NAMD energy plugin of the VMD program(Humphrey et al., 1996). This interaction energy ( E_binding ) is calculated using Equation (3): E_binding=-- (3) E_complex is the potential energy of protein-ligand complex, E_protein is the potential energy of protein, and E_ligand is the potential energy of ligand. < > is the ensemble average over simulation time. In the MM-GBSA method, the solvent effect was counted using the generalized Born implicit solvent model (GBIS)(Tanner et al., 2011). Measurement RBD:ACE2 association in vitro Untagged hBD-2 and N-terminally His-tagged RBD were purchased from Peprotech, Inc. and Raybiotech Inc., respectively. Binding experiments were carried out with a Monolith NT.115 Microscale Thermophoresis (MST) instrument (NanoTemper, Inc.) at room temperature in pH 7.1 phosphate buffer saline with 0.1% Tween-20 (PBS-T 0.1%). The RBD was labeled using the NanoTemper Monolith HIS- Tag Labeling Kit RED-trisNTA which labels His-tags with a fluorescent group. 40 nM of this RBD was mixed with a serial dilution of unlabeled hBD-2 in 0.2 mL micro reaction tubes (NanoTemper, Inc.) and then transferred to premium capillaries (NanoTemper, Inc.). The experiment was done with a triplicate set of tubes. Microscale thermophoresis monitors the change of the diffusion of proteins/peptides in microscopic temperature gradients upon protein binding. The dissociation constant Kd was obtained by fitting the binding curve with the quadratic solution for the fraction of fluorescent molecules that formed the complex between proteins A and T, calculated from the law of mass action KD = [A]*[T]/[AT] where [A] is the concentration of free fluorescent molecule and [T] the concentration of free titrant and [AT] the concentration of complex of A and T. We also carried out the experiment with a labeled RBD as well as .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ hBD-2 sample which had its disulphide bonds reduced by addition of 2.0 mM DTT, showing that disulphide bonds are essential for maintaining the folded structures (Hati and Bhattacharyya, 2020) and that these are required for the reasonably strong protein-protein interactions (Figure 5A). ELISA based assay 100 µl of rhBD-2 (Peprotech, Inc.) (concentration as indicated in Figure 5B) in assay diluent buffer 2 (R&D system), were incubated in an RBD coated plate (Ray biotech, Inc.) at 40C for 18 hrs. Plates were then washed 4 times with 300 µl of wash buffer (R&D Systems, Inc.) followed by incubation with 100 µl of biotinylated anti hBD-2 (Peprotech, Inc.)[ 0.1µg/ml] for 1 hr. Plates were then washed again as stated above, incubated with 100µl of Streptavidin-HRP (R&D system, Inc.) for 20 minutes. Signal was developed using TMB substrate and measured at 450nm using a microplate reader. Immunoprecipitation and Western blotting To study interaction between hBD-2 and His-RBD, recombinant hBD-2 (Peprotech, Inc.) with or without recombinant HIS-tagged-RBD (Sino Biologicals, Inc.) were pre-incubated at room temperature for 1 h in binding buffer (30mM HEPES pH 7.6, 5mM MgCl2, 150mM NaCl, 0.5mM dithiothreitol, 1 % Triton X-100 and 1mM EDTA) and then incubated with washed Ni-NTA agarose resin beads (25 µl) overnight at 4°C. Beads were collected by centrifugation at 1000 rpm for 1 min and washed thrice with binding buffer. Beads were boiled with 30 µl of Laemmli sample buffer and were analyzed by Western blotting (WB). Briefly, samples were separated on 20% SDS-Polyacrylamide gels and proteins were then transferred to nitrocellulose membrane (0.2 µm pore size) at 70V for 40 min in cold. Membranes were blocked with 5% milk in TBST and then probed with goat anti-human BD2 antibody (0.2 µg/ml; Peprotech), followed by secondary antibody (1:5000) at room temperature, and visualized by enhanced chemiluminescence. To study the ability of hBD2 to compete with RBD binding to ACE2, ACE2 HEK 293T cells were seeded in 6 cm plates. At 50% confluency, media was replaced with conditioned media from HEK 293T cells transfected with secreted FLAG RBD plasmid or control media in the presence or absence of hBD2 (1.0 and 3.0 µg/ml) and incubated at 37°C for 30 min. Cells were washed and collected in PBS-EDTA solution and then lysed in Triton lysis buffer. Lysates were centrifuged at 12000 g for 10 min at 4°C, and immunoprecipitated using M2 FLAG beads (Sigma) for 2 hours at 4°C. Beads were collected, washed, and boiled with Laemmli sample buffer and analyzed by Western blotting. CoV-2 spike-pseudotyped luciferase assay Pseudotyped SARS-CoV-2 spike virus was generation and luciferase assay was carried out as described previously (Crawford et al., 2020). Briefly, HEK 293T cells were transfected with luciferase-IRES- ZSgreen, HDM-HgPM2, HDM-tat1b, PRC-CMV-Rev1b, and SARS-CoV-2 Spike-ALAYT plasmids as described (Crawford et al., 2020) Culture supernatants were harvested 48 hours after transfection and used to infect ACE2 HEK293T cells. To study the effect of hBD2 on spike pseudotyped virus entry, ACE2 HEK 293T cells were incubated with pseudovirions and varying concentration of HBD2 (0-15 µg/ml) for 48 hours. Cells were lysed and luminescence was measured using luciferase assay system following manufacturer’s instructions (Promega, Inc.) in Spectramax i3 microplate detection platform (Molecular Devices, Inc.). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 1. Molecular dynamics simulations of RBD:ACE2 (as a reference) show protein complex is stable. (A) RMSF of RBD (left) and ACE2 (right) in the complex over 50 ns in comparison with values for the unbound (free) proteins; the secondary structure of ACE2 and RBD are indicated. (B) Difference in RMSF between bound and free proteins. The data are mapped to the cartoon representation of the complex with color bar (Bottom) indicating the range of -0.5 Å (in blue) to 0.5 Å (in red) (C) Number of hydrogen bonds for the RBD bound to ACE2 over the course of the simulation. (D) Table of most prominent h-bonds and their occupancy .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 2. Cartoon representation of RBD:hBD-2. (A) Comparison of the initial and last structure after 500 ns simulation (shown in cyan for hBD-2 and green for RBD and shown in magenta for hBD-2 and raspberry for RBD respectively) after 500 ns all- atom MD simulations for the RBD:hBD2 complex (B) RMSD of proteins in the complex and of the complex itself. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 3. The RBD and hBD-2 proteins retain considerable dynamics as a complex. (A) RMSF of RBD (left) and hBD-2 (right) in the complex over 500 ns in comparison with values for the unbound (free) proteins; the secondary structure of ACE2 and RBD are indicated (B) Difference in RMSF between bound and free proteins. The data are mapped to the cartoon representation of the complex with color bar (Bottom) indicating the range of -0.5 Å (in blue) to 0.5 Å (in red) (C) Number of hydrogen bonds for the RBD bound to hBD-2 over the simulation. (D) Table of most prominent h-bonds and their occupancy. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 4. Similar regions/residues are involved in RBD contact with ACE2 as with hBD-2. (A) Distance map of inter-protein contacts in (A) the RBD:ACE2 complex and (B) the RBD:hBD-2 complex with distances color coded by average proximity over the length of the simulations (see color scale, right). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 5. Biophysical and biological assays demonstrating hBD-2 binding to RBD. (A) Concentration dependent binding of recombinant hBD-2 (rhBD-2) to fluorescently labeled recombinant RBD (rRBD), as measured by miscroscale thermophoresis. HBD-2 was used under oxidizing (black data points) and under reducing conditions (red). (B) Functional ELISA assay showing that rhBD-2 binds to immobilized rRBD with a linear range of concentrations (1.5 to 100nM). (C) Recombinant His-RBD (5 µg) and hBD-2 (7.5 µg) were incubated as described in Methods and precipitated with Ni-NTA beads to pulldown His-tagged-RBD. Co-precipitation of hBD-2 was assessed by Western blotting. Lane 1 shows 20% input of hBD-2 and lane 2 shows Ni-NTA precipitation to examine background binding of hBD-2 to the beads. Data is representative of three independent experiments. (D) ACE2 HEK 293T cells were incubated with FLAG-RBD, with and without hBD-2 at indicated concentrations. Anti-FLAG immunoprecipitation was performed to precipitate ACE2 bound to FLAG-RBD and to assess the effect on hBD-2 addition of RBD:ACE2 binding. Data is representative of two biological replicates. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Figure 6. HBD2 inhibits CoV-2 spike-pseudotyped virus entry into ACE2 293T cells. (A) ACE2 HEK 293T cells were infected with CoV-2 Spike-pseudotyped virus and luciferase activity was assessed at 48 hours post infection. (B) Effect of hBD2 on CoV-2 Spike-pseudotyped virus cell entry was assessed as in A. (C) Percentage infection was calculated from the RLU values in (B) taking spike alone group as 100%. (D) Effect of hBD2 on VSVG-pseudotyped virus entry was assessed as in (A). (E) Percentage infection was calculated from the RLU values in (D) taking VSVG alone group as 100%. (F) Titration of hBD2 concentration (0-15 µg/ml) on spike-mediated pseudovirus entry and luciferase activity. (G) Percentage of spike infection was calculated from the RLU values in (F) taking spike alone group as 100%. (H) hBD2-mediated percent inhibition of spike- viral entry and IC50 was calculated by plotting hBD2 concentration (in µM) against % inhibition observed. Values given are Mean ± SEM of two independent experiments done in triplicates. ***p < 0.001, **p < 0.01, *p < 0.05, and ns (non-significant) against CoV- 2 spike-pseudotyped virus alone treated group. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ REFERENCES: Alcayaga-Miranda, F., Cuenca, J., and Khoury, M. (2017). Antimicrobial Activity of Mesenchymal Stem Cells: Current Status and New Perspectives of Antimicrobial Peptide-Based Therapies. Front Immunol 8, 339. Amaro, R.E., and Mulholland, A.J. (2020). Biomolecular Simulations in the Time of COVID19, and After. Comput Sci Eng 22, 30-36. Barros, E.P., Casalino, L., Gaieb, Z., Dommer, A.C., Wang, Y., Fallon, L., Raguette, L., Belfon, K., Simmerling, C., and Amaro, R.E. (2020). The Flexibility of ACE2 in the Context of SARS-CoV-2 Infection. Biophys J. Bensch, K.W., Raida, M., Mägert, H.J., Schulz-Knappe, P., and Forssmann, W.G. (1995). hBD-1: a novel beta- defensin from human plasma. FEBS Lett 368, 331-335. Brielle, E.S., Schneidman-Duhovny, D., and Linial, M. (2020). The SARS-CoV-2 Exerts a Distinctive Strategy for Interacting with the ACE2 Human Receptor. Viruses 12. Cantuti-Castelvetri, L., Ojha, R., Pedro, L.D., Djannatian, M., Franz, J., Kuivanen, S., van der Meer, F., Kallio, K., Kaya, T., Anastasina, M., et al. (2020). Neuropilin-1 facilitates SARS-CoV-2 cell entry and infectivity. Science, eabd2985. Chen, J., Wang, R., Wang, M., and Wei, G.W. (2020). Mutations Strengthened SARS-CoV-2 Infectivity. J Mol Biol 432, 5212-5226. Chow, L., Johnson, V., Impastato, R., Coy, J., Strumpf, A., and Dow, S. (2020). Antibacterial activity of human mesenchymal stem cells mediated directly by constitutively secreted factors and indirectly by activation of innate immune effector cells. Stem Cells Transl Med 9, 235-249. Chuang, G.Y., Kozakov, D., Brenke, R., Comeau, S.R., and Vajda, S. (2008). DARS (Decoys As the Reference State) potentials for protein-protein docking. Biophys J 95, 4217-4227. Crawford, K.H.D., Eguia, R., Dingens, A.S., Loes, A.N., Malone, K.D., Wolf, C.R., Chu, H.Y., Tortorici, M.A., Veesler, D., Murphy, M., et al. (2020). Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays. Viruses 12. Daly, J.L., Simonetti, B., Klein, K., Chen, K.-E., Williamson, M.K., Antón-Plágaro, C., Shoemark, D.K., Simón-Gracia, L., Bauer, M., Hollandi, R., et al. (2020). Neuropilin-1 is a host factor for SARS-CoV-2 infection. Science, eabd3072. Dawgul, M.A., Greber, K.E., Sawicki, W., and Kamysz, W. (2016). Human host defense peptides - role in maintaining human homeostasis and pathological processes. Curr Med Chem. Diamond, G., Beckloff, N., and Ryan, L.K. (2008). Host defense peptides in the oral cavity and the lung: similarities and differences. J Dent Res 87, 915-927. Diamond, G., Beckloff, N., Weinberg, A., and Kisich, K.O. (2009). The roles of antimicrobial peptides in innate host defense. Curr Pharm Des 15, 2377-2392. Diamond, G., and Ryan, L. (2011). Beta-defensins: what are they really doing in the oral cavity? Oral Dis 17, 628- 635. Dominguez, C., Boelens, R., and Bonvin, A.M. (2003). HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125, 1731-1737. Doss, M., White, M.R., Tecle, T., and Hartshorn, K.L. (2010). Human defensins and LL-37 in mucosal immunity. J Leukoc Biol 87, 79-92. Genheden, S., and Ryde, U. (2015). The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov 10, 449-461. Ghorbani, M., Brooks, B.R., and Klauda, J.B. (2020). Critical Sequence Hotspots for Binding of Novel Coronavirus to Angiotensin Converter Enzyme as Evaluated by Molecular Simulations. J Phys Chem B 124, 10034-10047. Ghosh, S.K., Gerken, T.A., Schneider, K.M., Feng, Z., McCormick, T.S., and Weinberg, A. (2007). Quantification of human beta-defensin-2 and -3 in body fluids: application for studies of innate immunity. Clin Chem 53, 757-765. Goodwin, K., Viboud, C., and Simonsen, L. (2006). Antibody response to influenza vaccination in the elderly: a quantitative review. Vaccine 24, 1159-1169. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Gross, L.Z.F., Sacerdoti, M., Piiper, A., Zeuzem, S., Leroux, A.E., and Biondi, R.M. (2020). ACE2, the Receptor that Enables Infection by SARS-CoV-2: Biochemistry, Structure, Allostery and Evaluation of the Potential Development of ACE2 Modulators. ChemMedChem 15, 1682-1690. Harder, J., Bartels, J., Christophers, E., and Schroder, J.M. (2001). Isolation and characterization of human beta - defensin-3, a novel human inducible peptide antibiotic. J Biol Chem 276, 5707-5713. Harder, J., Bartels, J., Christophers, E., and Schröder, J.M. (1997). A peptide antibiotic from human skin. Nature 387, 861. Hati, S., and Bhattacharyya, S. (2020). Impact of Thiol–Disulfide Balance on the Binding of Covid-19 Spike Protein with Angiotensin-Converting Enzyme 2 Receptor. ACS Omega 5, 16292-16298. Herrera, R., Morris, M., Rosbe, K., Feng, Z., Weinberg, A., and Tugizov, S. (2016). Human beta-defensins 2 and -3 cointernalize with human immunodeficiency virus via heparan sulfate proteoglycans and reduce infectivity of intracellular virions in tonsil epithelial cells. Virology 487, 172-187. Hoover, D.M., Rajashankar, K.R., Blumenthal, R., Puri, A., Oppenheim, J.J., Chertov, O., and Lubkowski, J. (2000). The structure of human beta-defensin-2 shows evidence of higher order oligomerization. J Biol Chem 275, 32911- 32918. Huang, J., Rauscher, S., Nawrocki, G., Ran, T., Feig, M., de Groot, B.L., Grubmüller, H., and MacKerell, A.D., Jr. (2017). CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods 14, 71-73. Humphrey, W., Dalke, A., and Schulten, K. (1996). VMD: visual molecular dynamics. J Mol Graph 14, 33-38, 27-38. Joly, S., Maze, C., McCray, P.B., Jr., and Guthmiller, J.M. (2004). Human beta-defensins 2 and 3 demonstrate strain- selective activity against oral microorganisms. J Clin Microbiol 42, 1024-1029. Kaselitz, T.B., Martin, E.T., Power, L.E., and Cinti, S. (2019). Impact of Vaccination on Morbidity and Mortality in Adults Hospitalized With Influenza A, 2014–2015. Infectious Diseases in Clinical Practice 27, 328-333. Khurshid, Z., Naseem, M., Yahya, I.A.F., Mali, M., Sannam Khan, R., Sahibzada, H.A., Zafar, M.S., Faraz Moin, S., and Khan, E. (2017). Significance and Diagnostic Role of Antimicrobial Cathelicidins (LL-37) Peptides in Oral Health. Biomolecules 7. Kim, J., Yang, Y.L., Jang, S.H., and Jang, Y.S. (2018). Human β-defensin 2 plays a regulatory role in innate antiviral immunity and is capable of potentiating the induction of antigen-specific immunity. Virol J 15, 124. Kim, Y.I., Kim, S.G., Kim, S.M., Kim, E.H., Park, S.J., Yu, K.M., Chang, J.H., Kim, E.J., Lee, S., Casel, M.A.B., et al. (2020). Infection and Rapid Transmission of SARS-CoV-2 in Ferrets. Cell Host Microbe 27, 704-709.e702. Koeninger, L., Armbruster, N.S., Brinch, K.S., Kjaerulf, S., Andersen, B., Langnau, C., Autenrieth, S.E., Schneidawind, D., Stange, E.F., Malek, N.P., et al. (2020). Human β-Defensin 2 Mediated Immune Modulation as Treatment for Experimental Colitis. Front Immunol 11, 93. Kota, S., Sabbah, A., Chang, T.H., Harnack, R., Xiang, Y., Meng, X., and Bose, S. (2008). Role of human beta-defensin- 2 during tumor necrosis factor-alpha/NF-kappaB-mediated innate antiviral response against human respiratory syncytial virus. J Biol Chem 283, 22417-22429. Kozakov, D., Brenke, R., Comeau, S.R., and Vajda, S. (2006). PIPER: an FFT-based protein docking program with pairwise potentials. Proteins 65, 392-406. Kozakov, D., Hall, D.R., Xia, B., Porter, K.A., Padhorny, D., Yueh, C., Beglov, D., and Vajda, S. (2017). The ClusPro web server for protein-protein docking. Nat Protoc 12, 255-278. Krasnodembskaya, A., Song, Y., Fang, X., Gupta, N., Serikov, V., Lee, J.W., and Matthay, M.A. (2010). Antibacterial effect of human mesenchymal stem cells is mediated in part from secretion of the antimicrobial peptide LL-37. Stem Cells 28, 2229-2238. Lan, J., Ge, J., Yu, J., Shan, S., Zhou, H., Fan, S., Zhang, Q., Shi, X., Wang, Q., Zhang, L., et al. (2020). Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 581, 215-220. Lee, B., and Richards, F.M. (1971). The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55, 379-400. Lee, S.H., Kim, J.E., Lim, H.H., Lee, H.M., and Choi, J.O. (2002). Antimicrobial defensin peptides of the human nasal mucosa. Ann Otol Rhinol Laryngol 111, 135-141. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Leikina, E., Delanoe-Ayari, H., Melikov, K., Cho, M.S., Chen, A., Waring, A.J., Wang, W., Xie, Y., Loo, J.A., Lehrer, R.I., et al. (2005). Carbohydrate-binding molecules inhibit viral fusion and entry by crosslinking membrane glycoproteins. Nat Immunol 6, 995-1001. LeMessurier, K.S., Lin, Y., McCullers, J.A., and Samarasinghe, A.E. (2016). Antimicrobial peptides alter early immune response to influenza A virus infection in C57BL/6 mice. Antiviral Res 133, 208-217. Lever, A.M., Strappe, P.M., and Zhao, J. (2004). Lentiviral vectors. J Biomed Sci 11, 439-449. Li, Q., Wu, J., Nie, J., Zhang, L., Hao, H., Liu, S., Zhao, C., Zhang, Q., Liu, H., Nie, L., et al. (2020). The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity. Cell 182, 1284-1294.e1289. Lokhande, K.B.B., Tanushree; Swamy, K. Venkateswara; Deshpande, Manisha (2020). An in Silico Scientific Basis for LL-37 as a Therapeutic and Vitamin D as Preventive for Covid-19. ChemRxiv. Malik, A., Prahlad, D., Kulkarni, N., and Kayal, A. (2020). Interfacial Water Molecules Make RBD of SPIKE Protein and Human ACE2 to Stick Together. bioRxiv, 2020.2006.2015.152892. Manzanares-Meza, L.D., and Medina-Contreras, O. (2020). SARS-CoV-2 and influenza: a comparative overview and treatment implications. Bol Med Hosp Infant Mex 77, 262-273. Mathews, M., Jia, H.P., Guthmiller, J.M., Losh, G., Graham, S., Johnson, G.K., Tack, B.F., and McCray, P.B., Jr. (1999). Production of beta-defensin antimicrobial peptides by the oral mucosa and salivary glands. Infect Immun 67, 2740- 2745. McCallum, M., Walls, A.C., Bowen, J.E., Corti, D., and Veesler, D. (2020). Structure-guided covalent stabilization of coronavirus spike glycoprotein trimers in the closed conformation. Nat Struct Mol Biol. Mi, B., Liu, J., Liu, Y., Hu, L., Liu, Y., Panayi, A.C., Zhou, W., and Liu, G. (2018). The Designer Antimicrobial Peptide A-hBD-2 Facilitates Skin Wound Healing by Stimulating Keratinocyte Migration and Proliferation. Cell Physiol Biochem 51, 647-663. Moll, G., Drzeniek, N., Kamhieh-Milz, J., Geissler, S., Volk, H.D., and Reinke, P. (2020). MSC Therapies for COVID- 19: Importance of Patient Coagulopathy, Thromboprophylaxis, Cell Product Quality and Mode of Delivery for Treatment Safety and Efficacy. Front Immunol 11, 1091. Mookherjee, N., Anderson, M.A., Haagsman, H.P., and Davidson, D.J. (2020). Antimicrobial host defence peptides: functions and clinical potential. Nature reviews Drug discovery 19, 311-332. Mulder, K.C., Lima, L.A., Miranda, V.J., Dias, S.C., and Franco, O.L. (2013). Current scenario of peptide-based drugs: the key roles of cationic antitumor and antiviral peptides. Front Microbiol 4, 321. Ndifon, W., Wingreen, N.S., and Levin, S.A. (2009). Differential neutralization efficiency of hemagglutinin epitopes, antibody interference, and the design of influenza vaccines. Proc Natl Acad Sci U S A 106, 8701-8706. Ooi, C.Y., Pang, T., Leach, S.T., Katz, T., Day, A.S., and Jaffe, A. (2015). Fecal Human β-Defensin 2 in Children with Cystic Fibrosis: Is There a Diminished Intestinal Innate Immune Response? Dig Dis Sci 60, 2946-2952. Ovsyannikova, I.G., Schaid, D.J., Larrabee, B.R., Haralambieva, I.H., Kennedy, R.B., and Poland, G.A. (2017). A large population-based association study between HLA and KIR genotypes and measles vaccine antibody responses. PLoS One 12, e0171261. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kalé, L., and Schulten, K. (2005). Scalable molecular dynamics with NAMD. J Comput Chem 26, 1781-1802. Pinkerton, J.W., Kim, R.Y., Koeninger, L., Armbruster, N.S., Hansbro, N.G., Brown, A.C., Jayaraman, R., Shen, S., Malek, N., Cooper, M.A., et al. (2020). Human β-defensin-2 suppresses key features of asthma in murine models of allergic airways disease. Clin Exp Allergy. Pogue, K., Jensen, J.L., Stancil, C.K., Ferguson, D.G., Hughes, S.J., Mello, E.J., Burgess, R., Berges, B.K., Quaye, A., and Poole, B.D. (2020). Influences on Attitudes Regarding Potential COVID-19 Vaccination in the United States. Vaccines (Basel) 8. Porter, K.A., Xia, B., Beglov, D., Bohnuud, T., Alam, N., Schueler-Furman, O., and Kozakov, D. (2017). ClusPro PeptiDock: efficient global docking of peptide recognition motifs using FFT. Bioinformatics 33, 3299-3301. Quiñones-Mateu, M.E., Lederman, M.M., Feng, Z., Chakraborty, B., Weber, J., Rangel, H.R., Marotta, M.L., Mirza, M., Jiang, B., Kiser, P., et al. (2003). Human epithelial beta-defensins 2 and 3 inhibit HIV-1 replication. Aids 17, F39- 48. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Ramakrishnan, P., Wang, W., and Wallach, D. (2004). Receptor-specific signaling for both the alternative and the canonical NF-kappaB activation pathways by NF-kappaB-inducing kinase. Immunity 21, 477-489. Ray, D., Le, L., and Andricioaei, I. (2020). Distant Residues Modulate the Conformational Opening in SARS-CoV-2 Spike Protein. bioRxiv, 2020.2012.2007.415596. Rivas-Santiago, B., Schwander, S.K., Sarabia, C., Diamond, G., Klein-Patel, M.E., Hernandez-Pando, R., Ellner, J.J., and Sada, E. (2005). Human {beta}-defensin 2 is expressed and associated with Mycobacterium tuberculosis during infection of human alveolar epithelial cells. Infect Immun 73, 4505-4511. Roth, A., Lütke, S., Meinberger, D., Hermes, G., Sengle, G., Koch, M., Streichert, T., and Klatt, A.R. (2020). LL-37 fights SARS-CoV-2: The Vitamin D-Inducible Peptide LL-37 Inhibits Binding of SARS-CoV-2 Spike Protein to its Cellular Receptor Angiotensin Converting Enzyme 2 In Vitro. bioRxiv, 2020.2012.2002.408153. Roy, S., Bag, A.K., Singh, R.K., Talmadge, J.E., Batra, S.K., and Datta, K. (2017). Multifaceted Role of Neuropilins in the Immune System: Potential Targets for Immunotherapy. Front Immunol 8, 1228. Ryan, L.K., Dai, J., Yin, Z., Megjugorac, N., Uhlhorn, V., Yim, S., Schwartz, K.D., Abrahams, J.M., Diamond, G., and Fitzgerald-Bocarsly, P. (2011). Modulation of human beta-defensin-1 (hBD-1) in plasmacytoid dendritic cells (PDC), monocytes, and epithelial cells by influenza virus, Herpes simplex virus, and Sendai virus and its possible role in innate immunity. J Leukoc Biol 90, 343-356. Sakamoto, N., Mukae, H., Fujii, T., Ishii, H., Yoshioka, S., Kakugawa, T., Sugiyama, K., Mizuta, Y., Kadota, J., Nakazato, M., et al. (2005). Differential effects of alpha- and beta-defensin on cytokine production by cultured human bronchial epithelial cells. Am J Physiol Lung Cell Mol Physiol 288, L508-513. Sawai, M.V., Jia, H.P., Liu, L., Aseyev, V., Wiencek, J.M., McCray, P.B., Jr., Ganz, T., Kearney, W.R., and Tack, B.F. (2001). The NMR structure of human beta-defensin-2 reveals a novel alpha-helical segment. Biochemistry 40, 3810-3816. Schibli, D.J., Hunter, H.N., Aseyev, V., Starner, T.D., Wiencek, J.M., McCray, P.B., Jr., Tack, B.F., and Vogel, H.J. (2002). The solution structures of the human beta-defensins lead to a better understanding of the potent bactericidal activity of HBD3 against Staphylococcus aureus. J Biol Chem 277, 8279-8289. Schwarzinger, M., Flicoteaux, R., Cortarenoda, S., Obadia, Y., and Moatti, J.P. (2010). Low acceptability of A/H1N1 pandemic vaccination in French adult population: did public health policy fuel public dissonance? PLoS One 5, e10199. Semple, F., and Dorin, J.R. (2012). β-Defensins: multifunctional modulators of infection, inflammation and more? J Innate Immun 4, 337-348. Singh, P.K., Jia, H.P., Wiles, K., Hesselberth, J., Liu, L., Conway, B.A., Greenberg, E.P., Valore, E.V., Welsh, M.J., Ganz, T., et al. (1998). Production of beta-defensins by human airway epithelia. Proc Natl Acad Sci U S A 95, 14961- 14966. Siu, Y.L., Teoh, K.T., Lo, J., Chan, C.M., Kien, F., Escriou, N., Tsao, S.W., Nicholls, J.M., Altmeyer, R., Peiris, J.S., et al. (2008). The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles. J Virol 82, 11318-11330. Spinello, A., Saltalamacchia, A., and Magistrato, A. (2020). Is the Rigidity of SARS-CoV-2 Spike Receptor-Binding Motif the Hallmark for Its Enhanced Infectivity? Insights from All-Atom Simulations. J Phys Chem Lett 11, 4785- 4790. Starr, T.N., Greaney, A.J., Hilton, S.K., Ellis, D., Crawford, K.H.D., Dingens, A.S., Navarro, M.J., Bowen, J.E., Tortorici, M.A., Walls, A.C., et al. (2020). Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 182, 1295-1310.e1220. Sutton, M.T., Fletcher, D., Ghosh, S.K., Weinberg, A., van Heeckeren, R., Kaur, S., Sadeghi, Z., Hijaz, A., Reese, J., Lazarus, H.M., et al. (2016). Antimicrobial Properties of Mesenchymal Stem Cells: Therapeutic Potential for Cystic Fibrosis Infection, and Treatment. Stem Cells Int 2016, 5303048. Tai, W., He, L., Zhang, X., Pu, J., Voronin, D., Jiang, S., Zhou, Y., and Du, L. (2020). Characterization of the receptor- binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cell Mol Immunol 17, 613-620. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ Tanner, D.E., Chan, K.Y., Phillips, J.C., and Schulten, K. (2011). Parallel Generalized Born Implicit Solvent Calculations with NAMD. J Chem Theory Comput 7, 3635-3642. Tripathi, S., Wang, G., White, M., Qi, L., Taubenberger, J., and Hartshorn, K.L. (2015). Antiviral Activity of the Human Cathelicidin, LL-37, and Derived Peptides on Seasonal and Pandemic Influenza A Viruses. PLoS One 10, e0124706. Tsuchiya, A., Takeuchi, S., Iwasawa, T., Kumagai, M., Sato, T., Motegi, S., Ishii, Y., Koseki, Y., Tomiyoshi, K., Natsui, K., et al. (2020). Therapeutic potential of mesenchymal stem cells and their exosomes in severe novel coronavirus disease 2019 (COVID-19) cases. Inflamm Regen 40, 14. Vajda, S., Yueh, C., Beglov, D., Bohnuud, T., Mottarella, S.E., Xia, B., Hall, D.R., and Kozakov, D. (2017). New additions to the ClusPro server motivated by CAPRI. Proteins 85, 435-444. van Zundert, G.C.P., Rodrigues, J., Trellet, M., Schmitz, C., Kastritis, P.L., Karaca, E., Melquiond, A.S.J., van Dijk, M., de Vries, S.J., and Bonvin, A. (2016). The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol 428, 720-725. Walls, A.C., Park, Y.J., Tortorici, M.A., Wall, A., McGuire, A.T., and Veesler, D. (2020). Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281-292.e286. Walls, A.C., Tortorici, M.A., Bosch, B.J., Frenz, B., Rottier, P.J.M., DiMaio, F., Rey, F.A., and Veesler, D. (2016). Cryo- electron microscopy structure of a coronavirus spike glycoprotein trimer. Nature 531, 114-117. Wang, R., Hozumi, Y., Yin, C., and Wei, G.W. (2020). Decoding SARS-CoV-2 Transmission and Evolution and Ramifications for COVID-19 Diagnosis, Vaccine, and Medicine. J Chem Inf Model. Warnke, P.H., Voss, E., Russo, P.A., Stephens, S., Kleine, M., Terheyden, H., and Liu, Q. (2013). Antimicrobial peptide coating of dental implants: biocompatibility assessment of recombinant human beta defensin-2 for human cells. Int J Oral Maxillofac Implants 28, 982-988. Wrapp, D., Wang, N., Corbett, K.S., Goldsmith, J.A., Hsieh, C.L., Abiona, O., Graham, B.S., and McLellan, J.S. (2020). Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263. Xiong, X., Qu, K., Ciazynska, K.A., Hosmillo, M., Carter, A.P., Ebrahimi, S., Ke, Z., Scheres, S.H.W., Bergamaschi, L., Grice, G.L., et al. (2020). A thermostable, closed, SARS-CoV-2 spike protein trimer. bioRxiv, 2020.2006.2015.152835. Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y., and Zhou, Q. (2020). Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science 367, 1444-1448. Yeasmin, R., Buck, M., Weinberg, A., and Zhang, L. (2018). Translocation of Human β Defensin Type 3 through a Neutrally Charged Lipid Membrane: A Free Energy Study. J Phys Chem B 122, 11883-11894. Yoshimoto, F.K. (2020). The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n- COV19), the Cause of COVID-19. Protein J 39, 198-216. Zhang, L., Borthakur, S., and Buck, M. (2016). Dissociation of a Dynamic Protein Complex Studied by All-Atom Molecular Simulations. Biophys J 110, 877-886. Zhang, L., and Buck, M. (2017). Molecular Dynamics Simulations Reveal Isoform Specific Contact Dynamics between the Plexin Rho GTPase Binding Domain (RBD) and Small Rho GTPases Rac1 and Rnd1. J Phys Chem B 121, 1485-1498. Zhao, H., Zhou, J., Zhang, K., Chu, H., Liu, D., Poon, V.K., Chan, C.C., Leung, H.C., Fai, N., Lin, Y.P., et al. (2016). A novel peptide with potent and broad-spectrum antiviral activities against multiple respiratory viruses. Sci Rep 6, 22008. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425621doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425621 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_07_425675 ---- Mass spectrometry-based sequencing of the anti-FLAG-M2 antibody using multiple proteases and a dual fragmentation scheme 1 Mass spectrometry-based sequencing of the anti-FLAG-M2 antibody using multiple 1 proteases and a dual fragmentation scheme 2 3 Authors: 4 Weiwei Peng1#, Matti F. Pronker1#, Joost Snijder1* 5 6 #equal contribution 7 *corresponding author: j.snijder@uu.nl 8 9 Affiliation: 10 1 Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research 11 and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH 12 Utrecht, The Netherlands 13 14 Keywords: 15 mass spectrometry, antibody, de novo sequencing, EThcD, stepped HCD, Herceptin, FLAG tag, 16 anti-FLAG-M2. 17 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 2 Abstract: 18 Antibody sequence information is crucial to understanding the structural basis for antigen binding 19 and enables the use of antibodies as therapeutics and research tools. Here we demonstrate a 20 method for direct de novo sequencing of monoclonal IgG from the purified antibody products. The 21 method uses a panel of multiple complementary proteases to generate suitable peptides for de 22 novo sequencing by LC-MS/MS in a bottom-up fashion. Furthermore, we apply a dual 23 fragmentation scheme, using both stepped high-energy collision dissociation (stepped HCD) and 24 electron transfer high-energy collision dissociation (EThcD) on all peptide precursors. The method 25 achieves full sequence coverage of the monoclonal antibody Herceptin, with an accuracy of 99% 26 in the variable regions. We applied the method to sequence the widely used anti-FLAG-M2 mouse 27 monoclonal antibody, which we successfully validated by remodeling a high-resolution crystal 28 structure of the Fab and demonstrating binding to a FLAG-tagged target protein in Western blot 29 analysis. The method thus offers robust and reliable sequences of monoclonal antibodies. 30 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 3 Introduction 31 Antibodies can bind a great molecular diversity of antigens, owing to the high degree of sequence 32 diversity that is available through somatic recombination, hypermutation, and heavy-light chain 33 pairings 1-2. Sequence information on antibodies therefore is crucial to understanding the 34 structural basis of antigen binding, how somatic hypermutation governs affinity maturation, and 35 an overall understanding of the adaptive immune response in health and disease, by mapping 36 out the antibody repertoire. Moreover, antibodies have become invaluable research tools in the 37 life sciences and ever more widely developed as therapeutic agents 3-4. In this context, sequence 38 information is crucial for the use, production and validation of these important research tools and 39 biopharmaceutical agents 5-6. 40 Antibody sequences are typically obtained through cloning and sequencing of the coding mRNAs 41 of the paired heavy and light chains 7-9. The sequencing workflows thereby rely on isolation of the 42 antibody-producing cells from peripheral blood monocytes, or spleen and bone marrow tissues. 43 These antibody-producing cells are not always readily available however, and cloning/sequencing 44 of the paired heavy and light chains is a non-trivial task with a limited success rate 7-9. Moreover, 45 antibodies are secreted in bodily fluids and mucus. Antibodies are thereby in large part 46 functionally disconnected from their producing B-cell, which raises questions on how the secreted 47 antibody pool relates quantitatively to the underlying B-cell population and whether there are 48 potential sampling biases in current antibody sequencing strategies. 49 Direct mass spectrometry (MS)-based sequencing of the secreted antibody products is a useful 50 complementary tool that can address some of the challenges faced by conventional sequencing 51 strategies relying on cloning/sequencing of the coding mRNAs 10-17. MS-based methods do not 52 rely on the availability of the antibody-producing cells, but rather target the polypeptide products 53 directly, offering the prospect of a next generation of serology, in which secreted antibody 54 sequences might be obtained from any bodily fluid. Whereas MS-based de novo sequencing still 55 has a long way to go towards this goal, owing to limitations in sample requirements, sequencing 56 accuracy, read length and sequence assembly, MS has been successfully used to profile the 57 antibody repertoire and obtain (partial) antibody sequences beyond those available from 58 conventional sequencing strategies based on cloning/sequencing of the coding mRNAs 10-17. 59 Most MS-based strategies for antibody sequencing rely on a proteomics-type bottom-up LC-60 MS/MS workflow, in which the antibody product is digested into smaller peptides for MS analysis 61 14, 18-23. Available germline antibody sequences are then often used either as a template to guide 62 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 4 assembly of de novo peptide reads (such as in PEAKS Ab) 23, or used as a starting point to 63 iteratively identify somatic mutations to arrive at the mature antibody sequence (such as in 64 Supernovo) 21. To maximize sequence coverage and aid read assembly, these MS-based 65 workflows typically use a combination of complementary proteases and aspecific digestion to 66 generate overlapping peptides. The most straightforward application of these MS-based 67 sequencing workflows is the successful sequencing of monoclonal antibodies from (lost) 68 hybridoma cell lines, but it also forms the basis of more advanced and challenging applications to 69 characterize polyclonal antibody mixtures and profile the full antibody repertoire from serum. 70 Here we describe an efficient protocol for MS-based sequencing of monoclonal antibodies. The 71 protocol requires approximately 200 picomol of the antibody product and sample preparation can 72 be completed within one working day. We selected a panel of 9 proteases with complementary 73 specificities, which are active in the same buffer conditions for parallel digestion of the antibodies. 74 We developed a dual fragmentation strategy for MS/MS analysis of the resulting peptides to yield 75 rich sequence information from the fragmentation spectra of the peptides. The protocol yields full 76 and deep sequence coverage of the variable domains of both heavy and light chains as 77 demonstrated on the monoclonal antibody Herceptin. As a test case, we used our protocol to 78 sequence the widely used anti-FLAG-M2 mouse monoclonal antibody, for which no sequence 79 was publicly available despite its described use in 5000+ peer-reviewed publications 24-25. The 80 protocol achieved full sequence coverage of the variable domains of both heavy and light chains, 81 including all complementarity determining regions (CDRs). The obtained sequence was 82 successfully validated by remodeling the published crystal structure of the anti-FLAG-M2 Fab and 83 demonstrating binding of the synthetic recombinant antibody following the experimental sequence 84 to a FLAG-tagged protein in Western blot analysis. The protocol developed here thus offers robust 85 and reliable sequencing of monoclonal antibodies with prospective applications for sequencing 86 secreted antibodies from bodily fluids. 87 88 Results 89 We used an in-solution digestion protocol, with sodium-deoxycholate as the denaturing agent, to 90 generate peptides from the antibodies for LC-MS/MS analysis. Following heat denaturation and 91 disulfide bond reduction, we used iodoacetic acid as the alkylating agent to cap free cysteines. 92 Note that conventional alkylating agents like iodo-/chloroacetamide generate +57 Da mass 93 differences on cysteines and primary amines, which may lead to spurious assignments as glycine 94 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 5 residues in de novo sequencing. The +58 Da mass differences generated by alkylation with 95 iodoacetic acid circumvents this potential pitfall. 96 We chose a panel of 9 proteases with activity at pH 7.5-8.5, so that the denatured, reduced and 97 alkylated antibodies could be easily split for parallel digestion under the same buffer conditions. 98 These proteases (with indicated cleavage specificities) included: trypsin (C-terminal of R/K), 99 chymotrypsin (C-terminal of F/Y/W/M/L), α-lytic protease (C-terminal of T/A/S/V), elastase 100 (unspecific), thermolysin (unspecific), lysN (N-terminal of K), lysC (C-terminal of K), aspN (N-101 terminal of D/E), and gluC (C-terminal of D/E). Correct placement or assembly of peptide reads 102 is a common challenge in de novo sequencing, which can be facilitated by sufficient overlap 103 between the peptide reads. This favors the occurrence of missed cleavages and longer reads, so 104 we opted to perform a brief 4-hour digestion. Following digestion, SDC is removed by precipitation 105 and the peptide supernatant is desalted, ready for LC-MS/MS analysis. The resulting raw data 106 was used for automated de novo sequencing with the Supernovo software package. 107 As peptide fragmentation is dependent on many factors like length, charge state, composition and 108 sequence 26, we needed a versatile fragmentation strategy to accommodate the diversity of 109 antibody-derived peptides generated by the 9 proteases. We opted for a dual fragmentation 110 scheme that applies both stepped high-energy collision dissociation (stepped HCD) and electron 111 transfer high-energy collision dissociation (EThcD) on all peptide precursors 27-29. The stepped 112 HCD fragmentation includes three collision energies to cover multiple dissociation regimes and 113 the EThcD fragmentation works especially well for higher charge states, also adding 114 complementary c/z ions for maximum sequence coverage. 115 We used the monoclonal antibody Herceptin (also known as Trastuzumab) as a benchmark to 116 test our protocol 30-31. From the total dataset of 9 proteases, we collected 4408 peptide reads 117 (defined as peptides with score >=500, see methods for details), 2866 of which with superior 118 stepped HCD fragmentation, and 1722 with superior EThcD fragmentation (see Table S1). 119 Sequence coverage was 100% in both heavy and light chains across the variable and constant 120 domains (see Figures S1 and S2). The median depth of coverage was 148 overall and slightly 121 higher in the light chain (see Table S1 and Figure S1-2). The median depth of coverage in the 122 CDRs of both chains ranged from 42 to 210. 123 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 6 124 Figure 1. mass spectrometry-based de novo sequencing of the monoclonal antibody Herceptin. The 125 variable regions of the Heavy (A) and Light Chains (B) are shown. The MS-based sequence is shown 126 alongside the known Herceptin sequence, with differences highlighted by asterisks (*). Exemplary MS/MS 127 spectra supporting the assigned sequences of the Heavy and Light Chain CDRs are shown below the 128 alignments. Peptide sequence and fragment coverage are indicated on top of the spectra, with b/c ions 129 indicated in blue and y/z ions in red. The same coloring is used to annotate peaks in the spectra, with 130 additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in 131 green. Note that to prevent overlapping peak labels, only a subset of successfully matched peaks is 132 annotated. 133 134 The experimentally determined de novo sequence is shown alongside the known Herceptin 135 sequence for the variable domains of both chains in Figure 1, with exemplary MS/MS spectra for 136 the CDRs. We achieved an overall sequence accuracy of 99% with the automated sequencing 137 procedure of Supernovo, with 3 incorrect assignments in the light chain. In framework 3 of the 138 light chain, I75 was incorrectly assigned as the isomer Leucine (L), a common MS-based 139 sequencing error. In CDRL3 of the light chain, an additional misassignment was made for the 140 dipeptide H91/Y92, which was incorrectly assigned as W91/N92. The dipeptides HY and WN 141 have identical masses, and the misassignment of W91/N92 (especially W91) was poorly 142 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 7 supported by the fragmentation spectra, in contrast to the correct H91/Y92 assignment (see c6/c7 143 in fragmentation spectra, Figure 1). Overall, the protocol yielded highly accurate sequences at a 144 combined 230/233 positions of the variable domains in Herceptin. 145 146 147 Figure 2. Mass spectrometry based de novo sequence of the mouse monoclonal anti-FLAG-M2 antibody. 148 The variable regions of the Heavy (A) and Light Chains (B) are shown. The MS-based sequence is shown 149 alongside the previously published sequenced in the crystal structure of the Fab (PDB ID: 2G60), and 150 germline sequence (IMGT-DomainGapAlign; IGHV1-04/IGHJ2; IGKV1-117/IGKJ1). Differential residues 151 are highlighted by asterisks (*). Exemplary MS/MS spectra in support of the assigned sequences are shown 152 below the alignments. Peptide sequence and fragment coverage are indicated on top of the spectra, with 153 b/c ions indicated in blue, y/z ions in red. The same coloring is used to annotate peaks in the spectra, with 154 additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in 155 green. Note that to prevent overlapping peak labels, only a subset of successfully matched peaks is 156 annotated. 157 158 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 8 We next applied our sequencing protocol to the mouse monoclonal anti-FLAG-M2 antibody as a 159 test case 24. Despite the widespread use of anti-FLAG-M2 to detect and purify FLAG-tagged 160 proteins 32, the only publicly available sequences can be found in the crystal structure of the Fab 161 33. The modelled sequence of the original crystal structure had to be inferred from germline 162 sequences that could match the experimental electron density and also includes many 163 placeholder Alanines at positions that could not be straightforwardly interpreted. The full anti-164 FLAG-M2 dataset from the 9 proteases included 3371 peptide reads (with scores >= 500); 1983 165 with superior stepped HCD fragmentation spectra, and 1388 with superior EThcD spectra. We 166 achieved full sequence coverage of the variable regions of both heavy and light chains, with a 167 median depth of coverage in the CDRs ranging from 32 to 192 (see Table S1). As for Herceptin, 168 the depth of coverage was better in the light chain compared to the heavy chain (see Figure S1-169 S2). The full MS-based anti-FLAG-M2 sequences can be found in FASTA format in the 170 supplementary information. 171 172 173 Figure 3. Validation of the MS-based anti-FLAG-M2 sequence. A) the previously published crystal structure 174 of the anti-FLAG-M2 FAb was remodeled with the experimentally determined sequence, shown in surface 175 rendering with CDRs and differential residues highlighted in colors. B) 2Fo-Fc electron density of the new 176 refined map contoured at 1 RMSD is shown in blue and Fo-Fc positive difference density of the original 177 deposited map contoured at 1.7 RMSD in green around the CDR loops of the heavy and light chains. 178 Differential residues between the published crystal structure and the model based on our antibody 179 sequencing are indicated in purple. C) Western blot validation of the synthetic recombinant anti-FLAG-M2 180 antibody produced with the experimentally determined sequence demonstrate equivalent FLAG-tag binding 181 compared to commercial anti-FLAG-M2 (see also Figure S3). 182 183 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 9 The MS-based sequences of anti-FLAG-M2 are shown alongside the crystal structure sequences 184 and the inferred germline precursors with exemplary MS/MS spectra for the CDRs in Figure 2. 185 The experimentally determined sequence reveals that anti-FLAG-M2 is a mouse IgG1, with an 186 IGHV1-04/IGHJ2 heavy chain and IGKV1-117/IGKJ1 kappa light chain. The experimentally 187 determined sequence differs at 34 and 9 positions in the heavy and light chain of the Fab crystal 188 structure, respectively. To validate the experimentally determined sequences, we remodeled the 189 crystal structure using the MS-based heavy and light chains, resulting in much improved model 190 statistics (see Figure 3 and Table S2). The experimental electron densities show excellent support 191 of the MS-based sequence (as shown for the CDRs in Figure 3B). A notable exception is L51 in 192 CDRH2 of the heavy chain. The MS-based sequence was assigned as Leucine, but the 193 experimental electron density supports assignment of the isomer Isoleucine instead (see Figure 194 S3). In contrast to the original model our new MS-based model reveals a predominantly positively 195 charged paratope (see Figure S4), which potentially complements the -3 net charge of the FLAG 196 tag epitope (DYKDDDDK) to mediate binding. The experimentally determined anti-FLAG-M2 197 sequence, with the L51I correction, was further validated by testing binding of the synthetic 198 recombinant antibody to a purified FLAG-tagged protein in Western blot analysis (see Figure 3C 199 and S5). The synthetic recombinant antibody showed equivalent binding compared to the original 200 antibody sample used for sequencing, confirming that the experimentally determined sequence 201 is reliable to obtain the recombinant antibody product with the desired functional profile. 202 203 Discussion 204 There are four other monoclonal antibody sequences against the FLAG tag publicly available 205 through the ABCD (AntiBodies Chemically Defined) database 34-36. Comparison of the CDRs of 206 anti-FLAG-M2 with these additional four monoclonal antibodies reveals a few common motifs that 207 may determine FLAG-tag binding specificity (see Table S3). In the heavy chain, the only common 208 motif between all five monoclonals is that the first three residues of CDRH1 follow a GXS 209 sequence. In addition, the last three residues of CDRH3 of anti-FLAG-M2 are YDY, similar to 210 MDY in H28, and YDF in EEh13.6 (and EEh14.3 also ends CDRH3 with an aromatic F residue). 211 In contrast to the heavy chain, the CDRs of the light chain are almost completely conserved in 212 4/5 monoclonals with only minimal differences compared to germline. The anti-FLAG-M2 and H28 213 monoclonals were specifically raised in mice against the FLAG-tag epitope 24, 35, whereas the 214 computationally designed EEh13.6 and EEh14.3 monoclonals contain the same light chain from 215 an EE-dipeptide tag directed antibody 34. This suggests that the IGKV1-117/IGKJ1 light chain may 216 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 10 be a common determinant of binding to a small negatively charged peptide epitope like the FLAG-217 tag and is readily available as a hardcoded germline sequence in the mouse antibody repertoire. 218 The availability of the anti-FLAG-M2 sequences may contribute to the wider use of this important 219 research tool, as well as the development and engineering of better FLAG-tag directed antibodies. 220 This example illustrates that our MS-based sequencing protocol yields robust and reliable 221 monoclonal antibody sequences. The protocol described here also formed the basis of a recent 222 application where we sequenced an antibody directly from patient-derived serum, using a 223 combination with top-down fragmentation of the isolated Fab fragment 37. The dual fragmentation 224 strategy yields high-quality spectra suitable for de novo sequencing and may further contribute to 225 the exciting prospect of a new era of serology in which antibody sequences can be directly 226 obtained from bodily fluids. 227 228 229 Methods 230 Sample preparation 231 Anti-Flag M2 antibody was purchased from Sigma (catalogue number F1804). Herceptin was 232 provided by Roche (Penzberg, Germany). 27 μg of each sample was denatured in 2% sodium 233 deoxycholate (SDC), 200 mM Tris-HCl, 10 mM tris(2-carboxyethyl)phosphine (TCEP), pH 8.0 at 234 95°C for 10 min, followed with 30 min incubation at 37°C for reduction. Sample was then alkylated 235 by adding iodoacetic acid to a final concentration of 40 mM and incubated in the dark at room 236 temperature for 45 min. 3 μg Sample was then digested by one of the following proteases: trypsin, 237 chymotrypsin, lysN, lysC, gluC, aspN, aLP, thermolysin and elastase in a 1:50 ratio (w:w) in a 238 total volume of 100 uL of 50 mM ammonium bicarbonate at 37°C for 4 h. After digestion, SDC 239 was removed by adding 2 uL formic acid (FA) and centrifugation at 14000 g for 20 min. Following 240 centrifugation, the supernatant containing the peptides was collected for desalting on a 30 µm 241 Oasis HLB 96-well plate (Waters). The Oasis HLB sorbent was activated with 100% acetonitrile 242 and subsequently equilibrated with 10% formic acid in water. Next, peptides were bound to the 243 sorbent, washed twice with 10% formic acid in water and eluted with 100 µL of 50% 244 acetonitrile/5% formic acid in water (v/v). The eluted peptides were vacuum-dried and 245 reconstituted in 100 µL 2% FA. 246 247 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 11 Mass Spectrometry 248 The digested peptides (single injection of 0.2 ug) were separated by online reversed phase 249 chromatography on an Agilent 1290 UHPLC (column packed with Poroshell 120 EC C18; 250 dimensions 50 cm x 75 µm, 2.7 µm, Agilent Technologies) coupled to a Thermo Scientific Orbitrap 251 Fusion mass spectrometer. Samples were eluted over a 90 min gradient from 0% to 35% 252 acetonitrile at a flow rate of 0.3 μL/min. Peptides were analyzed with a resolution setting of 60000 253 in MS1. MS1 scans were obtained with standard AGC target, maximum injection time of 50 ms, 254 and scan range 350-2000. The precursors were selected with a 3 m/z window and fragmented by 255 stepped HCD as well as EThcD. The stepped HCD fragmentation included steps of 25%, 35% 256 and 50% NCE. EThcD fragmentation was performed with calibrated charge-dependent ETD 257 parameters and 27% NCE supplemental activation. For both fragmentation types, ms2 scan were 258 acquired at 30000 resolution, 800% Normalized AGC target, 250 ms maximum injection time, 259 scan range 120-3500. 260 261 MS Data Analysis 262 Automated de novo sequencing was performed with Supernovo (version 3.10, Protein Metrics 263 Inc.). Custom parameters were used as follows: non-specific digestion; precursor and product 264 mass tolerance was set to 12 ppm and 0.02 Da respectively; carboxymethylation (+58.005479) 265 on cysteine was set as fixed modification; oxidation on methionine and tryptophan was set as 266 variable common 1 modification; carboxymethylation on the N-terminus, pyroglutamic acid 267 conversion of glutamine and glutamic acid on the N-terminus, deamidation on 268 asparagine/glutamine were set as variable rare 1 modifications. Peptides were filtered for score 269 >=500 for the final evaluation of spectrum quality and (depth of) coverage. Supernovo generates 270 peptide groups for redundant MS/MS spectra, including also when stepped HCD and EThcD 271 fragmentation on the same precursor both generate good peptide-spectrum matches. In these 272 cases only the best-matched spectrum is counted as representative for that group. This criterium 273 was used in counting the number of peptide reads reported in Table S1. Germline sequences and 274 CDR boundaries were inferred using IMGT/DomainGapAlign 38-39. 275 276 277 278 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 12 Revision of the anti-FLAG-M2 Fab crystal structure model 279 As a starting point for model building, the reflection file and coordinates of the published anti-280 FLAG-M2 Fab crystal structure were used (PDB ID: 2G60) 33. Care was taken to use the original 281 Rfree labels of the deposited reflection file for refinement, so as not to introduce extra model bias. 282 Differential residues between this structure and our mass spectrometry-derived anti-FLAG 283 sequence were manually mutated and fitted in the density using Coot 40. Many spurious water 284 molecules that caused severe steric clashes in the original model were also manually removed in 285 Coot. Densities for two sulfate and one chloride ion were identified and built into the model. The 286 original crystallization solution contained 0.1 M ammonium sulfate. Iterative cycles of model 287 geometry optimization in real space in Coot and reciprocal space refinement by Phenix were used 288 to generate the final model, which was validated with Molprobity 41-42. 289 290 Cloning and expression of synthetic recombinant anti-FLAG-M2 291 To recombinantly express full-length anti-FLAG-M2, the proteomic sequences of both the light 292 and heavy chains were reverse-translated and codon optimized for expression in human cells 293 using the Integrated DNA Technologies (IDT) web tool (http://www.idtdna.com/CodonOpt) 43. For 294 the linker and Fc region of the heavy chain, the standard mouse Ig gamma-1 (IGHG1) amino acid 295 sequence (Uniprot P01868.1) was used. An N-terminal secretion signal peptide derived from 296 human IgG light chain (MEAPAQLLFLLLLWLPDTTG) was added to the N-termini of both heavy 297 and light chains. BamHI and NotI restriction sites were added to the 5’ and 3’ ends of the coding 298 regions, respectively. Only for the light chain, a double stop codon was introduced at the 3’ site 299 before the NotI restriction site. The coding regions were subcloned using BamHI and NotI 300 restriction-ligation into a pRK5 expression vector with a C-terminal octahistidine tag between the 301 NotI site and a double stop codon 3’ of the insert, so that only the heavy chain has a C-terminal 302 AAAHHHHHHHH sequence for Nickel-affinity purification (the triple alanine resulting from the NotI 303 site). The L51I correction in the heavy chain was introduced later (after observing it in the crystal 304 structure) by IVA cloning 44. Expression plasmids for the heavy and light chain were mixed in a 305 1:1 (w/w) ratio for transient transfection in HEK293 cells with polyethylenimine, following standard 306 procedures. Medium was collected 6 days after transfection and cells were spun down by 307 10 minutes of centrifugation at 1000 g. Antibody was directly purified from the supernatant using 308 Ni-sepharose excel resin (Cytiva Lifes Sciences), washing with 500 mM NaCl, 2 mM CaCl2, 15 309 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 13 mM imidazole, 20 mM HEPES pH 7.8 and eluting with 500 mM NaCl, 2 mM CaCl2, 200 mM 310 imidazole, 20 mM HEPES pH 7.8. 311 312 Western blot validation of anti-FLAG-M2 binding 313 To test binding of our recombinant anti-FLAG-M2 to the FLAG-tag epitope, compared to the 314 commercially available anti-FLAG-M2 (Sigma), we used both antibodies to probe Western blots 315 of a FLAG-tagged protein in parallel. Purified Rabies virus glycoprotein ectodomain (SAD B19 316 strain, UNIPROT residues 20-450) with or without a C-terminal FLAG-tag followed by a foldon 317 trimerization domain and an octahistidine tag was heated to 95 °C in XT sample buffer (Biorad) 318 for 5 minutes. Samples were run twice on a Criterion XT 4-12% polyacrylamide gel (Biorad) in 319 MES XT buffer (Biorad) before Western blot transfer to a nitrocellulose membrane in tris-glycine 320 buffer (Biorad) with 20% methanol. The membrane was blocked with 5% (w/v) dry non-fat milk in 321 phosphate-buffered saline (PBS) overnight at 4 °C. The membrane was cut in two (one half for 322 the commercial and one half for the recombinant anti-FLAG-M2) and each half was probed with 323 either commercial (Sigma) or recombinant anti-FLAG-M2 at 1 µg/mL in PBS for 45 minutes. After 324 washing three times with PBST (PBS with 0.1% v/v Tween20), polyclonal goat anti-mouse fused 325 to horseradish peroxidase (HRP) was used to detect binding of anti-FLAG-M2 to the FLAG-tagged 326 protein for both membranes. The membranes were washed three more times with PBST before 327 applying enhanced chemiluminescence (ECL; Pierce) reagent to image the blots in parallel. 328 329 Data Availability 330 The raw LC-MS/MS data have been deposited to the ProteomeXchange Consortium via the 331 PRIDE partner repository with the dataset identifier PXD023419. The coordinates and reflection 332 file with phases for the remodeled crystal structure of the anti-FLAG-M2 Fab have been deposited 333 in the Protein Data Bank under accession code 7BG1. 334 335 Acknowledgements 336 Herceptin was a kind gift from Roche (Penzberg, Germany). We would like to acknowledge 337 support by Protein Metrics Inc. through access to Supernovo software and helpful discussion on 338 de novo antibody sequencing. We would like to thank everyone in the Biomolecular Mass 339 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 14 Spectrometry and Proteomics group at Utrecht University for support and helpful discussions. 340 This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute 341 for Chemical Immunology (ICI; 024.002.009). 342 343 Author Contributions 344 WP and JS conceived of the project. WP carried out the MS experiments. WP and JS analyzed 345 the MS data. MFP remodeled the crystal structure. MFP cloned and produced the synthetic 346 recombinant antibody and carried out Western blotting. JS supervised the project. JS wrote the 347 first draft and all authors contributed to preparing the final version of the manuscript. 348 349 Competing Interests 350 The authors declare no competing interests 351 352 References 353 1. Tonegawa, S., Somatic generation of antibody diversity. Nature 1983, 302 (5909), 575-354 581. 355 2. Watson, C. T.; Glanville, J.; Marasco, W. A., The individual and population genetics of 356 antibody immunity. Trends in immunology 2017, 38 (7), 459-470. 357 3. Carter, P. J.; Lazar, G. A., Next generation antibody drugs: pursuit of the'high-hanging 358 fruit'. Nature Reviews Drug Discovery 2018, 17 (3), 197. 359 4. Grilo, A. L.; Mantalaris, A., The increasingly human and profitable monoclonal antibody 360 market. Trends in biotechnology 2019, 37 (1), 9-16. 361 5. Baker, M., Blame it on the antibodies. Nature 2015, 521 (7552), 274. 362 6. Uhlen, M.; Bandrowski, A.; Carr, S.; Edwards, A.; Ellenberg, J.; Lundberg, E.; Rimm, D. 363 L.; Rodriguez, H.; Hiltke, T.; Snyder, M., A proposal for validation of antibodies. Nature methods 364 2016, 13 (10), 823-827. 365 7. Fischer, N. In Sequencing antibody repertoires: the next generation, MAbs, Taylor & 366 Francis: 2011; pp 17-20. 367 8. Georgiou, G.; Ippolito, G. C.; Beausang, J.; Busse, C. E.; Wardemann, H.; Quake, S. R., 368 The promise and challenge of high-throughput sequencing of the antibody repertoire. Nature 369 biotechnology 2014, 32 (2), 158-168. 370 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 15 9. Robinson, W. H., Sequencing the functional antibody repertoire—diagnostic and 371 therapeutic discovery. Nature Reviews Rheumatology 2015, 11 (3), 171. 372 10. Boutz, D. R.; Horton, A. P.; Wine, Y.; Lavinder, J. J.; Georgiou, G.; Marcotte, E. M., 373 Proteomic identification of monoclonal antibodies from serum. Analytical chemistry 2014, 86 374 (10), 4758-4766. 375 11. Castellana, N. E.; McCutcheon, K.; Pham, V. C.; Harden, K.; Nguyen, A.; Young, J.; 376 Adams, C.; Schroeder, K.; Arnott, D.; Bafna, V., Resurrection of a clinical antibody: Template 377 proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-378 α antibody. Proteomics 2011, 11 (3), 395-405. 379 12. Chen, J.; Zheng, Q.; Hammers, C. M.; Ellebrecht, C. T.; Mukherjee, E. M.; Tang, H.-Y.; 380 Lin, C.; Yuan, H.; Pan, M.; Langenhan, J., Proteomic analysis of pemphigus autoantibodies 381 indicates a larger, more diverse, and more dynamic repertoire than determined by B cell 382 genetics. Cell reports 2017, 18 (1), 237-247. 383 13. Cheung, W. C.; Beausoleil, S. A.; Zhang, X.; Sato, S.; Schieferl, S. M.; Wieler, J. S.; 384 Beaudet, J. G.; Ramenani, R. K.; Popova, L.; Comb, M. J., A proteomics approach for the 385 identification and cloning of monoclonal antibodies from serum. Nature biotechnology 2012, 30 386 (5), 447-452. 387 14. Guthals, A.; Gan, Y.; Murray, L.; Chen, Y.; Stinson, J.; Nakamura, G.; Lill, J. R.; 388 Sandoval, W.; Bandeira, N., De novo MS/MS sequencing of native human antibodies. Journal of 389 proteome research 2017, 16 (1), 45-54. 390 15. Lee, J.; Boutz, D. R.; Chromikova, V.; Joyce, M. G.; Vollmers, C.; Leung, K.; Horton, A. 391 P.; DeKosky, B. J.; Lee, C.-H.; Lavinder, J. J., Molecular-level analysis of the serum antibody 392 repertoire in young adults before and after seasonal influenza vaccination. Nature medicine 393 2016, 22 (12), 1456-1464. 394 16. Lee, J.; Paparoditis, P.; Horton, A. P.; Frühwirth, A.; McDaniel, J. R.; Jung, J.; Boutz, D. 395 R.; Hussein, D. A.; Tanno, Y.; Pappas, L., Persistent antibody clonotypes dominate the serum 396 response to influenza over multiple years and repeated vaccinations. Cell host & microbe 2019, 397 25 (3), 367-376. e5. 398 17. Lindesmith, L. C.; McDaniel, J. R.; Changela, A.; Verardi, R.; Kerr, S. A.; Costantini, V.; 399 Brewer-Jensen, P. D.; Mallory, M. L.; Voss, W. N.; Boutz, D. R., Sera antibody repertoire 400 analyses reveal mechanisms of broad and pandemic strain neutralizing responses after human 401 norovirus vaccination. Immunity 2019, 50 (6), 1530-1541. e8. 402 18. Bandeira, N.; Pham, V.; Pevzner, P.; Arnott, D.; Lill, J. R., Automated de novo protein 403 sequencing of monoclonal antibodies. Nature biotechnology 2008, 26 (12), 1336-1338. 404 19. Rickert, K. W.; Grinberg, L.; Woods, R. M.; Wilson, S.; Bowen, M. A.; Baca, M. In 405 Combining phage display with de novo protein sequencing for reverse engineering of 406 monoclonal antibodies, MAbs, Taylor & Francis: 2016; pp 501-512. 407 20. Savidor, A.; Barzilay, R.; Elinger, D.; Yarden, Y.; Lindzen, M.; Gabashvili, A.; Tal, O. A.; 408 Levin, Y., Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo 409 Protein and Antibody Sequence Determination. Molecular & Cellular Proteomics 2017, 16 (6), 410 1151-1161. 411 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 16 21. Sen, K. I.; Tang, W. H.; Nayak, S.; Kil, Y. J.; Bern, M.; Ozoglu, B.; Ueberheide, B.; Davis, 412 D.; Becker, C., Automated antibody de novo sequencing and its utility in biopharmaceutical 413 discovery. Journal of The American Society for Mass Spectrometry 2017, 28 (5), 803-810. 414 22. Sousa, E.; Olland, S.; Shih, H. H.; Marquette, K.; Martone, R.; Lu, Z.; Paulsen, J.; Gill, 415 D.; He, T., Primary sequence determination of a monoclonal antibody against α-synuclein using 416 a novel mass spectrometry-based approach. International Journal of Mass Spectrometry 2012, 417 312, 61-69. 418 23. Tran, N. H.; Rahman, M. Z.; He, L.; Xin, L.; Shan, B.; Li, M., Complete de novo assembly 419 of monoclonal antibody sequences. Scientific reports 2016, 6 (1), 1-10. 420 24. Brizzard, B. L.; Chubet, R. G.; Vizard, D., Immunoaffinity purification of FLAG epitope-421 tagged bacterial alkaline phosphatase using a novel monoclonal antibody and peptide elution. 422 Biotechniques 1994, 16 (4), 730-735. 423 25. Sigma-Aldrich anti-FLAG-M2 F1804 product page. 424 https://www.sigmaaldrich.com/catalog/product/sigma/f1804?lang=en®ion=NL (accessed 05-425 01-2021). 426 26. Paizs, B.; Suhai, S., Fragmentation pathways of protonated peptides. Mass 427 spectrometry reviews 2005, 24 (4), 508-548. 428 27. Diedrich, J. K.; Pinto, A. F.; Yates III, J. R., Energy dependence of HCD on peptide 429 fragmentation: stepped collisional energy finds the sweet spot. Journal of the American Society 430 for Mass Spectrometry 2013, 24 (11), 1690-1699. 431 28. Frese, C. K.; Altelaar, A. M.; van den Toorn, H.; Nolting, D.; Griep-Raming, J.; Heck, A. 432 J.; Mohammed, S., Toward full peptide sequence coverage by dual fragmentation combining 433 electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Analytical 434 chemistry 2012, 84 (22), 9668-9673. 435 29. Frese, C. K.; Zhou, H.; Taus, T.; Altelaar, A. M.; Mechtler, K.; Heck, A. J.; Mohammed, 436 S., Unambiguous phosphosite localization using electron-transfer/higher-energy collision 437 dissociation (EThcD). Journal of proteome research 2013, 12 (3), 1520-1525. 438 30. Carter, P.; Presta, L.; Gorman, C. M.; Ridgway, J.; Henner, D.; Wong, W.; Rowland, A. 439 M.; Kotts, C.; Carver, M. E.; Shepard, H. M., Humanization of an anti-p185HER2 antibody for 440 human cancer therapy. Proceedings of the National Academy of Sciences 1992, 89 (10), 4285-441 4289. 442 31. Slamon, D. J.; Leyland-Jones, B.; Shak, S.; Fuchs, H.; Paton, V.; Bajamonde, A.; 443 Fleming, T.; Eiermann, W.; Wolter, J.; Pegram, M., Use of chemotherapy plus a monoclonal 444 antibody against HER2 for metastatic breast cancer that overexpresses HER2. New England 445 journal of medicine 2001, 344 (11), 783-792. 446 32. Einhauer, A.; Jungbauer, A., The FLAG™ peptide, a versatile fusion tag for the 447 purification of recombinant proteins. Journal of biochemical and biophysical methods 2001, 49 448 (1-3), 455-465. 449 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 17 33. Roosild, T. P.; Castronovo, S.; Choe, S., Structure of anti-FLAG M2 Fab domain and its 450 use in the stabilization of engineered membrane proteins. Acta Crystallographica Section F: 451 Structural Biology and Crystallization Communications 2006, 62 (9), 835-839. 452 34. Entzminger, K. C.; Hyun, J.-m.; Pantazes, R. J.; Patterson-Orazem, A. C.; Qerqez, A. N.; 453 Frye, Z. P.; Hughes, R. A.; Ellington, A. D.; Lieberman, R. L.; Maranas, C. D., De novo design of 454 antibody complementarity determining regions binding a FLAG tetra-peptide. Scientific reports 455 2017, 7 (1), 1-11. 456 35. Ikeda, K.; Koga, T.; Sasaki, F.; Ueno, A.; Saeki, K.; Okuno, T.; Yokomizo, T., Generation 457 and characterization of a human-mouse chimeric high-affinity antibody that detects the 458 DYKDDDDK FLAG peptide. Biochemical and Biophysical Research Communications 2017, 486 459 (4), 1077-1082. 460 36. Lima, W. C.; Gasteiger, E.; Marcatili, P.; Duek, P.; Bairoch, A.; Cosson, P., The ABCD 461 database: a repository for chemically defined antibodies. Nucleic acids research 2020, 48 (D1), 462 D261-D264. 463 37. Bondt, A.; Hoek, M.; Tamara, S.; de Graaf, B.; Peng, W.; Schulte, D.; den Boer, M. A.; 464 Greisch, J.-F.; Varkila, M. R.; Snijder, J., Human Plasma IgG1 Repertoires are Simple, Unique, 465 and Dynamic. SSRN 2020. 466 38. Ehrenmann, F.; Kaas, Q.; Lefranc, M.-P., IMGT/3Dstructure-DB and 467 IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell 468 receptors, MHC, IgSF and MhcSF. Nucleic acids research 2010, 38 (suppl_1), D301-D307. 469 39. Ehrenmann, F.; Lefranc, M.-P., IMGT/DomainGapAlign: IMGT standardized analysis of 470 amino acid sequences of variable, constant, and groove domains (IG, TR, MH, IgSF, MhSF). 471 Cold Spring Harbor Protocols 2011, 2011 (6), pdb. prot5636. 472 40. Emsley, P.; Cowtan, K., Coot: model-building tools for molecular graphics. Acta 473 Crystallographica Section D: Biological Crystallography 2004, 60 (12), 2126-2132. 474 41. Afonine, P. V.; Grosse-Kunstleve, R. W.; Echols, N.; Headd, J. J.; Moriarty, N. W.; 475 Mustyakimov, M.; Terwilliger, T. C.; Urzhumtsev, A.; Zwart, P. H.; Adams, P. D., Towards 476 automated crystallographic structure refinement with phenix. refine. Acta Crystallographica 477 Section D: Biological Crystallography 2012, 68 (4), 352-367. 478 42. Chen, V. B.; Arendall, W. B.; Headd, J. J.; Keedy, D. A.; Immormino, R. M.; Kapral, G. 479 J.; Murray, L. W.; Richardson, J. S.; Richardson, D. C., MolProbity: all-atom structure validation 480 for macromolecular crystallography. Acta Crystallographica Section D: Biological 481 Crystallography 2010, 66 (1), 12-21. 482 43. Fuglsang, A., Codon optimizer: a freeware tool for codon optimization. Protein 483 expression and purification 2003, 31 (2), 247-249. 484 44. García-Nafría, J.; Watson, J. F.; Greger, I. H., IVA cloning: a single-tube universal 485 cloning system exploiting bacterial in vivo assembly. Scientific reports 2016, 6, 27459. 486 487 488 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_07_425723 ---- 1 Engineering the thermotolerant industrial yeast Kluyveromyces marxianus for anaerobic growth 1 Wijbrand J. C. Dekker, Raúl A. Ortiz-Merino, Astrid Kaljouw, Julius Battjes, Frank W. Wiering, Christiaan 2 Mooiman, Pilar de la Torre, and Jack T. Pronk* 3 Department of Biotechnology, Delft University of Technology, van der Maasweg 9, 2629 HZ Delft, The 4 Netherlands 5 *Corresponding author: Department of Biotechnology, Delft University of Technology, van der Maasweg 6 9, 2629 HZ Delft, The Netherlands, E-mail: j.t.pronk@tudelft.nl, Tel: +31 15 2783214. 7 Wijbrand J.C. Dekker w.j.c.dekker@tudelft.nl 8 Raúl A. Ortiz-Merino raul.ortiz@tudelft.nl https://orcid.org/0000-0003-4186-8941 9 Astrid Kaljouw astridk20@gmail.com 10 Julius Battjes juliusbattjes@hotmail.com 11 Frank Willem Wiering frank.wiering@gmail.com 12 Christiaan Mooiman c.mooiman@tudelft.nl 13 Pilar de la Torre pilartocortes@gmail.com 14 Jack T. Pronk j.t.pronk@tudelft.nl https://orcid.org/0000-0002-5617-4611 15 Manuscript for submission in Nature Biotechnology, section: Article. 16 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint mailto:j.t.pronk@tudelft.nl mailto:w.j.c.dekker@tudelft.nl mailto:raul.ortiz@tudelft.nl https://orcid.org/0000-0003-4186-8941 mailto:c.mooiman@tudelft.nl mailto:j.t.pronk@tudelft.nl https://orcid.org/0000-0002-5617-4611 https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract 17 Current large-scale, anaerobic industrial processes for ethanol production from renewable 18 carbohydrates predominantly rely on the mesophilic yeast Saccharomyces cerevisiae. Use of 19 thermotolerant, facultatively fermentative yeasts such as Kluyveromyces marxianus could confer 20 significant economic benefits. However, in contrast to S. cerevisiae, these yeasts cannot grow in the 21 absence of oxygen. Response of K. marxianus and S. cerevisiae to different oxygen-limitation regimes 22 were analyzed in chemostats. Genome and transcriptome analysis, physiological responses to sterol 23 supplementation and sterol-uptake measurements identified absence of a functional sterol-uptake 24 mechanism as a key factor underlying the oxygen requirement of K. marxianus. Heterologous expression 25 of a squalene-tetrahymanol cyclase enabled oxygen-independent synthesis of the sterol surrogate 26 tetrahymanol in K. marxianus. After a brief adaptation under oxygen-limited conditions, tetrahymanol-27 expressing K. marxianus strains grew anaerobically on glucose at temperatures of up to 45 °C. These 28 results open up new directions in the development of thermotolerant yeast strains for anaerobic 29 industrial applications. 30 Keywords: Ergosterol, tetrahymanol, anaerobic metabolism, thermotolerance, ethanol production, 31 yeast biotechnology, metabolic engineering 32 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 In terms of product volume (87 Mton y-1)1,2, anaerobic conversion of carbohydrates into ethanol by the 33 yeast Saccharomyces cerevisiae is the single largest process in industrial biotechnology. For 34 fermentation products such as ethanol, anaerobic process conditions are required to maximize product 35 yields and to minimize both cooling costs and complexity of bioreactors3. While S. cerevisiae is applied in 36 many large-scale processes and is readily accessible to modern genome-editing techniques4,5, several 37 non-Saccharomyces yeasts have traits that are attractive for industrial application. In particular, the high 38 maximum growth temperature of thermotolerant yeasts, such as Kluyveromyces marxianus (up to 50 °C 39 as opposed to 39 °C for S. cerevisiae), could enable lower cooling costs6–8. Moreover, it could reduce the 40 required dosage of fungal polysaccharide hydrolases during simultaneous saccharification and 41 fermentation (SSF) processes9,10. However, as yet unidentified oxygen requirements hamper 42 implementation of K. marxianus in large-scale anaerobic processes11–13. 43 In S. cerevisiae, fast anaerobic growth on synthetic media requires supplementation with a source of 44 unsaturated fatty acids (UFA), sterols, as well as several vitamins14–17. These nutritional requirements 45 reflect well-characterized, oxygen-dependent biosynthetic reactions. UFA synthesis involves the oxygen-46 dependent acyl-CoA desaturase Ole1, NAD+ synthesis depends on the oxygenases Bna2, Bna4, and Bna1, 47 while synthesis of ergosterol, the main yeast sterol, even requires 12 moles of oxygen per mole. 48 Oxygen-dependent reactions in NAD+ synthesis can be bypassed by nutritional supplementation of 49 nicotinic acid, which is a standard ingredient of synthetic media for cultivation of S. cerevisiae17,18. 50 Ergosterol and the UFA source Tween 80 (polyethoxylated sorbitan oleate) are routinely included in 51 media for anaerobic cultivation as ‘anaerobic growth factors’ (AGF)15,17,19. Under anaerobic conditions, S. 52 cerevisiae imports exogenous sterols via the ABC transporters Aus1 and Pdr1120. Mechanisms for uptake 53 and hydrolysis of Tween 80 by S. cerevisiae are unknown but, after its release, oleate is activated by the 54 acyl-CoA synthetases Faa1 and Faa421,22. 55 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 Outside the whole-genome duplicated (WGD) clade of Saccharomycotina yeasts, only few yeasts 56 (including Candida albicans and Brettanomyces bruxellensis) are capable of anaerobic growth in 57 synthetic media supplemented with vitamins, ergosterol and Tween 8012,13,23,24. However, most currently 58 known yeast species readily ferment glucose to ethanol and carbon dioxide when exposed to oxygen-59 limited growth conditions13,25,26, indicating that they do not depend on respiration for energy 60 conservation. The inability of the large majority of facultatively fermentative yeast species to grow 61 under strictly anaerobic conditions is therefore commonly attributed to incompletely understood 62 oxygen requirements for biosynthetic processes11. Several oxygen-requiring processes have been 63 proposed including involvement of a respiration-coupled dihydroorotate dehydrogenase in pyrimidine 64 biosynthesis, limitations in uptake and/or metabolism of anaerobic growth factors, and redox-cofactor 65 balancing constraints11,13,27. 66 Quantitation, identification and elimination of oxygen requirements in non-Saccharomyces yeasts is 67 hampered by the very small amounts of oxygen required for non-dissimilatory purposes. For example, 68 preventing entry of the small amounts of oxygen required for sterol and UFA synthesis in laboratory-69 scale bioreactor cultures of S. cerevisiae requires extreme measures, such as sparging with ultra-pure 70 nitrogen gas and use of tubing and seals that are resistant to oxygen diffusion25,28. This technical 71 challenge contributes to conflicting reports on the ability of non-Saccharomyces yeasts to grow 72 anaerobically, as exemplified by studies on the thermotolerant yeast K. marxianus29–31. Paradoxically, 73 the same small oxygen requirements can represent a real challenge in large-scale bioreactors, in which 74 oxygen availability is limited by low surface-to-volume ratios and vigorous carbon-dioxide production. 75 Identification of the non-dissimilatory oxygen requirements of non-conventional yeast species is 76 required to eliminate a key bottleneck for their application in industrial anaerobic processes and, on a 77 fundamental level, can shed light on the roles of oxygen in eukaryotic metabolism. The goal of this study 78 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 was to identify and eliminate the non-dissimilatory oxygen requirements of the facultatively 79 fermentative, thermotolerant yeast K. marxianus. To this end, we analyzed and compared physiological 80 and transcriptional responses of K. marxianus and S. cerevisiae to different oxygen- and anaerobic-81 growth factor limitation regimes in chemostat cultures. Based on the outcome of this comparative 82 analysis, subsequent experiments focused on characterization and engineering of sterol metabolism and 83 yielded K. marxianus strains that grew anaerobically at 45 °C. 84 Results 85 K. marxianus and S. cerevisiae show different physiological responses to extreme oxygen limitation 86 To investigate oxygen requirements of K. marxianus, physiological responses of strain CBS6556 were 87 studied in glucose-grown chemostat cultures operated at a dilution rate of 0.10 h-1 and subjected to 88 different oxygenation and AGF limitation regimes (Fig. 1a). Physiological parameters of K. marxianus in 89 these cultures were compared to those of S. cerevisiae CEN.PK113-7D subjected to the same cultivation 90 regimes. 91 In glucose-limited, aerobic chemostat cultures (supplied with 0.5 L air·min-1, corresponding to 54 mmol 92 O2 h-1), the Crabtree-negative yeast K. marxianus32 and the Crabtree-positive yeast S. cerevisiae33 both 93 exhibited a fully respiratory dissimilation of glucose, as evident from absence of ethanol production and 94 a respiratory quotient (RQ) close to 1 (Table 1). Apparent biomass yields on glucose of both yeasts 95 exceeded 0.5 g biomass (g glucose)-1 and were approximately 10 % higher than previously reported due 96 to co-consumption of ethanol, which was used as solvent for the anaerobic growth factor ergosterol32,34. 97 At a reduced oxygen-supply rate of 0.4 mmol O2 h-1 , both yeasts exhibited a mixed respiro-fermentative 98 glucose metabolism. RQ values close to 50 and biomass-specific ethanol-production rates of 11.5 ± 0.6 99 mmol·g·h-1 for K. marxianus and 7.5 ± 0.1 mmol·g·h-1 for S. cerevisiae (Table 1), indicated that glucose 100 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 dissimilation in these cultures was predominantly fermentative. Biomass-specific rates of glycerol 101 production which, under oxygen-limited conditions, enables re-oxidation of NADH generated in 102 biosynthetic reactions35, were approximately 2.5-fold higher (p = 2.3·10-4) in K. marxianus than in S. 103 cerevisiae. Glycerol production showed that the reduced oxygen-supply rate constrained mitochondrial 104 respiration. However, low residual glucose concentrations (Table 1) indicated that sufficient oxygen was 105 provided to meet most or all of the biosynthetic oxygen requirements of K. marxianus. 106 To explore growth of K. marxianus under an even more stringent oxygen-limitation, we exploited 107 previously documented challenges in achieving complete anaerobiosis in laboratory bioreactors19,28. 108 Even in chemostats sparged with pure nitrogen, S. cerevisiae grew on synthetic medium lacking Tween 109 80 and ergosterol, albeit at an increased residual glucose concentration (Fig. 1, Table 1). In contrast, K. 110 marxianus cultures sparged with pure N2 and supplemented with both AGFs consumed only 20 % of the 111 glucose fed to the cultures. These severely oxygen-limited cultures showed a residual glucose 112 concentration of 15.9 ± 0.3 g·L-1 and a low but constant biomass concentration of 0.4 ± 0.0 g·L-1. This 113 pronounced response of K. marxianus to extreme oxygen-limitation provided an experimental context 114 for further analyzing its unknown oxygen requirements. 115 S. cerevisiae can import exogenous sterols under severely oxygen-limited or anaerobic conditions20. If 116 the latter were also true for K. marxianus, omission of ergosterol from the growth medium of severely 117 oxygen-limited cultures would increase biomass-specific oxygen requirements and lead to an even lower 118 biomass concentration. In practice however, omission of ergosterol led to a small increase of the 119 biomass concentration and a corresponding decrease of the residual glucose concentration in severely 120 oxygen-limited chemostat cultures (Fig. 1b, Table 1). This observation suggested that, in contrast to S. 121 cerevisiae, K. marxianus cannot replace de novo oxygen-dependent sterol synthesis by uptake of 122 exogenous sterols. 123 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 Fig. 1 | Chemostat cultivation of S. cerevisiae CEN.PK113-7D and K. marxianus CBS6556 under 124 different aeration and anaerobic-growth-factor (AGF) supplementation regimes. The ingoing gas flow 125 of all cultures was 500 mL·min-1, with oxygen partial pressures of 21·104 ppm (O21·104), 840 ppm 126 (O840), or < 0.5 ppm (O0.5). The AGFs Ergosterol (E) and/or Tween 80 (T) were added to media as 127 indicated. a, Schematic representation of experimental set-up. Data for each cultivation regime were 128 obtained from independent replicate chemostat cultures. b, Residual glucose concentrations and 129 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 biomass-specific oxygen consumption rates (qO2) under different aeration and AGF-supplementation 130 regimes. Data represent mean and standard deviation of independent replicate chemostat cultures. c, 131 Distribution of consumed glucose over biomass and products in chemostat cultures of S. cerevisiae (left 132 column) and K. marxianus (right column), normalized to a glucose uptake rate of 1.00 mol·h-1. Numbers 133 in boxes indicate averages of measured metabolite formation rates (mol·h-1) and biomass production 134 rates (g dry weight·h-1) for each aeration and AGF supplementation regime. 135 Table 1 | Physiology of S. cerevisiae CEN.PK113-7D and K. marxianus CBS6556 in glucose-grown 136 chemostat cultures with different aeration and anaerobic-growth-factor (AGF) supplementation 137 regimes. Cultures were grown at pH 6.0 on synthetic medium with urea as nitrogen source and 7.5 g·L-1 138 glucose (aerobic cultures) or 20 g·L-1 glucose (oxygen-limited cultures) as carbon and energy source. 139 Data are represented as mean ± SE of data from independent chemostat cultures for each condition. 140 The AGFs ergosterol (E) and Tween 80 (T) were added to the media as indicated. Cultures were aerated 141 at 500 mL·min-1 with gas mixtures containing 21·104 ppm O2 (O21·104), 840 ppm O2 (O840) or < 0.5 ppm 142 O2 (O0.5). Tween 80 was omitted from media used for aerobic cultivation to prevent excessive foaming. 143 Ethanol measurements were corrected for evaporation (Supplementary Fig. 1). Positive and negative 144 biomass-specific conversion rates (q) represent consumption and production rates, respectively. 145 S. cerevisiae CEN.PK113-7D K. marxianus CBS6556 Condition 1 2 3 4 5 1 2 3 4 Aeration regime O21·1 04 O840 O0.5 O0.5 O0.5 O21·1 04 O840 O0.5 O0.5 AGF E TE TE T - E TE TE T Replicates 3 3 2 5 2 2 5 2 2 D (h-1) 0.10 ± 0.00 0.10 ± 0.00 0.10 ± 0.00 0.10 ± 0.00 0.10 ± 0.00 0.10 ± 0.00 0.11 ± 0.01 0.12 ± 0.01 0.12 ± 0.01 Biomass (g·L-1) 4.22 ± 0.06 2.29 ± 0.04 1.98 ± 0.01 1.56 ± 0.03 1.12 ± 0.02 3.79 ± 0.02 1.57 ± 0.10 0.35 ± 0.02 0.50 ± 0.04 Residual glucose (g·L-1) 0.00 ± 0.00 0.07 ± 0.00 0.06 ± 0.02 0.23 ± 0.04 1.47 ± 0.01 0.00 ± 0.00 0.10 ± 0.02 15.92 ± 0.26 13.67 ± 0.16 Y biomass/glucose (g·g-1) 0.57 ± 0.01 0.12 ± 0.00 0.10 ± 0.00 0.08 ± 0.00 0.06 ± 0.00 0.53 ± 0.00 0.08 ± 0.00 0.09 ± 0.00 0.09 ± 0.01 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 Y ethanol/glucose (g·g-1) - 1.67 ± 0.06 1.63 ± 0.02 1.65 ± 0.02 1.68 ± 0.02 - 1.53 ± 0.03 1.31 ± 0.05 1.40 ± 0.02 qglucose (mmol·g·h-1) -0.95 ± 0.03 -4.59 ± 0.10 -5.25 ± 0.04 -6.77 ± 0.27 -9.06 ± 0.15 -1.05 ± 0.00 -7.46 ± 0.30 -7.30 ± 0.81 -8.53 ± 0.00 qethanol (mmol·g·h-1) -0.44 ± 0.03 7.48 ± 0.10 8.40 ± 0.02 10.96 ± 0.56 15.03 ± 0.47 -0.52 ± 0.00 11.49 ± 0.44 10.25 ± 0.66 12.69 ± 0.11 RQ 1.08 ± 0.02 52.2 ± 2.4 - - - 1.06 ± 0.01 49.3 ± 7.5 - - Glycerol/biomass (mmol·(g biomass)-1) 0.00 ± 0.00 3.67 ± 0.05 5.58 ± 0.02 6.73 ± 0.25 11.26 ± 0.40 0.00 ± 0.00 9.51 ± 0.46 16.90 ± 0.76 18.45 ± 2.09 Carbon recovery (%) 99.9 ± 0.7 101.2 ± 3.3 100.4 ± 0.1 100.1 ± 1.3 104.0 ± 0.2 100.5 ± 0.1 91.1 ± 2.0 101.6 ± 6.5 99.7 ± 3.9 Degree of reduction recovery (%) 98.4 ± 0.7 100.9 ± 0.8 100.1 ± 0.9 98.1 ± 0.6 100.1 ± 1.8 98.8 ± 0.1 94.5 ± 0.4 97.8 ± 6.2 99.1 ± 3.5 146 Transcriptional responses of K. marxianus to oxygen limitation involve ergosterol metabolism 147 To further investigate the non-dissimilatory oxygen requirements of K. marxianus, transcriptome 148 analyses were performed on cultures of S. cerevisiae and K. marxianus grown under the aeration and 149 anaerobic-growth-factor supplementation regimes discussed above. The genome sequence of K. 150 marxianus CBS6556 was only available as draft assembly and was not annotated36. Therefore, long-read 151 genome sequencing, assembly and de novo genome annotation were performed, the annotation was 152 refined by using transcriptome assemblies (Data availability). Comparative transcriptome analysis of S. 153 cerevisiae and K. marxianus focused on orthologous genes with divergent expression patterns that 154 revealed a strikingly different transcriptional response to growth limitation by oxygen and/or anaerobic-155 growth-factor availability (Fig. 2). 156 In S. cerevisiae, import of exogenous sterols by Aus1 and Pdr11 can alleviate the impact of oxygen 157 limitation on sterol biosynthesis20. Consistent with this role of sterol uptake, sterol biosynthetic genes in 158 S. cerevisiae were only highly upregulated in severely oxygen-limited cultures when ergosterol was 159 omitted from the growth medium (Fig. 3b, Supplementary Fig. 6, contrast 43). Also the mevalonate 160 pathway for synthesis of the sterol precursor squalene, which does not require oxygen, was upregulated 161 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 (contrast 43), reflecting a relief of feedback regulation by ergosterol37. In contrast, K. marxianus showed 162 a pronounced upregulation of genes involved in sterol, isoprenoid and fatty-acid metabolism (Fig. 2ab, 163 Fig. 3, contrast 31) in severely oxygen-limited cultures supplemented with ergosterol and Tween 80. No 164 further increase of the expression levels of sterol biosynthetic genes was observed upon omission of 165 these anaerobic growth factors from the medium of these cultures (Supplementary Fig. 6, contrast 43). 166 These observations suggested that K. marxianus may be unable to import ergosterol when sterol 167 synthesis is compromised. Consistent with this hypothesis, co-orthology prediction with Proteinortho38 168 revealed no orthologs of the S. cerevisiae sterol transporters Aus1 and Pdr11 in K. marxianus. 169 K. marxianus harbors two dihydroorotate dehydrogenases, a cytosolic fumarate-dependent enzyme 170 (KmUra1) and a mitochondrial quinone-dependent enzyme (KmUra9). In vivo activity of the latter 171 requires oxygen because the reduced quinone is reoxidized by the mitochondrial respiratory chain39. 172 Consistent with these different oxygen requirements, KmURA9 was down-regulated under severely 173 oxygen-limited conditions, while KmURA1 was upregulated (Fig. 2b, contrast 31). Upregulation of 174 KmURA1 coincided with increased production of succinate (Table 1). 175 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 Fig. 2 | Transcriptional response of K. marxianus and S. cerevisiae to oxygen limitation and sterol, 176 Tween 80 supplementation. Transcriptome analyses were performed for each cultivation regime (1 to 177 5) of S. cerevisiae CEN.PK113-7D (scer) and K. marxianus CBS6556 (kmar). Data for each regime were 178 obtained from independent replicate chemostat cultures (Fig. 1). a, Comparison of GO-term gene-set 179 enrichment analysis of biological processes in contrast 31 of S. cerevisiae and K. marxianus with short 180 description of GO-terms (Supplementary Fig. 2-5). GO-terms were vertically ordered based on their 181 distinct directionality calculated with Piano40 with GO-terms enriched solely with up-regulated genes 182 (blue) at the top, GO-terms with mixed- or no-directionality in the middle (white) and GO-terms with 183 solely down-regulated genes at the bottom (brown). b, c, d, Subsets of differentially expressed 184 orthologous genes obtained from the gene-set analyses for both yeasts in contrasts 31 and 43, and with 185 genes without orthologs depicted with logFC value of 0 in the respective yeast. b, S. cerevisiae genes 186 previously shown as consistently upregulated under anaerobic conditions in four different nutrient-187 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 limitations41. c, As described for panel b but for downregulated genes. d, Differentially expressed genes 188 uniquely found in this study. e, f, g, h, Highlighted gene-sets showing divergent expression patterns 189 across the two yeasts. e, S. cerevisiae genes upregulated in contrast 31 but downregulated in K. 190 marxianus. f, S. cerevisiae genes downregulated in contrast 31 but upregulated in K. marxianus. g, h, 191 Similar to e and f but for contrast 43. 192 Fig. 3 | Different transcriptional regulation of ergosterol-biosynthesis in K. marxianus and S. 193 cerevisiae. a, RNAseq was performed on independent replicate chemostat cultures of S. cerevisiae 194 CEN.PK113-7D and K. marxianus CBS6556 for each aeration and anaerobic-growth-factor 195 supplementation regime (1 to 5; Fig. 1). b, Transcriptional differences in the mevalonate- and 196 ergosterol-pathway genes of S. cerevisiae and K. marxianus for contrasts 21 (O2 840 TE |O 21·104 E), 31 197 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 (O2 0.5 TE | O 21·104 E), 32 (O2 0.5 TE | O2 840 TE), 43 (O2 0.5 T | O2 0.5 TE), 54 (O2 0.5 | O2 0.5 T). 198 Lumped biochemical reactions are represented by arrows. Colors indicate up- (blue) or down-regulation 199 (brown) with color intensity indicating the log 2 fold change with color range capped to a maximum of 4. 200 Reactions are annotated with corresponding gene, K. marxianus genes are indicated with the name of 201 the S. cerevisiae orthologs. Ergosterol uptake by S. cerevisiae requires additional factors beyond the 202 membrane transporters Aus1 and Pdr1142. No orthologs of the sterol-transporters or Hmg2 were 203 identified for K. marxianus and low read counts for Erg3, Erg9 and Erg20 precluded differential gene 204 expression analysis across all conditions (dark grey). Enzyme abbreviations: Erg10 acetyl-CoA 205 acetyltransferase, Erg13 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) synthase, Hmg1/Hmg2 HMG-CoA 206 reductase, Erg12 mevalonate kinase, Erg8 phosphomevalonate kinase, Mvd1 mevalonate 207 pyrophosphate decarboxylase, Idi1 isopentenyl diphosphate:dimethylallyl diphosphate (IPP) isomerase, 208 Erg20 farnesyl pyrophosphate synthetase, Erg9 farnesyl-diphosphate transferase (squalene synthase), 209 Erg7 lanosterol synthase, Erg11 lanosterol 14α-demethylase, Cyb5 cytochrome b5 (electron donor for 210 sterol C5-6 desaturation), Ncp1 NADP-cytochrome P450 reductase, Erg24 C-14 sterol reductase, Erg25 C-211 4 methyl sterol oxidase, Erg26 C-3 sterol dehydrogenase, Erg27 3-keto-sterol reductase, Erg28 212 endoplasmic reticulum membrane protein (may facilitate protein-protein interactions between Erg26 213 and Erg27, or tether these to the ER), Erg6 Δ24-sterol C-methyltransferase, Erg2 Δ24-sterol C-214 methyltransferase, Erg3 C-5 sterol desaturase, Erg5 C-22 sterol desaturase, Erg4 C24/28 sterol 215 reductase, Aus1/Pdr11 plasma-membrane sterol transporter. 216 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 Absence of sterol import in K. marxianus 217 To test the hypothesis that K. marxianus lacks a functional sterol-uptake mechanism, uptake of 218 fluorescent sterol derivative 25-NBD-cholesterol (NBDC) was measured by flow cytometry43. Since S. 219 cerevisiae sterol transporters are not expressed in aerobic conditions20 and to avoid interference of 220 sterol synthesis, NBDC uptake was analysed in anaerobic cell suspensions (Fig. 4a). Four hours after 221 NBDC addition to cell suspensions of the reference strain S. cerevisiae IMX585, median single-cell 222 fluorescence increased by 66-fold (Fig. 4bc). In contrast, the congenic sterol-transporter-deficient strain 223 IMK809 (aus1Δ pdr11Δ) only showed a 6-fold increase of fluorescence, probably reflected detergent-224 resistant binding of NBDC to S. cerevisiae cell-wall proteins43,44. K. marxianus strains CBS6556 and 225 NBRC1777 did not show increased fluorescence, neither after 4 h nor after 23 h of incubation with NBDC 226 (< 2-fold, Fig. 4bc, Supplementary Fig. 7). 227 Fig. 4 | Uptake of the fluorescent sterol derivative NBDC by S. cerevisiae and K. marxianus strains. a, 228 Experimental approach. S. cerevisiae strains IMX585 (reference) and IMK809 (aus1Δ pdr11Δ), and K. 229 marxianus strains NBRC1777 and CBS6556 were each anaerobically incubated in four replicate shake-230 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 flask cultures. NBDC and Tween 80 (NBDC T) were added to two cultures, while only Tween 80 (T) was 231 added to the other two. After 4 h incubation, cells were stained with propidium iodide (PI) and analysed 232 by flow cytometry. PI staining was used to eliminate cells with compromised membrane integrity from 233 analysis of NBDC fluorescence. Cultivation conditions and flow cytometry gating are described in 234 Methods and in Supplementary Fig. 8, Supplementary Data set 1 and 2. b, Median and pooled standard 235 deviation of fluorescence intensity (λex 488 nm | λem 533/30 nm, FL1-A) of PI-negative cells with variance 236 of biological replicates after 4 h exposure to Tween 80 (white bars) or Tween 80 and NBDC (blue bars). 237 Variance was pooled for the strains IMX585, CBS6556 and NBRC1777 by repeating the experiment. c, 238 NBDC fluorescence-intensity distribution of cells in a sample from a single culture for each strain, shown 239 as modal-scaled density function. Dashed lines represent background fluorescence of unstained cells of 240 S. cerevisiae and K. marxianus. Fluorescence data for 23-h incubations with NBDC are shown in 241 Supplementary Fig. 7. 242 Engineering K. marxianus for oxygen-independent growth 243 Sterol uptake by S. cerevisiae, which requires cell wall proteins as well as a membrane transporter, has 244 not yet been fully resolved42,43. Instead of expressing a heterologous sterol-import system in K. 245 marxianus, we therefore explored production of tetrahymanol, which acts as a sterol surrogate in 246 strictly anaerobic fungi 45. Expression of a squalene-tetrahymanol cyclase from Tetrahymena 247 thermophila (TtSTC1), which catalyzes the single-step oxygen-independent conversion of squalene into 248 tetrahymanol (Fig. 5a), was recently shown to enable sterol-independent growth of S. cerevisiae46. 249 TtSTC1 was expressed in K. marxianus NBRC1777, which is more genetically amenable than strain 250 CBS655647. After 40 h of anaerobic incubation, the resulting strain contained 2.4 ± 0.4 mg·(g biomass)-1 251 tetrahymanol, 0.4 ± 0.1 mg·g-1 ergosterol and no detectable squalene, while strain NBRC1777 contained 252 3.5 ± 0.1 mg·g-1 squalene and 3.4 ± 0.2 mg·g-1 ergosterol (Fig. 5b). In strictly anaerobic cultures on sterol-253 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 free medium, strain NBRC1777 grew immediately after inoculation but not after transfer to a second 254 anaerobic culture (Fig. 5c), consistent with ‘carry-over’ of ergosterol from the aerobic preculture19. The 255 tetrahymanol-producing strain did not grow under these conditions (Fig. 5c) but showed sustained 256 growth under severely oxygen-limited conditions that did not support growth of strain NBRC1777 (Fig. 257 5de). Single-cell isolates derived from these oxygen-limited cultures (IMS1111, IMS1131, IMS1132, 258 IMS1133) showed instantaneous as well as sustained growth under strictly anaerobic conditions (Figure 259 5f and 5g). Tetrahymanol contents in the first, second and third cycle of anaerobic cultivation of isolate 260 IMS1111 were 7.6 ± 0.0 mg·g-1, 28.0 ± 13.0 mg·g-1 and 11.5 ± 0.1 mg·g-1, respectively (Fig. 5b), while no 261 ergosterol was detected. 262 To identify whether adaptation of the tetrahymanol-producing strain IMX2323 to anaerobic growth 263 involved genetic changes, its genome and those of the four adapted isolates were sequenced 264 (Supplementary Table 1). No copy number variations were detected in any of the four adapted isolates. 265 Only strain IMS1111 showed two non-conservative mutations in coding regions: a single-nucleotide 266 insertion in a transposon-borne gene and a stop codon at position 350 (of 496 bp) in KmCLN3, which 267 encodes for a G1 cyclin48. The apparent absence of mutations in the three other, independently adapted 268 strains indicated that their ability to grow anaerobically reflected a non-genetic adaptation. 269 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 Fig. 5 | Sterol-independent anaerobic growth of K. marxianus strains expressing TtSTC1. a, Oxygen-270 dependent sterol synthesis and cyclisation of squalene to tetrahymanol by TtStc1. b, Squalene, 271 ergosterol, and tetrahymanol contents with mean and standard error of the mean of (left panel) S. 272 cerevisiae strains IMX585 (reference), IMX1438 (sga1Δ::TtSTC1), and K. marxianus strains NBRC1777 273 (reference), IMX2323 (TtSTC1). Lipid composition of single-cell isolate IMS1111 (TtSTC1) (right panel) 274 over 3 serial transfers (C1-C3). Data from replicate cultures grown in strictly anaerobic (c, f, g) or 275 severely oxygen-limited shake-flask cultures (d, e). Aerobic grown pre-cultures were used to inoculate 276 the first anaerobic culture on SMG-urea and Tween 80, when the optical density started to stabilize the 277 cultures were transferred to new media. Data depicted are of each replicate culture (points) and the 278 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 mean (dotted line) from independent biological duplicate cultures, serial transfers cultures are 279 represented with C1-C5. Strains NBRC1777 (wild-type, upward red triangles), IMX2323 (TtSTC1, cyan 280 downward triangle), and the single-cell isolates IMS1111 (TtSTC1, orange circles), IMS1131 (TtSTC1, blue 281 circles), IMS1132 (TtSTC1, yellow circles), IMS1133 (TtSTC1, purple circles). S. cerevisiae IMX585 282 (reference, purple circle) and IMX1438 (TtSTC1, orange circles). c, Extended data with double inoculum 283 size is available in Supplementary Fig. 10. d, Extended data is available in Supplementary Fig. 9a. 284 Test of anaerobic thermotolerance and selection for fast growing anaerobes 285 One of the attractive phenotypes of K. marxianus for industrial application is its high thermotolerance 286 with reported maximum growth temperatures of 46-52 °C49,50. To test if anaerobically growing 287 tetrahymanol-producing strains retained thermotolerance, strain IMS1111 was grown in anaerobic 288 sequential-batch-reactor (SBR) cultures (Fig. 6) in which, after an initial growth cycle at 30 °C, the growth 289 temperature was shifted to 42 °C. Specific growth at 42 °C progressively accelerated from 0.06 h-1 to 290 0.13 h-1 over 17 SBR cycles (corresponding to ca. 290 generations; Fig. 6b). A subsequent temperature 291 increase to 45 °C led to a strong decrease of the specific growth rate which, after approximately 1000 292 generations of selective growth, stabilized at approximately 0.08 h-1. Whole-population genome 293 sequencing of the evolved populations revealed no common mutations or chromosomal copy number 294 variations (Supplementary Table 1). These data show that TtSTC1-expressing K. marxianus can grow 295 anaerobically at temperatures up to at least 45 °C. 296 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 Fig. 6 | Thermotolerance and anaerobic growth of tetrahymanol-producing K. marxianus strain. The 297 strain IMS1111 was grown in triplicate sequential batch bioreactor cultivations in synthetic media 298 supplemented with 20 g·L-1 glucose and 420 mg·L-1 Tween 80 at pH 5.0. a, Experimental design of 299 sequential batch fermentation with cycles at step-wise increasing temperatures to select for faster 300 growing mutants, each cycle consisted of three phases; (i) (re)filling of the bioreactor with fresh media 301 up to 100 mL and adjustment of temperature to a new set-point, (ii) anaerobic batch fermentation at a 302 fixed culture temperature with continuous N2 sparging for monitoring of CO2 in the culture off-gas, and 303 (iii) fast broth withdrawal leaving 7 mL (14.3 fold dilution) to inoculate the next batch. b, Maximum 304 specific estimated growth rate (circles) of each batch cycle for the three independent bioreactor 305 cultivations (M3R blue, M5R orange, M6L grey) with the estimated number of generations. The growth 306 rate was calculated from the CO2 production as measured in the off-gas and should be interpreted as an 307 estimate and in some cases could not be calculated. The culture temperature profile (dotted line) for 308 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 each independent bioreactor cultivation (blue, grey, orange) consisted of a step-wise increment of the 309 temperature at the onset of the fermentation phase in each batch cycle. c, Representative section of 310 CO2 off-gas profiles of the individual bioreactor (M5R) cultivation over time with CO2 fraction (orange 311 line) and culture temperature (grey dotted line), data of the entire experiment is available in 312 Supplementary Fig. 11 (Data availability). 313 Discussion 314 Industrial production of ethanol from carbohydrates relies on S. cerevisiae, due to its capacity for 315 efficient, fast alcoholic fermentation and growth under strictly anaerobic process conditions. Many 316 facultatively fermentative yeast species outside the Saccharomycotina WGD-clade also rapidly ferment 317 sugars to ethanol under oxygen-limited conditions26, but cannot grow and ferment in the complete 318 absence of oxygen11,13,25. Identifying and eliminating oxygen requirements of these yeasts is essential to 319 unlock their industrially relevant traits for application. Here, this challenge was addressed for the 320 thermotolerant yeast K. marxianus, using a systematic approach based on chemostat-based quantitative 321 physiology, genome and transcriptome analysis, sterol-uptake assays and genetic modification. S. 322 cerevisiae, which was used as a reference in this study, shows strongly different genome-wide 323 expression profiles under aerobic and anaerobic or oxygen-limited conditions51. Although only a small 324 fraction of these differences were conserved in K. marxianus (Fig. 2), we were able to identify absence 325 of a functional sterol import system as the critical cause for its inability to grow anaerobically. Enabling 326 synthesis of the sterol surrogate tetrahymanol yielded strains that grew anaerobically at temperatures 327 above the permissive temperature range of S. cerevisiae. 328 A short adaptation phase of tetrahymanol-producing K. marxianus strains under oxygen-limited 329 conditions reproducibly enabled strictly anaerobic growth. Although this ability was retained after 330 aerobic isolation of single-cell lines, we were unable to attribute this adaptation to mutations. In 331 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 contrast to wild-type K. marxianus, a non-adapted tetrahymanol-producing strain did not show ‘carry-332 over growth’ after transfer from aerobic to strictly anaerobic conditions and adapted cultures showed 333 reduced squalene contents (Fig. 5). These observations suggest that interactions between tetrahymanol, 334 ergosterol and/or squalene influence the onset of anaerobic growth and that oxygen-limited growth 335 results in a stable balance between these lipids that is permissive for anaerobic growth. 336 Comparative genomic studies in Saccharomycotina yeasts have previously led to the hypothesis that 337 sterol transporters are absent from pre-WGD yeast species11,52. While our observations on K. marxianus 338 reinforce this hypothesis, which was hitherto not experimentally tested, they do not exclude 339 involvement of additional oxygen-requiring reactions in other non-Saccharomyces yeasts. For example, 340 pyrimidine biosynthesis is often cited as a key oxygen-requiring process in non-Saccharomyces yeasts, 341 due to involvement of a respiratory-chain-linked dihydroorotate dehydrogenase (DHOD)53,54. K. 342 marxianus, is among a small number of yeast species that, in addition to this respiration dependent 343 enzyme (KmUra9), also harbors a fumarate-dependent DHOD (KmUra1)55. In K. marxianus the activation 344 of this oxygen-independent KmUra1 is a crucial adaptation for anaerobic pyrimidine biosynthesis. The 345 experimental approach followed in the present study should be applicable to resolve the role of 346 pyrimidine biosynthesis and other oxygen-requiring reactions in additional yeast species. 347 Enabling K. marxianus to grow anaerobically represents an important step towards application of this 348 thermotolerant yeast in large-scale anaerobic bioprocesses. However, specific growth rates and biomass 349 yields of tetrahymanol-expressing K. marxianus in anaerobic cultures were lower than those of wild-type 350 S. cerevisiae strains. A similar phenotype of tetrahymanol-producing S. cerevisiae was proposed to 351 reflect an increased membrane permeability46. Additional membrane engineering or expression of a 352 functional sterol transport system is therefore required for further development of robust, anaerobically 353 growing industrial strains of K. marxianus56. 354 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 online Methods 355 Yeast strains, maintenance and shake-flask cultivation 356 Saccharomyces cerevisiae CEN.PK113-7D57,58 (MATa MAL2-8c SUC2) was obtained from Dr. Peter Kötter, 357 J.W. Goethe University, Frankfurt. Kluyveromyces marxianus strains CBS 6556 (ATCC 26548; NCYC 2597; 358 NRRL Y-7571) and NBRC 1777 (IFO 1777) were obtained from the Westerdijk Fungal Biodiversity 359 Institute (Utrecht, The Netherlands) and the Biological Resource Center, NITE (NBRC) (Chiba, Japan), 360 respectively. Stock cultures of S. cerevisiae were grown at 30 °C in an orbital shaker set at 200 rpm, in 361 500 mL shake flasks containing 100 mL YPD (10 g·L-1 Bacto yeast extract, 20 g·L-1 Bacto peptone, 20 g·L-1 362 glucose). For cultures of K. marxianus, the glucose concentration was reduced to 7.5 g·L-1. After addition 363 of glycerol to early stationary-phase cultures, to a concentration of 30 % (v/v), 2 mL aliquots were stored 364 at -80 °C. Shake-flask precultures for bioreactor experiments were grown in 100 mL synthetic medium 365 (SM) with glucose as carbon source and urea as nitrogen source (SMG-urea)17,59. For anaerobic 366 cultivation, synthetic medium was supplemented with ergosterol (10 mg·L-1) and Tween 80 (420 mg·L-1) 367 as described previously14,17,19. 368 Expression cassette and plasmid construction 369 Plasmids used in this study are described in (Table 4). To construct plasmids pUDE659 (gRNAAUS1) and 370 pUDE663 (gRNAPDR11), the pROS11 plasmid-backbone was PCR amplified using Phusion HF polymerase 371 (Thermo Scientific, Waltham, MA) with the double-binding primer 6005. PCR amplifications were 372 performed with desalted or PAGE-purified oligonucleotide primers (Sigma-Aldrich, St Louis, MO) 373 according to manufacturer’s instructions. To introduce the gRNA-encoding nucleotide sequences into 374 gRNA-expression plasmids, a 2μm fragment was first amplified with primers 11228 and 11232 375 containing the specific sequence as primer overhang using pROS11 as template. PCR products were 376 purified with genElutePCR Clean-Up Kit (Sigma-Aldrich) or Gel DNA Recovery Kit (Zymo Research, Irvine, 377 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 CA). The two DNA fragments were then assembled by Gibson Assembly (New England Biolabs, Ipswich, 378 MA) according to the manufacturer’s instructions. Gibson assembly reaction volumes were downscaled 379 to 10 µL and 0.01 pmol·µL-1 DNA fragments at 1:1 molar ratio for 1 h at 50 °C. Chemically competent E. 380 coli XL1-Blue was transformed with the Gibson assembly mix via a 5 min incubation on ice followed by a 381 40 s heat shock at 42 °C and 1 h recovery in non-selective LB medium. Transformants were selected on 382 LB agar containing the appropriate antibiotic. Golden Gate assembly with the yeast tool kit60 was 383 performed in 20 µL reaction mixtures containing 0.75 µL BsaI HF V2 (NEB, #R3733), 2 µL DNA ligase 384 buffer with ATP (New England Biolabs), 0.5 µL T7-ligase (NEB) with 20 fmol DNA donor fragments and 385 MilliQ water. Before ligation at 16 °C was initiated by addition of T7 DNA ligase, an initial BsaI digestion 386 (30 min at 37 °C) was performed. Then 30 cycles of digestion and ligation at 37 °C and 16 °C, 387 respectively, were performed, with 5 min incubation times for each reaction. Thermocycling was 388 terminated with a 5 min final digestion step at 60 °C. 389 To construct a TtSTC1 expression vector, the coding sequence of TtSTC1 (pUD696) was PCR amplified 390 with primer pair 16096/16097 and Golden gate assembled with the donor plasmids pGGkd015 (ori 391 ampR), pP2 (KmPDC1p), pYTK053 (ScADH1t) resulting in pUDE909 (ori ampR KmPDC1p-TtSTC1-392 ScADH1t). For integration of TtSTC1 cassette into the lac4 locus both upstream and downstream flanks 393 (877/878 bps) of the lac4 locus were PCR amplified with the primer pairs 14197/14198 and 394 14199/14200, respectively. An empty integration vector, pGGKd068, was constructed by BsaI golden 395 gate cloning of pYTK047 (GFP-dropout), pYTK079 (hygB), pYTK090 (kanR), pYTK073 (ConRE’), pYTK008 396 (ConLS’) together with the two lac4 homologous nucleotide sequences. Plasmid assembly was verified 397 by PCR amplification with primers 15210, 9335, 16274 and 16275 and by digestion with BsmBI (New 398 England Biolabs, #R0580). The integration vector pUDI246 with the TtSTC1 expression cassette was 399 constructed by Gibson assembly of the PCR amplified pGGKd068 and pUDE909 with primer pairs 400 16274/16275 and 16272/16273, thereby adding 20 bp overlaps for assembly. For this step, the 401 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 incubation time of the Gibson assembly was increased to 90 min. Plasmid assembly was verified by 402 diagnostic PCR amplification using DreamTaq polymerase (Thermo Scientific) with primers 5941, 8442, 403 15216 and subsequent Illumina short-read sequencing. 404 Table 2 | Strains used in this study. Abbreviations: Saccharomyces cerevisiae (Sc), Kluyveromyces 405 marxianus (Km), Tetrahymena thermophila (Tt). 406 Genus Strain Relevant genotype Reference S. cerevisiae CEN.PK113-7D MATa URA3 HIS3 LEU2 TRP1 MAL2-8c SUC2 Entian and Kötter, 2007 57 S. cerevisiae IMX585 CEN.PK113-7D can1Δ::cas9-natNT2 Mans et al., 2015 61 S. cerevisiae IMX1438 IMX585 sga1Δ::TtSTC1 Wiersma et al., 2020 46 S. cerevisiae IMK802 IMX585 aus1Δ This study S. cerevisiae IMK806 IMX585 pdr11Δ This study S. cerevisiae IMK809 IMX585 aus1Δ pdr11Δ This study K. marxianus CBS6556 URA3 HIS3 LEU2 TRP1 CBS-KNAW* K. marxianus NBRC1777 URA3 HIS3 LEU2 TRP1 NBRC** K. marxianus IMX2323 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1111 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1112 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1113 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1131 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1132 KmPDC1p-TtSTC1-ScADH1t-hygB This study K. marxianus IMS1133 KmPDC1p-TtSTC1-ScADH1t-hygB This study 407 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 Table 3 | CRISPR gRNA target sequences used in this study. gRNA target sequences are shown with 408 PAM sequences underlined. Position in ORF indicates the base pair after which the Cas9-mediated 409 double-strand break is introduced. AT score indicates the AT content of the 20-bp target sequence and 410 RNA score indicates the fraction of unpaired nucleotides of the 20-bp target sequence, predicted with 411 the complete gRNA sequence using a minimum free energy prediction by the RNAfold algorithm62. 412 Locus Target sequence (5'-3') Position in ORF (bp) AT score RNA score AUS1 CATTATTGTAAATGATTTGGTGG 320/4184 0.75 1 PDR11 ATCTTTCATATAAATAACATAGG 1627/4235 0.85 1 413 Table 4 | Plasmids used in this study. Restriction enzyme recognition sites are indicated in superscript. 414 US/DS represent upstream and downstream homologous recombination sequences used for genomic 415 integration into the K. marxianus lac4 locus. Abbreviations: Saccharomyces cerevisiae (Sc), 416 Kluyveromyces marxianus (Km), Tetrahymena thermophila (Tt). 417 Plasmid Characteristics Source pGGkd015 ori ampR ConLS GFP ConR1 Hassing et al., 2019 63 pGGKd068 ori kanR NotIKmlac4US BsmBIConRE’BsaIsfGFPBsaI ConLS’BsmBI hygB Kmlac4DSNotI This study pP2 ori camR KmPDC1p Rajkumar et al., 2019 47 pROS11 ori ampR 2μm amdSYM pSNR52-gRNACAN1 pRSNR52-gRNAADE2 Mans et al., 2015 61 pUD696 ori kanR TtSTC1 Wiersma et al., 2020 46 pUDE659 ori ampR 2μm amdSYM pSNR52-gRNAAUS1 pRSNR52-gRNAAUS1 This study pUDE663 ori ampR 2μm amdSYM pSNR52-gRNAPDR11 pRSNR52-gRNAPDR11 This study pUDE909 ori ampR KmPDC1p-TtSTC1-ScADH1t This study pUDI246 ori kanR NotIKmlac4US KmPDC1p-TtSTC1-ScADH1t hygB Kmlac4DSNotI This study pYTK008 ori camR ConLS’ Lee et al., 2015 60 pYTK047 ori camR GFP dropout Lee et al., 2015 60 pYTK053 ori camR ScADH1t Lee et al., 2015 60 pYTK073 ori camR ConRE' Lee et al., 2015 60 pYTK079 ori camR hygB Lee et al., 2015 60 418 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Table 5 | Oligonucleotide primers used in this study. 419 Primer Sequence (5'->3') 11228 TGCGCATGTTTCGGCGTTCGAAACTTCTCCGCAGTGAAAGATAAATGATCCATTATTGTAAATGATTTGGGTTTTA GAGCTAGAAATAGCAAGTTAAAATAAG 11232 TGCGCATGTTTCGGCGTTCGAAACTTCTCCGCAGTGAAAGATAAATGATCATCTTTCATATAAATAACATGTTTTA GAGCTAGAAATAGCAAGTTAAAATAAG 11233 TAGTAAAGACTGCTGTAATTCATCTCTCAGTCCTTGCAGTCTGCTTTTTCTGGAATTAATTACCATTTTTAAATAT ATTTCTACTTTCTACTTAATAGCAATTTTAATTAATCTAATTAT 11234 ATAATTAGATTAATTAAAATTGCTATTAAGTAGAAAGTAGAAATATATTTAAAAATGGTAATTAATTCCAGAAAAA GCAGACTGCAAGGACTGAGAGATGAATTACAGCAGTCTTTACTA 11241 TAGCAAAAAAATTCACAACTAAACACGATAGAGTAAAATTAGAGAAGCAACGCCTCGCGGTCAGTGAATAGCGTTC CGTTAGAAAACATTCAAAATTACCTAATACTATTCAACAGTTCT 11242 AGAACTGTTGAATAGTATTAGGTAATTTTGAATGTTTTCTAACGGAACGCTATTCACTGACCGCGAGGCGTTGCTT CTCTAATTTTACTCTATCGTGTTTAGTTGTGAATTTTTTTGCTA 11243 TGTCACTACAGCCACAGCAG 11244 TTGGTAAGGCGCCACACTAG 11251 AGAGAAGCGCCACATAGACG 11252 TGCATATGCTACGGGTGACG 11897 CACCCAAGTATGGTGGGTAG 14148 AAGCATCGTCTCATCGGTCTCATATGTCAATTTCAAAGTACTTCACTCCCGTTGCTGAC 14149 TTATGCCGTCTCAGGTCTCAGGATTTAGTTCTGTACAGGCTTCTTC 14150 TTATGCCGTCTCAGGTCTCAAGAATTAGTTCTGTACAGGCTTCTTC 14151 AAGCATCGTCTCATCGGTCTCATATGTCTTTCACTAAAATCGCTGCCTTATTAG 14152 TTATGCCGTCTCAGGTCTCAGGATATCATAAGAGCATAGCAGCGGCACCGGCAATAG 14197 AAGCATCGTCTCATCGGTCTCACAATGAAAGTGATTGAAGAACCCTCAAAC 14198 TTATGCCGTCTCAGGTCTCAAGGGTTAAGCAATTGGATCCTACC 14199 AAGCATCGTCTCATCGGTCTCAGAGTTGCTTAATTAGCTTGTACATGGCTTTG 14200 TTATGCCGTCTCAGGTCTCATCGGGAAGGCCCATATTGAAGACG 14339 CCCAAATCATTTACAATAATGGATCATTTATC 14340 CATGTTATTTATATGAAAGATGATCATTTATC 16366 GTCCCTAGGTTCGTCATT 16367 CAAGATCAATGGTGGCTCTC 420 Strain construction 421 The lithium-acetate/polyethylene-glycol method was used for yeast transformation64. Homologous 422 repair (HR) DNA fragments for markerless CRISPR-Cas9-mediated gene deletions in S. cerevisiae were 423 constructed by annealing two 120 bp primers, using primer pairs 11241/11242 and 11233/11234 for 424 deletion of PDR11 and AUS1, respectively. After transformation of S. cerevisiae IMX585 with gRNA 425 plasmids pUDE659 and pUDE663 and double-stranded repair fragments, transformants were selected 426 on synthetic medium with acetamide as sole nitrogen source65. Deletion of AUS1 and PDR11 was 427 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 confirmed by PCR amplification with primer pairs 11243/11244 and 11251/11252, respectively. Loss of 428 gRNA plasmids was induced by cultivation of single-colony isolates on YPD, after which plasmid loss was 429 assessed by absence of growth of single-cell isolates on synthetic medium with acetamide as nitrogen 430 source. An aus1Δ pdr11Δ double-deletion strain was similarly constructed by chemical transformation of 431 S. cerevisiae IMK802 with pUDE663 and repair DNA. To integrate a TtSTC1 expression cassette into the 432 K. marxianus lac4 locus, K. marxianus NBRC1777 was transformed with 2 μg DNA NotI-digested 433 pUDI246. After centrifugation, cells were resuspended in YPD and incubated at 30 °C for 3 h. Cells were 434 then again centrifuged, resuspended in demineralized water and plated on 200 µg·L-1 hygromycin B 435 (InvivoGen, Toulouse, France) containing agar with 40 µg·L-1 X-gal, 5-bromo-4-chloro-3-indolyl-β-D-436 galactopyranoside (Fermentas, Waltham, MA). Colonies that could not convert X-gal were analyzed for 437 correct genomic integration of the TtSTC1 by diagnostic PCR with primers 16366, 16367 and 11897. 438 Genomic integration of TtSTC1 into the chromosome outside the lac4 locus was confirmed by short-read 439 Illumina sequencing. 440 Chemostat cultivation 441 Chemostat cultures were grown at 30 °C in 2 L bioreactors (Applikon, Delft, the Netherlands) with a 442 stirrer speed of 800 rpm. The dilution rate was set at 0.10 h-1 and a constant working volume of 1.2 L 443 was maintained by connecting the effluent pump to a level sensor. Cultures were grown on synthetic 444 medium with vitamins17. Concentrated glucose solutions were autoclaved separately at 110 °C for 20 445 min and added at the concentrations indicated, along with sterile antifoam pluronic 6100 PE (BASF, 446 Ludwigshafen, Germany; final concentration 0.2 g·L-1). Before autoclaving, bioreactors were tested for 447 gas leakage by submerging them in water while applying a 0.3 bar overpressure. 448 Anaerobic conditions of bioreactor cultivations were maintained by continuous reactor headspace 449 aeration with pure nitrogen gas (≤ 0.5 ppm O2, HiQ Nitrogen 6.0, Linde AG, Schiedam, the Netherlands) 450 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 at a flowrate of 500 mL N2 min-1 (2.4 vvm). Gas pressure of 1.2 bar of the reactor headspace was set with 451 a reduction valve (Tescom Europe, Hannover, Germany) and remained constant during cultivation. To 452 prevent oxygen diffusion into the cultivation the bioreactor was equipped with Fluran tubing (14 Barrer 453 O2, F-5500-A, Saint-Gobain, Courbevoie, France), Viton O-rings (Eriks, Alkmaar, the Netherlands), and no 454 pH probes were mounted. The medium reservoir was deoxygenated by sparge aeration with nitrogen 455 gas (≤ 1 ppm O2, HiQ Nitrogen 5.0, Linde AG). 456 For aerobic cultivation the reactor was sparged continuously with dried air at a flowrate of 500 mL air 457 min-1 (2.4 vvm). Dissolved oxygen levels were analyzed by Clark electrodes (AppliSens, Applikon) and 458 remained above 40% during the cultivation. For micro-aerobic cultivations nitrogen (≤ 1 ppm O2, HiQ 459 Nitrogen 5.0, Linde AG) and air were mixed continuously by controlling the fractions of mass flow rate of 460 the dry gas to a total flow of 500 mL min-1 per bioreactor. The mixed gas was distributed to each 461 bioreactor and analyzed separately in real-time. Continuous cultures were assumed to be in steady state 462 when after at least 5 volumes changes, culture dry weight and the specific carbon dioxide production 463 rates changed by less than 10%. 464 Cell density was routinely measured at a wavelength of 660 nm with spectrophotometer Jenway 7200 465 (Cole Palmer, Staffordshire, UK). Cell dry weight of the cultures were determined by filtering exactly 10 466 mL of culture broth over pre-dried and weighed membrane filters (0.45 µm, Thermo Fisher Scientific), 467 which were subsequently washed with demineralized water, dried in a microwave oven (20 min, 350 W) 468 and weighed again66. 469 Metabolite analysis 470 For determination of substrate and extracellular metabolite concentrations, culture supernatants were 471 obtained by centrifugation of culture samples (5 min at 13000 rpm) and analyzed by high-performance 472 liquid chromatography (HPLC) on a Waters Alliance 2690 HPLC (Waters, MA, USA) equipped with a Bio-473 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 Rad HPX-87H ion exchange column (BioRad, Veenendaal, the Netherlands) operated at 60 °C with a 474 mobile phase of 5 mM H2SO4 at a flowrate of 0.6 mL·min-1. Compounds were detected by means of a 475 dual-wavelength absorbance detector (Waters 2487) and a refractive index detector (Waters 2410) and 476 compared to reference compounds (Sigma-Aldrich). Residual glucose concentrations in continuous 477 cultivations were determined by HPLC analysis from rapid quenched culture samples with cold steel 478 beads67. 479 Gas analysis 480 The off-gas from bioreactor cultures was cooled with a condenser (2 °C) and dried with PermaPure Dryer 481 (Inacom Instruments, Veenendaal, the Netherlands) prior to analysis of the carbon dioxide and oxygen 482 fraction with a Rosemount NGA 2000 Analyser (Baar, Switzerland). The Rosemount gas analyzer was 483 calibrated with defined mixtures of 1.98 % O2, 3.01 % CO2 and high quality nitrogen gas N6 (Linde AG). 484 Ethanol evaporation rate 485 To correct for ethanol evaporation in the continuous bioreactor cultivations the ethanol evaporation 486 rate was determined in the same experimental bioreactor set-up without the yeast. To SM glucose 487 media with urea 400 mM of ethanol was added after which the decrease in the ethanol concentration 488 was measured over time by periodic measurements and quantification by HPLC analysis over the course 489 of at least 140 hours. To reflect the media composition used for the different oxygen regimes and 490 anaerobic growth factor supplementation, the ethanol evaporation was measured for bioreactor sparge 491 aeration with Tween 80, bioreactor head-space aeration both with and without Tween 80. The ethanol 492 evaporation rate was measured for each condition in triplicate. 493 Lipid extractions & GC analysis 494 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 For analysis of triterpene and triterpenoid cell contents biomass was harvested, washed once with 495 demineralized water and stored as pellet at -80 °C before freeze-drying the pellets using an Alpha 1-4 LD 496 Plus (Martin Christ, Osterode am Harz, Germany) at -60 °C and 0.05 mbar. Freeze-dried biomass was 497 saponificated with 2.0 M NaOH (Bio-Ultra, Sigma-Aldrich) in methylation glass tubes (PYREXTM 498 Boroslicate glass, Thermo Fisher Scientific) at 70 °C. As internal standard 5α-cholestane (Sigma-Aldrich) 499 was added to the saponified biomass suspension. Subsequently tert-butyl-methyl-ether (tBME, Sigma-500 Aldrich) was added for organic phase extraction. Samples were extracted twice using tBME and dried 501 with sodium-sulfate (Merck, Darmstadt, Germany) to remove remaining traces of water. The organic 502 phase was either concentrated by evaporation with N2 gas aeration or transferred directly to an 503 injection vial (VWR International, Amsterdam, the Netherlands). The contents were measured by GC-FID 504 using Agilent 7890A Gas Chromatograph (Agilent Technologies, Santa Clara, CA) equipped with an 505 Agilent CP9013 column (Agilent). The oven was programmed to start at 80 °C for 1 min, ramp first to 280 506 °C with 60 °C·min-1 and secondly to 320 °C with a rate of 10 °C·min-1 with a final temperature hold of 15 507 min. Spectra were compared to separate calibration lines of squalene, ergosterol, α-cholestane, 508 cholesterol and tetrahymanol as described previously46. 509 Sterol uptake assay 510 Sterol uptake was monitored by the uptake of fluorescently labelled 25-NBD-cholesterol (Avanti Polar 511 Lipids, Alabaster, AL). A stock solution of 25-NBD-cholesterol (NBDC) was prepared in ethanol under an 512 argon atmosphere and stored at -20 °C. Shake flasks with 10 mL SM glucose media were inoculated with 513 yeast strains from a cryo-stock and cultivated aerobically at 200 rpm at 30 °C overnight. The yeast 514 cultures were subsequently diluted to an OD660 of 0.2 in 400 mL SM glucose media in 500 mL shake 515 flasks to gradually reduce the availability of oxygen and incubated overnight. Yeast cultures were 516 transferred to fresh SM media with 40 g·L-1 glucose and incubated under anaerobic conditions at 30 °C 517 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 at 200 rpm. After 22 hours of anaerobic incubation 4 µg·L-1 NBD-cholesterol with 420 mg·L-1 Tween 80 518 were pulsed to the cultures. Samples were taken and washed with PBS 5 mL·L-1 Tergitol NP-40 pH 7.0 519 (Sigma-Aldrich) twice before resuspension in PBS and subsequent analysis. Propidium Iodide (PI) 520 (Invitrogen) was added to the sample (20 µM) and stained according to the manufacturer’s 521 instructions68. PI intercalates with DNA in cells with a compromised cell membrane, which results in red 522 fluorescence. Samples both unstained and stained with PI were analyzed with Accuri C6 flow cytometer 523 (BD Biosciences, Franklin Lakes, NJ) with a 488 nm laser and fluorescence was measured with emission 524 filter of 533/30 nm (FL1) for NBD-cholesterol and > 670 nm (FL3) for PI. Cell gating and median 525 fluorescence of cells were determined using FlowJo (v10, BD Bioscience). Cells were gated based on 526 forward side scatter (FSC) and side-scatter (SSC) to exclude potential artifacts or clumping cells. Within 527 this gated population PI positive and negatively stained cells were differentiated based on the cell 528 fluorescence across a FL3 FL1 dimension. Flow cytometric gates were drafted for each yeast species and 529 used for all samples. The gating strategy is given in Supplementary Fig. 8. Fluorescence of a strain was 530 determined by a sample of cells from independent shake-flask cultures and compared to cells from 531 identical unstained cultures of cells with the exact same chronological age. The staining experiment of 532 the strains IMX585, CBS6556 and NBRC1777 samples was repeated twice for reproducibility, the mean 533 and pooled variance was subsequently calculated from the biological duplicates of the two experiments. 534 The NBDC intensity and cell counts obtained from the NBDC experiments are available for re-analysis in 535 Supplementary Data set 1, and raw flow cytometry plots are depicted in Supplementary Data set 2. 536 Long read sequencing, assembly, and annotation 537 Cells were grown overnight in 500-mL shake flasks containing 100 mL liquid YPD medium at 30 °C in an 538 orbital shaker at 200 rpm. After reaching stationary phase the cells were harvested for a total OD660 of 539 600 by centrifugation for 5 min at 4000 g. Genomic DNA of CBS6556 and NBRC1777 was isolated using 540 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 the Qiagen genomic DNA 100/G kit (Qiagen, Hilden, Germany) according to the manufacturer’s 541 instructions. MinION genomic libraries were prepared using the 1D Genomic DNA by ligation (SQK-542 LSK108) for CBS6556, and the 1D native barcoding Genomic DNA (EXP-NBD103 & LSK108) for NBRC1777 543 according to the manufacturer’s instructions with the exception of using 80% EtOH during the ‘End 544 Repair/dA-tailing module’ step. Flow cell quality was tested by running the MinKNOW platform QC 545 (Oxford Nanopore Technology, Oxford, UK). Flow cells were prepared by removing 20 μL buffer and 546 subsequently primed with priming buffer. The DNA library was loaded dropwise into the flow cell for 547 sequencing. The SQK-LSK108 library was sequenced on a R9 chemistry flow cell (FLO-MIN106) for 48 h. 548 Base-calling was performed using Albacore (v2.3.1, Oxford Nanopore Technologies) for CBS6556, and for 549 NBRC1777 with Guppy (v2.1.3, Oxford Nanopore Technologies) using dna_r9.4.1_450bps_flipflop.cfg. 550 CBS6556 reads were assembled using Canu (v1.8)69, and NBRC1777 reads were assembled using Flye 551 (v2.7.1-b1673)70. Assemblies were polished with Pilon (v1.18)71 using Illumina data available at the 552 Sequence Read Archive under accessions SRX3637961 and SRX3541357. Both de novo genome 553 assemblies were annotated using Funannotate (v1.7.1)72, trained and refined using de novo 554 transcriptome assemblies (see below), adding functional annotation with Interproscan (v5.25-64.0)73. 555 Illumina sequencing 556 Plasmids were sequenced on a MiniSeq (Illumina, San Diego, CA) platform. Library preparation was 557 performed with Nextera XT DNA library preparation according to the manufacturer’s instructions 558 (Illumina). The library preparation included the MiniSeq Mid Output kit (300 cycles) and the input & final 559 DNA was quantified with the Qubit HS dsDNA kit (Life Technologies, Thermo Fisher Scientific). 560 Nucleotide sequences were assembled with SPAdes74 and compared to the intended in silico DNA 561 construct. For whole-genome sequencing, yeast cells were harvested from overnight cultures and DNA 562 was isolated with the Qiagen genomic DNA 100/G kit (Qiagen) as described earlier. DNA quantity was 563 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 measured with the QuBit BR dsDNA kit (Thermo Fisher Scientific). 300 bp paired-end libraries were 564 prepared with the TruSeq DNA PCR-free library prep kit (Illumina) according to the manufacturer’s 565 instructions. Short read whole-genome sequencing was performed on a MiSeq platform (Illumina). 566 RNA isolation, sequencing and transcriptome analysis 567 Culture broth from chemostat cultures was directly sampled into liquid nitrogen to prevent mRNA 568 turnover. The cell cultures were stored at -80 °C and processed within 10 days after sampling. After 569 thawing on ice, cells were harvested by centrifugation. Total RNA was extracted by a 5 min heatshock at 570 65 °C with a mix of isoamyl alcohol, phenol and chloroform at a ratio of 125:24:1, respectively 571 (Invitrogen). RNA was extracted from the organic phase with Tris-HCl and subsequently precipitated by 572 the addition of 3 M Nac-acetate and 40 % (v/v) ethanol at -20 °C. Precipitated RNA was washed with 573 ethanol, collected and after drying resuspended in RNAse free water. The quantity of total RNA was 574 determined with a Qubit RNA BR assay kit (Thermo Fisher Scientific). RNA quality was determined by the 575 RNA integrity number with RNA screen tape using a Tapestation (Agilent). RNA libraries were prepared 576 with the TruSeq Stranded mRNA LT protocol (Illumina, #15031047) and subjected to paired-end 577 sequencing (151 bp read length, NovaSeq Illumina) by Macrogen (Macrogen Europe, Amsterdam, the 578 Netherlands). 579 Pooled RNAseq libraries were used to perform de novo transcriptome assembly using Trinity (v2.8.3)75 580 which was subsequently used as evidence for both CBS6556 and NBRC1777 genome annotations. 581 RNAseq libraries were mapped into the CBS6556 genome assembly described above, using bowtie 582 (v1.2.1.1)76 with parameters (-v 0 -k 10 --best -M 1) to allow no mismatches, select the best out of 10 583 possible alignments per read, and for reads having more than one possible alignment randomly report 584 only one. Alignments were filtered and sorted using samtools (v1.3.1)77. Read counts were obtained 585 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 with featureCounts (v1.6.0)78 using parameters (-B -C) to only count reads for which both pairs are 586 aligned into the same chromosome. 587 Differential gene expression (DGE) analysis was performed using edgeR (v3.28.1)79. Genes with 0 read 588 counts in all conditions were filtered out from the analysis, same as genes with less than 10 counts per 589 million. Counts were normalized using the trimmed mean of M values (TMM) method80, and dispersion 590 was estimated using generalized linear models. Differentially expressed genes were then calculated 591 using a log ratio test adjusted with the Benjamini-Hochberg method. Absolute log2 fold-change values > 592 2, false discovery rate < 0.5, and P value < 0.05 were used as significance cutoffs. 593 Gene set analysis (GSA) based on gene ontology (GO) terms was used to get a functional interpretation 594 of the DGE analysis. For this purpose, GO terms were first obtained for the S. cerevisiae CEN.PK113-7D 595 (GCA_002571405.2) and K. marxianus CBS6556 genome annotations using Funannotate and 596 Interproscan as described above. Afterwards, Funannotate compare was used to get (co)ortholog 597 groups of genes generated with ProteinOrtho538 using the following public genome annotations S. 598 cerevisiae S288C (GCF_000146045.2), K. marxianus NBRC1777 (GCA_001417835.1), K. marxianus 599 DMKU3-1042 (GCF_001417885.1), in addition to the new genome annotations generated here for S. 600 cerevisiae CEN.PK113-7D, and K. marxianus CBS6556 and NBRC1777. Predicted GO terms for S. 601 cerevisiae CEN.PK113-7D and K. marxianus CBS6556 were kept, and merged with those from 602 corresponding (co)orthologs from S. cerevisiae S288C. Genes with term GO:0005840 (ribosome) were 603 not considered for further analyses. GSA was then performed with Piano (v2.4.0)40. Gene set statistics 604 were first calculated with the Stouffer, Wilcoxon rank-sum test, and reporter methods implemented in 605 Piano. Afterwards, consensus results were derived by p-value and rank aggregation, considered 606 significant if absolute Fold Change values > 1. ComplexHeatmap (v2.4.3)81 was used to draw GSA results 607 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 into Fig. 2, highlighting differentially expressed genes found in a previous study51. DGE and GSA were 608 performed using R (v4.0.2)82. 609 Anaerobic growth experiments 610 Anaerobic shake-flask experiments were performed in a Bactron anaerobic workstation (BACTRON300-611 2, Sheldon Manufacturing, Cornelius, OR) at 30 °C. The gas atmosphere consisted of 85% N2, 10% CO2 612 and 5% H2 and was maintained anaerobic by a Pd catalyst. The catalyst was re-generated by heating till 613 160 °C every week and interchanged by placing it in the airlock whenever the pass-box was used. 50-mL 614 Shake flasks were filled with 40 mL (80 % volumetric) media and placed on an orbital shaker (KS 130 615 basic, IKA, Staufen, Germany) set at 240 rpm inside the anaerobic chamber. Sterile growth media was 616 placed inside the anaerobic chamber 24 h prior to inoculation to ensure complete removal of traces of 617 oxygen. 618 The anaerobic growth ability of the yeast strains was tested on SMG-urea with 50 g·L-1 glucose at pH 6.0 619 with Tween 80 prepared as described earlier. The growth experiments were started from aerobic pre-620 cultures on SMG-urea media and the anaerobic shake flasks were inoculated at an OD660 of 0.2 621 (corresponding to an OD600 of 0.14). In order to minimize opening the anaerobic chamber, culture 622 growth was monitored by optical density measurements inside the chamber using an Ultrospec 10 cell 623 density meter (Biochrom, Cambridge, UK) at a 600 nm wavelength. When the optical density of culture 624 no longer increased or decreased new shake-flask cultures were inoculated by serial transfer at an initial 625 OD600 of 0.2. 626 Laboratory evolution in low oxygen atmosphere 627 Adaptive laboratory evolution for strict anaerobic growth was performed in a Bactron anaerobic 628 workstation (BACTRON BAC-X-2E, Sheldon Manufacturing) at 30 °C. 50-mL Shake flasks were filled with 629 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 40 mL SMG-urea with 50 g·L-1 glucose and including 420 mg·L-1 Tween 80. Subsequently the shake-flask 630 media were inoculated with IMX2323 from glycerol cryo-stock at OD660 < 0.01 and thereafter placed 631 inside the anaerobic chamber. Due to frequent opening of the pass-box and lack of catalyst inside the 632 pass-box oxygen entry was more permissive. After the optical density of the cultures no longer 633 increased, cultures were transferred to new media by 40-50x serial dilution. For IMS1111, IMS1112, 634 IMS1113 three and for IMS1131, IMS1132, IMS1133 four serial transfers in shake-flask media were 635 performed after which single colony isolates were made by plating on YPD agar media with hygromycin 636 antibiotic at 30 °C aerobically. Single colony isolates were subsequently restreaked sequentially for 637 three times on the same media before the isolates were propagated in SM glucose media and glycerol 638 cryo stocked. 639 To determine if an oxygen-limited pre-culture was required for the strict anaerobic growth of IMX2323 640 strain a cross-validation experiment was performed. In parallel, yeast strains were cultivated in 50-mL 641 shake-flask cultures with SMG-urea with 50 g·L-1 glucose at pH 6.0 with Tween 80 in both the Bactron 642 anaerobic workstation (BACTRON BAC-X-2E, Sheldon Manufacturing) with low levels of oxygen-643 contamination, and in the Bactron anaerobic workstation (BACTRON300-2, Sheldon Manufacturing) with 644 strict control of oxygen-contamination. After stagnation of growth was observed in the second serial 645 transfer of the shake-flask cultures a 1.5 mL sample of each culture was taken, sealed, and used to 646 inoculate fresh-media in the other Bactron anaerobic workstation. Simultaneously, the original culture 647 was used to inoculate fresh media in the same Bactron anaerobic workstation, thereby resulting in 4 648 parallel cultures of each strain of which halve were derived from the other Bactron anaerobic 649 workstation. 650 Laboratory evolution in sequential batch reactors 651 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 Laboratory evolution for selection of fast growth at high temperatures was performed in 400-mL 652 MultiFors (Infors Benelux, Velp, the Netherlands) bioreactors with a working volume of 100 mL for the 653 strain IMS1111 on SMG 20 g·L-1 glucose media with Tween 80 in triplicate. Anaerobic conditions were 654 created and maintained by continuous aeration of the cultures with 50 mL·min-1 (0.5 vvm) N2 gas and 655 continuous aeration of the media vessels with N2 gas. The pH was set at 5.0 and maintained by the 656 continuous addition of sterile 2 M KOH. Growth was monitored by analysis of the CO2 in the bioreactor 657 off-gas and a new empty-refill cycle was initiated when the batch time had at least elapsed 15 hours and 658 the CO2 signal dropped to 70% of the maximum reached in each batch. The dilution factor of each 659 empty-refill cycle was 14.3-fold (100 mL working volume, 7 mL residual volume). The first batch 660 fermentation was performed at 30 °C after which in the second batch the temperature was increased to 661 42 °C and maintained at for 18 consecutive sequential batches. After the 18 batch cycle at 42 °C the 662 culture temperature was again increased to 45 °C and maintained subsequently. Growth rate was 663 calculated based on the CO2 production as measured by the CO2 fraction in the culture off-gas in 664 essence as described previously83. In short, the CO2 fraction in the off-gas was converted to a CO2 665 evolution rate of mmol per hour and subsequently summed over time for each cycle. The corresponding 666 cumulative CO2 profile was transformed to natural log after which the stepwise slope of the log 667 transformed data was calculated. Subsequently an iterative exclusion of datapoints of the stepwise 668 slope of the log transformed cumulative CO2 profile was performed with exclusion criteria of more than 669 one standard deviation below the mean. 670 Variant calling 671 DNA sequencing reads were aligned into the NBRC1777 described above including an additional 672 sequence with TtSTC1 construct, and used to detect sequence variants using a method previously 673 reported84. Briefly, reads were aligned using BWA (v0.7.15-r1142-dirty)85, alignments were processed 674 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 using samtools (v1.3.1)77 and Picard tools (v2.20.2-SNAPSHOT) (http://broadinstitute.github.io/picard), 675 and variants were then called using the Genome Analysis Toolkit (v3.8-1-0-gf15c1c3ef)86 HaplotypeCaller 676 in DISCOVERY and GVCF modes. Variants were only called at sites with minimum variant confidence 677 normalized by unfiltered depth of variant samples (QD) of 20, read depth (DP) ≥ 5, and genotype quality 678 (GQ) > 20, excluding a 7.1 kb region in chromosome 5 containing rDNA. Variants were annotated using 679 the genome annotation described above, including the TtSTC1 construct, with SnpEff (v5.0)87 and 680 VCFannotator (http://vcfannotator.sourceforge.net). 681 Statistics 682 Statistical test performed are given as two sided with unequal variance t-test unless specifically stated 683 otherwise. We denote technical replicates as measurements derived from a single cell culture. Biological 684 replicates are measurements originating from independent cell cultures. Independent experiments are 685 two experiments identical in set-up separated by the difference in execution days. If possible variance 686 from independent experiments with identical setup were pooled together, but independent 687 experiments from time-course experiments (anaerobic growth studies) are reported separately. p-688 values were corrected for multiple-hypothesis testing which is specifically reported each time. No data 689 was excluded based on the resulting data out-come. 690 Data availability 691 Data supporting the findings of this work are available within the paper and source data for all figures in 692 this study are available at the www.data.4TU.nl repository with the doi:10.4121/13265552. 693 The raw RNA-sequencing data that supports the findings of this study are available from the Genome 694 Expression Omnibus (GEO) website (https://www.ncbi.nlm.nih.gov/geo/) with number GSE164344. 695 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint http://www.data.4tu.nl/ https://www.ncbi.nlm.nih.gov/geo/ https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 Whole-genome sequencing data of the CBS6556, NBRC1777 and evolved strains were deposited at NCBI 696 (https://www.ncbi.nlm.nih.gov/) under BioProject accession number PRJNA679749. 697 Code availability 698 The code that were used to generate the results obtained in this study are archived in a Gitlab 699 repository (https://gitlab.tudelft.nl/rortizmerino/kmar_anaerobic). 700 Author’s contributions 701 WD and JTP designed the study and wrote the manuscript. WD performed molecular cloning, bioreactor 702 cultivation experiment, transcriptome analysis and sterol-uptake experiments. JB contributed to 703 bioreactor cultivation experiments and molecular cloning. FW contributed to the molecular cloning and 704 sterol-uptake experiments. AK and CM contributed to bioreactor experiments and transcriptome 705 studies. PdlT performed plasmid and genome sequencing. RO contributed to transcriptome analysis and 706 performed sequence annotation and assembly. 707 Acknowledgements 708 We thank Mark Bisschops and Hannes Jürgens for fruitful discussions. We thank Erik de Hulster for 709 fermentation support and Marcel van den Broek for input on the bioinformatics analyses. 710 Competing interest 711 WD and JTP are co-inventors on a patent application that covers aspects of this work. The authors 712 declare no conflict of interest. 713 Funding 714 This work was supported by Advanced Grant (grant #694633) of the European Research Council to JTP. 715 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://gitlab.tudelft.nl/rortizmerino/kmar_anaerobic https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 References 716 1. Annual World Fuel Ethanol Production. Renewable Fuels Association (2020). Available at: 717 https://ethanolrfa.org/statistics/annual-ethanol-production/. (Accessed: 2nd May 2020) 718 2. Jansen, M. L. A. et al. Saccharomyces cerevisiae strains for second-generation ethanol 719 production: from academic exploration to industrial implementation. FEMS Yeast Res. 17, 1–20 720 (2017). 721 3. Weusthuis, R. A., Lamot, I., van der Oost, J. & Sanders, J. P. M. Microbial production of bulk 722 chemicals: development of anaerobic processes. Trends Biotechnol. 29, 153–158 (2011). 723 4. Favaro, L., Jansen, T. & van Zyl, W. H. Exploring industrial and natural Saccharomyces cerevisiae 724 strains for the bio-based economy from biomass: the case of bioethanol. Crit. Rev. Biotechnol. 39, 725 800–816 (2019). 726 5. Stovicek, V., Holkenbrink, C. & Borodina, I. CRISPR/Cas system for yeast genome engineering: 727 advances and applications. FEMS Yeast Res. 17, 1–16 (2017). 728 6. Hong, J., Wang, Y., Kumagai, H. & Tamaki, H. Construction of thermotolerant yeast expressing 729 thermostable cellulase genes. J. Biotechnol. 130, 114–123 (2007). 730 7. Laman Trip, D. S. & Youk, H. Yeasts collectively extend the limits of habitable temperatures by 731 secreting glutathione. Nat. Microbiol. 5, 943–954 (2020). 732 8. Choudhary, J., Singh, S. & Nain, L. Thermotolerant fermenting yeasts for simultaneous 733 saccharification fermentation of lignocellulosic biomass. Electron. J. Biotechnol. 21, 82–92 (2016). 734 9. Thorwall, S., Schwartz, C., Chartron, J. W. & Wheeldon, I. Stress-tolerant non-conventional 735 microbes enable next-generation chemical biosynthesis. Nat. Chem. Biol. 16, 113–121 (2020). 736 10. Mejía-Barajas, J. A. et al. Second-Generation Bioethanol Production through a Simultaneous 737 Saccharification-Fermentation Process Using Kluyveromyces Marxianus Thermotolerant Yeast. in 738 Special Topics in Renewable Energy Systems (InTech, 2018). doi:10.5772/intechopen.78052 739 11. Snoek, I. S. I. & Steensma, H. Y. Why does Kluyveromyces lactis not grow under anaerobic 740 conditions? Comparison of essential anaerobic genes of Saccharomyces cerevisiae with the 741 Kluyveromyces lactis genome. FEMS Yeast Res. 6, 393–403 (2006). 742 12. Visser, W., Scheffers, W. A., Batenburg-Van der Vegte, W. H. & Van Dijken, J. P. Oxygen 743 requirements of yeasts. Appl. Environ. Microbiol. 56, 3785–3792 (1990). 744 13. Merico, A., Sulo, P., Piškur, J. & Compagno, C. Fermentative lifestyle in yeasts belonging to the 745 Saccharomyces complex. FEBS J. 274, 976–989 (2007). 746 14. Andreasen, A. A. & Stier, T. J. B. Anaerobic nutrition of Saccharomyces cerevisiae I. Ergosterol 747 requirement for growth in a defined medium. J. Cell. Physiol. 41, 23–26 (1953). 748 15. Andreasen, A. A. & Stier, T. J. B. Anaerobic nutrition of Saccharomyces cerevisiae II. Unsaturated 749 fatty acid requirement for growth in a defined medium. J. Cell. Physiol. 43, 271–281 (1953). 750 16. Passi, S. et al. Saturated dicarboxylic acids as products of unsaturated fatty acid oxidation. 751 Biochim. Biophys. Acta - Lipids Lipid Metab. 1168, 190–198 (1993). 752 17. Verduyn, C., Postma, E., Scheffers, W. A. & van Dijken, J. P. Physiology of Saccharomyces 753 Cerevisiae in Anaerobic Glucose-Limited Chemostat Cultures. J. Gen. Microbiol. 136, 395–403 754 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 41 (1990). 755 18. Perli, T., Wronska, A. K., Ortiz-Merino, R. A., Pronk, J. T. & Daran, J. M. Vitamin requirements and 756 biosynthesis in Saccharomyces cerevisiae. Yeast 1–22 (2020). doi:10.1002/yea.3461 757 19. Dekker, W. J. C., Wiersma, S. J., Bouwknegt, J., Mooiman, C. & Pronk, J. T. Anaerobic growth of 758 Saccharomyces cerevisiae CEN.PK113-7D does not depend on synthesis or supplementation of 759 unsaturated fatty acids. FEMS Yeast Res. 19, (2019). 760 20. Wilcox, L. J. et al. Transcriptional profiling identifies two members of the ATP-binding cassette 761 transporter superfamily required for sterol uptake in yeast. J. Biol. Chem. 277, 32466–32472 762 (2002). 763 21. Black, P. N. & DiRusso, C. C. Yeast acyl-CoA synthetases at the crossroads of fatty acid 764 metabolism and regulation. Biochim. Biophys. Acta - Mol. Cell Biol. Lipids 1771, 286–298 (2007). 765 22. Jacquier, N. & Schneiter, R. Ypk1, the yeast orthologue of the human serum- and glucocorticoid-766 induced kinase, is required for efficient uptake of fatty acids. J. Cell Sci. 123, 2218–2227 (2010). 767 23. Blomqvist, J., Nogue, V. S., Gorwa-Grauslund, M. & Passoth, V. Physiological requirements for 768 growth and competitveness of Dekkera bruxellensis under oxygen limited or anaerobic 769 conditions. Yeast 29, 265–274 (2012). 770 24. Zavrel, M., Hoot, S. J. & White, T. C. Comparison of sterol import under aerobic and anaerobic 771 conditions in three fungal species, Candida albicans, Candida glabrata, and Saccharomyces 772 cerevisiae. Eukaryot. Cell 12, 725–738 (2013). 773 25. Visser, W., Scheffers, W. A., Batenburg-Van der Vegte, W. H. & Van Dijken, J. P. Oxygen 774 requirements of yeasts. Appl. Environ. Microbiol. 56, 3785–3792 (1990). 775 26. Dashko, S., Zhou, N., Compagno, C. & Piškur, J. Why, when, and how did yeast evolve alcoholic 776 fermentation? FEMS Yeast Res. 14, 826–832 (2014). 777 27. Snoek, I. S. I. & Steensma, H. Y. Factors involved in anaerobic growth of Saccharomyces 778 cerevisiae. Yeast 24, 1–10 (2007). 779 28. Vale da Costa, B. L., Basso, T. O., Raghavendran, V. & Gombert, A. K. Anaerobiosis revisited: 780 growth of Saccharomyces cerevisiae under extremely low oxygen availability. Appl. Microbiol. 781 Biotechnol. 1–16 (2018). doi:10.1007/s00253-017-8732-4 782 29. Wilkins, M. R., Mueller, M., Eichling, S. & Banat, I. M. Fermentation of xylose by the 783 thermotolerant yeast strains Kluyveromyces marxianus IMB2, IMB4, and IMB5 under anaerobic 784 conditions. Process Biochem. 43, 346–350 (2008). 785 30. Hughes, S. R. et al. Automated UV-C Mutagenesis of Kluyveromyces marxianus NRRL Y-1109 and 786 Selection for Microaerophilic Growth and Ethanol Production at Elevated Temperature on 787 Biomass Sugars. J. Lab. Autom. 18, 276–290 (2013). 788 31. Tetsuya, G. et al. Bioethanol Production from Lignocellulosic Biomass by a Novel Kluyveromyces 789 marxianus Strain. Biosci. Biotechnol. Biochem. 77, 1505–1510 (2013). 790 32. van Urk, H., Postma, E., Scheffers, W. A. & van Dijken, J. P. Glucose Transport in Crabtree-positive 791 and Crabtree-negative Yeasts. J. Gen. Microbiol. 135, 2399–2406 (1989). 792 33. von Meyenburg, K. Katabolit-Repression und der Sprossungszyklus von Saccharomyces 793 cerevisiae. (ETH Zürich, 1969). doi:10.3929/ethz-a-000099923 794 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 42 34. Rouwenhorst, R. J., Visser, L. E., Van Der Baan, A. A., Scheffers, W. A. & Van Dijken, J. P. 795 Production, Distribution, and Kinetic Properties of Inulinase in Continuous Cultures of 796 Kluyveromyces marxianus CBS 6556. Appl. Environ. Microbiol. 54, 1131–1137 (1988). 797 35. Bakker, B. M. et al. Stoichiometry and compartmentation of NADH metabolism in Saccharomyces 798 cerevisiae. FEMS Microbiol. Rev. 25, 15–37 (2001). 799 36. Jeong, H. et al. Genome sequence of the thermotolerant yeast Kluyveromyces marxianus var. 800 marxianus KCTC 17555. Eukaryot. Cell 11, 1584–1585 (2012). 801 37. Jordá, T. & Puig, S. Regulation of Ergosterol Biosynthesis in Saccharomyces cerevisiae. Genes 802 (Basel). 11, 795 (2020). 803 38. Lechner, M. et al. Proteinortho: Detection of (Co-)orthologs in large-scale analysis. BMC 804 Bioinformatics 12, 124 (2011). 805 39. Nagy, M., Lacroute, F. & Thomas, D. Divergent evolution of pyrimidine biosynthesis between 806 anaerobic and aerobic yeasts. Proc. Natl. Acad. Sci. U. S. A. 89, 8966–8970 (1992). 807 40. Väremo, L., Nielsen, J. & Nookaew, I. Enriching the gene set analysis of genome-wide data by 808 incorporating directionality of gene expression and combining statistical hypotheses and 809 methods. Nucleic Acids Res. 41, 4378–4391 (2013). 810 41. Tai, S. L. et al. Two-dimensional transcriptome analysis in chemostat cultures: Combinatorial 811 effects of oxygen availability and macronutrient limitation in Saccharomyces cerevisiae. J. Biol. 812 Chem. 280, 437–447 (2005). 813 42. Alimardani, P. et al. SUT1-promoted sterol uptake involves the ABC transporter Aus1 and the 814 mannoprotein Dan1 whose synergistic action is sufficient for this process. Biochem. J. 381, 195–815 202 (2004). 816 43. Marek, M., Silvestro, D., Fredslund, M. D., Andersen, T. G. & Pomorski, T. G. Serum albumin 817 promotes ATP-binding cassette transporter-dependent sterol uptake in yeast. FEMS Yeast Res. 818 14, 1223–1233 (2014). 819 44. Marek, M. et al. The yeast plasma membrane ATP binding cassette (ABC) transporter Aus1: 820 Purification, characterization, and the effect of lipids on its activity. J. Biol. Chem. 286, 21835–821 21843 (2011). 822 45. Takishita, K. et al. Lateral transfer of tetrahymanol-synthesizing genes has allowed multiple 823 diverse eukaryote lineages to independently adapt to environments without oxygen. Biol. Direct 824 7, 5 (2012). 825 46. Wiersma, S. J., Mooiman, C., Giera, M. & Pronk, J. T. Squalene-Tetrahymanol Cyclase Expression 826 Enables Sterol-Independent Growth of Saccharomyces cerevisiae. Appl. Environ. Microbiol. 86, 1–827 15 (2020). 828 47. Rajkumar, A. S., Varela, J. A., Juergens, H., Daran, J. G. & Morrissey, J. P. Biological Parts for 829 Kluyveromyces marxianus Synthetic Biology. Front. Bioeng. Biotechnol. 7, 1–15 (2019). 830 48. Landry, B. D., Doyle, J. P., Toczyski, D. P. & Benanti, J. A. F-Box Protein Specificity for G1 Cyclins Is 831 Dictated by Subcellular Localization. PLoS Genet. 8, e1002851 (2012). 832 49. Fonseca, G. G., Heinzle, E., Wittmann, C. & Gombert, A. K. The yeast Kluyveromyces marxianus 833 and its biotechnological potential. Appl. Microbiol. Biotechnol. 79, 339–354 (2008). 834 50. Madeira-Jr, J. V. & Gombert, A. K. Towards high-temperature fuel ethanol production using 835 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 43 Kluyveromyces marxianus: On the search for plug-in strains for the Brazilian sugarcane-based 836 biorefinery. Biomass and Bioenergy 119, 217–228 (2018). 837 51. Tai, S. L. et al. Two-dimensional transcriptome analysis in chemostat cultures: Combinatorial 838 effects of oxygen availability and macronutrient limitation in Saccharomyces cerevisiae. J. Biol. 839 Chem. 280, 437–447 (2005). 840 52. Seret, M. L., Diffels, J. F., Goffeau, A. & Baret, P. V. Combined phylogeny and neighborhood 841 analysis of the evolution of the ABC transporters conferring multiple drug resistance in 842 hemiascomycete yeasts. BMC Genomics 10, 459 (2009). 843 53. Shi, N. Q. & Jeffries, T. W. Anaerobic growth and improved fermentation of Pichia stipitis bearing 844 a URA1 gene from Saccharomyces cerevisiae. Appl. Microbiol. Biotechnol. 50, 339–345 (1998). 845 54. Gojković, Z. et al. Horizontal gene transfer promoted evolution of the ability to propagate under 846 anaerobic conditions in yeasts. Mol. Genet. Genomics 271, 387–393 (2004). 847 55. Riley, R. et al. Comparative genomics of biotechnologically important yeasts. Proc. Natl. Acad. Sci. 848 U. S. A. 113, 9882–9887 (2016). 849 56. Guo, L., Pang, Z., Gao, C., Chen, X. & Liu, L. Engineering microbial cell morphology and membrane 850 homeostasis toward industrial applications. Curr. Opin. Biotechnol. 66, 18–26 (2020). 851 57. Entian, K.-D. & Kötter, P. 25 Yeast Genetic Strain and Plasmid Collections. in Methods in 852 Microbiology 629–666 (2007). doi:10.1016/S0580-9517(06)36025-4 853 58. Nijkamp, J. F. et al. De novo sequencing, assembly and analysis of the genome of the laboratory 854 strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology. 855 Microb. Cell Fact. 11, 36 (2012). 856 59. Bracher, J. M. et al. Laboratory evolution of a biotin-requiring Saccharomyces cerevisiae strain for 857 full biotin prototrophy and identification of causal mutations. Appl. Environ. Microbiol. 83, 1–16 858 (2017). 859 60. Lee, M. E., DeLoache, W. C., Cervantes, B. & Dueber, J. E. A Highly Characterized Yeast Toolkit for 860 Modular, Multipart Assembly. ACS Synth. Biol. 4, 975–986 (2015). 861 61. Mans, R. et al. CRISPR/Cas9: A molecular Swiss army knife for simultaneous introduction of 862 multiple genetic modifications in Saccharomyces cerevisiae. FEMS Yeast Res. 15, 1–15 (2015). 863 62. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011). 864 63. Hassing, E. J., de Groot, P. A., Marquenie, V. R., Pronk, J. T. & Daran, J. M. G. Connecting central 865 carbon and aromatic amino acid metabolisms to improve de novo 2-phenylethanol production in 866 Saccharomyces cerevisiae. Metab. Eng. 56, 165–180 (2019). 867 64. Gietz, R. D. & Woods, R. A. Genetic Transformation of Yeast. Biotechniques 30, 816–831 (2001). 868 65. Solis-Escalante, D. et al. amdSYM, A new dominant recyclable marker cassette for Saccharomyces 869 cerevisiae. FEMS Yeast Res. 13, 126–139 (2013). 870 66. Postma, E., Verduyn, C., Scheffers, W. A. & Van Dijken, J. P. Enzymic analysis of the crabtree 871 effect in glucose-limited chemostat cultures of Saccharomyces cerevisiae. Appl. Environ. 872 Microbiol. 55, 468–477 (1989). 873 67. Mashego, M. R., van Gulik, W. M., Vinke, J. L. & Heijnen, J. J. Critical evaluation of sampling 874 techniques for residual glucose determination in carbon-limited chemostat culture 875 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 44 ofSaccharomyces cerevisiae. Biotechnol. Bioeng. 83, 395–399 (2003). 876 68. Boender, L. G. M., De Hulster, E. A. F., Van Maris, A. J. A., Daran-Lapujade, P. A. S. & Pronk, J. T. 877 Quantitative physiology of Saccharomyces cerevisiae at near-zero specific growth rates. Appl. 878 Environ. Microbiol. 75, 5607–5614 (2009). 879 69. Koren, S. et al. Canu: Scalable and accurate long-read assembly via adaptive κ-mer weighting and 880 repeat separation. Genome Res. 27, 722–736 (2017). 881 70. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using 882 repeat graphs. Nat. Biotechnol. 37, 540–546 (2019). 883 71. Walker, B. J. et al. Pilon : An Integrated Tool for Comprehensive Microbial Variant Detection and 884 Genome Assembly Improvement. PLoS One 9, (2014). 885 72. Palmer, J. & Stajich, J. funannotate. (2019). doi:10.5281/zenodo.3548120 886 73. Jones, P. et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 30, 887 1236–1240 (2014). 888 74. Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell 889 sequencing. J. Comput. Biol. 19, 455–477 (2012). 890 75. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference 891 genome. Nat. Biotechnol. 29, 644–652 (2011). 892 76. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of 893 short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). 894 77. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 895 (2009). 896 78. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning 897 sequence reads to genomic features. Bioinformatics 30, 923–930 (2014). 898 79. McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-Seq 899 experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012). 900 80. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis 901 of RNA-seq data. Genome Biol. 11, (2010). 902 81. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in 903 multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). 904 82. R Core Team. R: A Language and Environment for Statistical Computing. (2017). 905 83. Juergens, H. et al. Evaluation of a novel cloud-based software platform for structured experiment 906 design and linked data analytics. Sci. Data 5, 1–12 (2018). 907 84. Ortiz-Merino, R. A. et al. Ploidy Variation in Kluyveromyces marxianus Separates Dairy and Non-908 dairy Isolates. Front. Genet. 9, 1–16 (2018). 909 85. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. 910 Bioinformatics 25, 1754–1760 (2009). 911 86. Auwera, G. A. et al. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis 912 Toolkit Best Practices Pipeline. Curr. Protoc. Bioinforma. 43, 11.10.1-11.10.33 (2013). 913 87. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide 914 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 45 polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-915 3. Fly (Austin). 6, 80–92 (2012). 916 917 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 46 Description of Additional Supplementary Files 918 Supplementary Data Set 1 | Overview of flow cytometry samples with meta-data. Meta-data Table of 919 file names, frequency of cells compared to parent, number of cells in each group, strain name, time 920 point of fluorescence measurement after 4 hours (1) or 23 hours (2), staining of cells with propidium-921 iodide (PI) with value (PI) or without PI staining (-), staining of cells with Tween 80 NBD-cholesterol (TN) 922 or with Tween 80 only (T), with species names abbreviated K. marxianus (Km) or S. cerevisiae (Sc). 923 [Example picture of file FlowCyto_Table.xlsx] 924 925 Supplementary Data set 2 | Flow cytometry non-gated data of FL3-A versus FL1-A of all samples. 926 Flow cytometry data of showing fluorescent NBDC uptake by K. marxianus, S. cerevisiae strains with for 927 each sample the intensity of counts (pseudo-colored) for 533/30 nm (FL1) for NBDC and > 670 nm (FL3) 928 for PI. 929 [Example of first row of FlowCyto_FL1_FL3.pdf] 930 Filename Strain Time point PI # Day Staining Cells/PI-ne Cells/PI-po Cells/PI-ne A09 CBS6556_T_A_PI_1.fcs CBS6556 1 PI A 1 T 576 411000 75590 B09 CBS6556_T_B_PI_1.fcs CBS6556 1 PI B 1 T 625 398024 88212 A01 IMX585_T_A___1.fcs IMX585 1 - A 2 T 1391 3 472000 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 47 931 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 50 Supplemental material for: 934 Engineering the thermotolerant industrial yeast Kluyveromyces marxianus for anaerobic growth 935 Wijbrand J. C. Dekker, Raúl A. Ortiz-Merino, Astrid Kaljouw, Julius Battjes, Frank Wiering, Christiaan 936 Mooiman, Pilar de la Torre, and Jack T. Pronk* 937 Department of Biotechnology, Delft University of Technology, van der Maasweg 9, 2629 HZ Delft, The 938 Netherlands 939 *Corresponding author: Department of Biotechnology, Delft University of Technology, Van der Maasweg 940 9, 2629 HZ Delft, The Netherlands, E-mail: j.t.pronk@tudelft.nl, Tel: +31 15 2783214. 941 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint mailto:j.t.pronk@tudelft.nl https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 51 Supplementary Fig. 1 | Ethanol evaporation rate. Ethanol concentration over time with reactor volume 942 of 1200 mL SM glucose urea media maintained at 30 °C, stirred with 800 rpm and aerated with a 943 volumetric gas flow rate of 500 mL·min-1. The reactor off-gas was cooled by passing through a condenser 944 cooled at 2 °C. Circles and orange line represent the condition with sparge aeration and Tween 80 (T) 945 media supplementation, diamonds and blue line head-space aeration with Tween 80, triangle and red 946 line represent head space aeration and Tween 80 omission. Data represent mean with standard 947 deviation from three independent reactor experiments. 948 AGF Aeration type Ethanol evaporation (mmol·h-1) T Sparge 0.00578 ± 0.00062 T Head-space 0.00625 ± 0.00032 Head-space 0.00653 ± 0.00020 949 950 100 150 200 250 300 350 400 450 0 24 48 72 96 120 144 168 192 c e th an ol (m M ) Time (h) .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 52 Supplementary Fig. 2 | Consensus biological process GO term enrichment for K. marxianus contrast 951 31. GO terms are clustered according to their rank. See legend of Fig. 2 for experimental details. 952 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 53 Supplementary Fig. 3 | Consensus biological process GO term enrichment for K. marxianus contrast 953 43. GO terms are clustered according to their rank. See legend of Fig. 2 for experimental details. 954 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 54 955 Supplementary Fig. 4 | Consensus biological process GO term enrichment for S. cerevisiae contrast 31. 956 GO terms are clustered according to their rank. See legend of Fig. 2 for experimental details. 957 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 55 958 Supplementary Fig. 5 | Consensus biological process GO term enrichment for S. cerevisiae contrast 43. 959 GO terms are clustered according to their rank. See legend of Fig. 2 for experimental details. 960 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 56 Supplementary Fig. 6 | GO term enrichment comparison of biological process of K. marxianus (kmar) 961 to S. cerevisiae (scer) of contrast 43. GO terms were annotated with the color of distinct directionality 962 (up (blue) down (brown)) and the color intensity was determined by the magnitude of the inverse rank. 963 GO terms with significant mixed-directionality or non-directionality, as having no pronounced distinct 964 directionality, are colored white. Shared GO terms between K. marxianus and S. cerevisiae are 965 connected by a line.966 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 57 Supplementary Fig. 7 | Uptake of the fluorescent sterol derivative NBDC by S. cerevisiae and K. 967 marxianus strains after 23 h staining. 968 Flow cytometry data of Fig. 4 with prolonged staining after pulse-addition of NBD-cholesterol to the 969 shake-flask cultures for 23 h. Bar charts of the median and pooled standard deviation of the NBD-970 cholesterol fluorescence intensity of PI-negative cells with pooled variance from the biological replicate 971 cultures. See legend Fig. 4 for experimental details. 972 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 58 Supplementary Fig. 8 | Flow cytometry gating strategy of both K. marxianus (left panel) and S. 973 cerevisiae (right panel) samples. Gates were set per one species for all samples independent of NBDC 974 staining. Density of events were calculated by FlowJo software and represented in pseudo-color (blue 975 low density, red high-density). The gate between PI-negative and PI-positive was inside the “Cells” 976 gated-population. 977 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 59 Supplementary Fig. 9 | Cross-validation of oxygen-limited and anaerobic growth of K. marxianus 978 IMX2323. Strains were grown in shake-flask cultures in an oxygen-limited (a) and strict anaerobic 979 environment (b). To perform cross-validation between the two parallel running experiments, 1.5 mL 980 aliquot of each culture was sealed and transferred quickly between anaerobic chambers and used to 981 inoculate two shake-flask cultures, represented with crossed-arrows (⤮). The cultures from the strain 982 NBRC1777 (⤮) in the third transfer (C3) in the strict anaerobic environment (b) were hence inoculated 983 from an aliquot of the cultures of NBRC1777 (C2) grown in oxygen-limited environment (a). This resulted 984 in a serial transfer of 26.7 times dilution from transfer C2 to C3. Aerobic grown pre-cultures were used 985 to inoculate the first anaerobic culture on SMG-urea containing 50 g·L-1 glucose and Tween 80. Data 986 depicted are of each replicate culture (points) and the mean (dotted line) from independent biological 987 duplicate cultures, serial transfers cultures are represented with the number of respective transfer (C1-988 3) .989 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 60 Supplementary Fig. 10 | Sterol-independent anaerobic growth of S. cerevisiae IMX585 (reference), 990 IMX1438 (TtSTC1), K. marxianus NBRC1777 (reference) and IMX2323 (TtSTC1). Aerobic grown pre-991 cultures were used to inoculate shake-flask cultures with SMG-urea containing 50 g·L-1 glucose and 992 Tween 80 in a strict anaerobic environment at an OD600 of 0.1 for all strains, and both at OD600 of 0.1 and 993 0.6 for NBRC1777 and IMX2323. Data depicted are of each replicate culture (points) and the mean 994 (dotted line) from independent biological duplicate cultures, serial transfers cultures are represented 995 with the number of respective transfer (C1-2). 996 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 61 Supplementary Fig. 11 | CO2 fraction in the off-gas of K. marxianus IMS1111. Production of CO2 as 997 measured by the fraction of CO2 in the off-gas of the individual bioreactor cultivations of the K. 998 marxianus strain IMS1111 on SMG media pH 5.0 with 20 g·L-1 glucose, 420 mg·L-1 Tween 80 over time 999 (Left panels). The temperature profile was incrementally increased at the beginning of a new batch cycle 1000 (right panels). After 430 h the performance of the off-gas analyzer of replicate M3R deteriorated. 1001 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 62 Supplementary Table 1 | Mutations identified by whole-genome sequencing in comparison to the 1002 reference K. marxianus strain IMX2323. Overview of mutations detected in the strains after selected for 1003 strict anaerobic growth IMS1111, IMS1131, IMS1132, IMS1133 compared to the TtSTC1 engineered 1004 strain (IMX2323). Resequencing of IMS1111 after 4 transfers in strict anaerobic conditions is for clarity 1005 referred with the strain name IMS1115. Overview of mutations of the bioreactor populations after 1006 prolonged selection for anaerobic growth at elevated temperatures, represented by the bioreactor 1007 replicates (M3R, M5R, and M6L). Mutations in coding regions are annotated as synonymous (SYN), non-1008 synonymous (NSY), insertion or deletions. Mutations in non-coding regions are reported with the 1009 identifier of the neighboring gene, directionality and strand (+/-). For K. marxianus genes, corresponding 1010 S. cerevisiae orthologs with the S288C identifier are listed if applicable. QD refers to quality by depth 1011 calculated by GATK and genotyping overviews are given per strain using the GATK fields GT: 1/1 for 1012 homozygous alternative, 1/0 for heterozygous, AD: allelic depth (number of reads per reference and 1013 alternative alleles called), DP: approximate read depth at the corresponding genomic position, and GQ: 1014 genotype quality. NA indicates variants were not called in that position in the corresponding strain. 1015 Chro mos ome Po siti on Descri ption Type Kmar ID S28 8cSy stID G e n e Q D IM X2 32 3 IMS11 11 IMS 113 1 IMS1 132 IMS 113 3 IMS11 15 M3R M5R M6L Mutation spectra of IMX2323 derived single isolates after selection for strict anaerobic growth 3 89 78 44 Asp- 747- Asp CDS:(S YN) TPUv 2_00 2092 YDR 283 C G cn 2 3 2 NA 1/1:0, 120:1 20:99 NA NA NA 1/1:0, 105:1 05:99 1/1:0 ,99:9 9:99 1/1:0, 110:1 10:99 1/1:0, 118:1 18:99 8 59 15 6 codon: TCA CDS:IN SERTI ON[1] TPUv 2_00 4766 Tran spos on 2 7 NA 1/1:0, 7:7:21 NA 1/1:0 ,15:1 5:45 1/1: 0,9: 9:27 1/1:0, 9:9:27 1/1:0 ,12:1 2:36 1/1:0, 7:7:21 1/1:0, 7:7:21 8 55 04 50 Trp- 350- STP CDS:(N ON) TPUv 2_00 4999 YAL 040 C Cl n 3 2 3 NA 1/1:0, 119:1 19:99 NA NA NA 1/1:0, 143:1 43:99 1/1:0 ,89:8 9:99 1/1:0, 117:1 17:99 1/1:1, 98:99: 99 4 45 97 50 TPUv2 _0026 39-T1 p3UTR :+ TPUv 2_00 2639 YGR 156 W Pt i1 3 5 NA NA 1/1: 0,9: 9:29 1/1:0 ,9:11 :54 1/1: 0,9: 9:38 1/1:0, 4:6:24 1/1:0 ,10:1 0:35 1/1:0, 7:7:26 NA 5 17 74 29 TPUv2 _0031 61-T1 p5UTR :- TPUv 2_00 3161 YBR 283 C Ss h 1 2 7 NA NA 1/1: 0,9: 9:27 NA NA 0/1:1, 7:8:21 NA NA NA 5 90 94 77 UTP22 p5UTR :+ TPUv 2_00 3518 YGR 090 W U tp 2 2 3 5 NA 1/1:1, 11:12: 34 NA NA 1/1: 1,8: 9:24 1/1:0, 11:11: 36 NA NA NA Mutations in whole populations after selection for anaerobic growth at elevated temperatures .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ 63 3 13 52 43 0 codon: AAT CDS:D ELETIO N[-3] TPUv 2_00 2327 YLR 352 W Lu g 1 2 2 NA NA NA NA NA NA NA NA 0/1:39 ,65:10 7:99 8 63 57 79 codon: CAG CDS:IN SERTI ON[9] TPUv 2_00 5049 No similarity 2 6 NA NA NA NA NA NA NA NA 0/1:25 ,49:74 :99 1016 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425723doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425723 http://creativecommons.org/licenses/by-nc-nd/4.0/ Abstract Results K. marxianus and S. cerevisiae show different physiological responses to extreme oxygen limitation Transcriptional responses of K. marxianus to oxygen limitation involve ergosterol metabolism Absence of sterol import in K. marxianus Engineering K. marxianus for oxygen-independent growth Test of anaerobic thermotolerance and selection for fast growing anaerobes Discussion online Methods Yeast strains, maintenance and shake-flask cultivation Expression cassette and plasmid construction Strain construction Chemostat cultivation Metabolite analysis Gas analysis Ethanol evaporation rate Lipid extractions & GC analysis Sterol uptake assay Long read sequencing, assembly, and annotation Illumina sequencing RNA isolation, sequencing and transcriptome analysis Anaerobic growth experiments Laboratory evolution in low oxygen atmosphere Laboratory evolution in sequential batch reactors Statistics Data availability Code availability Author’s contributions Acknowledgements Competing interest Funding References Description of Additional Supplementary Files Reporting summary Supplemental material for: 10_1101-2021_01_07_425737 ---- Isolation of the Buchnera aphidicola flagellum basal body from the Buchnera membrane Isolation of the Buchnera aphidicola flagellum basal body from the Buchnera membrane Matthew J. Schepers1, James N. Yelland1, Nancy A. Moran2*, David W. Taylor1,3-5* 1Institute for Cell and Molecular Biology, University of Texas at Austin, Austin, TX, 78712 2Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712 3Departmnet of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712 4Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, 78712 5LIVESTRONG Cancer Institute, Dell Medical School, Austin, TX, 78712 *Correspondence to: dtaylor@utexas.edu (D.W.T.); nancy.moran@austin.utexas.edu (N.A.M.) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Abstract Buchnera aphidicola is an intracellular bacterial symbiont of aphids and maintains a small genome of only 600 kbps. Buchnera is thought to maintain only genes relevant to the symbiosis with its aphid host. Curiously, the Buchnera genome contains gene clusters coding for flagellum basal body structural proteins and for flagellum type III export machinery. These structures have been shown to be highly expressed and present in large numbers on Buchnera cells. No recognizable pathogenicity factors or secreted proteins have been identified in the Buchnera genome, and the relevance of this protein complex to the symbiosis is unknown. Here, we show isolation of Buchnera flagella from the cellular membrane of Buchnera, confirming the enrichment of flagellum proteins relative to other proteins in the Buchnera proteome. This will facilitate studies of the structure and function of the Buchnera flagellum structure, and its role in this model symbiosis. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Introduction Buchnera aphidicola is an obligate endosymbiont of aphid species worldwide1 and is a model for bacterial genome reduction, maintaining one of the smallest genomes yet discovered, only 600 kbps2,3. Though Buchnera has lost genes not essential for its symbiotic lifestyle2,4,5 it retains genes associated with amino acid biosynthesis, reflecting its participation in a nutritional symbiosis2,6,7. Though the exchange of amino acids and vitamins between the aphid host and Buchnera has been well-documented6,8,9, the molecular mechanism for how these metabolites cross Buchnera membranes is unknown: Buchnera maintains a small number of genes coding for membrane transport proteins, most of which are located at the inner membrane2,10. The permeability of the Buchnera outer membrane remains a mystery, considering the paucity of annotated transporter genes in sequenced Buchnera genomes. Genes coding for proteins localizing to the outer membrane of Buchnera include small β-barrel aquaporins, which allow passive diffusion of small molecules, and flagellum basal body components2,9,10. Investigation into protein expression by these symbiotic partners has shown that flagellum basal body components are highly expressed by Buchnera11. Indeed, transmission electron microscopy images of Buchnera reveal flagellum basal bodies studded all over the bacterial outer membrane12. Despite its abundance on the Buchnera cell surface, the role of this protein complex for maintaining the aphid-Buchnera symbiosis is unknown13. Buchnera of the pea aphid (Acyrthosiphon pisum) maintains 26 genes coding for flagellum proteins in three discrete clusters. The maintained genes code for the structural proteins required for formation of a flagellum basal body, a partial flagellar hook, as well as the Type III cytoplasmic export proteins. Buchnera lineages vary in the set of flagellum genes retained (Supplementary Table 1), but all have lost genes encoding the flagellin and motor proteins14, indicating a functional shift away from cell motility. The bacterial flagellum structure is an evolutionary homologue to the injectisome (Type III secretion system, or T3SS), a macromolecular protein complex used to (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 deliver secrete effector proteins, often to a eukaryotic host15,16,17. Flagellum assembly occurs in a stepwise, sequential manner beginning from the bacterial cytoplasm, identical to the T3SS18,19,20. Buchnera maintains genes coding for the proteins required for a functional T3SS2,12, as shown in studies of Yersinia21, and Salmonella22,23. Gram-negative bacteria have also been shown to export proteins through a flagellum basal body21,24,25. The bacterial flagellum could be repurposed to serve a novel function for the aphid-Buchnera symbiosis. The basal body could serve as a type III protein exporter to secrete proteins to signal to the aphid host or as an surface signal molecule for host recognition during infection of new aphid embryos. Here, we present a procedure for isolation of flagellum basal body complexes adapted for an endosymbiont26, allowing for removal of these structures directly from Buchnera and enrichment of flagellum basal body complexes after isolation. This procedure will enable further characterization of the basal bodies and their modifications for a role in symbiosis. Results Isolation of hook basal bodies from Buchnera Purification of the complex was initially assessed at multiple timepoints along the procedure. Samples were taken of initial Buchnera cell lysate, lysate after raising the pH to 10, protein suspension after the first 5000g spin, the third 5000g spin, and finally after the 30,000g spin and overnight incubation in TET buffer. SDS-PAGE showed sixteen bands were present after the staining procedure and their sizes corresponded to those of constituent proteins of the Buchnera flagellum basal body (Supplemental figure 1). Protein samples were extracted from the gel and subjected to mass spectrometry analysis. Mass spectrometry analysis of isolated basal bodies Protein ID LC-MS/MS spectral counts were provided by the University of Texas at Austin Proteomics Core Facility. We compared our samples to proteomic datasets from homogenized whole aphids, and from bacteriocytes purified from pea aphids11. Buchnera flagellum proteins (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 were highly enriched by our isolation procedure, especially FliF, FlgI, FlgE, FlhA, and FlgF (Figure 1.). These results indicate that all but two flagellum proteins present in the mass spectrometry samples were enriched during the isolation procedure: structural proteins FilE, FliF, FlgI, FlgE, FlgF, and FlgH were enriched threefold or more from the start to the finish of the procedure. FlgB, FlgC, FlgG, FliG, FliH, and FliI were enriched, though not to the extent of the other structural proteins. Type III secretion proteins FlhA and FliP were shown to be enriched by this procedure (Figure 2., Supplemental figure 2.). The widespread enrichment of Buchnera flagellum proteins indicates that our adapted procedure for isolating macromolecular protein complexes from the membranes of endosymbiotic bacteria was successful. Only flagellum proteins FlgK and FliN were reduced by the isolation procedure, perhaps because of their localization to the periphery of the flagellum. Basal bodies resemble top hats via electron microscopy We analyzed the isolated basal bodies by negative stain electron microscopy. While raw micrographs showed heterogenous particles, likely due to disassembly of the complex, detergent micelles, and contaminating proteins, there were several particles that appeared regularly. These single particles resembled a top hat with both rod and ring-shaped features (Figure 3), similar in size and shape to those observed in TEM images of whole Buchnera cells12. Discussion Here, we demonstrate a procedure for isolating macromolecular protein complexes from Buchnera aphidicola, an obligate endosymbiotic bacterium that cannot be cultured or genetically manipulated. Identifying the changes in these complexes could elucidate how Buchnera’s adaptation over millions of years to a mutualistic lifestyle has affected its proteome. As Buchnera is not motile and is confined to host-derived “symbiosomal” vesicles inside bacteriocytes28,29, the retention and expression of these partial flagella indicates that they have become repurposed. These complexes have previously been hypothesized to be acting as type (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 III secretion systems for provisioning peptides or signal factors to the aphid host13. Indeed, the proteins retained in the Buchnera flagellum constitute the structural proteins and machinery required for a functional type III secretion system21. Transcriptome analyses of pea aphid lines with different Buchnera titers reveal differences in expression of flagellar genes30. In aphid lines that harbor relatively low numbers of Buchnera, the endosymbionts have elevated relative expression of mRNA associated with flagellar secretion genes (fliP, fliQ,and fliR), while Buchnera in aphid lines with high Buchnera numbers had elevated expression of genes for flagellum structural proteins30 Though heavily expressed in Buchnera of pea aphids, components of the flagellum basal body are not maintained equally among lineages of Buchnera of different aphid species based on available genomic sequences14 (Supplementary Table 1). Genes coding for proteins associated with type III secretion activity (flhA, flhB, fliP, fliQ, and fliR) and basal body structural proteins (fliE, fliF, flgB, flgC, flgF, flgG, and flgH) are well maintained across Buchnera lineages, but genes coding for hook proteins (flgD, flgE, and flgK) and the flagellum-specific ATPase (fliI) are frequently shed. A more extreme example is the Buchnera strain harbored by aphids of genus Stegophylla: having the smallest sequenced Buchnera genome discovered thus far (412 kbps), these Buchnera have completely lost genes associated with flagellum structure and Type III secretion activity. In all but the most extreme examples, the Buchnera flagellum is well maintained, pointing to a continuing role for this complex for this ancient symbiosis. Buchnera’s tiny genome contains no known pathogenicity proteins or proteins previously associated with type III export2,31. Potentially, Buchnera flagellum basal bodies may instead serve as surface signals for recognition by the host. Vertical transfer of Buchnera from mother to daughter aphids shows naked Buchnera cells being exocytosed from maternal bacteriocytes and moving in aphid haemolymph to infect a nearby specialized syncytial cell of stage 7 embryos32. The purpose of the flagellum in the context of Buchnera’s symbiotic lifestyle remains unknown. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Further inquiry into this protein complex could reveal how the repurposing of a motility organelle facilitates this ancient and obligate symbiosis. Methods Buchnera extraction from aphids Pea aphids (Acrythosiphon pisum strain LSR1) were placed as all-female clones on Fava bean (Vicia faba) seedlings on 16h/8h light/dark cycles at 20ºC. Once reaching adulthood, apterous adults were raised on Fava bean plants on 16h/8h light cycles and allowed to reproduce. After seven days, all aphids (fourth-instar larvae, typically amounting to 5g) were removed from the Fava bean plants. Aphids were weighed and surface-sterilized in 0.5% bleach solution, then rinsed twice in Ultrapure water (MilliporeSigma), each 30 seconds. Aphids were gently ground in a mortar and pestle in 40mL sterile Buffer A (25mM KCl (Sigma-Aldrich), 35mM Tris base (Sigma- Aldrich), 10mM MgCl2 (Sigma-Aldrich), 250mM anhydrous EDTA (Sigma-Aldrich), and 500mM Sucrose (Sigma-Aldrich) at pH 7.5). Aphid homogenate was vacuum filtered to 100μm, then centrifuged at 1500g for 10 minutes at 4C. Supernatant was discarded, and the resulting pellet was resuspended in 20mL Buffer A and vacuum-filtered three times from 20μm, to 10μm, and finally to 5μm. The resulting filtrate was spun at 1500g for 30m at 4C and supernatant discarded. The resulting pellet was resuspended in 10mL Sucrose solution (300mM sucrose (Sigma-Aldrich) and 100mM Tris base (Sigma-Aldrich) then checked on a brightfield microscope for intact Buchnera cells. Buchnera cells remain alive while at 4C for a maximum of 24h. Isolation of flagellum basal bodies from Buchnera cells Buchnera was incubated with gentle spinning on ice with egg white lysozyme (0.1mg/mL, Sigma-Aldrich) for 30m. 100mM Anhydrous EDTA solution, pH 7.5 (Sigma-Aldrich) was added to final concentration 10mM. The pellet was taken off ice, and gradually raised to room temperature with gentle spinning for 30m. Triton X-100 (Acros Organics) was added to 1% w/v, along with 1mg/mL RNase-free DNase I (Bovine Pancreas, Sigma-Alrich) and allowed to stir for 1/2 hour. After incubation, cell lysate was kept at 4C or on ice until use. The lysate was raised to pH 10 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 using 1N NaOH (Macron Fine Chemicals) to attempt to denature host and bacterial cytoplasmic proteins. The solution was spun at 5000g for 10m at 4C three times, each time decanting the supernatant to a new tube. After three spins, the supernatant was transferred to a Nalgene Oak Ridge polyallomer centrifuge tube (Thermo-Fisher) and spun at 30,000g for 1h at 4C. Supernatant was gently decanted and pellet covered with TET buffer (10mM Tris-HCl, 5mM EDTA, 0.1% X- 100, pH 8.0) and left overnight at 4C to soften and dissolve. Submission of protein for mass spectrometry Solubilized protein concentration was determined using an Eppendorf Biophotometer. 1.5mg protein was run on premade 4-12% Tris-Glycine SDS-PAGE gels (Thermo-Fisher) at 120V for 10m. Gels were stained in Coomassie Brilliant Blue (Bio-Rad) for 30m, then destained in 20% Acetic acid (Thermo-Fisher) for 30m. Gel bands corresponding to the step in the procedure sampled (“Lysate,” “pH 10,” “Spin 1,” “Spin 3,” “Final”) were cut out and submitted to the University of Texas at Austin CBRS Biological Mass Spectrometry Facility for LC-MS/MS using a Dionex Ultimate 3000 RSLCnano LC coupled to a Thermo Orbitrap Fusion (Thermo-Fisher). Samples were submitted in 50mL destain with Buchnera aphidicola str. APS provided as the reference organism (ASM960v1). Prior to HPLC separation, peptides were desalted using Millipore U-C18 ZipTip Pipette Tips (Millipore-Sigma). A 2cm long x 75μm ID C18 trap column was followed by a 25cm long x 75μm analytical columns packed with C18 3μm material (Thermo Acclaim PepMap 100, Thermo-Fisher) running a gradient from 5-35%. The FT-MS resolution was set to 120,000, with an MS/MS cycle time of 3 seconds and acquisition in HCD ion trap mode. Raw data was processed using SEQUEST HT embedded in Proteome Discoverer (Thermo-Fisher). Scaffold 4 (Proteome software) was used for validation of peptide and protein IDs. EM and data collection Protein from the final step of this procedure was stained using 3% Uranyl Acetate on a 400-mesh continuous carbon grid. Images were acquired using an FEI Talos transmission (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 electron microscope operating at 200 kV, with 1.25 second exposures, a dose rate of 19 e-Å-2, and a nominal magnification of 57,000X. Whole aphid proteomic samples For controls, proteomes were profiled for whole aphids, including both Buchnera and aphid cells. Aphids were mixed-aged populations grown at 20ºC in 30 cup cages and pooled into three replicate samples. Aphids were washed and homogenized in buffer as described above. The homogenate was centrifuged at 4000g for 15min at 4ºC, Supernatant was removed, and pellet was suspended with 2% SDS, 0.1M Tris-HCl, 0.1M DTT at 100oC for 10 min, then centrifuged at 14,000g for 20min at 4C to remove non-soluble material after adding same volume of 8M Urea. Protein concentration was determined on an Eppendorf BioPhotometer. 5mg total protein was run on a Bis-Tris gel for less than 1 cm, and the band was excised and and sent to the UT Proteomics Core for LC-MS/MS protein ID. Protein ID methods were identical as detailed above. Author contributions M.J.S. raised aphids and prepared Buchnera protein extracts. J.N.Y. performed electron microscopy. N.A.M. and D.W.T. analyzed data and supervised and secured funding for this work. All authors reviewed the final manuscript. Acknowledgements We thank Eric Verbeke, Jack Bravo, and Evan Schwartz for their advice and ideas for isolating and imaging proteins from native cells; Julie Perreau, Margaret Steele, and Serena Zhao for creating a space in which ideas and techniques could be shared freely; Kim Hammond for help with aphid raising and organization. This work was supported by the National Science Foundation 1551092 (to N.A.M), a Welch Foundation grant F-1938 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 (to D.W.T.), National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health (NIH) R35GM138348 (to D.W.T.), Army Research Office Grant W911NF-15-1-0120 (to D.W.T.), and a Robert J. Kleberg, Jr. and Helen C. Kleberg Foundation Medical Research Award (to D.W.T.). D.W.T is a CPRIT Scholar supported by the Cancer Prevention and Research Institute of Texas (RR160088) and an Army Young Investigator supported by the Army Research Office (W911NF-19-1-0021). Competing interests The authors declare no competing interests. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Figures Figure 1: Barplot showing flagellum protein enrichment before (Lysate) and after (Final) the isolation procedure compared to proteomic datasets generated with whole aphids and dissected bacteriocytes. Blue indicates “core” proteins required for secretion activity and red indicates accessory proteins maintained by Buchnera aphidicola in pea aphids. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Figure 2: Cartoon diagram of the reduced Buchnera aphidicola (pea aphid) flagellum. Colors indicate enrichment status of individual proteins at the final step of the procedure, corresponding to Figure 1. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Figure 3: Single particles of Buchnera flagellum complexes after the isolation procedure. Scale bars represent 50nm. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Supplementary Figure 1: Silver stained SDS gel created after the enrichment procedure was performed. The first lane is taken directly from the enrichment preparation after overnight incubation with TET buffer. The second lane is after concentrating the enriched proteins to 1 mg/mL. The third lane is concentrated protein diluted to 0.5 mg/mL. Ladder values represent molecular weight in kDa. Symbols correspond to flagellar protein molecular weight: * corresponds to FlhA (78 kDa). † corresponds to FliF (63 kDa) and FlgK (63 kDa). º corresponds to FlgE (45 kDa), FliP (43kDa), and FlgI (41 kDa). ‡ corresponds to FliG (38kDa) and FliM (37 kDa). ∆ corresponds to FlgG (28 kDa), FlgF (28 kDa), FlgH (26 kDa), and FliH (26kDa). Ø corresponds to FlgB (16 kDa), FliN (15 kDa), FlgC (15 kDa), and FliE (11 kDa). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 Supplementary Figure 2: Dotplot of Buchnera aphidicola flagellum proteins found after LC/MS- MS analysis. The enrichment score for each protein is indicated on the x axis. Enrichment scores are calculated by dividing unique spectral counts for each protein in the final step by each protein present in the cell lysate. Core flagellum proteins (defined by proteins required for type III secretion activity and flagellum structure) are filled in green, accessory proteins are filled in white. flgB flgK fliE fliN fliP fliI fliG flgC flhA fliM fliH flgE flgI fliF flgH flgF flgG 0 10 20 30 Enrichment score after isolation procedure P ro te in type accessory core (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 References 1. Munson, M.A., Baumann, P., Kinsey, M.G. Buchnera gen. nov. and Buchnera aphidicola sp. nov., a taxon consisting of the mycetocyte-associated, primary endosymbionts of aphids. Int. J. Syst. Bacteriol. 1991; 41:566-568 2. Shigenobu, S., Watanabe, H., Hattori, M., Sakaki, Y. & Ishikawa, H. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407, 81-86 (2000). 3. Moran, N. A. & Bennett, G. M. The tiniest tiny genomes. Annu Rev Microbiol 68, 195-215 (2014). 4. Tamas, I. et al. 50 million years of genomic stasis in endosymbiotic bacteria. Science 296, 2376-2379 (2002). 5. Wernegreen, J. J. Genome evolution in bacterial endosymbionts of insects. Nature Reviews Genetics 3, 850-861 (2002). 6. Douglas, A. E. Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. Annu Rev Entomol 43, 17-37 (1998). 7. Akman Gündüz, E. & Douglas, A. E. Symbiotic bacteria enable insect to use a nutritionally inadequate diet. Proc Biol Sci 276, 987-991 (2009). 8. Nakabachi, A. & Ishikawa, H. Provision of riboflavin to the host aphid, Acyrthosiphon pisum, by endosymbiotic bacteria, Buchnera. J Insect Physiol 45, 1-6 (1999). 9. Charles, H., Calevro, F., Vinuelas, J., Fayard, J. M. & Rahbe, Y. Codon usage bias and tRNA over-expression in Buchnera aphidicola after aromatic amino acid nutritional stress on its host Acyrthosiphon pisum. Nucleic Acids Res 34, 4583-4592 (2006). 10. Charles, H. et al. A genomic reappraisal of symbiotic function in the aphid/Buchnera symbiosis: reduced transporter sets and variable membrane organisations. PLoS One 6, e29096 (2011). 11. Poliakov, A. et al. Large-scale label-free quantitative proteomics of the pea aphid- Buchnera symbiosis. Mol Cell Proteomics 10, M110.007039 (2011). 12. Maezawa, K. et al. Hundreds of flagellar basal bodies cover the cell surface of the endosymbiotic bacterium Buchnera aphidicola sp. strain APS. J Bacteriol 188, 6539-6543 (2006). 13. Denise, R., Abby, S. S. & Rocha, E. P. C. The Evolution of Protein Secretion Systems by Co-option and Tinkering of Cellular Machineries. Trends in Microbiology (2020). 14. Chong, R. A., Park, H. & Moran, N. A. Genome evolution of the obligate endosymbiont Buchnera aphidicola. Mol Biol Evol (2019). 15. Cornelis, G. R. & Van Gijsegem, F. Assembly and function of type III secretory systems. Annu Rev Microbiol 54, 735-774 (2000). 16. Moya, A., Peretó, J., Gil, R. & Latorre, A. Learning how to live together: genomic insights into prokaryote-animal symbioses. Nat Rev Genet 9, 218-229 (2008). 17. Abby, S. S. & Rocha, E. P. The non-flagellar type III secretion system evolved from the bacterial flagellum and diversified into host-cell adapted systems. PLoS Genet 8, e1002983 (2012). 18. Marlovits, T. C. et al. Structural insights into the assembly of the type III secretion needle complex. Science 306, 1040-1042 (2004). 19. Liu, R. & Ochman, H. Stepwise formation of the bacterial flagellar system. Proc Natl Acad Sci U S A 104, 7116-7121 (2007). 20. Ince, D., Sutterwala, F. S. & Yahr, T. L. Secretion of Flagellar Proteins by the Pseudomonas aeruginosa Type III Secretion-Injectisome System. Journal of Bacteriology 197, 2003-2011 (2015). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 21. Young, G. M., Schmiel, D. H. & Miller, V. L. A new pathway for the secretion of virulence factors by bacteria: the flagellar export apparatus functions as a protein-secretion system. Proc Natl Acad Sci U S A 96, 6456-6461 (1999). 22. Irikura, V. M., Kihara, M., Yamaguchi, S., Sockett, H. & Macnab, R. M. Salmonella typhimurium fliG and fliN mutations causing defects in assembly, rotation, and switching of the flagellar motor. J Bacteriol 175, 802-810 (1993). 23. Minamino, T. & Macnab, R. M. Components of the Salmonella flagellar export apparatus and classification of export substrates. J Bacteriol 181, 1388-1394 (1999). 24. Konkel, M. E. et al. Secretion of virulence proteins from Campylobacter jejuni is dependent on a functional flagellar export apparatus. J Bacteriol 186, 3296-3303 (2004). 25. Scanlan, E., Yu, L., Maskell, D., Choudhary, J. & Grant, A. A quantitative proteomic screen of the Campylobacter jejuni flagellar-dependent secretome. J Proteomics 152, 181-187 (2017). 26. López-Sánchez, M. J. et al. Evolutionary convergence and nitrogen metabolism in Blattabacterium strain Bge, primary endosymbiont of the cockroach Blattella germanica. PLoS Genet 5, e1000721 (2009). 27. Tegunov, D. & Cramer, P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat Methods 16, 1146-1152 (2019). 28. Braendle, C. et al. Developmental origin and evolution of bacteriocytes in the aphid- Buchnera symbiosis. PLoS Biol 1, E21 (2003). 29. Miura, T. et al. A comparison of parthenogenetic and sexual embryogenesis of the pea aphid Acyrthosiphon pisum (Hemiptera: Aphidoidea). J Exp Zool B Mol Dev Evol 295, 59- 81 (2003). 30. Smith, T. E. & Moran, N. A. Coordination of host and symbiont gene expression reveals a metabolic tug-of-war between aphids and Buchnera. Proc Natl Acad Sci U S A 117, 2113- 2121 (2020). 31. Shimomura, S., Shigenobu, S., Morioka, M. & Ishikawa, H. An experimental validation of orphan genes of Buchnera, a symbiont of aphids. Biochem Biophys Res Commun 292, 263-267 (2002). 32. Koga, R., Meng, X. Y., Tsuchida, T. & Fukatsu, T. Cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte-embryo interface. Proc Natl Acad Sci U S A 109, E1230-7 (2012). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425737doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425737 10_1101-2021_01_08_425855 ---- DeepHBV: A deep learning model to predict hepatitis B virus (HBV) integration sites. DeepHBV: A deep learning model to predict hepatitis B virus (HBV) integration sites. Canbiao Wu1 ¶, Xiaofang Guo2 ¶, Mengyuan Li3 ¶, Xiayu Fu4, Zeliang Hou1, Manman Zhai1,5, Jingxian Shen1, Xiaofan Qiu1, Zifeng Cui3, Hongxian Xie6, Pengmin Qin5, Xuchu Weng1, Zheng Hu3,7*, Jiuxing Liang1* 1 Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China; Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, China. 2 Department of Medical Oncology of the Eastern Hospital, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China 3 Department of Gynecological Oncology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China 4 Department of Thoracic Surgery, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China 5 School of Psychology, South China Normal University, Guangzhou, Guangdong, China 6 Generulor Company Bio-X Lab, Guangzhou, Guangdong, China 7 Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China *Corresponding author Email: huzheng1998@163.com(ZH), liangjiuxing@m.scnu.edu.cn(JL) ¶These authors contributed equally to this work. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Abstract Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Author summary Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. The performance of DeepHBV model significantly improves after adding genomic features, with an AUROC of 0.9430 and an AUPR of 0.9310. Furthermore, we enriched the transcriptional factor binding sites of proteins by convolution neural network. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for the further study of HBV integration mechanism. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Introduction HBV is the main cause of viral hepatitis and liver cancer (hepatocellular carcinoma: HCC) [1]. It is a small DNA virus that can integrate into the host genome via an RNA intermediate [1]. First, HBV attaches and enters into hepatocytes, then transports its nucleocapsid which contains a relaxed circular DNA (rcDNA) to the host nucleus. In host nucleus, rcDNA is converted into covalently closed circular DNA (cccDNA) which produces messenger RNAs (mRNA) and pregenomic RNA (pgRNA) by transcription. Via reverse transcription in host nucleus, pgRNA produces new rcDNA and double-stranded linear DNA (dslDNA), which tend to integrate into the host cell genome [2]. Previous study showed HBV integration breakpoints distributed randomly across the whole genome with a handful of hotspots [3]. For instance, HBV was reported to recurrently integrate into the telomerase reverse transcriptase (TERT) and Myeloid/lymphoid or mixed-lineage leukemia 4 (MLL4, also known as KMT2B) genes. The insertional events were also accompanied by the altered expression of the integrated gene [2,3,5], indicating important biological impacts on the local genome. Further analysis revealed that the association between HBV integration and genomic instability existed in these insertional events [4]. Moreover, significant enrichment of HBV integration was found near the following genomic features in tumours compared to non-tumour tissue: repetitive regions, fragile sites, CpG islands and telomeres [2]. However, the pattern and the mechanism of HBV integration still remained to be explored. Many of the HBV integration sites distributed throughout the human .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ genome and seem completely random [4,6,7]. Whether the features and patterns of these “random” viral integration events could be learned and extracted remained an open question, and once solved, will greatly improve the understanding towards HBV integration induced carcinogenesis. Deep learning has an excellent performance in computational biology research, such as medical image identification [8], discovering motifs in protein sequences [9]. The convolutional neural network (CNN) is the most important part in deep learning, which enables a computer to learn and program itself from training data [10]. Though deep learning performs excellent in a various of fields, the detailed theory of how it makes the decision was hard to explain due to its black box effect. Therefore, an approach named attention mechanism which can highlight the outstanding parts was invented to open the “black box” [11,12]. In this study, we developed, DeepHBV, an attention-based model to predict the HBV integration sites using deep learning. The attention mechanism calculates the attention weight for each position and connect the encoder and the decoder in the meanwhile. It highlights the regions concentrated by DeepHBV and helps figure out the patterns that were paid attention to. DeepHBV can predict HBV integration sites accurately and specifically, and the attention mechanism identified positions with potential important biological meanings. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Results DeepHBV effectively predicts HBV integration sites by adding genomic features. DeepHBV model structure and the scheme of encoding a 2 kb sample into a binary matrix were described in Fig 1. DeepHBV model was tested with our HBV integration sites database (http://dsvis.wuhansoftware.com). HBV integration sequences were prepared according to HBV integration sites as positive/negative samples following the steps in Method. The negative samples should be twice number of positive samples to keep data balance and to improve the confidence level. The positive samples were divided into 2902 and 1264 as positive training dataset and testing dataset. Ccorrespondingly, we extracted 5804 and 2528 negative samples as negative training dataset and testing dataset. DeepHINT, an existing deep learning model for predicting HIV integration sites according to surroundings [15], will also be evaluated using HBV integration sequences for training and testing. Both models were trained by the same HBV integration training dataset and used the same testing dataset for the evaluation. DeepHBV with HBV integration sequences showed an AUROC of 0.6363 and an AUPR of 0.5471 while DeepHINT with HBV integration sequences demonstrated an AUROC of 0.6199 and an AUPR of 0.5152 (Fig 2). The comparison of DeepHBV and DeepHINT was described in Discussion part. Several previous studies showed that HBV integration has a preference on surrounding genomic features such as repeat, histone markers, CpG islands, etc [2,4]. Thus, we tried to add these genomic features into DeepHBV, by mixing genomic feature samples together with HBV integration sequences as new datasets, then .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ trained and tested the updated DeepHBV models. We downloaded following genomic features from different datasets [16-18] into four subgroups: (1) DNase Clusters, Fragile site, RepeatMasker; (2) CpG islands, GeneHancer; (3) Cons 20 Mammals, TCGA Pan-Cancer; (4) H3K4Me3 ChIP-seq, H3K27ac ChIP-seq (S2 Fig). After obtaining genomic feature data positions (sources are mentioned in S2 Table), we extended the positions to 2000 bp and extracted related sequences on hg38 reference genome. We defined these sequences as positive genmoic feature samples. Then we mixed HBV integration sequences, positive genome feature samples, and randomly picked negative genomic feature samples (see Method) together and trained the DeepHBV model. Once a subgroup performed well, we re-test each genomic feature in that subgroup to figure out which specific genomic feature affect the model performance significantly (S2 Fig) (AUROC and AUPR values were recorded in S3 Table). From the ROC and PR curves, we found DeepHBV with HBV integration sites plus the genomic features repeat (AUROC: 0.8378 and AUPR: 0.7535) and TCGA Pan Cancer (AUROC: 0.9430 and AUPR: 0.9310) can significantly improve the HBV integration sites prediction performance against DeepHBV with HBV integration sequences (Fig 2). We also performed the same test on DeepHINT, but did not find a subgroup can substantially improve the model performance (results were recorded in S3 Table). Together, DeepHBV with HBV integration sequences plus repeat or TCGA Pan Cancer can significantly improve the model performance. Validation of DeepHBV using independent dataset VISDB It is necessary of DeepHBV to be applied on general datasets, we tested the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ pre-trained DeepHBV models (DeepHBV with HBV integration sequences + repeat peaks and DeepHBV with HBV integration sequences + TCGA Pan Cancer peaks) on the HBV integration sites dataset in another viruses integration sites (VIS) database VISDB [19]. We found that in the model trained with HBV integration sequences + repeat sequences showed an AUROC of 0.6657 and an AUPR of 0.5737, while the model trained with HBV integrated sequences + TCGA Pan Cancer showed an AUROC of 0.7603 and an AUPR of 0.6189. The DeepHBV model with HBV integration sequences + TCGA Pan Cancer performed better compared with DeepHBV model with HBV integration sequences + repeat and was more robust on both testing dataset from dsVIS (AUROC: 0.9430 and AUPR: 0.9310) and independent testing dataset from VISDB (AUROC: 0.7603 and AUPR: 0.6189). Thus, we decided to use this model for future HBV integration sites study. Study the preference pattern of HBV integration by conserved sequence elements DeepHBV can extract features with translation invariance by pooling operation, which enables DeepHBV to recognise certain patterns even the features were slightly translated. The participating of attention mechanism into DeepHBV framework might partly open the deep learning black box by giving an attention weight to each position. Each attention weight represented the computational importance level of that position in DeepHBV judgement. The attention weights in attention layer were extracted after two de-convolution and one de-pooling operation and the output shape .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ is 667×1. Each score represented an attention weight of a 3 bp region. Positions with higher attention weight scores might have more important impact on the pattern recognition of DeepHBV, meaning these positions might be the critical points for identifying HBV integration positive samples. We first averaged the fractions of attention scores in all HBV integration sequences and normalized them to the mean of all positions. Then we visualised the fractions of attention scores and found the figure showed peak-valley-peak patterns only in positive samples (Fig 3). We were interested in the positions with higher attention weights in convolution neural network. And we found that, in the attention weight distribution of DeepHBV with HBV integration sites + TCGA Pan Cancer, a cluster of attention weights much higher than other weights often occurred in the positive samples. While in the model of DeepHBV with HBV integration sites + repeat did not show this pattern (Fig 3). To further discover the pattern behind these positions with higher attention weights, we defined the sites with top 5% highest attention weights as attention intensive sites, the regions of 10 bp near them as attention intensive regions. We mapped these attention intensive sites on hg38 reference genome with genomic features (Fig 4), but found that the positional relationship between attention intensive sites and genomic features was not quite clear. The results indicated that there may exist other specific pattern closely related to HBV integration preference, and when analysed carefully, could be recognized by the DeepHBV model. Convolution and pooling module will learn the patterns with translation invariance in deep learning, based on that deep learning network tend to learn the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ domains happened recurrently among different samples in the same pooling matrix, even if the learned feature was not at the same position in these different samples [20,21]. Attention intensive regions are more likely to be conserved due to the translation invariance in convolution and pooling module, and would give hints to the selection preference of HBV integration sites. Transcriptional factor-binding sites (TFBS) motifs are conserved genomic elements which can be critical to the regulation of downstream genes. Therefore, we tested whether TFBS played important roles in HBV integration preference. We used all HBV integration samples whose prediction scores were higher than 0.95 from dsVIS and VISDB separately to enrich local TFBS motifs in attention intensive regions by HOMER v 4.11.1 [22] with its vertebrates transcription factor databases (Table 1). From the result of DeepHBV with HBV integration sequences + TCGA Pan Cancer, binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra, Foxo3, HEB, HIC1, HIF-1b, LRF, Meis1, MITF, MNT, MyoG, n-Myc, NPAS2, NPAS, Nr5a2, Ptf1a, Snail1, Tbx5, Tbx6, TCF7, TEAD1, TEAD3, TEAD4, TEAD, Tgif1, Tgif2, THRb, USF1, Usf2, Zac1, ZEB1, ZFX, ZNF692, ZNF711 can be both enriched in attention intensive regions of dsVIS and VISDB sequences. We selected two representative samples to obtain a more intuitive display. Genomic features, HBV integration sites from dsVIS and VISDB, attention intensive sites and TFBS were aligned and shown in hg38 reference genome (Fig 4). Most attention intensive sites can be mapped to enrich TF motifs. And the clusters of high attention weight from the output of DeepHBV with HBV integration sites plus TCGA Pan Cancer showed the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ binding site of a tumour suppressor gene HIC1, circadian clock related elements BMAL1, CLOCK, c-Myc and NAPS2 (Fig 4). The data provided novel insights into HBV integration site selection preference and reveal biological importance that warrants future experimental confirmation. Table 1. Enriched TFBS from attention intensive regions of DeepHBV with HBV integration sites + TCGA Pan Cancer peaks. HOMER known results HOMER de novo results Rank Name P-value Rank Best Match/Details P-value 1 BMAL1 1E-323 1 TEAD3 1E-2283 2 NPAS 1.00E-259 2 EBF1 1E-1926 3 CLOCK 1.00E-165 3 TCF7 1E-958 4 c-Myc 1.00E-126 4 GRHL2 1E-504 5 ZFX 1.00E-108 5 Dux 1E-477 6 Tgif2 1.00E-75 6 Ptf1a 1E-465 7 MNT 1.00E-71 7 TEAD 1E-385 8 LRF 1.00E-62 8 Ahr::Arnt 1.00E-302 9 Tbx5 1.00E-62 9 Sox5 1.00E-245 10 ZNF711 1.00E-57 10 TEAD 1.00E-233 11 n-Myc 1.00E-54 11 Zic2 1.00E-204 12 ZNF416 1.00E-52 12 Nr2e3 1.00E-197 13 USF1 1.00E-47 13 SOX18 1.00E-182 14 bHLHE40 1.00E-45 14 ZBTB14 1.00E-174 15 Rbpj1 1.00E-36 15 USF2 1.00E-153 16 Zac1 1.00E-35 16 Isl1 1.00E-142 17 Tgif1 1.00E-32 17 ZNF264 1.00E-142 18 ZEB1 1.00E-30 18 Ascl2 1.00E-133 19 THRb 1.00E-29 19 ZNF460 1.00E-120 20 Ptf1a 1.00E-29 20 LRF 1.00E-117 21 bHLHE41 1.00E-29 21 ZNF416 1.00E-117 22 TEAD1 1.00E-27 22 PKNOX1 1.00E-103 23 Stat3 1.00E-24 23 Bcl6b 1.00E-91 24 Meis1 1.00E-21 24 Arnt 1.00E-90 25 c-Myc 1.00E-21 25 Osr2 1.00E-88 26 Usf2 1.00E-20 26 TFAP2A 1.00E-79 27 NPAS2 1.00E-17 28 HIC1 1.00E-17 29 TEAD 1.00E-17 30 TEAD4 1.00E-16 31 AR-halfsite 1.00E-16 32 STAT6 1.00E-15 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ 33 TCF4 1.00E-13 34 MITF 1.00E-13 35 TEAD3 1.00E-13 36 Atf1 1.00E-12 37 HIF-1b 1.00E-11 38 Foxo3 1.00E-10 39 E2A 1.00E-09 40 TEAD2 1.00E-09 41 Mef2a 1.00E-08 42 ZNF692 1.00E-07 43 Nkx3.1 1.00E-07 44 COUP-TFII 1.00E-07 45 MyoG 1.00E-07 46 Nkx2.5 1.00E-06 47 Snail1 1.00E-05 48 HEB 1.00E-05 49 Tbx6 1.00E-05 50 SCRT1 1.00E-04 51 Nr5a2 1.00E-04 52 Nanog 1.00E-03 53 Oct11 1.00E-03 54 Elk1 1.00E-03 55 Erra 1.00E-03 56 Gata6 1.00E-03 57 BHLHA15 1.00E-03 58 AMYB 1.00E-03 59 Nr5a2 1.00E-03 60 NFkB-p65-Rel 1.00E-02 61 Zic 1.00E-02 62 TRPS1 1.00E-02 63 Hoxa9 1.00E-02 64 HIF2a 1.00E-02 65 Isl1 1.00E-02 66 CEBP:AP1 1.00E-02 67 EWS:FLI1-fusion 1.00E-02 68 FOXK1 1.00E-02 69 ETS 1.00E-02 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Discussion In this study, we developed an explainable attention-based deep learning model DeepHBV to predict HBV integration sites. In the comparison of DeepHBV and DeepHINT on predicting HBV integration sites (S3 Table), DeepHBV out-performed DeepHINT after adding genomic features due to its more suitable model structure and parameters on recognising the surroundings of HBV integration sites. We applied two convolution layers (1st layer: 128 convolution kernels and the kernel size is 8; 2nd layer: 256 convolution kernels and the kernel size is 6) and one pooling layer (with pooling size of 3) in DeepHBV while in DeepHINT the model only have one convolution layer (64 convolution kernels and the kernel size is 6) and one pooling layer (with pool size of 3). The increasing of convolution layers enables the information from higher dimensions can be extracted, the increasing of convolution kernels enables more feature information to be extracted [23]. We trained the DeepHBV model using three strategies (1) DNA sequences near HBV integration sites (HBV integration sequences), (2) HBV integration sequences + TCGA Pan Cancer peaks, (3) HBV integration sequences + repeat peaks. We found that the model with HBV integration sequences adding TCGA Pan Cancer or repeat can both significantly improve the model performance. And the DeepHBV with HBV integration sequences adding TCGA Pan Cancer peaks performed better on independent test dataset VISDB. However, the attention intensive regions cannot be well aligned to these genomic features. Thus, we further inferred that other features such as TFBS motifs may influence DeepHBV in the prediction process. And .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ HOMER was applied to recognise these TFBS that might be related to HBV-related diseases or cancer development. We noticed that the attention intensive regions identified by attention mechanism of DeepHBV with HBV integration sequences + TCGA Pan Cancer showed strong concentration on the binding site of the tumour suppressor gene HIC1, circadian clock-related elements BMAL1, CLOCK, c-Myc, NAPS2, and the transcription factors TEAD and Nr5a2. These DNA binding proteins were closely related to tumour development [24-30]. For instance, HIC1 is a tumour suppressor gene in hepatocarcinogenesis development [24,25]. BMAL1, CLOCK, c-Myc, NAPS2 all participate in the regulation of circadian clock [26], which is reported to promote HBV-related diseases [27,28]. In accordance, the binding motif of circadian clock-related elements were also enriched from the attention intensive regions of DeepHBV with HBV integration sequences + repeats, further confirming the results (S4 Table). In addition, the other transcription factors identified by Deep HBV are TEAD and Nr5a2. TEAD deregulation affected well-established cancer genes such as BRAF, KRAS, MYC, NF2 and LKB1, and showed high correlation with clinicopathological parameters in human malignancies [29]. Nr5a2 (also known as Liver receptor homolog-1, LRH-1) binds to the enhancer II (ENII) of HBV genes, and serves as a critical regulator of their expression [30]. In summary, DeepHBV is a robust deep learning model of using convolutional neural network to predict HBV integrations. Our data provide new insight into the preference for HBV integration and mechanism research on HBV induced cancer. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Methods Data preparation A detailed step-by-step instruction of DeepHBV was provided in S1 and S2 Notes. To obtain positive training and testing samples for DeepHBV, we extracted 1000 bp DNA sequences from upstream and 1000 bp DNA sequences from downstream of HBV integration sites as positive dataset, each sample was denoted as 𝑆 = (𝑛1,𝑛2,…,𝑛2000), where 𝑛i represents the nucleotide in position i. DeepHBV, as a deep learning network also require negative samples that do not contain HBV integration sites as background area. The existing of HBV integration hot spots which contains several integration events within 30~100 kb range [13] prompted us that we should selected background area keeping enough distance from known HBV integration sites. Thus, we discarded the regions around known HBV integration sites with length 50 kb on hg38 reference genome and selected 2 kb length DNA sequences randomly on remained regions as negative samples. We encoded extracted DNA sequences using one-hot code to make the calculation of distance between features in training and the calculation of similarity more accuracy. Original DNA sequences were converted to binary matrices of 4-bit length where each dimension corresponds to one nucleotide type. Finally, we converted a 2000 bp DNA sequence into a 2000×4 binary matrix. Feature extraction DeepHBV model first applied convolution and pooling module to learn and .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ obtain sequence features around HBV integration sites (S1 Fig). Each binary matrix representing a DNA sequence entered the convolution and pooling module to execute convolution calculation. We employed multiple variant convolution kernels to calculation in order to obtain different features. S = (𝑛1,𝑛2,…,𝑛2000) denoted as a specific DNA sequence and E represented the binary matrix- encoded from S, the convolutional calculation in convolution layer refers to 𝑋 = 𝑐𝑜𝑛𝑣(𝐸), which can be described as: 𝑋𝑘,𝑗= ∑ 𝑝―1 𝑗=0 ∑ 𝐿 𝑙=1 𝑊𝑘,𝑗,𝑙𝐸𝑙,𝑖+𝑗 (1) Where 1 ≤ 𝑘 ≤ 𝑑, 𝑑 refers to the number of kernels, 1 ≤ 𝑖 ≤ 𝑛 ― 𝑝 +1, 𝑖 refers to the index, 𝑝 refers to the kernel size, n refers to input sequence length, 𝑊 refers to the kernel weight. Convolutional layer activated eigen vectors using Rectified Linear Unit (ReLU) after extracting relative eigen vectors. ReLU is an activation function in artificial neural networks which can be described as 𝑓(𝑥) = max (0,𝑥). We applied ReLU on the output matrix of each convolution layer and mapped each element on a sparse matrix. ReLU imitates real neuron activation, which enables data fitted to the model better. Then we applied max-pooling strategy to complete dimension reduction as well as support the maximum retention of predicted information. Till now, we achieved the final eigen vector 𝐹c from the binary matrix represented DNA sequence after feature extracting in convolution and pooling module. Attention mechanism in DeepHBV model DeepHBV added attention mechanism in order to capture and understand the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ position contribution in abstracted eigen-vector 𝐹c. Eigen-vector entered the attention layer, which will calculate a weight value to each dimension in 𝐹c. The attention weight represents the contribution level of the convolutional neural network (CNN) in that position. The output of attention weight 𝑡𝑗 is the contribution score, larger 𝑡𝑗 score means bigger contribution in this position to HBV integration sites prediction. All contribution scores were normalized to achieve the dense eigenvector matrix, which denoted as 𝐹𝑎: 𝐹𝑎 = ∑ 𝑞 𝑗=1 𝑎𝑗𝑣𝑗 (2) Where， 𝑎𝑗 = 𝑒𝑥𝑝 (𝑡𝑗) ∑𝑞𝑖 𝑒𝑥𝑝 (𝑡𝑖) (3) Where 𝑎𝑗 represents the relevant normalisation score, 𝑣𝑗 represents the eigenvector at position 𝑗 of the input eigenmatrix. Each position represents an extracted eigen-vector in each convolution kernel. The convolution-pooling module and the attention mechanism module need to be combined in model prediction progress, in another word, eigen-vector 𝐹c and relative eigen important score 𝐹𝑎 should work together in HBV integration sites prediction. We linked the values in eigen-vector 𝐹c and linearly mapped them to a new vector 𝐹𝑣, which is: 𝐹𝑣= (𝑑𝑒𝑛𝑠𝑒(𝑓𝑙𝑎𝑡𝑡𝑒𝑛(𝐹c))) (4) In this step, flatten layer performed function 𝑓𝑙𝑎𝑡𝑡𝑒𝑛() to reduce dimension and concatenate data; function 𝑑𝑒𝑛𝑠𝑒() was executed by dense layer, which will map .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ dimension-reduced data to a single value. Then 𝐹𝑣 and 𝐹𝑎 concatenated vector entered linear classifier prediction to calculate the probability of HBV integration happened within the current sequence, with: 𝑃 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑐𝑜𝑛𝑐𝑎𝑡(𝐹𝑎,𝐹𝑣)) (5) Where 𝑃 is the predicted score, 𝑠𝑖𝑔𝑚𝑜𝑖𝑑() represents the activation function acted as classifier in final output, 𝑐𝑜𝑛𝑐𝑎𝑡() represents the concatenate operation. In the meantime, if we give the output eigenvector 𝐹c from convolution-and-pooling module as input, and execute attention mechanism, weight vector 𝑊 can be achieved: 𝑊 = 𝑎𝑡𝑡(𝑎1,𝑎2,…,𝑎𝑞) (6) Where 𝑎𝑡𝑡() refers to the attention mechanism, 𝑎𝑖 denotes the eigenvector in 𝑖𝑡ℎ dimension in the eigenmatrix, 𝑊 represents the dataset containing contribution scores of each position in the eigenmatrix extracted by convolution-and-pooling module. DeepHBV model training After confirming each parameter in DeepHBV (S1 Table), we trained the deep learning neural network model DeepHBV via binary crossentropy. The loss function of DeepHBV can be defined as: loss = -∑𝑖 𝑦𝑖 log(𝑃) + (1 ― 𝑦𝑖) log(1 ― 𝑃) (7) Where, 𝑦𝑖 is the prediction score, 𝑃 is the binary tag value of that sequence (in this dataset, positive samples were labelled as 1 and negative samples were labelled as 0). Back propagation algorithm was adapted in training progress and .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Nesterov-accelerated adaptive moment estimation (Nadam) gradient descent algorithm was applied to optimise parameter initialization. The deep learning neural network model adapted Python 3.7, Keras library 2.2.4 [14] using three NVIDIA® Tesla V100-PCIE-32G（NVIDIA Corporation, California, USA ） for training and testing. DeepHBV takes around 90 min and 30 s for model training and testing respectively using the computational platform under such software and hardware settings. Data Availability DeepHBV is available as an open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepHBV.git Reference 1. Liang TJ. Hepatitis B: the virus and disease. Hepatology 2009;49(5 Suppl):S13-21. 2. Tu T, Budzinska MA, Shackel NA et al. HBV DNA Integration: Molecular Mechanisms and Clinical Implications. Viruses 2017;9(4). 3. Sung WK, Zheng H, Li S et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat Genet 2012;44(7):765-9. 4. Zhao LH, Liu X, Yan HX et al. Genomic and oncogenic preference of HBV integration in hepatocellular carcinoma. Nat Commun 2016;7:12992. 5. Ding D, Lou X, Hua D et al. Recurrent targeted genes of hepatitis B virus in the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ liver cancer genomes identified by a next-generation sequencing-based approach. PLoS Genet 2012;8(12):e1003065. 6. Tu T, Budzinska MA, Vondran FWR et al. Hepatitis B Virus DNA Integration Occurs Early in the Viral Life Cycle in an In Vitro Infection Model via Sodium Taurocholate Cotransporting Polypeptide-Dependent Uptake of Enveloped Virus Particles. J Virol 2018;92(11). 7. Mason WS, Gill US, Litwin S et al. HBV DNA Integration and Clonal Hepatocyte Expansion in Chronic Hepatitis B Patients Considered Immune Tolerant. Gastroenterology 2016;151(5):986-998 e4. 8. Litjens G, Kooi T, Bejnordi BE et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60-88. 9. Bailey TL, Baker ME, Elkan CP. An artificial intelligence approach to motif discovery in protein sequences: Application to steroid dehydrogenases. The Journal of Steroid Biochemistry and Molecular Biology 1997;62(1):29-44. 10. Yamashita R, Nishio M, Do RKG et al. Convolutional neural networks: an overview and application in radiology. Insights into Imaging 2018;9(4):611-629. 11. Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science 2014. 12. Guidotti R, Monreale A, Ruggieri S et al. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 2018;51(5):Article 93. 13. Hu Z, Zhu D, Wang W et al. Genome-wide profiling of HPV integration in cervical cancer identifies clustered genomic hot spots and a potential .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ microhomology-mediated integration mechanism. Nat Genet 2015;47(2):158-63. 14. Chollet Fao. Keras. 2015. 15. Hu H, Xiao A, Zhang S et al. DeepHINT: understanding HIV-1 integration via deep learning with attention. Bioinformatics 2019;35(10):1660-1667. 16. Haeussler M, Zweig AS, Tyner C et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 2019;47(D1):D853-D858. 17. Inoue F, Kircher M, Martin B et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res 2017;27(1):38-52. 18. Robinson JT, Thorvaldsdottir H, Winckler W et al. Integrative genomics viewer. Nature Biotechnology 2011;29(1):24-26. 19. Tang D, Li B, Xu T et al. VISDB: a manually curated database of viral integration sites in the human genome. Nucleic Acids Res 2019. 20. Zhang W, Itoh K, Tanida J et al. Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Appl Opt 1990;29(32):4790-7. 21. Bruna J, Zaremba W, Szlam A et al. Spectral Networks and Locally Connected Networks on Graphs. Computer Science 2013. 22. Heinz S, Benner C, Spann N et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell 2010;38(4):576-589. 23. Seide F, Gang L, Dong Y. Conversational speech transcription using .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ context-dependent deep neural networks. 2012. 24. Taniguchi K, Roberts LR, Aderca IN et al. Mutational spectrum of beta-catenin, AXIN1, and AXIN2 in hepatocellular carcinomas and hepatoblastomas. Oncogene 2002;21(31):4863-71. 25. Zheng J, Xiong D, Sun X et al. Signification of Hypermethylated in Cancer 1 (HIC1) as Tumor Suppressor Gene in Tumor Progression. Cancer Microenviron 2012;5(3):285-93. 26. Paibomesai MI, Moghadam HK, Ferguson MM et al. Clock genes and their genomic distributions in three species of salmonid fishes: Associations with genes regulating sexual maturation and cell cycling. BMC Res Notes 2010;3:215. 27. Fekry B, Ribas-Latre A, Baumgartner C et al. Incompatibility of the circadian protein BMAL1 and HNF4alpha in hepatocellular carcinoma. Nat Commun 2018;9(1):4349. 28. Mukherji A, Bailey SM, Staels B et al. The circadian clock and liver function in health and disease. J Hepatol 2019;71(1):200-211. 29. Huh HD, Kim DH, Jeong HS et al. Regulation of TEAD Transcription Factors in Cancer Biology. Cells 2019;8(6). 30. Cai YN, Zhou Q, Kong YY et al. LRH-1/hB1F and HNF1 synergistically up-regulate hepatitis B virus gene transcription and DNA replication. Cell Research 2003;13(6):451-458. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Figure legends Figure 1. The deep learning framework applied in DeepHBV. (a) Scheme of encoding a 2 kb DNA sequence into a binary matrix using one-hot code; (b) A brief flowchart of DeepHBV structure, the matrix shape was included in brackets, and a detailed flowchart was in S1 Fig. Figure 2. Evaluation of DeepHBV and DeepHINT model prediction performance on the test dataset. (a) receiver-operating characteristic (ROC) curves and (b) precision recall (PR) curves, respectively. “DeepHBV with HBV integration sequences” refers to DeepHBV model with only HBV integration sequences as input; “DeepHINT with HBV integration sequences” refers to DeepHINT model with only HBV integration sequences as input; “DeepHBV with HBV integration sequences + repeat” refers to DeepHBV integration sequences and repeat sequences as input; “DeepHBV with HBV integration sequences” refers to DeepHBV integration sequences and TCGA Pan Cancer sequences as input: “DeepHBV with HBV integration sequences + repeat + (test) VISDB” refers to DeepHBV using HBV integration sequences and repeat sequences for training and using VISDB as independent test dataset; “HBV with HBV integration sequences + TCGA Pan Cancer + (test) VISDB” refers to DeepHBV using HBV integration sequences as TCGA Pan Cancer sequences for training and using VISDB as independent test dataset. Figure 3. The attention weight distribution of analysed by DeepHBV with HBV integration sequences + genomic features. (a) DeepHBV with HBV integration sequences + TCGA Pan Cancer peaks; (b) DeepHBV with HBV integration .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ sequences + repeat peaks. The left graph showed the fractions of attention weight, which were averaged among all samples and normalized to the average of all positions, each index represents a 3 bp region due to the multiple convolution and pooling operation. The graphs on the right are representative samples of attention weight distribution of positive samples and negative samples. Figure 4. Attention intensive regions highlighted essential local genomic features on predicting HBV integration sites. Representative examples showed the positional relationship between the attention intensive sites and several genomic features using DeepHBV with HBV integration sequences + TCGA Pan Cancer model on (a) chr5:1,294,063-1,296,063 (hg38), (b) chr5: 1291277-1293277 (hg38). Each of these two sequences contains HBV integration sites from both dsVIS and VISDB. Enriched DNA binding proteins detected by HOMER from the attention intensive regions using the output of DeepHBV then we applied FIMO [1] to find the enriched motif position and label the motifs on attention intensive regions. UCSC genome browser [2] and Matplotlib [3] was used for visualisation. “HPV integration site” refers to the sites selected from our unpublished database used as testing samples. “Attention Intensive Sites” denotes the sites with top 5% attention weight. “RepeatMasker”, “TCGA Pan Cancer”, “DNase Clusters”, “Con20mammals”, “GeneHancer”, “Layered H3K27ac”, “Layered H3K36me3” are genomic features. References 1. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ motif. Bioinformatics 2011;27(7):1017-8. 2. Haeussler M, Zweig AS, Tyner C et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 2019;47(D1):D853-D858. 3. Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 2007;9(3):90-95. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ Supporting information S1 Fig. DeepHBV framework. Each part represents a layer in neural network and 𝑛 × 𝑛 stands for the output dimension which was explained in S2 Note. Two continuous convolution layers were used to extract features; max-pooling layers can reduce the dimension while keeping the feature matrix has the ability to predicting information; dropout layer randomly drop some results to prevent over-fit; flatten layer is responsible for reduce the dimensions and connect them; dense layer is used to map the output from last layer to a specific value; attention layer and attention flatten are used to give a weight score to each dimension in the feature matrix; concatenate layer concatenates captured features and importance scores of those features from the convolution module and the attention mechanism model. Prediction Output offered the final output reveals the probability of HBV infection. S2 Fig. Prediction performance on the HBV integration dataset with different types of genomic features added in. We found that character 1 and character 3 outperformed the DeepHBV model with an significant increase in AUPR and AUROC score on character 1 and character 3, indicating that DeepHBV can capture genomic features from character 1 and character 3 effectively, so we did further analysis on each single items in character group 1 and 3, and found that Repeats and TCGA Pan Cancer are the genomic features that can be captured by DeepHBV which significantly improved model performance. DeepHBV with HBV integration sequences + repeats reached the AUROC of 0.8378 and the AUPR of 0.7535, which DeepHBV with HBV integration sequences + TCGA Pan Cancer reached the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ AUROC of 0.9430 and the AUPR of 0.9310. S1 Table. The parameters for the deep neural network used in DeepHBV. S2 Table. Genomic features and sources. (Access date: Novemember 16th, 2019) S3 Table. Comparison of DeepHBV and DeepHINT result record. S4 Table. Enriched TFBS from attention intensive regions of DeepHBV with HBV integration sites + repeat peaks. S1 Note. DeepHBV framework. DeepHBV neural network structure design and hyperparameters involved in DeepHBV are noted. S2 Note. Mathematical matters of the DeepHBV. There are explanations for 8 mathematical matters (i.e. encoding DNA sequences, convolution layers, the max pooling layer, dropout layer, attention layer, concatenate layer, linear classifier and optimisation algorithm) of the DeepHBV in this part. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425855doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425855 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_08_425864 ---- Fatty acid oxidation participates of the survival to starvation, cell cycle progression and differentiation in the insect stages of Trypanosoma cruzi 1 2 3 4 Fatty acid oxidation participates of the survival to starvation, cell cycle progression 5 and differentiation in the insect stages of Trypanosoma cruzi 6 7 Rodolpho Ornitz Oliveira Souza¹, Flávia Silva Damasceno¹, Sabrina Marsiccobetre1, Marc Biran2, 8 Gilson Murata3, Rui Curi4, Frédéric Bringaud5, Ariel Mariano Silber¹* 9 10 11 ¹ University of São Paulo, Laboratory of Biochemistry of Tryps – LaBTryps, Department of 12 Parasitology, Institute of Biomedical Sciences – São Paulo, SP, Brazil 13 14 2 Centre de Résonance Magnétique des Systèmes Biologiques (RMSB), Université de Bordeaux, 15 CNRS UMR-5536, Bordeaux, France 16 17 3 University of São Paulo, Department of Physiology, Institute of Biomedical Sciences – São Paulo, 18 SP, Brazil 19 20 4 Cruzeiro do Sul University, Interdisciplinary Post-Graduate Program in Health Sciences - São Paulo, 21 SP, Brazil 22 23 5 Laboratoire de Microbiologie Fondamentale et Pathogénicité (MFP), Université de Bordeaux, 24 CNRS UMR-5234, Bordeaux, France 25 26 27 28 *Corresponding author 29 E-mail: asilber@usp.br (AMS) 30 31 32 33 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 34 Abstract 35 During its complex life cycle, Trypanosoma cruzi colonizes different niches in its insect and 36 mammalian hosts. This characteristic determined the types of parasites that adapted to face 37 challenging environmental cues. The primary environmental challenge, particularly in the insect 38 stages, is poor nutrient availability. These T. cruzi stages could be exposed to fatty acids originating 39 from the degradation of the perimicrovillar membrane. In this study, we revisit the metabolic fate of 40 fatty acid breakdown in T. cruzi. Herein, we show that during parasite proliferation, the glucose 41 concentration in the medium can regulate the fatty acid metabolism. At the stationary phase, the 42 parasites fully oxidize fatty acids. [U-14C]-palmitate can be taken up from the medium, leading to 43 CO2 production via beta-oxidation. Lastly, we also show that fatty acids are degraded through beta- 44 oxidation. Additionally, through beta-oxidation, electrons are fed directly to oxidative 45 phosphorylation, and acetyl-CoA is supplied to the tricarboxylic acid cycle, which can be used to 46 feed other anabolic pathways such as the de novo biosynthesis of fatty acids. 47 48 Author Summary 49 Trypanosoma cruzi is a protist parasite with a life cycle involving two types of hosts, a 50 vertebrate one (which includes humans, causing Chagas disease) and an invertebrate one (kissing 51 bugs, which vectorize the infection among mammals). In both hosts, the parasite faces environmental 52 challenges such as sudden changes in the metabolic composition of the medium in which they 53 develop, severe starvation, osmotic stress and redox imbalance, among others. Because kissing bugs 54 feed infrequently in nature, an intriguing aspect of T. cruzi biology (it exclusively inhabits the 55 digestive tube of these insects) is how they subsist during long periods of starvation. In this work, we 56 show that this parasite performs a metabolic switch from glucose consumption to lipid oxidation, and 57 it is able to consume lipids and the lipid-derived fatty acids from both internal origins as well as 58 externally supplied compounds. When fatty acid oxidation is chemically inhibited by etomoxir, a very 59 well-known drug that inhibits the translocation of fatty acids into the mitochondria, the proliferative 60 insect stage of the parasites has dramatically diminished survival under severe metabolic stress and 61 its differentiation into its infective forms is impaired. Our findings place fatty acids in the centre of 62 the scene regarding their extraordinary resistance to nutrient-depleted environments. 63 64 65 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 66 Introduction 67 T. cruzi, a flagellated parasite, is the causative agent of Chagas disease, a neglected health 68 problem endemic to the Americas [1]. The parasite life cycle is complex, alternating between 69 replicative and non-replicative forms in two types of hosts, mammalians and triatomine insects [2]. 70 In mammalian hosts, two primary forms are recognized: replicative intracellular amastigotes and 71 nondividing trypomastigotes, which are released from infected host cells into the extracellular 72 medium. After being released from infected cells, trypomastigotes can spread the infection by 73 infecting new cells, or they can be ingested by a triatomine bug during its blood meal. Once inside 74 the invertebrate host, the ingested trypomastigotes differentiate into epimastigotes, which initiate 75 their proliferation and colonization of the insect digestive tract [3]. Once the epimastigotes reach the 76 final portion of the digestive tube, they initiate differentiation into non-proliferative, infective 77 metacyclic trypomastigotes. These forms will be expelled during a new blood meal and will be able 78 to infect a new vertebrate host [2,4–6]. 79 The diversity of environments through which T. cruzi passes during its life cycle (i.e., the 80 digestive tube of the insect vector, the bloodstream and the mammalian cells cytoplasm) subjects it 81 to different levels of nutrient availability [3,7]. Therefore, this organism evolved a robust, flexible 82 and efficient metabolism [5,8]. As an example, it was recognized early on that epimastigotes are able 83 to rapidly switch their metabolism, allowing the consumption of carbohydrates and different amino 84 acids [9,10]. Several studies identified aspartate, asparagine, glutamate [11], proline [12–14], 85 histidine [15], alanine [11,16] and glutamine [11,17] as oxidisable energy sources. Despite the 86 quantity of accumulated information on amino acid and carbohydrate consumption, little is known 87 about how T. cruzi uses fatty acids and how these compounds contribute to the parasite´s metabolism 88 and survival. In this study, we explore fatty acid metabolism in T. cruzi. We also address fatty acid 89 regulation by external glucose levels and the involvement of their oxidation in the replication and 90 differentiation of T. cruzi insect stages. 91 92 Methods 93 Parasites 94 Epimastigotes of T. cruzi strain CL clone 14 were maintained in the exponential growth phase 95 by sub-culturing them for 48 h in Liver Infusion Tryptose (LIT) medium at 28 °C [18]. Metacyclic 96 trypomastigotes were obtained through the differentiation of epimastigotes at the stationary growth 97 phase in TAU-3AAG (Triatomine Artificial Urine supplemented with 10 mM proline, 50 mM 98 glutamate, 2 mM aspartate and 10 mM glucose) as previously reported [19]. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 99 100 Fatty acid oxidation assays 101 Preparation of palmitate-BSA conjugates. Sodium palmitate at 70 mM was solubilized in water by 102 heating it up to 70 °C. BSA free fatty acids (FFA BSA) (Sigma®) was dissolved in PBS and warmed 103 up to 37 °C with continuous stirring. Solubilized palmitate was added to BSA at 37 °C with 104 continuous stirring (for a final concentration of 5 mM in 7% BSA). The conjugated palmitate-BSA 105 was aliquoted and stored at −80 °C [20]. 106 107 CO2 production from oxidisable carbon sources. To test the production of CO2 from palmitate, 108 glucose or histidine, exponentially growing epimastigotes (5x107 mL-1) were washed twice in PBS 109 and incubated for different times (0, 30, 60 and 120 min) in the presence of 0.1 mM of palmitate 110 spiked with 0.2 µCi of 14C-U-substrates. To trap the produced CO2, Whatman paper was embedded 111 in 2 M KOH solution and was placed in the top of the tube. The 14CO2 trapped by this reaction was 112 quantified by scintillation [15,16]. 113 114 1H-NMR analysis of the exometabolome. Epimastigotes (1x108 mL-1) were collected by 115 centrifugation at 1,400 x g for 10 min, washed twice with PBS and incubated in 1 mL (single point 116 analysis) of PBS supplemented with 2 g/L NaHCO3 (pH 7.4). The cells were maintained for 6 h at 27 117 °C in incubation buffer containing [U-13C]-glucose, non-enriched palmitate or no carbon sources. The 118 integrity of the cells during the incubation was checked by microscopic observation. The supernatant 119 (1 mL) was collected and 50 µl of maleate solution in D2O (10 mM) was added as an internal 120 reference. 1H-NMR spectra were collected at 500.19 MHz on a Bruker Avance III 500 HD 121 spectrometer equipped with a 5 mm Prodigy cryoprobe. The measurements were recorded at 25 °C. 122 The acquisition conditions were as follows: 90° flip angle, 5,000 Hz spectral width, 32 K memory 123 size, and 9.3 sec total recycling time. The measurements were performed with 64 scans for a total 124 time of close to 10 min and 30 sec. The resonances of the obtained spectra were integrated and the 125 metabolite concentrations were calculated using the ERETIC2 NMR quantification Bruker program. 126 127 Oxygen consumption. To evaluate the importance of internal fatty acid sources in O2 consumption, 128 exponentially growing parasites were treated or not treated with 500 µM ETO (the inhibitor of 129 carnitine palmitoyltransferase 1), washed twice in PBS and resuspended in Mitochondrial Cellular 130 Respiration (MCR) buffer. The rates of oxygen consumption were measured using intact cells in a 131 high-resolution oxygraph (Oxygraph-2k; Oroboros Instruments, Innsbruck, Austria). Oligomycin A 132 (0.5 µg/mL) and FCCP (0.5 µM) were sequentially added to measure the optimal non-coupled .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 133 respiration and the respiration leak state, respectively. The data were recorded and treated using 134 DatLab 7 software [15,16,21]. 135 136 Mitochondrial activity assays 137 MTT and Alamar Blue. The parasites were washed twice and incubated in PBS supplemented with 138 0.1 mM palmitate in 0.35% FFA BSA, 0.35% FFA BSA alone, and 5 mM glucose, and 5 mM histidine 139 or not supplemented media were used as controls (positives and negative, respectively). The cell 140 viability was evaluated at 24 h and 48 h after incubation using the MTT assay, as described in [15,16]. 141 Alamar Blue. The parasites were washed twice and incubated in PBS or PBS supplemented with 500 142 μM ETO in 96-well plates. The plates were maintained at 28 °C during all the experiments. After 143 every 24 h, the cells were incubated with 0.125 μg.mL-1 of Alamar Blue reagent and kept at 28 °C for 144 2 h under protection from light. The fluorescence was accessed using the wavelengths λexc = 530 nm 145 and λem = 590 nm in the SpectraMax® i3 (Molecular Devices) plate reader. 146 147 Measurement of intracellular ATP content 148 The intracellular ATP levels were assessed using a luciferase assay kit (Sigma-Aldrich ®), as 149 described in [15–17]. In brief, the parasites were incubated in PBS supplemented (or not) with 0.1 150 mM palmitate, 0.35% FFA BSA, 5 mM glucose or 5 mM histidine for 24 h at 28 °C. The ATP 151 concentrations were determined by using a calibration curve with ATP disodium salt (Sigma), and 152 the luminescence at 570 nm was measured as indicated by the manufacturer. 153 154 Enzymatic activities 155 Carnitine palmitoyltransferase 1 (CPT1). The epimastigotes were washed twice in PBS (1,000 x 156 g, 5 min at 4 °C), resuspended in buffered Tris-EDTA (100 mM, 2.5 mM and 0.1% Triton X-100) 157 containing 1 µM phenylmethyl-sulphonyl fluoride (PMSF), 0.5 mM N-alpha-p-tosyl-lysyl- 158 chloromethyl ketone (TLCK), 0.01 mg aprotinin and 0.1 mM trans-epoxysuccinyl-L-leucyl amido 159 (4-guanidino) butane (E-64) as a protease inhibitors (Sigma Aldrich®) and lysed by sonication (5 160 pulses for 1 min each, 20%). The lysates were clarified by centrifugation at 10,000 x g for 30 min at 161 4 °C. The soluble fraction was collected and the proteins were quantified by Bradford method [22] 162 and adjusted to 0.1 mg/mL protein. The reaction mixture contained 0.5 mM L-carnitine, 0.1 mM 163 palmitoyl-CoA and 2.5 mM DTNB in Tris-EDTA buffer (pH = 8.0). The CPT1 activity was measured 164 spectrophotometrically at 412 nm by DTNB reaction with free HS-CoA, forming the TNB- ion. To 165 calculate the specific activity, the absorbance values were converted into molarity by using the TNB- 166 extinction molar coefficient of 12,000 M-1.s-1 [23]. As a blank, we performed the same assay without .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 167 adding the substrate. All the enzymatic assays were performed in 96-well plates at a final volume of 168 0.2 mL in the SpectraMax® i3 (Molecular Devices). 169 170 Acetyl-CoA carboxylase (ACC). The ACC activity was measured spectrophotometrically by 171 coupling its enzymatic reaction with that of citrate synthase (CS), which uses oxaloacetate and acetyl- 172 CoA to produce citrate. Measurements were performed at the end-points in two steps. First, the 173 reaction mixture contained 100 mM potassium phosphate buffer (pH = 8.0), 15 mM KHCO3, 5 mM 174 MnCl2, 5 mM ATP, 1 mM acetyl-CoA and 0.1 μM biotin. The reaction was initiated by adding 0.1 175 mg of cell extract and developed using 15 min incubations at 28 °C. The reaction was stopped by 176 adding perchloric acid 40% (v/v) and centrifuged 10,000 x g for 15 min at 4 °C. The second reaction 177 was performed by using 0.1 mL of the supernatant from the first reaction, 20 mM oxaloacetate and 178 0.5 mM of DTNB in 100 mM potassium phosphate buffer (pH = 8.0). The reaction was initiated by 179 adding 0.5 units of CS (Sigma Aldrich©). To calculate the specific activity of ACC, we converted 180 the absorbance values to molarity by using the TNB- extinction molar coefficient of 12,000 M-1.s-1. 181 For the blank reaction, we performed the same assay without acetyl-CoA [24]. 182 183 Hexokinase (HK). The HK activity was measured as described in [25]. Briefly, the activity was 184 measured by coupling the hexokinase activity with a commercial glucose-6-phosphate 185 dehydrogenase, which oxidizes the glucose-6-phosphate (G6PD, SIGMA) resulting from the HK 186 activity with the concomitant reduction of NADP+ to NADPH. The resulting NADPH was 187 spectrophotometrically monitored at 340 nm. The reaction mixture contained 50 mM Triethanolamine 188 buffer pH 7.5, 5 mM MgCl2, 100 mM KCl, 10 mM glucose, 5 mM ATP and 5 U of commercial 189 G6PD. To calculate the specific activity, the absorbance values were converted to molarity using the 190 NADP(H) extinction molar coefficient of 6,220 M-1.s-1. 191 Serine palmitoyltransferase (SPT). The SPT activity was measured through the reduction of the 192 DTNB reaction by the free HS-CoA, forming the TNB- ion, which was measured 193 spectrophotometrically at 412 nm as previously described [23]. In brief, the epimastigotes were 194 washed twice in PBS, resuspended in Tris-EDTA buffer (100 mM/2.5 mM) containing Triton X-100 195 0.1% and lysed by sonication (20% of potency, during 2 min). The reaction mixture contained 0.1 196 mg of protein free-cell extract, 0.5 mM L-serine, 0.1 mM palmitoyl-CoA and 2.5 mM DTNB in Tris- 197 EDTA buffer (100 mM/2.5 mM) pH = 8.0 [26]. To calculate the specific activity, we converted the 198 absorbance values to molarity using the TNB- extinction molar coefficient of 12,000 M-1.s-1. For the 199 blank reaction, we performed the same assay without adding palmitoyl-CoA. All the enzymatic assays 200 were performed in 96-well plates in a final volume of 0.2 mL in the SpectraMax® i3 (Molecular 201 Devices). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 202 203 Glucose and triglyceride quantification 204 Spent LIT medium from epimastigote cultures was collected by recovering the supernatants 205 from a centrifugation (10,000 x g for 15 min at 4 °C). Each sample of spent LIT was analysed for its 206 glucose and triglyceride contents using commercial kits (triglyceride monoreagent and glucose 207 monoreagent by Bioclin Brazil) according to the manufacturer’s instructions. These kits are based on 208 colorimetric enzymatic reactions, and the absorbance of each assay was measured in 96-well plates 209 at a final volume of 0.2 mL in the SpectraMax® i3 (Molecular Devices). 210 211 Proliferation assays 212 Exponentially growing T. cruzi epimastigotes (5x107 mL-1) were treated with different 213 concentrations of ETO or not treated (negative control) in LIT medium. As a positive control for 214 growth inhibition, we used a combination of rotenone (60 µM) and antimycin (0.5 µM) [27]. The 215 parasites (2.5x106 mL-1) were transferred to 96-well plates and then incubated at 28 °C. The cell 216 proliferation was quantified by reading the optical density (OD) at 620 nm for eight days. The OD 217 values were converted to cell numbers using a linear regression equation previously obtained under 218 the same conditions. Each experiment was performed in quadruplicate [28]. 219 220 Flow cytometry analyses 221 Cell death. Epimastigotes in the exponential phase of growth were maintained in LIT and treated 222 with ETO 500 µM for 5 days. After the incubation time, the parasites were analysed as described in 223 [28]. The cells were analysed by flow cytometry (FACScalibur BD Biosciences). 224 225 Cell cycle (DNA content). Epimastigotes in the exponential phase of growth were maintained in LIT 226 and treated with ETO 500 µM over 5 days. After the incubation time, the parasites were washed twice 227 in PBS and resuspended in lysis buffer (phosphate buffer Na2HPO4 7.7 mM; KH2PO4 2.3 mM; pH = 228 7.4) and digitonin 64 µM. After incubating on ice for 30 min, propidium iodide 0.2 μg/mL was added. 229 The samples were analysed by flow cytometry (Guava) adapted from [29]. 230 231 Fatty acid staining using BODIPY® 500/510. Exponentially growing epimastigotes were kept in 232 LIT medium to reach three different cell densities (2.5x107 mL-1, 5x107 mL-1 and 108 mL-1) in 24- 233 well plates at 28 °C. Twenty-four hours before the flow cytometry analysis, the parasites were treated 234 with 1 µM C1-BODIPY® 500/510-C12. This fluorophore allows for measurements of the relationship 235 between fatty acid accumulation and consumption by shifting the fluorescence filter. The samples 236 were collected, washed twice in PBS and incubated in 4% paraformaldehyde for 15 min. After .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 237 incubation, the cells were washed twice with PBS and suspended in the same buffer. Flow cytometry 238 analysis was performed with FL-1 and FL-2 filters in a FACS Fortessa DB®. The results were 239 analysed using FloJo software. 240 241 Fluorescence microscopy 242 The parasites were maintained in LIT medium as previously reported for fatty acid staining 243 using BODIPY® 500/510. After incubation, the cells were washed twice in PBS and placed on glass 244 slides. The images were acquired with a digital DFC 365 FX camera coupled to a DMI6000B/AF6000 245 microscope (Leica). The images were analysed using ImageJ software. 246 247 Results 248 Palmitate supports ATP synthesis in T. cruzi 249 We initially investigated the ability of T. cruzi epimastigotes to oxidize fatty acids. To this 250 end, we used palmitate as a proxy for fatty acids in general. The parasites were incubated with 0.1 251 mM 14C-[U]-palmitate, which allowed us to measure the production of 1.3 nmoles of CO2 derived 252 from palmitate oxidation during the first 60 min and 1.5 extra nmoles during the following 60 min 253 (Fig 1a). This finding indicated that beta-oxidation and the further ‘burning’ of the resulting acetyl- 254 CoA is operative in epimastigote mitochondria. Because palmitate is taken up from extracellular 255 medium and oxidized to CO2, it is reasonable to assume that it could contribute to resistance to severe 256 nutritional stress. To support this idea, we tested the ability of palmitate to extend parasite survival 257 under extreme nutritional stress. Parasites were incubated for 24 and 48 h in PBS (negative control, 258 in this condition we expected the lower viability after the incubations), 0.1 mM palmitate in PBS 259 supplemented with BSA (as a palmitate carrier), 5.0 mM histidine in PBS or 5.0 mM glucose in PBS 260 (both positive controls, since it is well knowing the ability of both metabolites to extend the parasites´ 261 viability in metabolic stress conditions, see [15]). As an additional negative control, we used PBS 262 supplemented with BSA without added palmitate. The viability of these cells was assayed by 263 measuring the total reductive activity by MTT assay. Additionally, we measured the total ATP levels. 264 Cells incubated in the presence of palmitate showed higher viability than the negative controls, but 265 not as high as that of parasites incubated with glucose or histidine (Fig 1b). Consistently, parasites 266 incubated in the presence of palmitate showed higher ATP contents than both negative controls. 267 However, the intracellular ATP levels in the cells incubated with palmitate were diminished by half 268 when compared to parasites incubated with histidine. Interestingly, the palmitate kept the ATP 269 content at levels comparable to glucose (Fig 1c). 270 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 271 Figure 1. Palmitate oxidation promotes ATP production and viability in epimastigote forms 272 under starvation. Schematic representation of 14C-U-palmitate metabolism. The metabolites 273 corresponding to labelled palmitate metabolism are presented in green. A) 14CO2 production from 274 epimastigotes incubated in PBS with 14C-U-palmitate 100 µM. The 14CO2 was captured at 0, 30, 60 275 and 120 min. B) Viability of epimastigote forms after incubation with different carbon sources and 276 palmitate. The viability was assessed after 24 and 48 h by MTT assay. C) The intracellular ATP 277 content was evaluated following incubation with different energy substrates or not (PBS, negative 278 control). The ATP concentration was determined by luciferase assay and the data were adjusted by 279 the number of cells. A statistical analysis was performed with one-way ANOVA followed by Tukey's 280 post-test at p < 0.05 using the GraphPad Prism 8.0.2 software program. We represent the level of 281 statistical significance in this figure as follows: *** p value < 0.001; ** p value < 0.01; * p value < 282 0.05. For a p value > 0.05 we consider the differences to be not significant (ns). 283 284 Epimastigote forms excrete acetate as a primary end-product of palmitate oxidation 285 Because the epimastigotes were able to oxidize 14C-U-palmitate to 14CO2, we were interested 286 in analysing their exometabolome and comparing it with that of parasites exclusively consuming 287 glucose, palmitate or without any carbon source. Thus, we subjected exponentially growing parasites 288 to 16 h of starvation and then incubated them for 6 h in the presence of 0.3 mM palmitate, 10 mM 289 13C-U-glucose or without any carbon source. For the control, we analysed a sample of non-starved 290 parasites. After the incubations, the extracellular media were collected and analysed by 1H-NMR 291 spectrometry. As expected, all the incubation conditions produced different flux profiles for excreted 292 metabolites (Fig 2 and S1 Fig). Under our experimental conditions, the non-starved parasites 293 primarily excreted succinate and acetate in similar quantities, and alanine and lactate to a lesser extent. 294 Parasites starved for 16 h in PBS and left to incubate in the absence of other metabolites had 295 diminished succinate production (~7-fold) but increased acetate production three-fold compared to 296 the non-starved parasites. It is relevant to stress that the only possible origin for these metabolites are 297 internal carbon sources (ICS). Notably, no other excreted metabolites were detected under these 298 conditions, indicating that under starvation, most of the ICS are transformed into acetate as an end 299 product, which is compatible with the oxidation of internal fatty acids. These results raise the question 300 about the metabolic fates of glucose or fatty acids in previously starved parasites. Starved 301 epimastigotes that recovered in the presence of glucose exhibited a profuse excretion of succinate 302 (450-fold the quantity excreted by the starved cells) and roughly equivalent quantities of acetate 303 compared with the starved cells. Interestingly, lactate and alanine were also excreted at similar levels. 304 As expected, the recovery with glucose produced an increase in all the secreted metabolites. However, 305 analysing their distribution is a reconfiguration of the metabolism towards a majority production of 306 succinate. Finally, in epimastigotes incubated with palmitate, we observed an increase in the acetate 307 and alanine production of approximately 2.5 times to the levels in parasites that recovered in the 308 presence of glucose. Interestingly, succinate is excreted in a smaller quantity than acetate and alanine, 309 but still at 10-fold the rate observed in the starved non-recovered cells. Surprisingly, there was also a .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 310 significant production of pyruvate (not previously described in the literature, and not observed under 311 any other conditions) and a small amount of lactate derived from palmitate. 312 313 Figure 2. Excreted end products of glucose and palmitate metabolism in epimastigote forms of 314 T. cruzi. A) The extracellular medium of epimastigote forms incubated under different conditions 315 was analysed by 1H-NMR spectrometry to detect and quantify the end-products. The resulting data 316 were expressed in nmoles/h/108 cells. Means ± SD of three independent experiments. ICS is internal 317 carbon sources; nd is non-detectable. B) and C) Schematic representation of the contribution of 318 glucose and palmitate to the metabolism of epimastigote forms of T. cruzi. The glycosomal 319 compartment and TCA cycle are indicated. The amount of end-product determined by the font size. 320 Numbers indicates enzymatic steps. 1. Glycolysis; 2. pyruvate dehydrogenase; 3. citrate synthase; 4. 321 aconitase; 5. isocitrate dehydrogenase; 6. α-ketoglutarate dehydrogenase; 7. succinyl-CoA 322 synthetase; 8. Succinate dehydrogenase/complex II/fumarate reductase NADH-dependent; 9. 323 fumarate hydratase; 10. malate dehydrogenase; 11. Malic enzyme; 12. alanine dehydrogenase/alanine 324 aminotransferase; 13. lactate dehydrogenase; 14. acetate:succinyl-CoA transferase; 15. acetyl-CoA 325 hydrolase; 16. succinyl-CoA synthetase; 17. Glycosomal fumarate reductase and 18. Palmitate 326 oxidation by beta-oxidation, resulting in FADH2, NADH and acetyl-CoA; Abbreviations: Cit: Citrate, 327 Aco: Aconitate, IsoC: Isocitrate, α-kg: α-Ketoglutarate, Suc-CoA: Succinyl-CoA, Suc: Succinate, 328 Fum: Fumarate, Mal: Malate, and Oxa: Oxaloacetate. 329 330 Glucose metabolism represses the fatty acid oxidation in epimastigotes 331 Glucose is the primary carbon source for exponentially proliferating epimastigotes, and after 332 its exhaustion from the culture medium, the parasites change their metabolism to use amino acids as 333 carbon sources preferentially [10]. Therefore, we were interested in analysing if this preference for 334 glucose is maintained in relation to the consumption of lipids. To determine if glucose metabolism 335 interferes with the consumption of fatty acids, we created a 48 h proliferation curve using parasites 336 with an initial concentration adjusted to 2.5 x 107 mL-1 and quantified them for 24 h each. Under these 337 conditions, the parasites from the beginning of the experiment, at 0 h, are at mid-exponential phase, 338 they are at late exponential phase at 24 h, and at 48 h they reached stationary phase at a concentration 339 of 10 x 107 mL-1 (Fig 3A). At 0 h, 24 h and 48 h, the culture medium was collected to measure the 340 remaining glucose and triacylglycerol (TAGs) concentrations (Figs 3B and 3C). Most of the glucose 341 was consumed during the first 24 h (during proliferation), while the concentration of TAGs remained 342 the same. After 48 h of proliferation (stationary phase), the TAG levels and lipid contents of the 343 droplets were decreased by 1.5-fold and 2-fold, respectively, suggesting that glucose is preferentially 344 consumed relative to fatty acids. These data show a decrease in the extracellular TAGs between 24 345 and 48 h, while the glucose was already almost entirely consumed, suggesting that glucose is 346 negatively regulating the fatty acid catabolism. 347 348 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 349 Figure 3. Changes in glucose and triacylglycerol contents in LIT medium. A) Growth curve of 350 epimastigote forms. B) Glucose quantification over 48 h. C) Triacylglycerol levels over 48 h. In each 351 experiment, we collected each medium at different times and subjected it to quantification according 352 to the manufacturer's instructions. All the experiments were performed in triplicates. Statistical 353 analysis was performed with one-way ANOVA followed by Tukey's post-test p < 0.05 using the 354 GraphPad Prism 8.0.2 software program. We represent the levels of statistical significance in this 355 figure as follows: *** p value < 0.001; ** p value < 0.01; and * p value < 0.05. For p value > 0.05, 356 we consider the differences not significant (ns). 357 358 Epimastigote forms use endogenous fatty acids to support growth after glucose exhaustion 359 From the previous results, we learned that under glucose deprivation, TAGs are taken up by 360 the epimastigotes, and internally stored fatty acids are mobilized. However, to date, we did not 361 provide any evidence pointing to their use as reduced carbon sources. To confirm this idea, 362 exponentially proliferating epimastigotes were incubated in PBS supplemented with palmitate and 363 14C-U-glucose, or reciprocally, glucose and 14C-U-palmitate. In both cases, the production of 14C- 364 labelled CO2 was quantified. The presence of 5 mM glucose diminished the release of 14CO2 from 365 14C-U-palmitate by 90% while the presence of palmitate did not interfere with the production of 14CO2 366 from 14C-U-glucose (Fig 4). Taken together, our results show that glucose inhibits TAGs and fatty 367 acid consumption, and after glucose exhaustion, a metabolic switch occurs towards the oxidation of 368 internally stored fatty acids. 369 370 Figure 4. Glucose metabolism inhibits FAO. Parasites were incubated in the presence of 14C-U- 371 palmitate + 5 mM glucose and 14C-U-glucose + 0.1 mM palmitate in PBS. 14CO2 production from 372 epimastigotes incubated in PBS. The 14CO2 was captured after 120 min of incubation. The 373 experiments were performed in triplicates. Statistical analysis was performed with one-way ANOVA 374 followed by Tukey's post-test p < 0.05 using the GraphPad Prism 8.0.2 software program. We 375 represent the level of statistical significance in this figure as follows: *** p value < 0.001; ** p value 376 < 0.01; and * p value < 0.05. For p value > 0.05, we consider the differences not significant (ns). 377 378 To monitor the dynamics of use or accumulation of fatty acids in lipid droplets, we used as a 379 probe a fluorescent fatty acid analogue called BODIPY 500/510 C1-C12. BODIPY shifts its 380 fluorescence from red to green upon the uptake and catabolism of fatty acids, and from green to red 381 when fatty acids are accumulated in the lipid droplets. Parasites collected at the mid and late 382 exponential proliferation phases and the stationary phase were incubated with 1 μM BODIPY 383 500/510 C1-C12 for 16 h, before fluorescence determination by flow cytometry (Figs 5A, 5B and 5C). 384 The fluorescence values increased with the harvesting time (and therefore, with the glucose 385 depletion), indicating the increased uptake and use of fatty acids as substrates by a fatty acyl-CoA 386 synthetase. These data were confirmed by fluorescence microscopy (Fig 5D). Interestingly, parasites 387 in stationary phase showed an accumulation of activated fatty acids in spots along the cell. However, 388 the number of lipid droplets increased upon parasite proliferation (Figs 6A, 6B 6C). This observation .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 389 indicates that not only fatty acids metabolism is activated after glucose exhaustion, but also the 390 parasite storage of fatty acids into lipid droplets. 391 392 393 Figure 5. Flow cytometry reveals distinct patterns in fatty acid pools during epimastigote 394 growth. The epimastigotes were treated with 1 µM of BODIPY C1-C12 (500/510) and analysed by 395 flow cytometry and fluorescence microscopy. A) 0 h. B) 24 h. C) 48 h. In the flow cytometry 396 histograms, dashed peaks represent unstained parasites. Green-filled peaks represent stained 397 parasites. D) Mean fluorescence per cell. The fluorescence for each cell was calculated using ImageJ 398 software. All the experiments were performed in triplicates. Statistical analysis was performed with 399 one-way ANOVA followed by Tukey's post-test p < 0.05 using the GraphPad Prism 8.0.2 software 400 program. We represent the level of statistical significance in this figure as follows: *** p value < 401 0.001; ** p value < 0.01; and * p value < 0.05. For p value > 0.05, we consider the differences not 402 significant (ns). 403 404 405 Figure 6. Epimastigote forms accumulates fatty acids into lipid droplets during growth. The 406 epimastigotes were treated with 1 µM BODIPY C1-C12 (500/510) and analysed by flow cytometry 407 and fluorescence microscopy. A) 0 h. B) 24 h. C) 48 h. In the flow cytometry histograms, dashed 408 peaks represent unstained parasites. Yellow filled peaks represent positively stained parasites. The 409 number of green/yellow spots for each cell was calculated using ImageJ software. All the experiments 410 were performed in triplicates. 411 412 To find if the increase in fatty acid pools is accompanied by a change in the levels of enzymes 413 related to fatty acid metabolism, we evaluated the specific activities of the enzymes hexokinase (HK), 414 which is responsible for the initial step of glycolysis and an indicator of active glycolysis; acetyl-CoA 415 carboxylase (ACC), which produces malonyl-CoA for fatty acid synthesis and carnitine 416 palmitoyltransferase 1 (CPT1), the complex that plays a central role in fatty acid oxidation (FAO) by 417 controlling the entrance of long-chain fatty acids into the mitochondria [30]. For the control, we 418 selected the enzyme serine palmitoyltransferase 1 (SPT1), a constitutively expressed protein in T. 419 cruzi [31] (Fig 7). The hexokinase activity diminished up to 30% with the progression of the 420 proliferation curve and the correlated depletion of glucose (Fig 7A). In addition, the ACC activity is 421 no more detectable in the stationary phase cells (Fig 7B). By contrast, the CPT1 activity is increased 422 by ~4-fold when the stationary phase is reached (Fig 7C), which confirms that fatty acid degradation 423 occurs in the absence of glucose. It is noteworthy that the high levels of ACC activity in the presence 424 of glucose supports the idea that under these conditions, fatty acids are probably synthesized instead 425 of being catabolized. As expected, SPT1 did not change during the analysed time frame (Fig 7D). 426 427 Figure 6. Activities of enzymes related to lipid and glucose metabolism during T. cruzi growth 428 curves. A) (HK) Hexokinase B) (ACC) acetyl-CoA carboxylase, C) (CPT1) carnitine- 429 palmitoyltransferase, and D) (SPT) serine palmitoyltransferase. All these activities were measured in .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 430 crude extracts from epimastigote forms at different moments of the growth curve. All the experiments 431 were performed in triplicates. Time course activities and controls shown in Fig S2. Statistical analysis 432 was performed with one-way ANOVA followed by Tukey's post-test at p < 0.05, using the GraphPad 433 Prism 8.0.2 software program. We represent the level of statistical significance in this figure as 434 follows: *** p value < 0.001; ** p value < 0.01; and * p value < 0.05. For p value > 0.05 we consider 435 the differences not significant (ns). 436 437 438 Etomoxir, a CPT1 inhibitor, affects T. cruzi proliferation and mitochondrial activity 439 To investigate the role of FAO in T. cruzi, we tested the effect of a well characterized inhibitor 440 of CPT1, etomoxir (ETO), on the proliferation of epimastigotes. Among the ETO concentrations 441 tested here (from 0.1 to 500 µM), only the higher concentration arrested parasite proliferation (Fig 442 8A). Importantly, the ETO effect was manifested when the parasites reached the late exponential 443 phase (a cell density of approximately 5x107 mL-1). This result is consistent with our previous 444 findings showing that FAO (and thus CPT1 activity) acquires an important role at this point in the 445 proliferation curve. To confirm that CPT1 is in fact a target of ETO in T. cruzi, we assayed the drug's 446 effect on the enzyme activity in free cell extracts. Our results showed that 500 µM ETO diminished 447 the CPT1 activity by almost 80% (Fig 8B). To confirm the interference of ETO with the beta- 448 oxidation of fatty acids, parasites incubated in PBS containing 14C-U-palmitate were treated with 500 449 µM ETO to compare their production of 14CO2 with that of the untreated controls. Palmitate-derived 450 CO2 production diminished by 80% in ETO-treated cells compared to untreated parasites (Fig 8C). 451 In addition, ETO treatment did not affect the metabolism of 14C-U-glucose or 14C-U-histidine, ruling 452 out a possible unspecific reaction of this drug with CoA-SH as described by [32]. Other compounds 453 described as FAO inhibitors were also tested, but none of them inhibited epimastigote proliferation 454 or 14CO2 production from 14C-U-palmitate (S3 Fig). In addition, the BODIPY cytometric analysis of 455 cells treated with 500 µM ETO showed a strong decrease in the CoA acylation levels (activation of 456 fatty acids) with respect to the untreated controls (Fig 8D), as confirmed by fluorescence microscopy 457 (Fig 8D). To reinforce the validation of ETO for further experiments, a set of controls are offered in 458 S3 Fig. Our preliminary conclusion is that ETO inhibited beta-oxidation by inhibiting CPT1, 459 confirming that the breakdown of fatty acids is important to proliferation progression in the absence 460 of glucose. 461 462 Figure 8. ETO inhibits CPT1 and interferes with cell proliferation in epimastigote forms. (A) 463 Proliferation of epimastigote forms in the presence of 0.1 to 500 µM ETO. For the positive control 464 of dead cells, a combination of antimycin (0.5 µM) and rotenone (60 µM) was used. (B) Inhibition of 465 CPT1 activity in crude extracts using 250 and 500 µM of ETO. C) 14CO2 capture from 14C-U- 466 palmitate oxidation. D) Flow cytometry analysis and fluorescence microscopy of epimastigote forms 467 treated (or not) with ETO. In the histograms, dashed peaks represent unstained parasites and green- 468 filled peaks represent parasites stained with BODIPY C1-C12. All the experiments were performed in .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 469 triplicates. Statistical analysis was performed with one-way ANOVA followed by Tukey's post-test 470 at p < 0.05 using the GraphPad Prism 8.0.2 software program. We represent the level of statistical 471 significance in this figure as follows: *** p value < 0.001; ** p value < 0.01; and * p value < 0.05. 472 For p values > 0.05, we consider the differences not significant (ns). 473 474 Etomoxir treatment affects cell cycle progression 475 The metabolic interference of ETO diminished epimastigote proliferation; however, this 476 finding could be due to a decrease in the parasite proliferation rate or an increase in the death rate. 477 Therefore, we checked if this compound could induce cell death through programmed cell death 478 (PCD) or necrosis. PCD is characterized by biochemical and morphological events such as exposure 479 to phosphatidylserine, DNA fragmentation, decreases (or increases) in the ATP levels, and increases 480 in reactive oxygen species (ROS), among others [33]. The parasites were treated with 500 µM of 481 ETO for 5 days, followed by incubation with propidium iodide (PI) for cell membrane integrity 482 analysis and annexin-V FITC to evaluate the phosphatidylserine exposure. Parasites treated with ETO 483 showed negative results for necrosis or programmed cell death markers (Fig 9A), indicating that the 484 cell proliferation was arrested but cell viability was maintained. Because the multiplication rates 485 seemed to be diminished, we performed a cell cycle analysis. Noticeably, the treated parasites were 486 enriched in G1 (85.9%) with respect to non-treated cells (43.6%), suggesting that ETO prevented the 487 entry of epimastigotes into the S phase of the cell cycle (Fig 9B). Last, we noticed that after washing 488 out the ETO, the parasites recovered their proliferation at rates comparable to our untreated controls 489 (Figs 9C). 490 491 Figure 9. Analysis of extracellular phosphatidylserine exposure, membrane integrity and cell 492 cycle after ETO treatment. Parasites in the exponential growth phase were treated with 500 µM of 493 ETO for 5 days. (A) Following the incubation period, the parasites were labelled with propidium 494 iodide (PI) and annexin V-FITC (ANX) and analysed by flow cytometry. (B) The cell cycle was 495 assessed using PI staining. (C) Growth curves of epimastigote forms before and after removing the 496 treatment. All the experiments were performed in triplicates. Statistical analysis was performed with 497 one-way ANOVA followed by Tukey's post-test p < 0.05, using the GraphPad Prism 8.0.2 software 498 program. We represent the level of statistical significance in this figure as follows: *** p value < 499 0.001; ** p value < 0.01; and * p value < 0.05. For p values > 0.05, we consider the differences not 500 significant (ns). 501 502 Inhibition of FAO by ETO affects energy metabolism, impairing the consumption of 503 endogenous fatty acids 504 The evidence obtained to date suggests that parasites resist metabolic stress by mobilizing and 505 consuming stored fatty acids. Therefore, it is reasonable to hypothesize that ETO, which blocks the 506 mobilization of fatty acids into the mitochondria for oxidation, probably perturbs the ATP levels in .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 507 late-exponential or stationary phase cells. Parasites growing for 5 days under 500 µM ETO treatment 508 or no treatment were collected to evaluate the ability of parasites that were treated or not with ETO 509 to trigger oxygen consumption. The rates of O2 consumption corresponding to basal respiration were 510 measured in cells resuspended in MCR respiration buffer. We then measured the leak respiration by 511 inhibiting the ATP synthase with oligomycin A. Finally, to measure the maximum capacity of the 512 electron transport system (ETS), we used the uncoupler FCCP [21]. Our results demonstrate that 513 compared to no treatment, ETO treatment diminishes the rate of basal oxygen consumption, the leak 514 respiration and the ETS capacity. In general, respiratory rates diminished in parasites treated with 515 ETO when compared to the untreated ones. As expected, ETO treatment led to a 75% decrease in the 516 levels of total intracellular ATP compared to untreated parasites (Fig 10A). To complement this 517 result, because all these experiments were conducted in the complete absence of an oxidizable 518 external metabolite, our results show that the parasite is able to oxidize internal metabolites (Figs 10B 519 and 10C). Taking into account that treating parasites with ETO diminished the basal respiration rates 520 of these parasites by approximately one-half (Figs 10B and 10C), it is reasonable to conclude that a 521 relevant part of the respiration in the absence of external oxidisable metabolites is based on the 522 consumption of internal lipids. This is consistent with the confirmation that epimastigotes maintain 523 their viability in the presence of non-fatty acid carbon sources in the presence of ETO (S4 Fig). In 524 summary, these results confirm that ETO is interfering with ATP synthesis through oxidative 525 phosphorylation in epimastigote forms. 526 527 Figure 10. Effects of ETO on respiration and ATP production in epimastigote forms of T. cruzi. 528 (A) Oxygen consumption of epimastigote forms after normal growth in LIT medium. (B) Oxygen 529 consumption after ETO 500 μM treatment. Parasite growth in LIT medium with the compound until 530 the 5th day. In black, a time-course register of the concentration (pmols) of O2 in the respiration 531 chamber. In blue, negative of the concentration derivative (pmols) of O2 with respect to time (velocity 532 of O2 consumption in pmoles per second). The parasites were washed twice in PBS and kept in MRC 533 buffer at 28 °C during the assays (see Materials and Methods for more details). (C) The basal 534 respiration (initial oxygen flux values, MRC), respiration leak after the sequential addition of 0.5 535 µg/mL of oligomycin A (2 µg/mL), and electron transfer system (ETS) capacity after the sequential 536 addition of 0.5 µM FCCP (2 µM) were measured for each condition. (D) Intracellular levels of ATP 537 after treating with 500 µM ETO. The intracellular ATP content was assessed following incubation 538 with different energy substrates or not (PBS, negative control). The ATP concentration was 539 determined by luciferase assay and the data were adjusted by the number of cells. All the experiments 540 were performed in triplicates. Statistical analysis was performed with one-way ANOVA followed by 541 Tukey's post-test at p < 0.05 using GraphPad Prism 8.0.2 software. We represent the level of statistical 542 significance as follows: *** p value < 0.001; ** p value < 0.01; and * p value < 0.05. For p values > 543 0.05, we consider the differences not significant (ns). 544 545 Endogenous fatty acids contribute to long-term starvation resistance in epimastigote forms .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 546 As previously demonstrated, ETO interferes with the consumption of endogenous fatty acids, 547 and this impairment causes ATP depletion and cell cycle arrest. One intriguing characteristic of the 548 insect stages of T. cruzi is their resistance to starvation. To observe the importance of internal fatty 549 acids in this process, we incubated epimastigotes in PBS in the presence (or absence) of 500 μM ETO. 550 The mitochondrial activity of these cells was followed for 24 h with Alamar blue®. Our results 551 showed that the mitochondrial activity of the parasites in the presence of ETO was reduced by 31% 552 after 48 h of starvation, and 65% after 72 h of starvation (Fig. 11) compared to the controls (untreated 553 parasites). These data confirmed our hypothesis that the breakdown of accumulated fatty acids 554 partially contributes to the resistance of the parasite under severe starvation. 555 556 Figure 11. Internal fatty acid consumption contributes to parasite viability under severe 557 nutritional starvation. Viability of epimastigote forms after incubation in PBS with or without ETO. 558 The viability was assessed every 24 h using Alamar Blue®. Statistical analysis was performed with 559 one-way ANOVA followed by Tukey's post-test p < 0.05 using GraphPad Prism 8.0.2 software. We 560 represent the levels of statistical significance as follow: *** p value < 0.001, and for p values > 0.05, 561 we consider the differences not significant (ns). 562 563 Inhibition of CPT1 impairs metacyclogenesis 564 Considering that the FAO increases in the epimastigotes during the stationary phase, and that 565 differentiation into infective metacyclic trypomastigotes (metacyclogenesis) is triggered in the 566 stationary phase of epimastigote parasites, one might expect a possible relationship between the 567 consumption of fatty acids and metacyclogenesis. To approach this possibility, we initially compared 568 the CPT1 activity of stationary epimastigote forms before and after a 24 h incubation in the 569 differentiation medium TAU-3AAG. As observed, there is an increase in CPT1 activity after 570 submitting the parasites to the metacyclogenesis in vitro (Fig. 12A). Parasites were then submitted to 571 differentiation with TAU-3AAG medium in the presence of the probe BODIPY. The probe was 572 incorporated into lipid droplets, confirming that fatty acids metabolism was active during the 573 beginning of metacyclogenesis (Fig 12B). To address the importance of FAO during differentiation, 574 metacyclogenesis was induced in vitro on ETO-treated or untreated (control) parasites. ETO 575 treatment interfered with differentiation, diminishing the number of metacyclic forms present in the 576 culture (Fig 12C). In addition, this inhibition was dose-dependent, with an IC50 = + 32.96 µM (Fig 577 12D). Importantly, we ruled out that the variation found in the differentiation rates was due to a 578 selective death of treated epimastigotes, since their survival during this experiment in the presence or 579 absence of ETO (from 5 to 500 µM) was not significantly different (S5 Fig). Based on these data, we 580 could conclude that fatty acid oxidation, at the level of the CPT1, was also participating in the 581 regulation of metacyclogenesis. 582 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 583 Figure 12. ETO inhibits metacyclogenesis. A) CPT1 activity of epimastigote forms in stationary 584 phase and 24h after incubated in TAU-3AAG medium (for triggering metacyclogenesis). B) 585 Fluorescence microscopy of cells incubated in TAU-3AAG in the presence of BODIPY® 500-510 586 C1-C12. C) Effects of different ETO concentrations on metacyclogenesis. The differentiation was 587 evaluated by counting the cells in a Neubauer chamber each day for 6 days. This experiment was 588 performed in triplicate. D) Percentage of differentiation at the 5th day of differentiation. Inset: IC50 of 589 metacyclogenesis inhibition by ETO. The enzymatic activities were measured in duplicate. All the 590 other experiments were performed in triplicates. 591 592 593 Discussion 594 During the journey of T. cruzi inside the insect vector, the glucose levels decrease rapidly after 595 each blood meal [34], leaving the parasite exposed to an environment rich in amino and fatty acids in 596 the digestive tube of Rhodnius prolixus [35,36]. Because the digestive tract of triatomine insects 597 possesses a perimicrovillar membrane, which is composed primarily of lipids and is enriched by 598 glycoproteins [37], it has been speculated that its degradation could provide lipids for parasite 599 metabolism [38]. In this study, we showed that the insect stages of T. cruzi coordinate the activation 600 of fatty acid consumption with the metabolism of glucose. Our experiments corroborate early studies 601 about the relatively slow use of palmitate as an energy source by proliferating epimastigotes [39,40]. 602 In addition, our results shed light on the end product excretion by epimastigote forms during 603 incubation under starvation conditions, and during their recovery from starvation using glucose or 604 palmitate. First, we showed that non-starved and starved parasites recovered in the presence of 605 glucose, excreting succinate as their primary metabolic waste, as expected [41–43]. After 16 h of 606 nutritional starvation, the consumption of internal carbon sources produces acetate as the primary 607 end-product. In the presence of glucose after 16 h of starvation, we found that glucose-derived 608 carbons contribute to the excreted pools of acetate and lactate. Interestingly, palmitate metabolism 609 contributed to the increase in acetate production, followed by the production of alanine, pyruvate, 610 succinate and lactate. The unexpected production of alanine, pyruvate and lactate can be explained 611 by an increase in the TCA cycle activity, producing malate, which can be converted into pyruvate by 612 the decarboxylative reaction of the malic enzyme (ME) [44]. Pyruvate can be converted into alanine 613 through a transamination reaction by an alanine- [45], a tyrosine- [46] an aspartate aminotransferase 614 [47], or a reductive amination by an alanine dehydrogenase [48]. The excretion of lactate could be a 615 consequence of lactate dehydrogenase activity. However, it should be noted that this enzymatic 616 activity has not been observed to date. In relation to the succinate production, a relevant factor 617 favouring this process is the production of NADH by the third step of the beta-oxidation (3- 618 hydroxyacyl-CoA dehydrogenase). This NADH can be oxidized through the activity of NADH- .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 619 dependent mitochondrial fumarate reductase [49], which concomitantly converts NADH into NAD+ 620 and fumarate into succinate. This succinate can be excreted or re-used by the TCA cycle, and the 621 resulting NAD+ can be used as a cofactor for other enzymes. 622 As previously mentioned, it is well known that during the initial phase of proliferation, 623 epimastigotes preferentially consume glucose, and during the stationary phase, a metabolic switch 624 occurs towards the consumption of amino acids [8,10,42]. Our results show that this switch 625 constitutes a broader and more systemic metabolic reprogramming, which also includes FAO. We 626 detected this switch through changes in the enzymatic activities of key enzymes responsible for the 627 regulation of FAO, such as CPT1 and ACC, which have increased and decreased activities, 628 respectively, in the presence of glucose. Our findings showed that the inhibition of CPT1 affects the 629 late phase of proliferation of epimastigotes when the switch to FAO has already occurred. 630 An interesting question about T. cruzi epimastigotes is how they survive long periods of 631 starvation. Early data showed high respiration levels in epimastigotes incubated in the absence of 632 external oxidisable carbon sources. This oxygen consumption was attributed to the breakdown of 633 TAGs into free fatty acids and their further oxidation [50]. Here, we confirmed this finding by 634 inhibiting the internal fatty acid consumption, which in turn diminished the oxidative phosphorylation 635 activity, internal ATP levels and the total reductive activity of parasites under severe nutritional stress. 636 Even more notably, we showed that under these conditions, the lipids stored in lipid droplets [51,52] 637 are consumed. Unlike what has been observed in procyclic forms of T. brucei, in which the function 638 of lipid droplets is not clear [53], our results show that in T. cruzi, they are committed to epimastigote 639 survival under extreme metabolic stress. Of course, the contribution of other metabolic sources and 640 processes such as autophagy in coping with nutritional stress cannot be ruled out [54]. 641 Multiple metabolic factors has been involved in metacyclogenesis, such as the proline, aspartate, 642 glutamate [55], glutamine [17] and lipids present in the triatomine digestive tract [56]. Interestingly, 643 the occurrence of metacyclic trypomastigotes in culture leads to an increase in CO2 production from 644 labelled palmitate [39]. The ETO treatment inhibited metacyclogenesis in vitro, showing that the 645 consumption of internal fatty acids is important for cell differentiation. Consequently, we propose 646 that lipids are not only external signals of metacyclogenesis, as previously suggested [56], but they 647 also have a central role in the bioenergetics of metacyclogenesis. As in the oxidation of several amino 648 acids, the acetyl-CoA produced from beta-oxidation and probably the reduced cofactors resulting 649 from these processes are contributing to the mitochondrial ATP production necessary to support this 650 differentiation step. 651 In conclusion, fatty acids are important carbon sources for T. cruzi epimastigotes in the 652 absence of glucose. Palmitate can be taken up by the cells and fuel the TCA cycle by producing 653 acetyl-CoA, the oxidation of which generates CO2. However, in the absence of external carbon .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 654 sources, lipid droplets become the primary sources of fatty acids, helping the organism to survive 655 nutritional stress. Importantly, FAO supports endogenous respiration rates and ATP production and 656 powers metacyclogenesis. 657 658 659 Acknowledgements 660 We thank the Core Facility for Scientific Research at the University of Sao Paulo (CEFAP- 661 USP/FLUIR) for the flow cytometry analysis and Dr. Mauro Javier Veliz Cortez (Department of 662 Parasitology, ICB-USP) for the microscopy work. We thank the Core Facility for Scientific Research 663 at the University of Sao Paulo (CEFAP-USP/FLUIR) for the flow cytometry analysis and Dr. Mauro 664 Javier Veliz Cortez (Department of Parasitology, ICB-USP) for the microscopy support. 665 666 References 667 [1] WHO | Chagas disease (American trypanosomiasis), WHO. (2018). 668 https://www.who.int/chagas/en/ (accessed January 29, 2019). 669 [2] J.A. Perez-Molina, I. Molina, Chagas disease, Lancet. 391 (2018) 82–94. 670 https://doi.org/10.1016/S0140-6736(17)31612-4. 671 [3] R. de F.P. Melo, A.A. Guarneri, A.M. Silber, The influence of environmental cues on the 672 development of Trypanosoma cruzi in triatominae vector, Front. Cell. Infect. Microbiol. 10 673 (2020) 27. https://doi.org/10.3389/fcimb.2020.00027. 674 [4] W. De Souza, Basic cell biology of Trypanosoma cruzi. Curr. Pharm. Des. 8 (2002) 269–85. 675 http://www.ncbi.nlm.nih.gov/pubmed/11860366. 676 [5] P. Lisvane Silva, B.S. Mantilla, M.J. Barison, C. Wrenger, A.M. Silber, The uniqueness of 677 the Trypanosoma cruzi mitochondrion: opportunities to identify new drug target for the 678 treatment of Chagas disease, Curr Pharm Des. 17 (2011) 2074–2099. 679 https://www.ncbi.nlm.nih.gov/pubmed/21718252. 680 [6] C. Bern, Chagas’ Disease, N Engl J Med. 373 (2015) 1882. 681 https://doi.org/10.1056/NEJMc1510996 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 682 [7] Y. Li, S. Shah-Simpson, K. Okrah, A.T. Belew, J. Choi, K.L. Caradonna, P. Padmanabhan, 683 D.M. Ndegwa, M.R. Temanni, H. Corrada Bravo, N.M. El-Sayed, B.A. Burleigh, 684 Transcriptome remodeling in Trypanosoma cruzi and human cells during intracellular 685 infection, PLOS Pathog. 12 (2016) e1005511. https://doi.org/10.1371/journal.ppat.1005511. 686 [8] L. Marchese, J. Nascimento, F. Damasceno, F. Bringaud, P. Michels, A. Silber, The uptake 687 and metabolism of amino acids, and their unique role in the biology of pathogenic 688 trypanosomatids, Pathogens. 7 (2018) 36. https://doi.org/10.3390/pathogens7020036. 689 [9] J.J. Cazzulo, Energy metabolism in Trypanosoma cruzi, Subcell Biochem. 18 (1992) 235– 690 257. https://www.ncbi.nlm.nih.gov/pubmed/1485353. 691 [10] M.J. Barison, L.N. Rapado, E.F. Merino, E.M. Furusho Pral, B.S. Mantilla, L. Marchese, C. 692 Nowicki, A.M. Silber, M.B. Cassera, Metabolomic profiling reveals a finely tuned, 693 starvation-induced metabolic switch in Trypanosoma cruzi epimastigotes, J Biol Chem. 292 694 (2017) 8964–8977. https://doi.org/10.1074/jbc.M117.778522. 695 [11] R. Zeledon, Comparative physiological studies on four species of hemoflagellates in culture. 696 II. Effect of carbohydrates and related substances and some amino compounds on the 697 respiration, J. Parasitol. 46 (1960) 541. https://doi.org/10.2307/3274935. 698 [12] D. Sylvester, S.M. Krassner, Proline metabolism in Trypanosoma cruzi epimastigotes, Comp 699 Biochem Physiol B. 55 (1976) 443–447. https://www.ncbi.nlm.nih.gov/pubmed/789007. 700 [13] L.S. Paes, B. Suarez Mantilla, F.M. Zimbres, E.M. Pral, P. Diogo de Melo, E.B. Tahara, A.J. 701 Kowaltowski, M.C. Elias, A.M. Silber, Proline dehydrogenase regulates redox state and 702 respiratory metabolism in Trypanosoma cruzi, PLoS One. 8 (2013) e69419. 703 https://doi.org/10.1371/journal.pone.0069419. 704 [14] B.S. Mantilla, L.S. Paes, E.M.F. Pral, D.E. Martil, O.H. Thiemann, P. Fernández-Silva, E.L. 705 Bastos, A.M. Silber, Role of Δ1-pyrroline-5-carboxylate dehydrogenase supports 706 mitochondrial metabolism and host-cell invasion of Trypanosoma cruzi., J. Biol. Chem. 290 707 (2015). https://doi.org/10.1074/jbc.M114.574525. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 708 [15] M.J. Barisón, F.S. Damasceno, B.S. Mantilla, A.M. Silber, The active transport of histidine 709 and its role in ATP production in Trypanosoma cruzi., J. Bioenerg. Biomembr. 48 (2016) 710 437–49. https://doi.org/10.1007/s10863-016-9665-9. 711 [16] R.M.B.M. Girard, M. Crispim, M.B. Alencar, A.M. Silber, Uptake of L-alanine and its 712 distinct roles in the bioenergetics of Trypanosoma cruzi, MSphere. 3 (2018). 713 https://doi.org/10.1128/mSphereDirect.00338-18. 714 [17] F.S. Damasceno, M.J. Barisón, M. Crispim, R.O.O. Souza, L. Marchese, A.M. Silber, L- 715 Glutamine uptake is developmentally regulated and is involved in metacyclogenesis in 716 Trypanosoma cruzi, Mol. Biochem. Parasitol. 224 (2018). 717 https://doi.org/10.1016/j.molbiopara.2018.07.007. 718 [18] E.P. Camargo, Growth and differentiation in Trypanosoma cruzi. I. Origin of metacyclic 719 trypanosomes in liquid media, Rev. Inst. Med. Trop. Sao Paulo. 6 (1964) 93–100. 720 http://www.ncbi.nlm.nih.gov/pubmed/14177814 (accessed January 29, 2019). 721 [19] F.S. Damasceno, M.J. Barison, M. Crispim, R.O.O. Souza, L. Marchese, A.M. Silber, L- 722 Glutamine uptake is developmentally regulated and is involved in metacyclogenesis in 723 Trypanosoma cruzi, Mol Biochem Parasitol. 224 (2018) 17–25. 724 https://doi.org/10.1016/j.molbiopara.2018.07.007. 725 [20] F.K. Huynh, M.F. Green, T.R. Koves, M.D. Hirschey, Measurement of fatty acid oxidation 726 rates in animal tissues and cell lines. Methods Enzymol. 542 (2014) 391–405. 727 https://doi.org/10.1016/B978-0-12-416618-9.00020-0. 728 [21] M.B. Alencar, R.B.M.M. Girard, A.M. Silber, Measurement of energy states of the 729 trypanosomatid mitochondrion. Methods Mol. Biol. 2116 (2020) 655–671. 730 https://doi.org/10.1007/978-1-0716-0294-2_39. 731 [22] M.M. Bradford, A rapid and sensitive method for the quantitation of microgram quantities of 732 protein utilizing the principle of protein-dye binding. Anal. Biochem. 72 (1976) 248–54. 733 https://doi.org/10.1006/abio.1976.9999. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 734 [23] P.L.L. Bieber, T. Abraham, T. Helmrath, Y. Kim, R. Dehlin, A Rapid Spectrophotometric 735 assay for carnitine palmitoyltransferase, 1972. https://ac.els-cdn.com/0003269772900619/1- 736 s2.0-0003269772900619-main.pdf?_tid=d5130378-0890-4a27-8e0e- 737 63a5c3921132&acdnat=1549722154_d3bdaa5775a2a24b64ccf923a6b8ba4d (accessed 738 February 9, 2019). 739 [24] L.B. Willis, W. Saridah, W. Omar, ; Ravigadevi Sambanthamurthi, A.J. Sinskey, Non- 740 radioactive assay for Acetyl-CoA carboxylase activity, 2008. 741 http://palmoilis.mpob.gov.my/publications/jopr2008sp2-laura.pdf (accessed February 9, 742 2019). 743 [25] G.E. Racagni, E.E. Machado de Domenech, Characterization of Trypanosoma cruzi 744 hexokinase, Mol. Biochem. Parasitol. 9 (1983) 181–188. https://doi.org/10.1016/0166- 745 6851(83)90108-1. 746 [26] M.F. Rütti, S. Richard, A. Penno, A. von Eckardstein, T. Hornemann, An improved method 747 to determine serine palmitoyltransferase activity, J. Lipid Res. 50 (2009) 1237–44. 748 https://doi.org/10.1194/jlr.D900001-JLR200. 749 [27] A. Magdaleno, I.Y. Ahn, L.S. Paes, A.M. Silber, Actions of a proline analogue, L- 750 thiazolidine-4-carboxylic acid (T4C), on Trypanosoma cruzi, PLoS One. 4 (2009) e4534. 751 https://doi.org/10.1371/journal.pone.0004534. 752 [28] F.S. Damasceno, M.J. Barison, E.M. Pral, L.S. Paes, A.M. Silber, Memantine, an antagonist 753 of the NMDA glutamate receptor, affects cell proliferation, differentiation and the 754 intracellular cycle and induces apoptosis in Trypanosoma cruzi, PLoS Negl Trop Dis. 8 755 (2014) e2717. https://doi.org/10.1371/journal.pntd.0002717. 756 [29] K. Figarella, M. Rawer, N.L. Uzcategui, B.K. Kubata, K. Lauber, F. Madeo, S. Wesselborg, 757 M. Duszenko, Prostaglandin D2 induces programmed cell death in Trypanosoma brucei 758 bloodstream form, Cell Death Differ. 12 (2005) 335–346. 759 https://doi.org/10.1038/sj.cdd.4401564. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 760 [30] G.D. Lopaschuk, S.R. Wall, P.M. Olley, N.J. Davies, Etomoxir, a carnitine 761 palmitoyltransferase I inhibitor, protects hearts from fatty acid-induced ischemic injury 762 independent of changes in long chain acylcarnitine. Circ. Res. 63 (1988) 1036–1043. 763 https://doi.org/10.1161/01.RES.63.8.0.2036. 764 [31] C.M. Koeller, N. Heise, The sphingolipid biosynthetic pathway is a potential target for 765 chemotherapy against Chagas disease, Enzyme Res. 2011 (2011) 1–13. 766 https://doi.org/10.4061/2011/648159. 767 [32] A.S. Divakaruni, W.Y. Hsieh, L. Minarrieta, T.N. Duong, K.K.O. Kim, B.R. Desousa, A.Y. 768 Andreyev, C.E. Bowman, K. Caradonna, B.P. Dranka, D.A. Ferrick, M. Liesa, L. Stiles, 769 G.W. Rogers, D. Braas, T.P. Ciaraldi, M.J. Wolfgang, T. Sparwasser, L. Berod, S.J. 770 Bensinger, A.N. Murphy, Etomoxir inhibits macrophage polarization by disrupting CoA 771 homeostasis., Cell Metab. 28 (2018) 490-503.e7. https://doi.org/10.1016/j.cmet.2018.06.001. 772 [33] M. Duszenko, K. Figarella, E.T. Macleod, S.C. Welburn, Death of a trypanosome: a selfish 773 altruism, Trends Parasitol. 22 (2006) 536–542. https://doi.org/10.1016/j.pt.2006.08.010. 774 [34] A.C. Mariano, R. Santos, M.S. Gonzalez, D. Feder, E.A. Machado, B. Pascarelli, K.C. 775 Gondim, J.R. Meyer-Fernandes, Synthesis and mobilization of glycogen and trehalose in 776 adult male Rhodnius prolixus, Arch. Insect Biochem. Physiol. 72 (2009) 1–15. 777 https://doi.org/10.1002/arch.20319. 778 [35] J.M.C. Ribeiro, F.A. Genta, M.H.F. Sorgine, R. Logullo, R.D. Mesquita, G.O. Paiva-Silva, 779 D. Majerowicz, M. Medeiros, L. Koerich, W.R. Terra, C. Ferreira, A.C. Pimentel, P.M. 780 Bisch, D.C. Leite, M.M.P. Diniz, J.L. da S.G. V. Junior, M.L. Da Silva, R.N. Araujo, A.C.P. 781 Gandara, S. Brosson, D. Salmon, S. Bousbata, N. González-Caballero, A.M. Silber, M. 782 Alves-Bezerra, K.C. Gondim, M.A.C. Silva-Neto, G.C. Atella, H. Araujo, F.A. Dias, C. 783 Polycarpo, R.J. Vionette-Amaral, P. Fampa, A.C.A. Melo, A.S. Tanaka, C. Balczun, J.H.M. 784 Oliveira, R.L.S. Gonçalves, C. Lazoski, R. Rivera-Pomar, L. Diambra, G.A. Schaub, E.S. 785 Garcia, P. Azambuja, G.R.C. Braz, P.L. Oliveira, An insight into the transcriptome of the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 786 digestive tract of the bloodsucking bug, Rhodnius prolixus, PLoS Negl. Trop. Dis. 8 (2014) 787 e2594. https://doi.org/10.1371/journal.pntd.0002594. 788 [36] L. Antunes, J. Han, J. Pan, C.J.C. Moreira, P. Azambuja, Metabolic signatures of triatomine 789 vectors of Trypanosoma cruzi unveiled by metabolomics, PLoS One. 8 (2013) 77283. 790 https://doi.org/10.1371/journal.pone.0077283. 791 [37] K.C. Gondim, G.C. Atella, E.G. Pontes, D. Majerowicz, Lipid metabolism in insect disease 792 vectors, Insect Biochem. Mol. Biol. 101 (2018) 108–123. 793 https://doi.org/10.1016/j.ibmb.2018.08.005. 794 [38] P.R. Bittencourt-Cunha, L. Silva-Cardoso, G.A. de Oliveira, J.R. da Silva, A.B. da Silveira, 795 G.E.G. Kluck, M. Souza-Lima, K.C. Gondim, M. Dansa-Petretsky, C.P. Silva, H. Masuda, 796 M.A.C. da Silva Neto, G.C. Atella, P.R. Bittencourt-Cunha, L. Silva-Cardoso, G.A. de 797 Oliveira, J.R. da Silva, A.B. da Silveira, G.E.G. Kluck, M. Souza-Lima, K.C. Gondim, M. 798 Dansa-Petretsky, C.P. Silva, H. Masuda, M.A.C. da Silva Neto, G.C. Atella, Perimicrovillar 799 membrane assembly: the fate of phospholipids synthesised by the midgut of Rhodnius 800 prolixus, Mem. Inst. Oswaldo Cruz. 108 (2013) 494–500. https://doi.org/10.1590/S0074- 801 0276108042013016. 802 [39] D.E. Wood, E.L. Schiller, Trypanosoma cruzi: comparative fatty acid metabolism of the 803 epimastigotes and trypomastigotes in vitro. Exp. Parasitol. 38 (1975) 202–7. 804 http://www.ncbi.nlm.nih.gov/pubmed/1100424. 805 [40] D.E. Wood, Trypanosoma cruzi: fatty acid metabolism in vitro, Exp. Parasitol. 37 (1975) 60– 806 6. http://www.ncbi.nlm.nih.gov/pubmed/1090440. 807 [41] J.J. Cazzulo, Aerobic fermentation of glucose by trypanosomatids, FASEB J. 6 (1992) 3153– 808 61. https://doi.org/10.1096/FASEBJ.8.0.23.1397837. 809 [42] J.J. Cazzulo, Intermediate metabolism in Trypanosoma cruzi, J. Bioenerg. Biomembr. 26 810 (1994) 157–65. http://www.ncbi.nlm.nih.gov/pubmed/8056782 (accessed June 6, 2019). 811 [43] B. Frydman, C. Santos, J.J.B. Cannata, J.J. Cazzulo, Carbon-13 nuclear magnetic resonance .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 812 analysis of [1-13C]-glucose metabolism in Trypanosoma cruzi. Evidence of the presence of 813 two alanine pools and of two CO2 fixation reactions, Eur. J. Biochem. 192 (1990) 363–368. 814 https://doi.org/10.1111/j.1432-1033.1990.tb19235.x. 815 [44] A.E. Leroux, D.A. Maugeri, F.R. Opperdoes, J.J. Cazzulo, C. Nowicki, Comparative studies 816 on the biochemical properties of the malic enzymes from Trypanosoma cruzi and 817 Trypanosoma brucei, FEMS Microbiol. Lett. 314 (2011) 25–33. 818 https://doi.org/10.1111/j.1574-6968.2010.02142.x. 819 [45] C. Zelada, M. Montemartini, J.J. Cazzulo, C. Nowicki, Purification and partial structural and 820 kinetic characterization of an alanine aminotransferase from epimastigotes of Trypanosoma 821 cruzi, Mol. Biochem. Parasitol. 79 (1996) 225–228. https://doi.org/10.1016/0166- 822 6851(96)02652-7. 823 [46] M. Montemartini, J. Buá, E. Bontempi, C. Zelada, A.M. Ruiz, J. Santomé, J. José Cazzulo, 824 C. Nowicki, A recombinant tyrosine aminotransferase from Trypanosoma cruzi has both 825 tyrosine aminotransferase and alanine aminotransferase activities, FEMS Microbiol. Lett. 826 133 (1995) 17–20. https://doi.org/10.1111/j.1574-6968.1995.tb07854.x. 827 [47] D. Marciano, C. Llorente, D.A. Maugeri, C. de la Fuente, F. Opperdoes, J.J. Cazzulo, C. 828 Nowicki, Biochemical characterization of stage-specific isoforms of aspartate 829 aminotransferases from Trypanosoma cruzi and Trypanosoma brucei, Mol. Biochem. 830 Parasitol. 161 (2008) 12–20. https://doi.org/10.1016/j.molbiopara.2008.05.005. 831 [48] J.J. Cazzulo, S. Arauzo, B.M. Franke de Cazzulo, J.J.B. Cannata, On the production of 832 glycerol and l-alanine during the aerobic fermentation of glucose by trypanosomatids, FEMS 833 Microbiol. Lett. 51 (1988) 187–191. https://doi.org/10.1111/j.1574-6968.1988.tb02995.x. 834 [49] A. Boveris, C.M. Hertig, J.F. Turrens, Fumarate reductase and other mitochondrial activities 835 in Trypanosoma cruzi, Mol. Biochem. Parasitol. 19 (1986) 163–169. 836 https://doi.org/10.1016/0166-6851(86)90121-0. 837 [50] G.W. Rogerson, W.E. Gutteridge, Catabolic metabolism in Trypanosoma cruzi, Int. J. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 838 Parasitol. 10 (1980) 131–135. https://doi.org/10.1016/0020-7519(80)90024-7. 839 [51] M.G. Pereira, G. Visbal, T.F.R. Costa, S. Frases, W. de Souza, G. Atella, N. Cunha-e-Silva, 840 Trypanosoma cruzi epimastigotes store cholesteryl esters in lipid droplets after cholesterol 841 endocytosis, Mol. Biochem. Parasitol. 224 (2018) 6–16. 842 https://doi.org/10.1016/J.MOLBIOPARA.2018.07.004. 843 [52] M.G. Pereira, G. Visbal, L.T. Salgado, J.C. Vidal, J.L.P. Godinho, N.N.T. De Cicco, G.C. 844 Atella, W. de Souza, N. Cunha-e-Silva, Trypanosoma cruzi epimastigotes are able to manage 845 internal cholesterol levels under nutritional lipid stress conditions, PLoS One. 10 (2015) 846 e0128949. https://doi.org/10.1371/journal.pone.0128949. 847 [53] S. Allmann, M. Mazet, N. Ziebart, G. Bouyssou, L. Fouillen, J.-W. Dupuy, M. Bonneu, P. 848 Moreau, F. Bringaud, M. Boshart, Triacylglycerol storage in lipid droplets in procyclic 849 Trypanosoma brucei, PLoS One. 9 (2014) e114628. 850 https://doi.org/10.1371/journal.pone.0114628. 851 [54] V.E. Alvarez, G. Kosec, C. Sant’Anna, V. Turk, J.J. Cazzulo, B. Turk, Autophagy is 852 involved in nutritional stress response and differentiation in Trypanosoma cruzi, J. Biol. 853 Chem. 283 (2008) 3454–3464. https://doi.org/10.1074/jbc.M708474200. 854 [55] V.T. Contreras, J.M. Salles, N. Thomas, C.M. Morel, S. Goldenberg, In vitro differentiation 855 of Trypanosoma cruzi under chemically defined conditions, Mol Biochem Parasitol. 16 856 (1985) 315–327. https://www.ncbi.nlm.nih.gov/pubmed/3903496. 857 [56] M.J. Wainszelbaum, M.L. Belaunzarán, E.M. Lammel, M. Florin-Christensen, J. Florin- 858 Christensen, E.L.D. Isola, Free fatty acids induce cell differentiation to infective forms in 859 Trypanosoma cruzi, Biochem. J. 375 (2003) 705–12. https://doi.org/10.1042/BJ20021907. 860 [57] C.C.P. Aires, L. IJlst, F. Stet, C. Prip-Buus, I.T. de Almeida, M. Duran, R.J.A. Wanders, 861 M.F.B. Silva, Inhibition of hepatic carnitine palmitoyl-transferase I (CPT IA) by valproyl- 862 CoA as a possible mechanism of valproate-induced steatosis, Biochem. Pharmacol. 79 (2010) 863 792–799. https://doi.org/10.1016/j.bcp.2009.10.011. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 864 [58] P.F. Kantor, A. Lucien, R. Kozak, G.D. Lopaschuk, The antianginal drug trimetazidine shifts 865 cardiac energy metabolism from fatty acid oxidation to glucose oxidation by inhibiting 866 mitochondrial long-chain 3-ketoacyl coenzyme A thiolase, Circ. Res. 86 (2000) 580–588. 867 https://doi.org/10.1161/01.RES.86.5.580. 868 [59] C.D.L. Folmes, A.S. Clanachan, G.D. Lopaschuk, Fatty acid oxidation inhibitors in the 869 management of chronic complications of atherosclerosis, Curr. Atheroscler. Rep. 7 (2005) 870 63–70. https://doi.org/10.1007/s11883-005-0077-2. 871 [60] W.C. Stanley, S.R. Meadows, K.M. Kivilo, B.A. Roth, G.D. Lopaschuk, β-Hydroxybutyrate 872 inhibits myocardial fatty acid oxidation in vivo independent of changes in malonyl-CoA 873 content, Am. J. Physiol. Circ. Physiol. 285 (2003) H1626–H1631. 874 https://doi.org/10.1152/ajpheart.00332.2003. 875 [61] R.S. O’Connor, L. Guo, S. Ghassemi, N.W. Snyder, A.J. Worth, L. Weng, Y. Kam, B. 876 Philipson, S. Trefely, S. Nunez-Cruz, I.A. Blair, C.H. June, M.C. Milone, The CPT1a 877 inhibitor, etomoxir induces severe oxidative stress at commonly used concentrations, Sci. 878 Rep. 8 (2018) 6289. https://doi.org/10.1038/s41598-018-24676-6. 879 [62] C.-H. Yao, G.-Y. Liu, R. Wang, S.H. Moon, R.W. Gross, G.J. Patti, Identifying off-target 880 effects of etomoxir reveals that carnitine palmitoyltransferase I is essential for cancer cell 881 proliferation independent of β-oxidation, PLOS Biol. 16 (2018) e2003782. 882 https://doi.org/10.1371/journal.pbio.2003782. 883 [63] S.N. Rampersad, Multiple applications of Alamar Blue as an indicator of metabolic function 884 and cellular health in cell viability bioassays, Sensors (Basel). 12 (2012) 12347–60. 885 https://doi.org/10.3390/s120912347. 886 [64] J.J. Homsy, B. Granger, S.M. Krassner, Some factors inducing formation of metacyclic 887 stages of Trypanosoma cruzi, J. Protozool. 36 (1989) 150–3. 888 http://www.ncbi.nlm.nih.gov/pubmed/2657033 (accessed January 29, 2019). 889 890 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 891 Supporting information 892 S1 893 S1 Fig. 1H-NMR analysis of excreted end products from glucose and threonine metabolism. The 894 metabolic end products (succinate, acetate, alanine and lactate) excreted by the epimastigote cells that 895 were incubated after 6 h in PBS (A), PBS after 16 h of starvation without (B) or with D-[U-13C]- 896 glucose (C) or palmitate (D) were determined by 1H-NMR. Each spectrum corresponds to one 897 representative experiment from a set of at least 3. A part of each spectrum ranging from 0.5 ppm to 898 4 ppm is shown. The resonances were assigned as indicated: A12, acetate; A13, 13C-enriched acetate; 899 Al12, alanine; Al13, 13C-enriched alanine; G13, 13C-enriched glucose; L12, lactate; L13, 13C-enriched 900 lactate; P12, palmitate; S12, succinate; and S13, 13C-enriched succinate. 901 902 S2 903 904 S2 Fig. Time course activities of enzymes measured in this work. A) (ACC) acetyl-CoA 905 carboxylase, B) (CPT1) carnitine-palmitoyltransferase, and C) (SPT) serine palmitoyltransferase. All 906 the activities were measured in cell-free extracts of epimastigote forms at different moments of the 907 growth curve as indicated in the main text. All the measurements were performed in triplicates. 908 S3 909 910 To check if other well-known FAO inhibitors have the same effect on the proliferation of T. cruzi 911 epimastigotes, we performed the same assay as described in Materials and Methods by evaluating 912 different concentrations of valproic acid (AV) [57], trimetazidine [58,59] and β-hydroxybutyrate [60], 913 which are inhibitors of 3-ketothiolase. Because they did not affect the proliferation of the epimastigote 914 forms, we used the higher concentration evaluated in these assays to know if the compounds inhibit 915 FAO by 14CO2 trapping by using U-14C-palmitate as a substrate. As observed, none of these 916 compound inhibited the 14CO2 production from palmitate, confirming that they are not inhibiting FAO 917 in T. cruzi. 918 919 S3 Fig. Other FAO inhibitors did not affect cell proliferation and FAO in the epimastigote 920 forms. The compounds were evaluated at concentrations between 0.1 and 1000 µM. For positive 921 controls of dead cells, a combination of antimycin (0.5 µM) and rotenone (60 µM) were used. The 922 maximum concentration tested for these compounds does not diminish CO2 liberation from FAO. A) 923 Valproic Acid (AV). B) Trimetazidine (TMZ). C) β-hydroxybutyrate (βHOB). 924 925 S4 926 927 In this study, we showed that the epimastigote forms of T. cruzi present low sensitivity in response 928 to ETO treatment. Recently, some groups described off-target effects when ETO is used at 929 concentrations of up to 200 μM [61,62]. To validate ETO as an FAO inhibitor of T. cruzi, the parasites 930 were incubated for 24 h in PBS (negative control), 0.1 mM palmitate supplemented with BSA, 5.0 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 931 mM histidine, 5 mM glucose, 0.1 mM carnitine and BSA without adding palmitate in the presence 932 (or not) of 500 μM ETO. The viability of these cells was inferred from the measured total reductive 933 activity using MTT assays (see Material and Methods section for more details). As expected, ETO 934 treatment did not affect the viability of cells incubated in glucose or histidine but did affect the 935 viability of the cells incubated with palmitate or carnitine. Surprisingly, we also observed an ETO 936 effect on parasites under metabolic stress, such as those incubated with PBS or BSA. This finding 937 could be explained by the fact that under metabolic stress, the parasite mobilizes and consumes its 938 internal lipids. 939 940 941 S4 Fig. ETO did not affect the viability of epimastigote forms in the presence of other carbon 942 sources. The viability of epimastigote forms after incubation with different carbon sources and 943 palmitate. The viability was assessed after 24 h using MTT. 944 945 S5 946 947 Because metacyclogenesis occurs in chemically defined conditions, we performed a viability assay 948 to define the maximum tolerated concentration that allows the parasites to survive under ETO 949 treatment. Stationary epimastigotes in TAU-3AAG media were treated with different concentrations 950 of ETO (range 5 to 500 μM) during 24 h. The viability of these cells was inferred by measuring the 951 total reductive activity using an Alamar blue assay [63]. Briefly, after 24 h in the presence or absence 952 of ETO, the cells were incubated with 0,125 μg.mL-1 of Alamar blue reagent in accordance with the 953 protocol by [17]. Under these conditions, the parasites were 10 times more sensitive to ETO 954 treatment, surviving when subjected to ETO concentrations between 5-50 µM (Fig. S3 A). This range 955 of concentrations used to treat the parasites was maintained in TAU-3AAG medium and to follow 956 the differentiation by daily counts, based on the percentage of metacyclic trypomastigotes collected 957 in culture supernatant. To confirm that the parasites were still alive after 5 days under differentiation, 958 we checked the viability of cells that were treated (or not, control) using the same assay. As shown 959 above (Figure S3 B), the parasites were viable under all the tested conditions. Considering that TAU- 960 3AAG contains glucose in its composition, we performed an in vitro metacyclogenesis using only 961 proline as a metabolic inducer [64]. As observed, even in the absence of glucose, ETO treatment 962 affects metacyclogenesis. 963 964 965 966 Fig S5. Viability of epimastigote forms subjected to metacyclogenesis under different ETO 967 concentrations. A) Cell viability under metacyclogenesis after 24 h of treatment with different ETO 968 concentrations. B) Cell viability under metacyclogenesis after 5 days in the presence of ETO. C) 969 Effect of ETO on the metacyclogenesis induced by proline. 970 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425864doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425864 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_08_423993 ---- Structural and mechanistic insights into the Artemis endonuclease and strategies for its inhibition Structural and mechanistic insights into the Artemis endonuclease and strategies for its inhibition Yuliana Yosaatmadja1ⱡ, Hannah T Baddock2ⱡ, Joseph A Newman1, Marcin Bielinski3, Angeline E Gavard1, Shubhashish M M Mukhopadhyay1, Adam A Dannerfjord1, Christopher J Schofield3, Peter J McHugh2*, Opher Gileadi1*. 1Centre for Medicines Discovery, University of Oxford, ORCRB, Roosevelt Drive, Oxford, OX3 7DQ, United Kingdom; 2Department of Oncology, MRC-Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom; 3Chemistry Research Laboratory, University of Oxford, Mansfield Road, Oxford, OX1 3TA, United Kingdom. * To whom correspondence should be addressed. email: opher.gileadi@cmd.ox.ac.uk Correspondence may also be addressed to peter.mchugh@imm.ox.ac.uk. ⱡ These authors contributed equally ABSTRACT Artemis (DCLRE1C) is an endonuclease that plays a key role in development of B- and T- lymphocytes and in DNA double-strand break repair by non-homologous end-joining (NHEJ). Artemis is phosphorylated by DNA-PKcs and acts to open DNA hairpin intermediates generated during V(D)J and class-switch recombination. Consistently, Artemis deficiency leads to radiosensitive congenital severe immune deficiency (RS-SCID). Artemis belongs to a structural superfamily of nucleases that contain conserved metallo-β-lactamase (MBL) and β-CASP (CPSF-Artemis-SNM1-Pso2) domains. Here, we present crystal structures of the catalytic domain of wild type and variant forms of Artemis that cause RS-SCID Omenn syndrome. The truncated catalytic domain of the Artemis is a constitutively active enzyme that with similar activity to a phosphorylated full-length protein. Our structures help explain the basis of the predominantly endonucleolytic activity of Artemis, which contrast with the predominantly exonuclease activity of the closely related SNM1A and SNM1B nucleases. The structures also reveal a second metal binding site in its β-CASP domain that is unique to Artemis. By combining our structural data that from a recently reported structure we were able model the interaction of Artemis with DNA substrates. Moreover, co-crystal structures with inhibitors indicate the potential for structure-guided development of inhibitors. INTRODUCTION Nucleases hydrolyse the phosphodiester bonds of nucleic acids and are grouped into two broad classes: exonucleases and endonucleases. Exonucleases are often non sequence- specific, while endonucleases can be further grouped into sequence-specific .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint mailto:peter.mchugh@imm.ox.ac.uk https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ endonucleases, such as restriction enzymes, and structure-selective endonucleases [1]. Artemis (also known as SNM1C or DCLRE1C), along with SNM1A (DCLRE1A) and Apollo (SNM1B or DCLRE1B), are human nucleases that fall into the extended structural family of metallo-β-lactamase (MBL) fold enzymes [2,3]. The N-terminal region of Artemis is predicted to have a core MBL fold (aa 1–155, 385–361) with an inserted β-CASP (CPSF73, Artemis, SNM1 and PSO2) domain (aa 156–384). β-CASP domains are present within the larger family of eukaryotic nucleic acid processing MBLs and confer both DNA/RNA binding and nuclease activity [2]. The C-terminal region of Artemis mediates protein-protein interactions, contains post translational modification (PTM) sites, directs subcellular localisation, and may modulate catalytic activity [4–7]. Although SNM1A, SNM1B, and Artemis are predicted to have similar core structures for their catalytic domains, each have distinct cellular functions and substrate specificities. While SNM1A and SNM1B/Apollo are exclusively 5' to 3' exonucleases, the predominant activity of Artemis is endonucleolytic [5,7], although a minor 5' to 3' exonuclease activity has been reported [8]. Human SNM1A localises to sites of DNA damage, can digest past DNA damage lesions in vitro, and is involved in the repair of interstrand crosslinks (ICLs) [9– 11]. SNM1B/Apollo is a shelterin-associated protein required for resection at newly- replicated leading-strand telomeres to generate the 3'-overhang necessary for telomere loop (t-loop) formation and telomere protection [12–14]. Both SNM1A and SNM1B/Apollo prefer ssDNA substrates in vitro, with an absolute requirement for a free 5'-phosphate [3,15]. By contrast, Artemis prefers hairpins and DNA junctions as substrates for its endonuclease activity, although it is able to process ssDNA substrates [16–18] The endonuclease activity of Artemis is responsible for hairpin opening in variable (diversity) joining (V(D)J) recombination [19] and contributes to end-processing in the canonical non-homologous end joining (c-NHEJ) DNA repair [20–23]. V(D)J recombination is initiated by the recognition and binding of recombination-activating gene proteins (RAG1 and RAG2) to the recombination signal sequences (RSSs) adjacent to the V, D, and J gene segments. Upon binding, the RAG proteins induce double-strand breaks (DSBs) and create a hairpin at the coding ends [24–26]. The Ku heterodimer recognises the DNA double- strand break and recruits DNA-dependent protein kinase catalytic subunit (DNA-PKcs) and Artemis to mediate hairpin opening [17]. Following hairpin opening, the NHEJ machinery containing the XRCC4/XLF(PAXX)/DNA-Ligase IV complex is recruited to catalyse the processing and ligation reactions of the DNA ends [20,27,28]. V(D)J recombination is an essential process in antibody maturation [16,29,30]. Mutations in the Artemis gene cause aberrant hairpin opening resulting in severe combined immune deficiency (RS-SCID), with sensitivity to ionising radiation due to impairment of the predominant DSB repair pathway in mammalian cells, NHEJ [17,19,27], and another form of SCID (Omenn syndrome) associated with hypomorphic Artemis mutations [31,32] .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ One of the most common mutations leading to Artemis loss-of-function are large deletions in the first four exons and a nonsense founder mutation, as found in Navajo and Apache Native Americans [33]. In addition, missense mutations and in-frame deletions in the highly conserved residues such as H35, D165 and H228 can also abolish Artemis’ protein function [34]. Owing to the key roles of Artemis and related DSB repair enzymes in both programmed V(D)J recombination and non-programmed c-NHEJ DSB repair, they are attractive pharmacological targets for the radiosensitisation of tumours. Here, we present a high-resolution crystal structure of the catalytic core of Artemis (aa1– 361) containing both MBL and a β-CASP domains. This reveals that Artemis possesses a unique feature, that is not present in SNM1A and SNM1B/Apollo, i.e., a second metal binding site in its β-CASP domain that bears a resemblance to classical Cys2His2 zinc finger motifs. We propose that this second metal coordination site is involved in Artemis stabilisation and substrate specificity. We also present a model for Artemis DNA binding based on our data and another recently published structure. The Artemis DNA model is compared with models of DNA binding from related nucleases to reveal distinct features that define a role for Artemis in the end-joining reaction. Following development of an assay suitable for inhibitor screens, we identified drug-like molecules that could potentially inhibit both the Artemis active site and its essential zinc finger-like motif. MATERIAL AND METHODS Cloning and site directed mutagenesis of WT and mutant Artemis (aa 1-362) The Artemis MBL-β-CASP domain (WT and mutant) encoding constructs were cloned into the baculovirus expression vector pBF-6HZB which combines an N-terminal His6 sequence and the Z-basic tag (GenBankTM accession number KP233213.1) for efficient purification and to promote solubility. The Artemis gene was cloned using ligation independent cloning (LIC) [35]. Site directed mutagenesis was carried out using an inverse PCR experiment whereby an entire plasmid is amplified using complementary mutagenic primers (oligonucleotides) with minimal cloning steps [36]. Using the high-fidelity and high- processivity enzyme Herculase II Fusion DNA Polymerase (Agilent), a PCR was performed to amplify a whole plasmid. The PCR product was then added to a KLD enzyme mix (NEB) reaction and was incubated at room temperature for 1 hour, prior to transformation into Escherichia coli cells. Expression and purification of WT and mutant Artemis with IMAC (aa 1-362) Baculovirus generation was performed as previously described [3]. Recombinant proteins were produced in Sf9 cell at 2 x 106 cells/ mL infected with 1.5 mL of P2 virus for WT and 3 mL of P2 virus for mutants respectively. Infected Sf9 cells were harvested 70 h after infection by centrifugation (900 x g, 20 min). The cell pellet was resuspended in 30 mL/ L .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ lysis buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 10 mM imidazole, 5% (v/v) glycerol and 1 mM TCEP), snap frozen in liquid nitrogen, then stored at −80 °C for later use. Thawed cell aliquots were lysed by sonication. The lysates were clarified by centrifugation (40,000 g, 30 min), then the supernatant was passed through a 0.80 μm filter (Millipore) and loaded onto an equilibrated (lysis buffer) immobilised metal affinity chromatography column (IMAC) (Ni-NTA Superflow Cartridge, Qiagen). The immobilised protein was washed with lysis buffer, then eluted using a linear gradient of elution buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 300 mM imidazole, 5% v/v glycerol, and 1 mM TCEP). The protein containing fractions were pooled and passed through an ion exchange column (HiTrap® SP FF GE Healthcare Life Sciences) pre-equilibrated in the SP buffer A (25 mM HEPES pH 7.5, 300 mM NaCl, 5% (v/v) glycerol and 1 mM TCEP). The protein was eluted using a linear gradient of SP buffer B (SP buffer A with 1 M NaCl), and fractions containing the tag-free Artemis were identified by electrophoresis. Artemis containing fractions were pooled and dialysed overnight at 4°C in SP buffer A and supplemented with recombinant tobacco etch virus (TEV) protease for cleavage of the 6His- ZB tag. The protein was subsequently loaded into an ion exchange column (HiTrap® SP FF GE Healthcare Life Sciences), pre-equilibrated in the SP buffer A to remove 6His-ZB tag and uncleaved protein. The protein was eluted using a linear gradient of SP buffer B, and fractions containing the tag-free Artemis were identified by electrophoresis. Artemis- containing fractions from the SP column elution were combined and concentrated to 1 mL using a 30 kDa MWCO centrifugal concentrator. The protein was then loaded on to a Superdex 75 increase 10/300 GL equilibrated with SEC buffer (25 mM HEPES pH 7.5, 300 mM NaCl, 5% (v/v) glycerol, 2 mM TCEP). Mass spectrometric analysis of the purified proteins revealed masses of 41716.5 Da, 41650.5 Da, 41672.2 Da, 41639.9 Da for WT, H35A, D37A and H35D proteins, respectively. The calculated masses are 41715.09, 41649.2, 41671.2 and 14639.2, respectively, all within 1.5 Da of the measured masses. Expression and purification of WT truncated Artemis catalytic domain without IMAC (aa 1-362) The truncated Artemis protein was expressed and purified in a similar manner as described above except for the first purification step. We used 5 mL HiTrap® SP Fast Flow (GE Health Care) column as the first step of purification. Following an overnight TEV cleavage the protein was subjected to a second ion exchange step (5 mL HiTrap® SP Fast Flow (GE Health Care)) for the removal of the Z-Basic protein tag. The protein was further purified by size exclusion chromatography (Highload® 16/200 Superdex® 200). Cloning, expression and purification of full-length WT Artemis (aa 1-692) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ The full-length Artemis encoding construct was cloned into pFB-CT10HF-LIC, a baculovirus expression vector containing a C-terminal His10 and FLAG tag. pFB-CT10HF-LIC was a gift from Nicola Burgess-Brown (Addgene plasmid # 39191; http://n2t.net/addgene:39191; RRID: Addgene_39191). As for the truncated protein, the full -length Artemis gene was also cloned using ligation independent cloning (LIC) [35]. The baculovirus mediated expression of the full length DCLRE1C/ Artemis gene was performed in a manner similar to that used for the truncated protein. However, instead of infection with 1.5 mL of P2 Virus, 3.0 mL of P2 virus was used to infect Sf9 cells at 2 x 106 cells/ mL for the expression of the full-length Artemis construct. Cell harvesting and the initial IMAC purification steps were performed as described for the catalytic domain. Following IMAC chromatographic purification, TEV cleavage overnight in dialysis buffer (50 mM HEPES pH 7.5, 0.5 M NaCl, 5% glycerol and 1 mM TCEP) gave protein which was then passed through a 5 mL Ni-sepharose column; the flowthrough fractions were collected. The Artemis protein was then concentrated using a centrifugal concentrator (Centricon, MWCO 30 kDa) before loading on a Superdex S200 HR 16/60 gel filtration column in dialysis buffer. Fractions containing purified Artemis protein were pooled and concentrated to 10 mg/mL. Electospray mass spectrometry (ESI-QTOF) Reversed-phase chromatography was performed in-line prior to mass spectrometry using an Agilent 1290 uHPLC system (Agilent Technologies inc. – Palo Alto, CA, USA). Concentrated protein samples were diluted to 0.02 mg/ml in 0.1% formic acid and 50 µl was injected on to a 2.1 mm x 12.5 mm Zorbax 5um 300SB-C3 guard column housed in a column oven set at 40 oC. The solvent system used consisted of 0.1% formic acid in ultra- high purity water (Millipore) (solvent A) and 0.1 % formic acid in methanol (LC-MS grade, Chromasolve) (solvent B). Chromatography was performed as follows: Initial conditions were 90 % A and 10 % B and a flow rate of 1.0 ml/min. A linear gradient from 10 % B to 80 % B was applied over 35 seconds. Elution then proceeded isocratically at 95 % B for 40 seconds followed by equilibration at initial conditions for a further 15 seconds. Protein intact mass was determined using a 6530 electrospray ionisation quadrupole time-of-flight mass spectrometer (Agilent Technologies Inc. – Palo Alto, CA, USA). The instrument was configured with the standard ESI source and operated in positive ion mode. The ion source was operated with the capillary voltage at 4000 V, nebulizer pressure at 60 psig, drying gas at 350oC and drying gas flow rate at 12 L/min. The instrument ion optic voltages were as follows: fragmentor 250 V, skimmer 60 V and octopole RF 250 V. Protein crystallisation and Soaking .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Artemis (PDB:6TT5) was crystallised using the sitting drop vapour diffusion method by mixing 50 nL protein with 50 nL crystallisation solution comprising 0.2 M ammonium chloride, 20% (v/v) PEG 3350. Crystals grew after 2 weeks and reached maximum size within 3 weeks. An unliganded crystal was flash frozen in liquid nitrogen, cryoprotected with the mother liquor supplemented with 20% (v/v) ethylene glycol solution. The non-IMAC purified Artemis (PDB:7AF1) was crystallised in a similar manner, with the addition of 20 nL of crystal seed solution obtained from previous crystallisation experiment. The crystals were grown in a solution comprising 0.25 M ammonium chloride and 30% (v/v) PEG 3350 at 4°C. Crystals grew after one day and reached a maximum size within one week. Artemis variants (mutants H33A and H35D) were crystallised using the sitting drop vapour diffusion method by mixing 50 nL protein with 50 nL crystallisation solution comprising 0.1 M sodium citrate pH 5.5, 20% PEG 3350, while the D37A was crystalised in 0.2 M ammonium acetate, 0.1 M bis-TRIS pH 5.5, 25% PEG 3350. All Artemis variants were crystalised in the presence of 20 nL of crystal seed solution obtained from previous crystallisation experiment. Crystals grew after one day at 4°C. and reached maximum size within one week Data collection and refinements Data were collected at Diamond Light Source I04, I03, or I24 beamlines. Diffraction data were processed using DIALS [37] and structures were solved by molecular replacement using PHASER [38] and the PDB coordinates 5Q2A. Model building and the addition of water molecules were performed in COOT [39] and structures refined using REFMAC [40]. Data collection and refinement statistics are given in Table I. The X-ray fluorescence data was collected at Diamond Light Source I03 (6TT5) using 100% transmission and 12.7 eV, and I24 (7AF1) using 1% transmission and 12.8 eV (Suppl. Figure 1). Generation of 3-radiolabelled substrates 10 pmol of single-stranded DNA (Eurofins MWG Operon, Germany) were labelled with 3.3 pmol of α-32P-dATP (Perkin Elmer) by incubation with terminal deoxynucleotidyl transferase (TdT, 20 U; ThermoFisher Scientific), at 37oC for 1 hour. This solution was then passed through a P6 Micro Bio-Spin chromatography column (BioRad), and the radiolabeled DNA was annealed with the appropriate unlabeled oligonucleotides (1:1.5 molar ratio of labelled to unlabeled oligonucleotide) (Supplementary Table 1 for sequences) by heating to 95oC for 5 min, and cooling to below 30oC in annealing buffer (10 mM Tris-HCl; pH 7.5, 100 mM NaCl, 0.1 mM EDTA). Gel-based nuclease assays Standard nuclease assays were carried out in reactions containing 20 mM HEPES-KOH, pH 7.5, 50 mM KCl, 10 mM MgCl2, 0.05% (v/v) Triton X-100, 5% (v/v) glycerol (final volume: .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ 10 μL), and the indicated concentrations of Artemis. Reactions were started by the addition of DNA substrate (10 nM), incubated at 37°C for the indicated time, then quenched by addition of 10 μL stop solution (95% formamide, 10 mM EDTA, 0.25% (v/v) xylene cyanole, 0.25% (v/v) bromophenol blue) with incubation at 95 °C for 3 min. Reaction products were analysed by 20% denaturing polyacrylamide gel electrophoresis (made from 40% solution of 19:1 acrylamide:bis-acrylamide, BioRad) and 7 M urea (Sigma Aldrich)) in 1 x TBE (Tris-borate EDTA) buffer. Electrophoresis was carried out at 700 V for 75 minutes; gels were subsequently fixed for 40 minutes in a 50% methanol, 10% acetic acid solution, and dried at 80°C for two hours under a vacuum. Dried gels were exposed to a Kodak phosphor imager screen and scanned using a Typhoon 9500 instrument (GE). Fluorescence-based nuclease assay. The protocol of Lee et al [41] was adapted for structure-specific endonuclease activity. A ssDNA substrate was utilised containing a 5’ FITC-conjugated T and a 3’ BHQ-1 (black hole quencher)-conjugated T (Suppl. Table 1). As the FITC and BHQ-1 are located proximal to one another, prior to endonucleolytic incision, the intact substrate does not fluoresce. Following endonucleolytic incision by DCLRE1C/Artemis, there is uncoupling of the FITC from the BHQ-1 and an increase in fluorescence. Inhibitors (at increasing concentrations) were incubated with Artemis for 10 minutes at room temperature, before the reaction was started with the addition of DNA substrate. Assays were carried out in a 384-well format, in a 25 L reaction volume. The buffer was the same as for the gel-based nuclease assays, Artemis concentration was 50 nM, and the DNA substrate was at 25 nM. Fluorescence spectra were measured using a PHERAstar FSX (excitation: 495 nm; emission: 525 nm) with readings taken every 150 sec, for 35 min, at 37 °C. RESULTS Human Artemis (SNM1C or DCLRE1C) has a core catalytic fold similar to SNM1A and Apollo/SNM1B The core catalytic domain of Artemis (aa 3–361) was produced in baculovirus-infected Sf9 cells fused to a highly basic His6-Zb tag, which confers tight binding to cation exchange columns. The protein was purified using immobilised metal affinity chromatography (IMAC) on a Nickel-Sepharose column as the initial step. Subsequent preparations were performed without the use of IMAC, to avoid the introduction of Ni2+ ions during purification. Artemis protein was purified as detailed in the Methods & Materials, and crystals were subsequently grown and diffracted to 1.6 Å resolution (Table 1); the structure was solved using a structure of SNM1A (PDB coordinates 5Q2A) as the molecular replacement model. The resultant Artemis structure (PDB coordinates 7AF1) contains a single molecule in the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ asymmetric unit, with two zinc ions coordinated at the active site. The metal ions were identified using X-ray fluorescence (XRF) analysis during data collection at the Diamond Light source. When using the protein purified using IMAC, the first zinc ion in the active site can be replaced by a nickel ion (PDB: 6TT5) (Figure 2E). The presence of the nickel ion was also confirmed in the crystal by XRF. The X-ray fluorescence analysis of the metal ions present in the structures are shown in Suppl. Figure 1. This metal ion coordination pattern has been observed with other member of the family, such as SNM1A and SNM1B/Apollo [3,9]. The overall fold of human Artemis protein catalytic core is very similar to that of human SNM1A and SNM1B/Apollo (2 Å RMSD). It has the key structural characteristics of human MBL fold nucleases, with the di-metal containing active site interfaced between the MBL and β-CASP domains (Figure 1A). As anticipated, the MBL domain (Figure 1A and 1B, in pink) of Artemis has the typical α/β-β/α sandwich MBL fold [42] and contains all of the highly conserved motifs 1–4 (Figure 1C and D; and sequence alignment, Suppl. Figure 2) which are typical for the whole MBL superfamily, and motif A–C which are typical of the β-CASP fold containing family [2,27,43,44]. Motifs 1–4 (Figures 1D and 2C) are responsible for metal ion coordination in both DNA and RNA processing MBL enzymes [44]. As previously observed in crystal structures of human SNM1A and Apollo/SNM1B, Artemis can coordinate one or two metal ions in its active site. One zinc ion (Zn1) in the active site is coordinated by four residues (His33, His35, His115, and Asp116) and two water molecules (H2O 506 and 611) in an octahedral manner (Figure 2C). The second zinc ion (Zn2) was refined with 30% occupancy and is coordinated by three residues (Asp37, His38, and Asp136) and two water molecules (H2O 506 and 529). The low occupancy of the second zinc ion, together with the two conformations (0.5 occupancy for each conformation) observed for Asp37 (Figure 2E) suggest that this site binds a metal ion less tightly than the Zn1 site, consistent with studies on other human MBL fold nucleases [3, Baddock et. al.,2020] The structure of human SNM1A (PDB: 5AHR) [3] was solved with a single zinc ion coordinated in the active site (Figure 2A). By contrast, SNM1B structures solved with a bound AMP (Baddock et. al., accompanying paper) showed that both metal ions are positioned to coordinate the phosphate group of the AMP in an octahedral manner (Figure 2B). In summary, the octahedral coordination sites for the first zinc ion are contributed to by three histidines, one aspartate residue, and either water molecules or a phosphate oxygen of the substrate; the second metal ion is more weakly coordinated in the SNM1 protein family, with one histidine and two aspartates, with the remaining three positions occupied by water or a phosphate oxygen of the DNA substrate. This can explain the partial occupancy of the second zinc position in SNM1 enzyme structures, where the full occupancy may be achieved only in presence of substrate. We propose that Artemis would .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ coordinate a phosphate group of its substrate in a similar manner. The structure of human CPSF-73 (PDB: 2I7T), an RNA processing nuclease, with two active site bound zinc ions and a phosphate molecule [45], shows that the two zinc ions are coordinated in a very similar geometry with the human MBL DNA processing enzymes. [46,47]. A striking difference between the MBL RNA and DNA nucleases is that the second metal ion (M2) in the RNA processing nucleases is coordinated by an additional histidine residue (His418 for CPSF-73) [45] that is absent in the DNA processing enzymes (Figure 2D). The structure of Artemis reveals a novel zinc-finger like motif in the β-CASP domain. Proteins with a β-CASP fold form a distinct sub-group within the MBL-fold superfamily that specifically act on nucleic acids [2]. Artemis’ β-CASP domain is comprised of residues 156–384 and it is the second globular domain (Figure 1A, shown in white) in the catalytic region; inserted within the Artemis MBL fold sequence between the small α-helices 6 and 7 (figure 1 B). The β-CASP domain has been proposed to facilitate substrate recognition and binding in the nucleic acid processing MBL fold-containing family of enzymes [2,48]. Another metal ion coordination site, unique to Artemis, is present in the β-CASP domain, with similarity to the canonical Cys2 His2 zinc-finger motif [49,50]. Many DNA binding proteins, including transcription factors and a substantial number of DNA repair factors (including those involved in NHEJ), possess the classical Cys2 His2 zinc finger motif, that serves as a structural feature stabilising the DNA binding domain [49–52]. A typical Cys2 His2 zinc coordinating finger (Figure 3A) has a ββα motif, wherein the zinc ion is coordinated between an α-helix and two antiparallel β-sheets. The zinc ion confers structural stability and hydrophobic residues located at the sides of the zinc coordination site enable specific binding of the zinc finger in the major groove of the DNA [49,50,53,54]. Similar to the canonical Cys2 His2 zinc finger motif, the zinc ion coordination in Artemis’ β-CASP domain adopts a tetrahedral geometry, with coordination by two cysteine (Cys256 and Cys272) and two histidine (His228 and His254) residues (Figure 3 B). However, in the case of Artemis the metal ion coordination site is sandwiched between two β-sheets instead of an α-helix and two antiparallel β-sheets. Almost all the residues in the zinc-finger like motif (His228, Cys256, and Cys227) are unique to Artemis (sequence alignment Suppl. Figure 2), with only His254 being well conserved within the SNM1-family. However, these four residues that forms the zinc-finger like motif are highly conserved in Artemis across different species (from human to marine sponge), implying functional importance (sequence alignment Suppl. Figure 3). Consistently, substitution of His228 and His254 (H228N and H254L), two of the zinc coordinating residues in the β-CASP domain of Artemis, cause RS-SCID in humans [34,44,55]. Patients with these inherited mutations suffer from impaired V(D)J recombination, leading to underdeveloped B and T lymphocytes. The importance of histidine 254, has been highlighted by de Villartay et al. [44], who showed that the full- .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ length H254A Artemis variant is unable to carry out V(D)J recombination in vivo and has no discernible endonucleolytic activity in vitro. Comparison of Artemis structure with 6WO0 and 6WNL During the preparation of this manuscript a structural study on the catalytic core of Artemis was published [56]. This study described reported two Artemis structures (PDB: 6WO0 and 6WNL) that are similar to our Artemis structure (PDB code 7AF1) (backbone RMSDs of 0.48 Å and 0.54 Å respectively), with identical relative positioning of the MBL and β-CASP domains (Figure 4A and Suppl. Figure 4A). The only significant difference was that whilst we refined our structure with two zinc ions in the active site, both of the crystal forms reported by Karim et al. were modelled with a single active site zinc ion (Zn1), reinforcing the proposal of weaker metal ion binding at the Zn2 site. Re-analysis of the 6WO0 and 6WNL structures An unusual aspect of the Karim et al. structures is that both crystal forms were obtained in the presence of DNA and were reported to require DNA for their growth; the crystals showed a fluorescence signal supporting the presence of DNA (the oligonucleotides used contained a cyanine dye fluorophore), yet neither of the models presented contain DNA. The authors referred to some broken stacking electron density in 6WNL in a solvent channel and a patch of unsolved density approaching the active site in 6WO0, but state that the DNA “did not bind to the protein in a physiological way, and likely bound promiscuously to promote crystallization” [56]. We performed a careful re-examination of these structures looking closely at the residual electron density. For the 6WNL structure we were able to locate a distorted duplex DNA of around 13 base pairs which we propose may be the product of duplex annealing of the oligonucleotide used for crystallization (a semi-palindromic 13-mer that was designed to form a hairpin with phospho-thioate linkages in the single-stranded region) (Suppl. Figure 4B). For this structure, we are in general agreement with Karim et al that the DNA does not appear to make meaningful interactions with the protein that inform on the mechanism of nuclease activity, although this mode of association with DNA may possibly be relevant to alternative binding modes relating to higher order complexes containing Artemis. By contrast, for the 6WO0 structure we were able to confidently build a DNA molecule that contacts the Artemis active site in a manner that we believe to be relevant to the Artemis nuclease activity. Our model contains an 8-nucleotide 5-single-stranded extension with a short 2-base pair region of duplex DNA that reaches into the Artemis active site making close contacts with the metal ion centre in a manner consistent with the proposed catalytic mechanism (Figure 4B). The sequence of the longest strand corresponds to the 10-nucleotide cy-5 labelled strand (cy5-GCGATCAGCT) with some residual density at the 5-end that may be attributed to the cyanine fluorophore which we did not include in our model. The complementary .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ strand used for crystallization was 13-nucleotides long and was intended to produce a 5- overhang, but only two bases and three phosphates could be located in the density. The abrupt manner in which the electron density apparently disappears from either end of this strand suggests that this is the product of a cleavage reaction, although it is possible that remaining nucleotides are not located due to disorder. The analysis of electron density at this site is complicated by the proximity to a crystallographic 2-fold symmetry axis, which brings a symmetry copy of the DNA molecule into a position where atoms partially overlap and the extended 5’ strands form a pseudo duplex (Suppl. Figure 5A). The occupancy of the entire DNA molecule is thus limited to 0.5, and the lower occupancy is reflected in the electron density map which requires a lower contour level than would usually be applied (Suppl. Figure 5B). After carefully building and refining the afore-described DNA bound model, significant positive electron density was revealed for the second metal ion (Zn2 site) which we also included in the model with the same occupancy (0.5) as the DNA. Our model was refined to similar crystallographic R- factors as 6WO0 and has been deposited with PDB accession number 7ABS (refinement statistics are given in Table I). Model for Artemis DNA binding Using the crystallographically observed DNA as a template we were able make a model for Artemis binding to a longer section of double-stranded DNA by complementing unpaired bases on the single-stranded DNA overhang with canonical base pairs, whilst maintaining acceptable geometry of the sugar phosphate backbone (Figure 5A). The duplex section of this model deviates slightly from the ideal B-form geometry [57], in a manner that is reminiscent of certain transcription factor DNA complexes [58,59]. We have also extended the metal ion contacting strand by three nucleotides to form a 5-overhang; the positioning of the overhang nucleotides is more speculative, nevertheless it was possible to avoid clashes with protein residues whist maintaining relaxed geometry (Figure 6C). In the extended DNA complex model, Artemis contacts both strands of the DNA model in several areas; notably a single phosphate lies above the di-metal ion bearing active site and ligates to both metal ions in the same manner as observed in structures of related enzymes with phosphate or phosphate-containing compounds (Baddock et.al. 2020) [60]. The two downstream nucleotides on this strand pass close to the protein surface, forming possible interactions with both the main chain of Asp37 and sidechain of Lys36, whilst subsequent nucleotides are not close to the protein (Figure 6A). The overhang portion of this strand continues with a slightly altered trajectory, potentially contacting Artemis in the vicinity of the cleft separating the MBL and β-CASP domains, with the potential to form favourable interactions with both positively charged (Arg 172) and aromatic residues (Phe 173 and Trp 293) (Figure 6C). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ The complementary strand forms interactions with the protein via backbone contacts that span a 4-nucleotide stretch between 5- and 8-bases from the 3-terminus that contact positively charged sidechains in the MBL domain (Lys36, Lys40, Arg43, and Lys74) (Figure 6A). The 3-end of this strand apparently terminates directly above a cluster of polar or positively charged residues in the β-CASP domain (Lys207, Lys288, Asn205) (Figure 6B). Whilst the experimental (as used in co-crystallisation) DNA substrate and our model both contain a 3'-hydroxyl group, the model implies that addition of a 3'-phosphate could be accommodated and may be expected to make favourable interactions with the basic cluster of residues. Thus, our model illustrates a preferred binding mode for Artemis for DNA with a 5'-overhang binding at the junction between double- and single-stranded regions, and the expected product of this reaction would be a blunt ended DNA with a 5'-phosphate. In the case of hairpin DNA substrates our model indicates the possibility for Artemis to accommodate a loop connecting the two strands possibly of around 4-nucleotides or more, with the cleavage product being DNA with a 3'-overhang cleaved from the last paired base of the duplex. Comparison of the Artemis DNA binding mode with that of other nucleases We have recently determined the structure of SNM1B/Apollo in complex with two deoxyadenosine monophosphate nucleotides and through a similar process of extrapolation to that outlined above we have independently built a model for SNM1B binding to DNA containing a 3'-overhang (one of its preferred substrates) (Baddock et.al. accompanying paper). The overall mode of DNA binding is similar in the two models (Figure 5), with the two DNA duplexes being roughly parallel and forming contacts to similar regions on the MBL domain. The most important differences lie in the nature of the contacts formed to the active site and the paths of the various overhangs. In the SNM1B model extensive contacts are made to the 5-phosphate in a well-defined phosphate binding pocket. Both human SNM1A and SNM1B are exclusively 5-phosphate exonucleases, with most of these phosphate binding residues being highly conserved (sequence alignment Suppl. Figure 1 in yellow) (Baddock et. al accompanying paper). Interestingly, Artemis lacks these key phosphate binding residues and the 5-phosphate binding pocket of SNM1A and SNM1B. Instead, this pocket in Artemis is partially filled by the side chain of Phe318. These contacts appear to define a high-affinity binding pocket exclusively for the 5-phosphate of the DNA terminus, thus explaining the major differences in nuclease activities within the family, i.e., SNM1A and SNM1B being exonucleases and Artemis being an endonuclease. Further differences between Artemis and SNM1B/SNM1A are found in the loop connecting β-strands G and H (using Artemis numbering), which in Artemis is significantly longer and occupies a different position contacting residues in the MBL domain (Figure 6B) , compared to the loops in SNM1B and SNM1A that form part of the phosphate binding pocket and make potential contacts with the 3'-overhang. This loop displacement in Artemis may contribute to its ability in accommodating DNA substrates .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ with either 5-overhangs or hairpins, thus facilitating its function as a structure specific endonuclease. Interestingly, the surface of the Artemis interface between the MBL and β-CASP domains contains a belt of positively charged residues (Figure 5A). These positive surface charges are proposed to facilitate productive DNA binding to the active site, consistent with our DNA binding model. One of the striking differences, at least in the available structures, is that the active site of Artemis is more open compared to those of SNM1A or SNM1B. This openness may reflect an ability to accommodate different substrate conformation including hairpins, 3'- and 5'-overhangs, as well as DNA flaps and gaps. Both human SNM1A and SNM1B appear to have a more sequestered active site that would only fit a single strand of DNA, which is consistent with previous findings on their preferred substrate selectivity [10]. Biochemical characterisation of truncated Artemis catalytic domain (aa 1-361) To investigate the activity of our different versions of recombinant Artemis, we performed nuclease assays using radiolabelled DNA substrates. We compared the catalytic domain purified using IMAC (which contained Ni2+ in the active site) with protein purified using ion exchange (and avoiding IMAC), which contained predominantly Zn2+. We also tested the activity of full-length phosphorylated Artemis. The results show that both truncated enzymes have identical activities, which is also very similar to that of the full- length enzyme (Suppl. Figure 6). One notable difference between our full-length protein and that reported by Ma et. al [18], is that our full-length protein is active in the absence of DNA-PKcs. Intact protein mass spectrometric analysis of our full-length protein shows that the protein has undergone up to five phosphorylation events (Suppl. Figure 7). Poinsignon et al. have shown that Artemis is constitutively phosphorylated in cultured mammalian cells and is the target of additional phosphorylation in response to induced DNA damage [61]; it is interesting that the capacity to phosphorylate Artemis to produce an active form is also conserved in insect cells. We observed exonuclease activity with full-length Artemis at 10 nM (Suppl. Figure 8), though this was weak compared to its endonuclease activity at the same concentration. We observed no exonuclease activity for the truncated Artemis construct, it is possible that phosphorylation alters the balance between endonuclease and exonuclease activity, though the biological relevance of this, if any, remains to be validated. As mentioned above, both human SNM1A and SNM1B require a 5'-phosphate for their activity [3,9,15]. To investigate whether there is a similar requirement for Artemis, we tested the activity of truncated Artemis against single-stranded and overhang DNA substrates with different 5'-end groups, including a phosphate, hydroxyl group, and biotin groups (Figure 7A). The results imply that, at least under the tested conditions, Artemis is .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ agnostic to the different end modifications, exhibiting comparable digestion of all substrates. Extensive evidence demonstrates that full-length Artemis in complex with DNA-PKcs has structure specific endonuclease activity [17,18,27]. These studies reported that Artemis can digest substrates including overhangs, hairpins, stem-loops, and splayed arms (pseudo-Y). To investigate the activity of truncated Artemis catalytic domain (aa 1–361) we performed nuclease assays using a variety of radio-labelled DNA substrates (Figure 7B). The results show that truncated Artemis has substrate specific endonuclease activity, with a preference for single-stranded DNA susbstrates, and those that contain single stranded character (e.g. 5’- and 3’-overhangs, splayed arms, and a lagging flap structure), compared with double stranded DNA structures (e.g. ds DNA and a replication fork). This is in accordance with previous research, where Artemis has been reported to cleave around ss- to dsDNA junctions in DNA substrates (perhaps cite Chang et al, 2015 for this). The truncated Artemis catalytic domain also exhibits hairpin opening activity, in accordance with what has previously been reported (Suppl. Fig. 9). On a duplex substrate (YM117 from Ma et al) [18] with a 20 nt hairpin region, Artemis cleaves adjacent to the hairpin, consistent with previous data. It is clear that truncated form of Artemis exhibits nuclease activity closely comparable to the phosphorylated full-length Artemis protein [18], indicating that the structural studies presented here reveal mechanistic insights of direct relevance to the DNA-PKcs-associated form of Artemis that engages in end-processing reactions in vivo. Structural and biochemical characterisation of Artemis point mutations Previous site-directed mutagenesis studies by Pannicke et al. targeting the metal ion coordinating residues in the active site (D17N, H33A, H35A, D37N) of full-length Artemis (aa 1–692) established the importance of active site motifs 1–4 for activity [27]. Each of these substitutions markedly reduced or abolished Artemis’ ability to carry out its role in V(D)J recombination in vitro. We mutated, expressed, purified, and crystallised three forms of truncated Artemis (aa 1–361) with substitutions in several of these metal ion co-ordinating residues, i.e. D37A, H33A, and the Omenn Syndrome patient mutation H35D [31,55]. We found that the overall architecture of the three variants is almost identical to the WT (Figure 8A). The D37A structure retains one Zn ion (Zn1) whilst losing the second, (Zn2) (Figure 8B) in the active site. Both the H33A and H35D variants additionally exhibited loss of the Zn1 ion. All three variants retained the Zn ion in the zinc finger-like motif of the β-CASP domain. The position of the active site residues and the surrounding residues in the D37A variant, superposed perfectly with WT Artemis. As previously mentioned, Asp37 can adopt two conformations as seen in 6TT5 structure, noting that the coordinated zinc ion is generally present at about 30% occupancy in both of the WT Artemis structures (PDB 6TT5 and 7AF1). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Therefore, it is unsurprising that mutation of Asp37 to alanine results in the loss of Zn2. Differential scanning fluorimetry (DSF) experiments were carried out to investigate the stability of the three variants. DSF analysis showed that the D37A variant has similar thermal stability as the WT, suggesting that the protein is stable and folded in the presence of a single metal ion in the active site, whilst both H33A and H35D variants were substantially destabilized with ΔTm around -13 °C compared to the WT Artemis (Suppl. Figure 10). Histidines 33 and 35 are the first two histidine residues in the HxHxDH motif (motif 2) in the MBL domain. Their role is to coordinate the first metal ion (Zn1 site) in the catalytic site. In the absence of metal ions in the catalytic site, the loop comprising residues 113–119 moves away from the active site (Figure 8C and D). Another small rearrangement occurs in helix α8 (residues 348–358) of the MBL domain. In both the H33A and H35D variants, helix α8 moves slightly closer toward strand β14, compared to the WT and D37A variant. Surprisingly, the biggest rearrangement occurs in β-strand E (residues 268–270) and α-helix E (residues 261–267); both located near the zinc finger motif in the β-CASP domain (Figure 8A). In H33A and H35D variants, both β-strand E and α-helix E shifted upward and away from the zinc finger like motif. These conformational changes may suggest some allosteric regulation in terms of substrate binding and catalytic activity of the enzyme. We also tested the activity of the D37A, H33A, and H35D Artemis variants in vitro using single-stranded 3’ end radiolabelled DNA as a substrate in a gel-based assay. All three variants lost their ability to digest the DNA substrate in vitro (Figure 9). These observations are in agreement with the results obtained with full-length variants by Ege et al. and Pannicke et al. [27,31]. Their studies show that the full-length Artemis variants H33A, H35D, and D37N are able to interact with and be phosphorylated by DNA-PKcs, however, have lost the ability to digest DNA substrates in vitro. The combined results reveal the importance of the HxHxDH motif and highlight the importance of the di-metal catalytic core in the SNM1 family, not only in directly catalysing hydrolysis, but also likely in conformational changes involved in catalysis [43]. Identification of small molecule inhibitors of Artemis Radiotherapy is a mainstay of cancer therapy; its effectiveness relies on inducing DNA double-strand breaks (DSBs) that contain complex, chemically modified ends that must be processed prior to repair [62,63]. The canonical non-homologous end-joining (c-NHEJ) pathway repairs 80% of DNA double-strand breaks in mammalian cells [20,23]. Therefore, combining radiation therapy in conjunction with c-NHEJ inhibitors could selectively radiosensitise tumours. Weterings et al. have reported a compound that interferes with the binding of Ku70/80 to DNA, thereby increasing sensitivity to ionising radiation in human cell lines [64]; ATM inhibitors are also in advanced clinical development and represent the most developed strategy to inhibit DSB repair to increase the efficacy of radiotherapy [65] .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Artemis, along with SNM1A and SNM1B, possess a conserved MBL-fold domain that is similar to the true bacterial MBLs. Previous studies on human SNM1A and SNM1B/Apollo, showed that ceftriaxone (Rocephin), a widely used β-lactam antibacterial (third generation cephalosporin) inhibits the nuclease activity of both SNM1A and SNM1B [66]. To investigate if this class of β-lactam anti-bacterial compounds could inhibit Artemis we performed fluorescence-based nuclease assays with three cephalosporins, i.e. ceftriaxone, cefotaxime and 7-aminocephalosporanic acid (Figure 10A). The results show that neither cefotaxime nor the parent compound, 7-aminocephalosporanic acid, potently inhibit Artemis’ activity, whilst ceftriaxone inhibits Artemis with a modest IC50 of 65 µM (Figure 10B). We solved the structure of ceftriaxone bound to the catalytic domain of Artemis (purified by IMAC) at 1.9 Å resolution (Figure 10C) by soaking an Artemis crystal with ceftriaxone. This structure was solved by molecular replacement (using PDB: 6TT5 as a model), in the space group P1 with one protein molecule in the asymmetric unit. As before, in this structure Artemis possesses the canonical bilobar MBL and β-CASP fold with an active site containing one nickel ion, possibly related to the purification method. Ceftriaxone binds to the protein surface in an extended manner making interactions with the active site, towards the β-CASP domain (Figure 10 C). There is no evidence for cleavage of the β-lactam ring nor of loss of the C-3’ cephalosporin side chain, reactions that can occur during ‘true’ MBL catalysed cephalosporin hydrolysis. The electron density at the active site clearly reveals the presence of the ceftriaxone side chain in a position to coordinate the nickel ion (at the Zn1 site) replacing water molecules (waters 72 and 106) compared with the apo structure (Figure 10D). Despite the conservation of key elements of the active site of the MBL fold nucleases and the ‘true’ b-lactam hydrolysing MBLs [67], ceftriaxone, does not interact with the nickel ion via its b-lactam ring (as occurs for the true MBLs), but via both carbonyl oxygens of the cyclic 1, 2 diamide in its sidechain (Ni-O distances: 2 Å and 2.2 Å), i.e. it is not positioned for productive b-lactam hydrolysis. The amino-thiazole group (N7) of ceftriaxone forms hydrogen bonds with the side chain of Asn205, while the S1 of the 7-aminocephalosporanic acid core of the compound interacts with the hydroxyl of Tyr212 through an ethylene glycol molecule. The rest of the molecule appears to be flexible. The binding mode of ceftriaxone to Artemis shown in Figure 10C is near identical to that observed for ceftriaxone with SNM1A (PDB: 5NZW) structure (Suppl. Figure 11). One notable difference between the Ceftriaxone-bound Artemis structure and the apo structure, is the loss of a second metal ion at the active site (Figure 10D). In the apo structure (Figure 2C), this zinc ion is coordinated by residues Asp37, His38, and Asp130. With a single metal coordination in the ceftriaxone bound structure, the Asp37 side chain is positioned away from the active site (Figure 10D), as seen in the nickel bound (PDB: 6TT5) structure (Figure 2E). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ To investigate the possibility of inhibiting Artemis through binding to the zinc finger motif in the β-CASP domain, we used the fluorescence-based nuclease assay to test three compounds known to react with thiol groups present in zinc fingers and which result in zinc ion displacement, i.e. ebselen [68], auranofin [69] and disulfiram [70]. We found that both ebselen and disulfiram inhibit Artemis with IC50 values around 8.5 µM and 10.8 µM respectively, whilst Auranofin inhibits less potently (IC50 46 µM) (Suppl. Figure 12), indicating additional possible inhibitory strategies. DISCUSSION The DCLRE1C/Artemis gene was first discovered in 2000, following work with children with severe combined immunodeficiency disease (SCID) [71]. Subsequent studies have shown that Artemis is a key enzyme in V(D)J recombination [16,17,72] and the c-NHEJ DNA repair pathway [21,44,73]; and that it is structure specific endonuclease, and member of MBL fold structural superfamily [2]. Our structures of wild-type and catalytic site mutants of SNM1C/DCLRE1C or Artemis protein show that, like SNM1A and SNM1B/Apollo and the RNA processing enzyme CPSF73, Artemis has a typical α/β-β/α sandwich fold in its MBL domain and has a β-CASP domain, the latter a characteristic feature of MBL fold nucleases. However, both our Artemis structures and those recently reported by Karim et. al [56] reveal a unique structural feature of Artemis in its β-CASP domain that is not reported in other human MBL enzymes, i.e. a classical zinc-finger like motif. Moreover, collectively, these structures allow us to assign a likely mode of DNA substrate interaction for Artemis. The role of the newly-described zinc-finger like motif remains unknown. However, zinc- finger motifs are common structural features in DNA binding proteins such as transcription factors [50,54], but are also observed in a substantial number of required and accessory NHEJ proteins [52]. These zinc fingers provide structural stability and enhance substrate selectivity rather than being involved in catalytic reactions, and we propose that this is likely to be the case for Artemis. The fact that the residues (His 228, His 254, Cys 256, and Cys 272) that are involved in the zinc-finger like motif are highly conserved across different Artemis species suggests the importance of this structural feature. Furthermore, point mutations in His 228 and His 254 (H228N and H254L) have been reported in patients with a SCID phenotype [55] . The presence of one or two metal ions coordinated by the HxHxDH motif at the active site of Artemis reflects a hallmark of the SNM1 enzyme family [9,74]; the available evidence implies that metal ion binding at one site (Zn1 site in standard MBL nomenclature) is stronger than at the other (Zn2 site). By analogy with studies on the true MBLs, these metal ions are proposed to activate a water molecule that act as the nucleophile for the phosphodiester cleavage. Our structure (PDB: 7AF1) suggests that the native metal ion(s) residing in the active site of Artemis is zinc, although a nickel ion can also occupy the same .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ site depending on the how the protein was purified (PDB: 6TT5). Neither the presence of Ni ion in the active site, nor the truncation of the C-terminal tail appear to inhibit, at least substantially, the activity of Artemis. Thus, using radio-labelled gel-based nuclease assays, we showed that the truncated Artemis catalytic domain (aa 1–361) with either Zn or Ni ions in the active site (as observed crystallographically in the same preparations) have similar activity with the full-length Artemis construct (aa 1- 693). Therefore, it seems likely that nickel ions are able to replace zinc ions in solution, but catalysis of MBL fold enzymes, including hydrolytic reactions, with metal ions other than zinc is well-precedented [67,75] We also solved structures of three Artemis catalytic mutants; D37A, H33A, and an Omenn syndrome patient mutation, H35D. Using gel-based nuclease assays, we showed that these variants are biochemically inactive. Overall, the three variant structures are similar to the WT structure, even though H33A and H35D entirely lack any metal ions in the active site, although zinc was present in the zinc finger. Mutation of Asp37 to alanine results in the loss of the second metal in the catalytic site, likely explaining the loss of activity, although the first metal ion is still present. Note that some MBL fold hydrolases uses two metal ions (e.g.,B1 and B3 subfamilies of the true MBLs and RNase J1 from Bacillus subtilis) (Suppl. Figure 13A and C) [67,76] whereas others, sometimes with apparently very similar active sites, only use one metal ion (e.g. the B2 subfamily of the true MBLs and RNase J from Staphylococcus epidermis) [67,77](Suppl. Figure 13B and D). Thus, whilst our results support the importance of having both metals for the nuclease activity by Artemis, subtle features can influence MBL fold enzyme activity [67,74]. Following re-analysis of the Karim et.al structure (PDB code 6WO0), we were able to generate a model of a DNA overhang I. complex with Artemis that informs on the substrate binding mode. Our model shows that Artemis interacts with the DNA substrate in the interface between the MBL and the β-CASP domains. This interaction is mediated through the combination of polar or positive residues and aromatic residues of Artemis and the DNA substrate (Figure 6). Artemis is the only identified MBL/β-CASP DNA processing enzyme that possesses substantive endonuclease activity. By contrast both SNM1A and SNM1B/Apollo are strictly phosphate exonucleases [3,9,15]. The recent structure of SNM1B/Apollo in complex with-׳5 two deoxyadenosine monophosphate nucleotides (PDB code: 7A1F) reported by Baddock et. al. 2020, reveals a cluster of residues that form a 5 ׳-phosphate binding pocket, adjacent to the metal centre. Structural sequence alignments of the three proteins shows that these residues are highly conserved in SNM1A and SNM1B (Suppl. Figure 1). Apart from Ser 317, none of these conserved phosphate binding pocket residues are present in Artemis. Instead, the pocket is partially occupied by the Phe318 side chain, which is absent in both SNM1A and SNM1B. Artemis also possesses a longer and more flexible loop connecting β- strands 27 and 28 (Figure 6B), compared to the same loop in SNM1A and SNM1B that make .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ up part of the 5 ׳-phosphate binding pocket. The flexibility of this loop could enable accommodation with different types of DNA structures, such as hairpins, and 5 ׳-overhangs. These differences plausibly explain Artemis’ substrate preferences and its primary activity as a structure-selective endonuclease. Of the three of the β-lactam anti bacterial compounds previously shown to inhibit SNM1A [66], we only observed inhibition of Artemis with ceftriaxone. Although the potency of inhibition is moderate (IC50 65 µM), we were able to solve the structure of ceftriaxone in complex with Artemis. Notably, ceftriaxone does not bind with its b-lactam carbonyl located at the active site where it ligates to one zinc (or other metal) ion, but instead binds the single nickel ion in bidentate manner via the carbonyls of its cyclic 1,2 diamide on its C-3’ sidechain. [74,78]. Studies with the true MBLs have shown that appropriate derivatisation of weakly binding molecules can lead to highly potent and selective inhibitors. In proof of principle attempts to inhibit Artemis though its novel structural feature compared to other MBL fold nucleases, i.e. via its zinc-finger like motif, we tested three covalent inhibitors with thiol-reactive groups. Ebselen, disulfiram and auranofin have the potential to interact with zinc fingers, including via zinc ejection with consequent protein destabilization [69,70,79,80]. Both ebselen and auranofin are reported have some antimicrobial properties [81], ebselen is in clinical trials for a variety of conditions, ranging from stroke to bipolar disorder [82], and auranofin is used for treatment of rheumatoid arthritis [83]. Recent studies have also shown that ebselen inhibits enzymes from SARS- CoV-2, i.e. the main protease (Mpro) and the exonuclease ExoN (nsp14ExoN-nsp10) complex [84,85]. Disulfiram is a known acetaldehyde dehydrogenase inhibitor used in treatment for alcohol abuse disorder [86]. Our results show that both ebselen and disulfiram inhibit Artemis (IC50s 8.5 µM and 10.8 µM, respectively), whilst auranofin is less potent (IC5046 µM). Studies focussed on inhibiting the MBL fold nucleases are at an early stage compared with work on the true MBLs. The structures and assays results presented here provide starting points with established drugs, from which it might be possible to generate selective Artemis inhibitors, either binding at the active site or elsewhere (including the apparently unique zinc finger of Artemis), in order to radiosensitise cells. ACCESSION NUMBERS Coordinates and structure factors have been deposited in the Protein Data Bank under accession codes 6TT5, 7AF1, 7AFS, 7AFU, 7AGI, 7APV and 7ABS. SUPPLEMENTARY DATA .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Supplementary Data are available at NAR online. ACKNOWLEDGEMENT We are very grateful to Dr Rod Chalk and Tiago Moreira for mass spectrometry, and Dr Neil Patterson for the helpful discussions and data collections at Diamond Light Source. We acknowledge Diamond Light Source for time on Beamlines I03, I04 and I24 under Proposal MX19301. FUNDING This work was supported by a Cancer Research UK Programme Award [A24759 to PJM, OG and CJS] and Wellcome trust grant [106169/ZZ14/Z to OG and 106244/Z/14/Z to CJS ]. CONFLICT OF INTEREST The authors declare no conflict of interest. REFERENCES 1. Yang W (2011) Nucleases: Diversity of structure, function and mechanism. 2. Callebaut I, Moshous D, Mornon J-P & De Villartay JP (2002) Metallo-beta-lactamase fold within nucleic acids processing enzymes: the beta-CASP family. Nucleic Acids Res. 30, 3592–3601. 3. Allerston CK, Lee SY, Newman JA, Schofield CJ, Mchugh PJ & Gileadi O (2015) The structures of the SNM1A and SNM1B/Apollo nuclease domains reveal a potential basis for their distinct DNA processing activities. Nucleic Acids Res. 43, 11047–11060. 4. Goodarzi AA, Yu Y, Riballo E, Douglas P, Walker SA, Ye R, Härer C, Marchetti C, Morrice N, Jeggo PA & Lees-Miller SP (2006) DNA-PK autophosphorylation facilitates Artemis endonuclease activity. EMBO J. 25, 3880–3889. 5. Malu S, De Ioannes P, Kozlov M, Greene M, Francis D, Hanna M, Pena J, Escalante CR, Kurosawa A, Erdjument-Bromage H, Tempst P, Adachi N, Vezzoni P, Villa A, Aggarwal AK & Cortes P (2012) Artemis C-terminal region facilitates V(D)J recombination through its interactions with DNA Ligase IV and DNA-pkcs. J. Exp. Med. 209, 955–963. 6. Niewolik D, Pannicke U, Lu H, Ma Y, Wang LCV, Kulesza P, Zandi E, Lieber MR & Schwarz K (2006) DNA-PKcs dependence of artemis endonucleolytic activity, differences between hairpins and 5′ or 3′ overhangs. J. Biol. Chem. 281, 33900–33909. 7. Niewolik D, Peter I, Butscher C & Schwarz K (2017) Autoinhibition of the nuclease ARTEMIS is mediated by a physical interaction between its catalytic and C-terminal domains. J. Biol. Chem. 292, 3351–3365. 8. Li S, Chang HH, Niewolik D, Hedrick MP, Pinkerton AB, Hassig CA, Schwarz K & Lieber MR (2014) Evidence that the DNA endonuclease ARTEMIS also has intrinsic 5′-exonuclease .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ activity. J. Biol. Chem. 289, 7825–7834. 9. Baddock HT, Yosaatmadja Y, Newman JA, Schofield CJ, Gileadi O & McHugh PJ (2020) The SNM1A DNA Repair Nuclease. DNA Repair (Amst). 594, 102941. 10. Yan Y, Akhter S, Zhang X & Legerski R (2010) The multifunctional SNM1 gene family: not just nucleases. Future Oncol. 6, 1015–1029. 11. Wang AT, Sengerova B, Cattell E, Inagawa T, Hartley JM, Kiakos K, Burgess-Brown NA, P SL, H EJ, Schofield CJ, Gileadi O, Hartley JA & Mchugh PJ (20AD) Human SNM1A and XPF– ERCC1 collaborate to initiate DNA interstrand cross-link repair. Genes Dev. 25, 1859– 1870. 12. Lenain C, Bauwens S, Amiard S, Brunori M, Giraud-Panis MJ & Gilson E (2006) The Apollo 5′ Exonuclease Functions Together with TRF2 to Protect Telomeres from DNA Repair. Curr. Biol. 16, 1303–1310. 13. van Overbeek M & de Lange T (2006) Apollo, an Artemis-Related Nuclease, Interacts with TRF2 and Protects Human Telomeres in S Phase. Curr. Biol. 16, 1295–1302. 14. Demuth I, Digweed M & Concannon P (2004) Human SNM1B is required for normal cellular response to both DNA interstrand crosslink-inducing agents and ionizing radiation. Oncogene 23, 8611–8618. 15. Sengerová B, Allerston CK, Abu M, Lee SY, Hartley J, Kiakos K, Schofield CJ, Hartley JA, Gileadi O & McHugh PJ (2012) Characterization of the human SNM1A and SNM1B/Apollo DNA repair exonucleases. J. Biol. Chem. 287, 26254–26267. 16. Mansilla-Soto J & Cortes P (2003) VDJ recombination: Artemis and its in vivo role in hairpin opening. J. Exp. Med. 197, 543–547. 17. Ma Y, Pannicke U, Schwarz K & Lieber MR (2002) Hairpin opening and overhang processing by an Artemis/DNA-dependent protein kinase complex in nonhomologous end joining and V(D)J recombination. Cell 108, 781–794. 18. Ma Y, Schwarz K & Lieber MR (2005) The Artemis:DNA-PKcs endonuclease cleaves DNA loops, flaps, and gaps. DNA Repair (Amst). 4, 845–851. 19. Moshous D, Callebaut I, De Chasseval R, Corneo B, Cavazzana-Calvo M, Le Deist F, Tezcan I, Sanal O, Bertrand Y, Philippe N, Fischer A & De Villartay JP (2001) Artemis, a novel DNA double-strand break repair/V(D)J recombination protein, is mutated in human severe combined immune deficiency. Cell 105, 177–186. 20. Lieber MR (2011) The mechanism of DSB repair by the NHEJ. Annu. Rev. Biochem. 79, 181–211. 21. Gu J, Li S, Zhang X, Wang LC, Niewolik D, Schwarz K, Legerski RJ, Zandi E & Lieber MR (2010) DNA-PKcs regulates a single-stranded DNA endonuclease activity of Artemis. DNA Repair (Amst). 9, 429–437. 22. Srivastava M & Raghavan SC (2015) DNA double-strand break repair inhibitors as cancer therapeutics. Chem. Biol. 22, 17–29. 23. Pannunzio NR, Watanabe G & Lieber MR (2018) Nonhomologous DNA end-joining for repair of DNA double-strand breaks. J. Biol. Chem. 293, 10512–10523. 24. Shockett PE & Schatz DG (1999) DNA Hairpin Opening Mediated by the RAG1 and RAG2 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Proteins. Mol. Cell. Biol. 19, 4159–4166. 25. De P, Peak MM & Rodgers KK (2004) DNA Cleavage Activity of the V(D)J Recombination Protein RAG1 Is Autoregulated. Mol. Cell. Biol. 24, 6850–6860. 26. Kim MS, Lapkouski M, Yang W & Gellert M (2015) Crystal structure of the V(D)J recombinase RAG1-RAG2. Nature 518, 507–511. 27. Pannicke U, Ma Y, Hopfner KP, Niewolik D, Lieber MR & Schwarz K (2004) Functional and biochemical dissection of the structure-specific nuclease ARTEMIS. EMBO J. 23, 1987– 1997. 28. Barnes DE, Stamp G, Rosewell I, Denzel A & Lindahl T (1998) Targeted disruption of the gene encoding DNA ligase IV leads to lethality in embryonic mice. Curr. Biol. 8, 1395– 1398. 29. Roth DB, Menetski JP, Nakajima P, Bosma MJ & Gellert M (1992) V ( D ) J Recombination : Broken DNA Molecules with Covalently Sealed ( Hairpin ) Coding Ends in scid Mouse Thymocytes. 70, 983–991. 30. Bassing CH, Swat W & Alt FW (2002) The mechanism and regulation of chromosomal V(D)J recombination. Cell 109, 45–55. 31. Ege M, Ma Y, Manfras B, Kalwak K, Lu H, Lieber MR, Schwarz K & Pannicke U (2005) Plenary paper Omenn syndrome due to ARTEMIS mutations. Blood 105, 4179–4186. 32. Volk T, Pannicke U, Reisli I, Bulashevska A, Ritter J, Björkman A, Schäffer AA, Fliegauf M, Sayar EH, Salzer U, Fisch P, Pfeifer D, Virgilio M Di, Cao H, Yang F, Zimmermann K, Keles S, Schindler D, Hammarström L, Caliskaner Z, Rizzi M, Hummel M, Pan-hammarström Q, Schwarz K & Grimbacher B (2015) DCLRE1C ( ARTEMIS ) mutations causing phenotypes ranging from atypical severe combined immunodeficiency to mere antibody deficiency. 24, 7361–7372. 33. Li L, Moshous D, Zhou Y, Wang J, Xie G, Salido E, Hu D & Cowan MJ (2002) A Founder Mutation in Artemis, an SNM1-Like Protein, Causes SCID in Athabascan-Speaking Native Americans. J. Immunol. 168, 6323–6329. 34. Felgentreff K, Lee YN, Frugoni F, Du L, Van Der Burg M, Giliani S, Tezcan I, Reisli I, Mejstrikova E, De Villartay JP, Sleckman BP, Manis J & Notarangelo LD (2015) Functional analysis of naturally occurring DCLRE1C mutations and correlation with the clinical phenotype of ARTEMIS deficiency. J. Allergy Clin. Immunol. 136, 140-150.e7. 35. Savitsky P, Bray J, Cooper CDO, Marsden BD, Mahajan P, Burgess-Brown NA & Gileadi O (2010) High-throughput production of human proteins for crystallization: The SGC experience. J. Struct. Biol. 172, 3–13. 36. Dominy CN & Andrews DW (2003) Site-directed mutagenesis by inverse PCR. Methods Mol. Biol. 235, 209–223. 37. Winter G, Waterman DG, Parkhurst JM, Brewster AS, Gildea RJ, Gerstel M, Fuentes- Montero L, Vollmar M, Michels-Clark T, Young ID, Sauter NK & Evans G (2018) DIALS: Implementation and evaluation of a new integration package. Acta Crystallogr. Sect. D Struct. Biol. 74, 85–97. 38. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC & Read RJ (2007) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674. 39. Emsley P & Cowtan K (2004) Coot: Model-building tools for molecular graphics. Acta Crystallogr. Sect. D Biol. Crystallogr. 60, 2126–2132. 40. Murshudov GN, Skubák P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, Winn MD, Long F & Vagin AA (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr. Sect. D Biol. Crystallogr. 67, 355–367. 41. Lee SY, Brem J, Pettinati I, Claridge TDW, Gileadi O, Schofield CJ & McHugh PJ (2016) Cephalosporins inhibit human metallo β-lactamase fold DNA repair nucleases SNM1A and SNM1B/apollo. Chem. Commun. 52, 6727–6730. 42. Carfi A, Pares S, Duée E, Galleni M, Duez C, Frère JM & Dideberg O (1995) The 3-D structure of a zinc metallo-β-lactamase from Bacillus cereus reveals a new type of protein fold. EMBO J. 14, 4914–4921. 43. Li X & Moses RE (2003) The β-lactamase motif in Snm1 is required for repair of DNA double-strand breaks caused by interstrand crosslinks in S. cerevisiae. DNA Repair (Amst). 2, 121–129. 44. de Villartay JP, Shimazaki N, Charbonnier JB, Fischer A, Mornon JP, Lieber MR & Callebaut I (2009) A histidine in the β-CASP domain of Artemis is critical for its full in vitro and in vivo functions. DNA Repair (Amst). 8, 202–208. 45. Mandel CR, Kaneko S, Zhang H, Gebauer D, Vethantham V, Manley JL & Tong L (2006) Polyadenylation factor CPSF-73 is the pre-mRNA 3′-end-processing endonuclease. Nature 444, 953–956. 46. Alberts IL, Nadassy K & Wodak SJ (1998) Analysis of zinc binding sites in protein crystal structures. Protein Sci. 7, 1700–1716. 47. Ataie NJ, Hoang QQ, Zahniser MPD, Tu Y, Milne A, Petsko GA & Ringe D (2008) Zinc Coordination Geometry and Ligand Binding Affinity: The Structural and Kinetic Analysis of the Second-Shell Serine 228 Residue and the Methionine 180 Residue of the Aminopeptidase from Vibrio proteolyticus †. Biochemistry 47, 7673–7683. 48. Ishikawa H, Nakagawa N, Kuramitsu S & Masui R (2006) Crystal Structure of TTHA0252 from Thermus thermophilus HB8 , a RNA Degradation Protein of the Metallo- b - lactamase Superfamily. 542, 535–542. 49. Matthews JM & Sunde M (2002) Zinc fingers - Folds for many occasions. IUBMB Life 54, 351–355. 50. Wolfe SA, Nekludova L & Pabo CO (2000) DNA Recognition by Cys 2 His 2 Zinc Finger Proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183–212. 51. Krishna SS, Majumdar I & Grishin N V. (2003) Structural classification of zinc fingers. Nucleic Acids Res. 31, 532–550. 52. Singh JK & van Attikum H (2020) DNA double-strand break repair: Putting zinc fingers on the sore spot. Semin. Cell Dev. Biol. 53. Wu X, Bishopric NH, Discher DJ, Murphy BJ & Webster KA (1996) Physical and functional sensitivity of zinc finger transcription factors to redox change. Mol. Cell. Biol. 16, 1035– 1046. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ 54. Laity JH, Lee BM & Wright PE (2001) Zinc finger proteins: New insights into structural and functional diversity. Curr. Opin. Struct. Biol. 11, 39–46. 55. Pannicke U, Hönig M, Schulze I, Rohr J, Heinz GA, Braun S, Janz I, Rump EM, Seidel MG, Matthes-Martin S, Soerensen J, Greil J, Stachel DK, Belohradsky BH, Albert MH, Schulz A, Ehl S, Friedrich W & Schwarz K (2010) The most frequent DCLRE1C (ARTEMIS) mutations are based on homologous recombination events. Hum. Mutat. 31, 197–207. 56. Karim MF, Liu S, Laciak AR, Volk L, Rosenblum M, Lieber MR, Wu M, Curtis R, Huang N, Carr G & Zhu G (2020) Structural analysis of the catalytic domain of Artemis endonuclease/SNM1C reveals distinct structural features. J. Biol. Chem. 444, jbc.RA120.014136. 57. Ussery DW (2002) DNA Structure: A-, B- and Z-DNA Helix Families. Encycl. Life Sci. 58. Elrod-Erickson M, Rould MA, Nekludova L & Pabo CO (1996) Zif268 protein-DNA complex refined at 1.6 Å: A model system for understanding zinc finger-DNA interactions. Structure 4, 1171–1180. 59. Locasale JW, Napoli AA, Chen S, Berman HM & Lawson CL (2009) Signatures of Protein- DNA Recognition in Free DNA Binding Sites. J. Mol. Biol. 386, 1054–1065. 60. Mandel CR, Kaneko S, Zhang H, Gebauer D, Vethantham V, Manley JL & Tong L (2006) Polyadenylation factor CPSF-73 is the pre-mRNA 3’-end-processing endonuclease. Nature 444, 953–956. 61. Poinsignon C, Moshous D, Callebaut I, De Chasseval R, Villey I & De Villartay JP (2004) The Metallo-β-Lactamase/β-CASP Domain of Artemis Constitutes the Catalytic Core for V(D)J Recombination. J. Exp. Med. 199, 315–321. 62. Jekimovs C, Bolderson E, Suraweera A, Adams M, O’Byrne KJ & Richard DJ (2014) Chemotherapeutic compounds targeting the DNA double-strand break repair pathways: The good, the bad, and the promising. Front. Oncol. 4 APR, 1–18. 63. Shibata A & Jeggo P (2019) A historical reflection on our understanding of radiation- induced DNA double strand break repair in somatic mammalian cells; interfacing the past with the present. Int. J. Radiat. Biol. 95, 945–956. 64. Weterings E, Gallegos AC, Dominick LN, Cooke LS, Bartels TN, Vagner J, Matsunaga TO & Mahadevan D (2016) A novel small molecule inhibitor of the DNA repair protein Ku70/80. DNA Repair (Amst). 43, 98–106. 65. Jin MH & Oh DY (2019) ATM in DNA repair in cancer. Pharmacol. Ther. 203, 107391. 66. Lee SY, Brem J, Pettinati I, Claridge TDW, Gileadi O, Schofield CJ & McHugh PJ (2016) Cephalosporins inhibit human metallo β-lactamase fold DNA repair nucleases SNM1A and SNM1B/apollo. Chem. Commun. 52, 6727–6730. 67. Palzkill T (2013) Metallo-β-lactamase structure and function. Ann. N. Y. Acad. Sci. 1277, 91–104. 68. Spraggon G, Koesema E, Scarselli M, Malito E, Biagini M, Norais N, Emolo C, Barocchi MA, Giusti F, Hilleringmann M, Rappuoli R, Lesley S, Covacci A, Masignani V & Ferlenghi I (2010) Supramolecular organization of the repetitive backbone unit of the Streptococcus pneumoniae pilus. PLoS One 5. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ 69. Abbehausen C (2019) Zinc finger domains as therapeutic targets for metal-based compounds - an update. 11. 70. Chen S, Jeng K & Lai MMC (2017) Zinc Finger-Containing Cellular Transcription Corepressor ZBTB25 Promotes Influenza Virus RNA Transcription and Is a Target for Zinc Ejector Drugs. 91, 1–20. 71. Brandt VL & Roth DB (2003) Artemis: Guarding small children and, now, the genome. J. Clin. Invest. 111, 315–316. 72. Vázquez-Torres A (2012) Redox active thiol sensors of oxidative and nitrosative stress. Antioxidants Redox Signal. 17, 1201–1214. 73. Mao Z, Bozzella M, Seluanov A & Gorbunova V (2008) Comparison of nonhomologous end joining and homologous recombination in human cells. DNA Repair (Amst). 7, 1765– 1771. 74. Pettinati I, Brem J, Lee SY, Mchugh PJ & Scho CJ (2016) The Chemical Biology of Human Metallo- b -Lactamase Fold Proteins. Trends Biochem. Sci. 41, 338–355. 75. Cahill ST, Tarhonskaya H, Rydzik AM, Flashman E, McDonough MA, Schofield CJ & Brem J (2016) Use of ferrous iron by metallo-β-lactamases. J. Inorg. Biochem. 163, 185–193. 76. Newman JA, Hewitt L, Rodrigues C, Solovyova A, Harwood CR & Lewis RJ (2011) Unusual, dual endo- and exonuclease activity in the degradosome explained by crystal structure analysis of RNase J1. Structure 19, 1241–1251. 77. Raj R, Nadig S, Patel T & Gopal B (2020) Structural and biochemical characteristics of two Staphylococcus epidermidis RNase J paralogues RNase J1 and RNase J2 . J. Biol. Chem., jbc.RA120.014876. 78. Hamed RB, Gomez-Castellanos R, Henry L, Ducho C, Mcdonough MA & Schofield CJ (2013) The enzymes of B-lactam biosynthesis. Nat. Prod. Rep. 30, 1–204. 79. Antony S & Bayse CA (2013) Density Functional Theory Study of the Attack of Ebselen on a Zinc- Finger Model. Inorg. Chem. 52, 13803–13805. 80. Lee Y, Wang Y, Duh Y, Yuan HS & Lim C (2013) Identification of Labile Zn Sites in Drug- Target Proteins. J. Am. Chem. Soc. 135, 14028–14031. 81. May HC, Yu JJ, N. Guentzel M, Chambers JP, Cap AP & Arulanandam BP (2018) Repurposing auranofin, ebselen, and PX-12 as antimicrobial agents targeting the thioredoxin system. Front. Microbiol. 9, 1–10. 82. Noguchi N (2016) Ebselen, a useful tool for understanding cellular redox biology and a promising drug candidate for use in human diseases. Arch. Biochem. Biophys. 595, 109– 112. 83. Roder C & Thomson MJ (2015) Auranofin: Repurposing an Old Drug for a Golden New Age. Drugs R D 15, 13–20. 84. Jin Z, Du X, Xu Y, Deng Y, Liu M, Zhao Y, Zhang B, Li X, Zhang L, Peng C, Duan Y, Yu J, Wang L, Yang K, Liu F, Jiang R, Yang X, You T, Liu X, Yang X, Bai F, Liu H, Liu X, Guddat LW, Xu W, Xiao G, Qin C, Shi Z, Jiang H, Rao Z & Yang H (2020) Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293. 85. Baddock HT, Brolih S, Yosaatmadja Y, Ratnaweera M, Bielinski M, Swift L, Cruz-Migoni A, .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Morris GM, Schofield CJ, Gileadi O & McHugh PJ (2020) Characterisation of the SARS-CoV- 2 ExoN (nsp14<sup>ExoN</sup>-nsp10) complex: implications for its role in viral genome stability and inhibitor identification. bioRxiv, 2020.08.13.248211. 86. Skinner MD, Lahmek P, Pham H & Aubin HJ (2014) Disulfiram efficacy in the treatment of alcohol dependence: A meta-analysis. PLoS One 9. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ FIGURES Figure 1: Overall architecture of Artemis. A. A cartoon representation of the structure of human SNM1C/ Artemis. The active site containing MBL domain is in pink; the β-CASP domain (white) contains a novel zinc-finger like motif, that has not been identified in other MBL/ β-CASP nucleic acid processing enzymes. The three zinc ions are represented by grey spheres. B. Topology diagram of Artemis protein. The β-strands are represented arrows and α-helices by cylinders. The MBL domain (pink) has the typical α/β-β/α sandwich of the MBL superfamily, with an insert of the β-CASP domain (white) between the small helix α6 and helix α7. C. Overlay of structures of the human SNM1 Family members: SNM1A, SNM1B and SNM1C. D. Cartoon representation of amino acid sequence alignment for human SNM1A, SNM1B and SNM1C, showing the conserved MBL and b- CASP domains. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 2: Active site views of human MBL/ β-CASP nuclease family enzymes. Each of these catalytic sites contains 4 highly conserved motifs (1-4, in red). Motif 1 = Asp, motif 2 = 3 His and 1 Asp (HxHxDH), motif 3 = His and motif 4 = Asp. A. The human SNM1A active site with a single octahedral zinc ion (grey) coordination (PDB: 5AHR). B. The active site of human SNM1B/ Apollo (PDB: 7A1F) with a nickel ion (green) and an iron ion (orange) with a coordinating AMP molecule. C. Human SNM1C/ Artemis (PDB: 7AF1) purified in the absence of IMAC with two zinc ions (in grey) in its active site. A water molecule shared (asterisk*) between the two metals is the proposed nucleophile for the hydrolytic reaction. D. The active site of the human RNA processing enzyme CPSF73 (PDB: 2I7T). The second zinc ion (M2) is coordinated by an additional histidine residue (His 418) which has no counterpart in the SNM1 proteins. E. The active site of human SNM1C/ Artemis (PDB: 6TT5) purified with IMAC. A nickel ion is present in the first metal coordination site. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 3: Comparison of a novel zinc-finger like motif in the β-CASP domain of Artemis with a canonical zinc- finger motif. A. Cartoon representation of the classical Cys2His2 zinc-finger motif, from transcription factor SP1F2 (PDB code: 1SP2). This has a ββα fold, where two Cys- and two His-residues are involved in zinc ion coordination and the sidechains of three conserved hydrophobic residues are shown. B. The β-CASP region of Artemis has a novel zinc-finger like motif. The inset shows the four residues (two His and two Cys) coordinating the zinc ion (grey). The Fo ̶ Fc electron- density map (scaled to 2.5σ in PyMOL) surrounding the zinc ion before it was included in refinement. Figure 4: Overall structure representation of the Artemis /SNM1C fold. A. Overlay of our wt Artemis structure (PDB code: 7AF1) (pink) with that of Karim et. al. PDB code: 6WO0 (aquamarine) (backbone RMSD 0.48 Å). B. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Re-analysis of the latter (6WO0) structure re-refined with a DNA molecule present (PDB code; 7ABS). The 2Fo- Fc map (contoured at 0.6σ in pymol) is represented by grey mesh surrounding the DNA (yellow). Figure 5: Electrostatic surface potentials of DNA bound model for Artemis/SNM1C (A) and Apollo/SNM1B (B). The blue colour represents a more electropositive surface potential and the red show a more electronegative cluster. The active site contains the two metal ions represented in grey sphere for Zinc, orange sphere for Iron and green sphere for Nickel ion. N- and C- terminal of the protein are indicated in red. The electrostatic surface potentials were generated using PyMOL (electrostatic range at +/- 5). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 6: Proposed interactions of DNA with Artemis (PDB: 7ABS). Model for DNA binding to Artemis showing the residues contacting DNA. The two zinc ions at the active site are represented by grey spheres. A. A row of positively charged residue is on the surface of the MBL domain interact with the phosphate backbone of the DNA. B. A DNA overhang is located at the active site. A cluster of polar residues (N205, K207 and K288) is located in the β-CASP domain. The extended flexible loop, which is unique to Artemis compared to SNM1A and SNM1B, that connects b27 and b28 is indicated on the right. C. The DNA overhang forms a hydrogen bond with Arginine 172 and interacts with a cluster of hydrophobic residues at the interface between MBL and β-CASP domains. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 7: Nuclease assay utilising truncated Artemis (aa 3–361) with various DNA substrates. A. The nuclease activity of Artemis is indifferent to the 5’ end group, indicative of true endonuclease activity. Increasing concentrations of Artemis from 0 (NE; no enzyme) to 250 nM incubated with 10 nM ssDNA with either a 5 phosphate, 5 hydroxyl, or 5 biotin moiety for 45 min at 37°C. B. Artemis is able to cleave DNA substrates containing single-stranded regions. Increasing amounts of Artemis incubated with structurally diverse DNA substrates (10 nM) for 45 min at 37oC. Products for A and B were analysed by 20% denaturing PAGE. The DNA substrates utilised are represented at the top of the lanes and a red asterisk indicates the position of the 3 radiolabel. The positions of DNA size markers run as a reference are indicated on the left, with sizes in nt. NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 ssDNA [SNM1C] (nM) dsDNA 5’ overhang 3’ overhang splayed arms leading flap lagging flap replication fork 51 – 41 – 31 – 26 – 21 – digestion products [SNM1C] (nM) 51 – 41 – 31 – 26 – 21 – digestion products NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 ssDNA NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 5’ overhang NE 10 50 100 250 NE 10 50 100 250 NE 10 50 100 250 5’ PHO 3’ 5’ OH 3’ 5’ BIO 3’ 5’ PHO 3’ 22 nt 5’ OH 3’ 22 nt 5’ BIO 3’ 22 nt 5’ PHO 3’ 22 nt 5’ OH 3’ 22 nt 5’ BIO 3’ 22 nt 3’ overhang B A 22 nt 5’ 3’ 3’ 5’ 5’ 3’ 5’ 3’ 3’ 5’ 5’ 3’ 3’ 5’ 22nt 5’ 3’ 3’ 5’ 22nt 22 nt 5’ 3’ 3’ 5’ 5’ 3’ 3’ 5’ 22 nt 22 nt 5’ 3’ 3’ 5’ .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 8: Views from structure of Artemis D37A, H33A and H35D variants. A. An overlay of the four Artemis structures: WT (PDB:7AF1) in pink, D37A variant (PDB:7AFS) in yellow, H33A variant (PDB:7AFU) in cyan and H35D patient mutation (PDB:7AGI) in blue, showing the general architecture of the three variants are the same as the W structure. The nickel ion is represented as green sphere, and the zinc ions as grey spheres. The movement of helix αE in the β-CASP domain is indicated by a red arrow. B. Left: The active site of D37A variant has a single nickel ion with two complexing water molecules (red spheres); Right: the active site residues of WT Artemis (pink), superimposed with those of the D37A variant (yellow). Aside from loss of the second metal in the D37A mutant, there is little movement at the active site. C. Active site residues of the H33A variant (cyan; left) and an overlay (right) with WT Artemis (pink). The two distinguishing features of H33A variant are a lack of metal ions and movement of the loop containing His115. D. The active site of the H35D variant (blue; left) and an overlay (right) with WT Artemis (pink). The H35D point substitution is present in patients with Omen syndrome. This variant lacks both metal ions, similarly to the H33A variant. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 9: Comparing the activity of Artemis variants vs WT protein. Increasing amounts (from 0 to 250 nM) of WT and mutant Artemis proteins (as indicated) were incubated with 10 nM of 51 nucleotide ssDNA substrate for 30 min at 37 °C. Reaction products were subsequently analysed by 20% denaturing PAGE. The size (in nucleotides) of the marker oligonucleotides are indicated on the left-hand side of the corresponding bands. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ Figure 10: Artemis inhibition by β-lactam antibiotics. A. Structures of selected β-lactam antibacterial. B. Inhibitor profiles of β-lactams on the nuclease activity of Artemis was assessed via a real-time fluorescence- based nuclease assay. C. Cartoon representation of the structure of truncated Artemis (aa1-361) with a ceftriaxone molecule (in white) bound at the active site. D. Active site residues with the electron density (Fo-Fc) contoured map around the modelled Ceftriaxone. The map is contoured at the 1.0 σ level and was calculated before the Ceftriaxone molecule was included in the refinement. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ TABLES Table 1: Data collections and refinement statistics. * Data in parentheses is for the high-resolution shell. PDB ID 6TT5 7AF1 (2 Zn) 7AFS (D37A) 7AFU (H33A) 7AGI (H35D) 7APV 7ABS (Ni and Zn) (Ceftriaxone) (DNA bound) Data Collection and processing Diffraction Source DLS (I04) DLS (I24) DLS (I03) DLS (I03) DLS (I03) DLS (I03) APS 17- IDD Wavelength (Å) 0.979 0.976 0.976 0.976 0.976 0.976 1.00 Space group P1 P1 P1 P1 P1 P1 P21212 Cell Dimensions a, b, c (Å) 35.87, 47.99, 48.25 35.75, 47.97,48.15 35.91, 48.06, 48.21 35.97, 47.90, 48.37 35.97, 48.05, 48.44 35.88, 48.10, 48.25 72.81, 111.00, 55.17 α, β, γ (°) 82.61, 76.37, 85.98 82.68, 76.35, 85.81 82.89, 76.43, 86.38 82.51, 75.94, 87.73 82.43,76.01, 87.33 82.76, 76.29, 86.30 90.00, 90.00, 90.00 Resolution (Å) * 47.55 - 1.50 47.53 - 1.70 35.35 - 1.70 35.53 - 1.56 47.62 - 1.70 47.69 - 1.95 40 – 1.97 (1.53 - 1.50) (1.73 - 1.70) (1.73 - 1.70) (1.59 - 1.56) (1.73 - 1.70) (2.00 - 1.95) (2.02 – 1.97) Rmerge (%)* 6.0 (64.4) 13.3 (79.5) 5.3 (53.2) 4.9 (31.5) 4.8 (23.2) 11.6 (65.0) 5.0 (82.8) I/ σ(I) 13.4 (3.2) 5.8 (2.0) 11.7 (2.2) 11.3 (2.6) 13.4 (3.8) 8.6 (2.8) 16 (2.4) Completeness (%) 94.7 (66.4) 97.4 (95.6) 97.3 (96.0) 94.2 (63.7) 97.4 (95.4) 98.0 (97.2) 99.6 (99.6) Multiplicity 3.6 (3.2) 3.5 (3.5) 3.6 (3.7) 3.6 (3.3) 3.7 (3.8) 3.5 (3.6) 6.4 (6.6) Refinements Resolution (Å) 47.55 - 1.50 47.53 - 1.70 47.66 - 1.70 35.53 - 1.56 47.62 - 1.70 47.96 - 1.95 40 - 1.97 No. of reflections 44746 31282 34470 39478 31745 21086 32168 Rwork 0.17 0.19 0.18 0.19 0.19 0.18 0.22 Rfree 0.19 0.21 0.22 0.21 0.21 0.23 0.28 No. of Atom Protein 2927 2989 2957 2985 3122 2975 2923 Water 224 144 163 255 164 107 105 Zinc/ Nickle 3 3 2 1 1 2 3 Ethylene glycol 24 28 12 44 32 16 - DNA - - - - - - 253 Ceftriaxone - ̶ - - - 1 - B-factors 16.3 19.3 25.9 17.5 18.4 22.6 56 r.m.s. deviations Bond length (Å) 0.003 0.008 0.008 0.006 0.007 0.01 0.007 bond angles (°) 1.23 1.33 1.29 1.28 1.32 1.5 1.47 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.423993doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.423993 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_08_425875 ---- A high content lipidomics method using scheduled MRM with variable retention time window and relative dwell time weightage A high content lipidomics method using scheduled MRM with variable retention time window and relative dwell time weightage Akash Kumar Bhaskar 1,2 , Salwa Naushin 1,2 , Arjun Ray 3 , Shalini Pradhan 1 , Khushboo Adlakha 1 , Towfida Jahan Siddiqua 1,4 , Dipankar Malakar 5 , Shantanu Sengupta 1,2* 1 CSIR-Institute of Genomics and Integrative Biology, Mathura Road, New Delhi-110025, India 2 Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India 3 Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla, New Delhi-110020, India 4 Nutrition and Clinical Services Division, International Centre for Diarrheal Disease Research, Dhaka- 1212, Bangladesh 5 SCIEX, 121, Udyog Vihar, Phase IV, Gurgaon-122015, Haryana, India. *Addresses for Correspondence: Shantanu Sengupta CSIR-Institute of Genomics and Integrative Biology, Mathura road, Delhi -110025 Email: shantanus@igib.res.in (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint mailto:shantanus@igib.res.in https://doi.org/10.1101/2021.01.08.425875 Abstract: Lipids are highly diverse group of biomolecules that play a pivotal role in biological processes. Lipid compositions of bio-fluids are complex, reflecting a wide range of concentration of different lipid classes with structural diversity within lipid species. Varying degrees of chemical complexity makes their identification and quantification challenging. Newer methods are thus, highly desired for comprehensive analysis of lipid species including identification of structural isomers. Herein, we propose a targeted- MRM method for large scale high-throughput lipidomics analysis using a combination of variable retention time window (variable-RTW) and relative dwell time weightage (relative-DTW) for different lipid species. With this method, we were able to detect more than 1000 lipid species (encompassing 18 lipid classes), including different structural isomers of triglyceride, diglyceride, and phospholipids, in a single-run of 24 minutes. The limit of detection varied between 0.245 pmol/L and 1 nmol/L for different lipid classes with 245 fmol/L being lowest for phosphatidylethanolamine while it was highest for diacylglycerol (1 nmol/L). Similarly, the limit of quantitation varied from 291 fmol/L to 2 nmol/L. The recovery of the method is in the acceptable range and the 849 of lipid species were found to have a coefficient of variance (CV) <30%. Using this method we demonstrate that lipids with ω-3 and ω-6 fatty acid chains are altered in individuals with vitamin B12 deficiency. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Introduction: Lipid constitutes highly diverse biomolecules which play important role in the normal functioning of the body, maintaining the cellular homeostasis, cell signaling and energy storage 1-5 . Dysregulation of lipid homeostasis is associated with a large number of pathologies such as obesity and diabetes 6,7 , cardiovascular disease 8 , cancer etc. 9,10 . Lipid compositions of bio-fluids are complex, reflecting a wide range of concentrations of different lipid classes and structural diversity within lipid species 11,12 . Although the exact number of distinct lipids present in cells is not exactly known, it is believed that the cellular lipidome consists of more than 1000 different lipid species each with several structural isomers 4,13-15 . Identification of lipids using traditional methods like thin layer chromatography, gas chromatography, etc. are limited by their lower sensitivity and accuracy and hence is not suitable for comprehensive lipidomics studies 16,17 . Recent advances in mass spectrometry (MS) based lipidomics has enabled accurate identification of a large number of lipid species from various biological sources 18,19 . Analysis of lipids in both positive and negative ion modes in a single mass spectrometric scan using untargeted or targeted approach have been used for greater coverage 20,21 . The untargeted lipidomics approach however has some major challenges especially with respect to identification and characterization of the lipid species, time required to process large quantity of raw data and the bias towards the detection of lipids with high- abundance 19,22 . These problems are greatly reduced in a targeted approach using multiple reaction monitoring (MRM), since defined groups of chemically characterized and annotated lipid species are analyzed 22,23 . The use of MRM enables simultaneous identification of around hundred lipid species, including those with low abundance 24,25 . The number of lipid species identified could be further increased by using scheduled MRM, where the MRM transitions are monitored only around the expected retention time of the eluting lipid species 21,26,27 . This enables monitoring of greater number of MRM transitions in a single mass spectrometric acquisition. Using scheduled MRM, Takeda et al and other groups, were able to identify/ quantify hundreds of lipid species including isomers of phospholipids (PLs) and diacylglycerol (DAG) in a single targeted scan 21,28,29 . However, identification of triacylglycerols (TAG) was based on pseudotransitions, as identifying different species of TAG is challenging 21,30,31 . The retention time window chosen in a scheduled MRM is usually of a fixed width. However, as the retention time window width varies for each lipid species, a variable window width for each lipid species could reduce the time necessary to develop high throughput targeted methods. There are a few reports where variable retention time window (dynamic MRM) has been used in various applications, including identifying lipids of a specific class 32-38 . However, none of these studies involved comprehensive lipidome analysis. Further, in these studies, the dwell time for each peak was automatically fixed on the basis of the RT window width chosen. The quality of peaks can be improved by varying dwell time weightage for each transition without (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 compromising with the cycle time. Assigning, a low dwell time weightage to high abundant compounds and high dwell time weightage to less abundant compounds, irrespective of the elution window, could help in accommodating large number of transitions in single run with improved data quality. Here we report a rapid and sensitive targeted lipidomics method using scheduled MRM with variable-retention time width and relative dwell time weightage, that enabled the identification of more than 1000 lipid species including isomers of triglycerides, diglycerides, and phospholipids in a single mass spectrometric scan of 24 minutes. As a demonstration of the applicability of this method we show that vitamin B12 deficiency, leads to alteration of various lipid species which could explain the association of vitamin B12 deficiency with cardio-metabolic diseases previously reported in various studies. To the best of our knowledge this is the largest number of lipid species identified till date in a single experiment. Materials and Methods Chemicals and reagents MS-grade acetonitrile, methanol, water, 2-propanol (IPA) and HPLC-grade dichloromethane (DCM), were purchased from Biosolve (Dieuze, France); ammonium acetate and ethanol were obtained from Merck (Merck & Co. Inc., Kenilworth, NJ, USA). Lipid internal standards used in the study : SM (d18:1-18:1(d9)), TAG (15:0-18:1(d7)- 15:0), DAG (15:0-18:1(d7)), LPC (18:1(d7)), PC (15:0-18:1(d7)), LPE (18:1(d7)), PE (15:0-18:1(d7)), PG (15:0-18:1(d7)), PI (15:0-18:1(d7)), PS (15:0-18:1(d7), PA (15:0- 18:1(d7)) in the form of Splash mix and ceramide (17:0) were purchased from Avanti polar (Alabaster, Alabama, USA). Lipid extraction from human plasma We used a modified Bligh and Dyer method using Dichloromethane/methanol/water (2:2:1 v/v). The study was approved by institutional ethical committee of CSIR-IGIB. Human plasma (10 μL) was mixed with 490 μL of water (in glass tube) and incubated on ice for 10 minutes. Lipid internal standard mixes (5 µL, consisting of splashmix and ceramide) was added to a mixture of methanol (2 mL) and dichloromethane (1 mL); the mixture was vortexed and allowed to incubate for 30 minutes at room temperature. After incubation, 500 μL water and 1 mL dichloromethane was added to the solution and vortexed for 5 seconds. The mixture was centrifuged at 300 g for 10 minutes when there was a phase separation. The lower organic layer was collected into a fresh glass tube. 2 mL dichloromethane was added to remaining mixture in extraction tube and centrifuged again to collect the lower layer. The previous step was repeated one more time. Solvent was evaporated in vacuum dryer at 25 °C and the lipids were resuspended in 100μl of ethanol; vortexed for 5 minutes, sonicated for 10 minutes and again vortexed for 5 minutes. The suspension was transferred to polypropylene auto sampler vials and subjected to LC-MS run. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Liquid chromatography-Mass spectrometry: We used an Exion LC system with a Waters AQUITY UPLC BEH HILIC XBridge Amide column (3.5 µm, 4.6 x 150 mm) for chromatographic separation. The oven temperature was set at 35°C and the auto sampler was set at 4°C. Lipids were separated using buffer A (95% acetonitrile with 10mM ammonium acetate, pH-8.2) and buffer B (50% acetonitrile with 10mM ammonium acetate, pH-8.2) with following gradient: with a flow rate of 0.7 ml/minute, buffer B was increased from 0.1% to 6% in 6 minutes, increased to 25% buffer B in next 4 minutes. In the next 1 minute buffer B was ramped up to 98%, further increased to 100% in the next 2 minutes, and held at the same concentration and flow rate for 30 seconds. Flow rate was increased from 0.7 ml/min to 1.5 ml/min and 100% buffer B was maintained for the next 5.1 minutes. Buffer B was brought to initial 0.1% concentration in 0.1 minute and column was equilibrated at the same concentration and flow for 4.3 minutes before flow rate was brought to initial 0.7 ml/minute in next 30 seconds and maintained at the same till the end of 24 minutes gradient. Additionally the separation system was equilibrated for 3 minute for subsequent runs. Sciex QTRAP 6500+ LC/MS/MS system in low mass range, Turbo source with Electrospray Ionization (ESI) probe was used with the following parameters; curtain gas (CUR): 35 psi, temperature (TEM): 500 degree, source gas 1(GS1): 50 and source gas 2 (GS2): 60 psi, ionization voltage (IS): 5500 for positive mode and IS: -4500 for negative mode, target scan time: 0.5 sec, scan speed: 10 Da/s, settling time: 5.0000 msec and MR pause: 5.0070 msec. Acquisition was done using Analyst 1.6.3 software. Method development: For identification and relative quantification of all the lipid species, theoretical MRM library were generated using LIPIDMAPS (https://www.lipidmaps.org/). Using internal standards from different lipid classes, the MRM parameters (collision energy, declusturing potential, cell exit potential, and entrance potential) were optimized for 1224 lipid species which belonged to 18 lipid classes - sphingomyelin (SM), ceramide (Cer), cholesterol ester (CE), Monoacylglycerol (MAG), diacylglycerol (DAG), Triacylglycerol (TAG), lysophosphatidic acid (LPA), phosphotidic acid (PA), lysophosphatidylcholine (LPC), phosphatidylcholine (PC), lysophosphatidylethanolamine (LPE), phosphatidylethanolamine (PE), lysophosphatidylinositol (LPI), phosphatidylinositol (PI), lysophosphatidylglycerol (LPG), phosphatidylglycerol (PG), lysophosphatidylserine, and (LPS), phosphatidylserine (PS) (supplementary table- 1). The MRM library consisted of 1236 transitions including 12 internal standards, of which 611 species were identified in positive mode (SM, CE, Cer, TAG, DAG, MAG) and 625 identified in negative mode (Phospholipids and lysophospholipids). The current MRM panel covers major lipid classes and categories having fatty acids with 14-22 carbons and 0-6 double bonds per fatty acyl chain. Transitions were distributed into multiple unscheduled MRM method and the relative retention time of each transition was determined with respect to their respective internal standards through Amide-HILIC column. Furthermore, the retention time validation was done by performing MS/MS (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 experiment using Information dependent acquisition (IDA) with enhanced product ion scan (EPI) of specific ions in unscheduled MRM for each lipid class. MS/MS analysis in EPI mode was based on the conventional triple quadrupole ion path property of an ion- trap for the third quadrupole. The basic parameters were kept the same as mentioned in MRM experiment. MS/MS spectra were compared with MS/MS information from LIPID MAPS (http://www.lipidmaps.org/) to verify the structures of the putative lipid species and predicting the structure from MS/MS spectra based on specific cleavage rules for lipids. Retention time window and Dwell time weightage Using sMRM Builder (https:// https://sciex.com/), an Excel based tool from Sciex, the variable retention time window and variable dwell time weightage for all transitions were optimized. The principle on which the tool works is based on the width and intensity of the chromatographic peak. With variable retention time window width, each MRM transition can have its own RT window. Wider windows are assigned to analytes that show higher run to run variation or have broader peak widths. Variable dwell times were assigned to improve the signal to noise ratio of MRM transitions based on the abundance of the analyte in the sample- higher dwell time weightage assigned for analytes with low abundance (supplementary table 1). Dwell time for each species were assigned based on this weight which maintains the cycle time and optimizes the signal to noise ratio for low abundant peaks. Detailed for optimized parameters is given in supplementary table 1. Limit of Detection and Quantitation: The limits of detection and quantitation were derived from peak area of known amounts of lipid internal standards added to lipid extract from human plasma (matrix): The master mix of lipid internal standards was prepared from splashmix and ceramide (17:0) having following concentrations: SM (41.86 nmol), Cer (24.91 nmol), TAG (70.59 nmol), DAG (15.99 nmol), LPC (48.23 nmol), PC (213.38 μmol), LPE (10.89 nmol), PE (8.02 nmol), PG (38.09 nmol), PI (5.40 nmol), PS (10.74 nmol), PA (10.73 nmol). Limit of Blank- was defined as the average (based on triplicate experiments) signal found only in matrix (without internal standards; blank). LoB was calculated using mean and standard deviation from plasma matrix: LoB = mean blank + 1.645(SD blank) 39 The raw analytical signal obtained for standards from plasma lipid extract (spiked with standards) was used to estimate the LoD and LoQ, using the following formula: LoD = mean blank + 3(Standard Deviation blank) 40 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint http://www.lipidmaps.org/ https://doi.org/10.1101/2021.01.08.425875 LoQ = mean blank + 10(Standard Deviation blank) 40 The standard solution was diluted serially with matrix and the lipid standards were run in the following concentration ranges: 319.39 fmol- 41.86 nmol for SM, 190.06 fmol- 24.91 nmol for Cer, 538.53 fmol-70.59 nmol for TAG, 121.97 fmol- 15.99 nmol for DAG, 367.9633086 fmol- 48.23 nmol for LPC, 1.63 pmol- 213.38 μmol for PC, 83.09 fmol- 10.89 nmol for LPE, 61.16 fmol- 8.02 nmol for PE, 290.59 fmol- 38.09 nmol for PG, 41.24 fmol- 5.40 nmol for PI, 81.96 fmol- 10.74 nmol for PS, 81.83 fmol- 10.73 nmol for PA. The lowest concentration which has signal more than the estimated method limits (based on above formula) was considered as LoD and LoQ. The mean and standard deviation was calculated from 3 replicates. Linearity was represented by R 2 , where LoQ was taken as the lowest calibrator concentration for each lipid standards. Spike and recovery and coefficient of variance: Extraction recovery for the method was measured by comparing the peak area of matrix extract spiked with standards before and after extraction. For this, 5uL of lipid internal standard mix (standard mix: lipid extract resuspension volume :: 1:20 v/v) was used. The percentage recovery and relative standard deviation was calculated from 3 biological replicates. Relative recovery = Mean area of extracted sample with spiked standard before extraction/ Mean area of extracted sample with spiked standard after extraction 41 %Relative Standard Deviation = Standard Deviation /Mean analytical signal × 100 Coefficient of variance (CV) of the method was determined by observing individual lipid species variation within batch. The intra-batch variation was assessed by analyzing 5 technical replicates of lipids extracted from plasma. CV values were only calculated for those lipid species which has carry over less than 20% and present in at least 3 replicates 42 . Inter day variability for each lipid species was determined by analyzing lipids on 3 different days from a stock of pooled plasma. The CV values were reported for 3 different days (n=5, technical replicates) after sum-normalization within lipid class. Percentage CV = standard deviation/average intensity ×100 Alteration of plasma lipids due to vitamin B12 deficiency: Study population: The study (which was a part of a larger study), was designed to identify plasma lipids that were altered due to vitamin B12 deficiency. Apparently healthy individuals were classified in two groups based on their plasma vitamin B12 levels. An informed consent was obtained from the participants. The study was approved by institutional ethical committee of CSIR-IGIB. Individuals with vitamin B12 values less than 150 pg/ml, were considered to be vitamin B12 deficient and those with levels between 400-800 pg/ml were considered be in the normal range. Lipids from plasma were extracted as described above. For this study, plasma of 95 individuals ( 48 with B12 deficiency and 47 with normal plasma vitamin B12 levels) were used. Lipids (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 that had a CV<30% and that were altered by more than 1.3 folds with p<0.05 were considered to be significantly altered between the two groups. Data analysis: The .wiff files for relative quantitation were processed in MultiQuant 3.0.2 and for the identification of different lipid species; MS/MS spectrum matching with the structure of putative lipid species using .mol file was done using Peakview 2.0.1. Statistical analysis was done using Excel. Figures were drawn using MATLAB (MATLAB, 2010. version 7.10.0 (R2010a), Natick, Massachusetts: The MathWorks Inc.), Raw graph (https://rawgraphs.io) and GraphPad Prism version 6.0. Results: We developed a scheduled-MRM method that can identify more than 1000 lipid species in a single mass spectrometric acquisition using a combination of variable-RTW and relative-DTW for each lipid species along with an optimized LC-gradient. Initially, we generated a theoretical MRM library using LIPIDMAPS (http://www.lipidmaps.org/) which consisted of 1224 lipid species and 12 internal standards, belonging to the 18 lipid classes. The total ion chromatogram is shown in figure 1a. The 18 classes of lipids were analyzed in the positive or negative ion modes. In the positive ion mode, the M+H precursor ions were used for SM, Cer, CE, while for neutral lipids (TAG, DAG, and MAG) [M+NH4] precursor ions were considered. Phospholipids (PL's) were identified in negative ion mode, forming [M-H] precursor ion except LPC’s and PC’s, for which [M+CH3COO]- were considered. The variable-RTW and relative-DTW for different species was determined based on the intensity and width of the peaks obtained for each lipid species. For instance, in positive ion mode, SM (18:1) had a broader elution window (36.1 seconds) compared to CE (24:0) (32.5 seconds), but the signal intensity of CE (24:0) was lower as compared to SM (18:1). Thus, to collect sufficient number of data points, higher dwell time weight of 3.01 was applied for CE (24:0) as compared to 1.00 for SM (18:1) (figure 1b and1c). Furthermore, LPC (20:4) and LPE (22:5), had the same elution window of 40.2 seconds but a dwell time weightage of 1 was applied for LPC (20:4) as compared to 1.15 for LPE (22:5; figure 1d and 1e). A complete list of all parameters for each lipid species along with retention window and dwell weightage is given in supplementary table 1. Identification of isomers within lipid classes In an attempt to identify different lipid isomers, we used customized-approaches for various lipid classes. For TAGs, instead of using pseudo-transitions, we identified different TAG species on the basis of sn-position by selecting a unique parent ion/ daughter ion (Q1/Q3) combination, which is based on neutral loss of one of the sn- position fatty acyl chain (RCOOH) and NH3 from parent ion [M+NH4]+. For instance, the parent ion (Q1) for TAG 52:6 is 868.8 while the product ion (Q3) was derived from the remaining mass of TAG after loss of fatty acid present at one of the sn-position like m/z (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint http://www.lipidmaps.org/ https://doi.org/10.1101/2021.01.08.425875 595.5 for TAG (52:6/FA16:0) as shown in figure 2. Using this approach we found 9 TAG species for TAG (52:6) based on composition of fatty acid present at one of the sn- position (figure 2a). Furthermore, MS/MS through EPI scan confirmed six of the 9 isomers of TAG 52:6 unambiguously (supplementary figure 1). The MRM library used, consists of 445 TAG species which belongs to 96 different categories of TAG based on total chain length and unsaturation. Further validation of Q3 in MS/MS experiment through IDA-EPI scan confirmed the structural characterization of Q3 ion with MS/MS spectrum for 349 putative TAG species. Using this method, we were able to identify total of 415 TAG species from 90 different categories of TAG (figure 3a). Among these 90 TAG’s, we found TAG (52:3) was the most abundant form in human plasma (figure 3b and supplementary table 2). We identified 11 isomers of TAG (52:3) among which TAG (52:3/FA16:0) was the most abundant in human plasma (figure 3b and supplementary table 3). For phospholipids (PC, PE, PG, PS, PI, and PA), instead of the conventional method of using the head group loss in positive ion mode (e.g.: PC-38:4, 868.607/184.4), we used a modified approach using negative ion mode via the loss of fatty acid to identify the phospholipids at the fatty acid composition level. Using this approach, we were able to identify isomers of phospholipids within a class, like PC16:0‒22:5, PC 18:0‒20:5, PC 18:1‒20:4 and PC 18:2‒20:3 for PC 38:5 (supplementary figure 2a). Further, EPI scan for MSMS confirmed the fragmented daughter ions for the identification of three PC (38:5) isomers (supplementary figure 2b,2c and 2d). From the analysis of 455 phospholipids belonging to 6 phospholipid classes (PC, PE, PG, PI, PS, and PA) in the library, we were able to identify 385 phospholipid species. Among them, phospholipid (PC, PE, PG, PI, PS, PA) with chain length 36 with 2 unsaturation had the highest abundance (figure 3c and supplementary table 2). Within PLs, PC 34:2 has highest abundance (supplementary figure 3 and supplementary table 4). We observed three isomers of PC 34:2, among which PC (16:0/18:2) was the most abundant (figure 3d (supplementary table 3). EPI scan confirms the MS/MS identification of 182 PL,s. We were also able to identify isomers of DAG (e.g. DAG 16:1/20:2 and DAG 18:1/18:2) (supplementary table 3). A list of all lipid species with their isomers and abundance in terms of area under the chromatogram is given in (supplementary table 3). Further validation of retention time through EPI confirm the MS/MS spectrum matching with putative lipid structure for other lipid classes. Method validation: Limit of blank (LoB), limit of detection (LoD), limit of quantitation (LoQ), and linear range. The raw analytical signal in blank was considered for establishing the LoB, which was determined from area under the chromatogram for the selected transition of each lipid standards (supplementary table 5). The LoD and LoQ were obtained from the raw analytical signal (area under the chromatogram) obtained by progressively diluting the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 lipid standards. The LoD and LoQ were based on the average values obtained in 3 replicates, reflecting inter day variability as mentioned in the materials and methods section. A representative graph of LoD and LoQ for SM (positive mode) and PC (negative mode) is shown in figure 4a and 4b, while the values of LoD and LoQ for all the species are provided in table 1. The LoDs for all lipid classes were in range of 0.245 pmol/L – 41.961 pmol/L except DAG which was 1 nmol/L. Detection limit for SM, LPC, PE, and PG were found to be in femtomolar range, while the rest were in picomolar range. The lowest LoQ was detected for PG- 0.291 pmol/L and highest for DAG- 2 nmol/L. The linearity of the method was checked by defining the relationship between raw values of analytical signal for each lipid standard and its concentration in presence of matrix (plasma). The linear range was determined by checking the performance limit from LoQ to the highest end of the concentration; based on the coefficient of determination (R 2 ) value. Spike and recovery and coefficient of variation To determine the percent recovery of all the lipid species, a known amount of lipid standards, were added to plasma (matrix) before or after (spike) extraction of the lipids from the plasma. The raw area signals obtained from these two conditions were compared to determine the percentage recovery. These experiments were performed on three different days and the average percent recovery of the lipid standards was determined (figure 5a and supplementary table 6). To determine the coefficient of variation of all the lipid species, we extracted lipids from plasma pooled from 5 individuals. For intra batch variations, the same sample was subjected to mass spectrometric analysis 5 times. The coefficient of variation was calculated after sum normalization of raw values obtained within each class. To obtain the inter day variability; lipids were extracted from the same sample on 3 different days. A total of 1018, 952, 986 lipid species were detected on day 1, day 2, and day 3 respectively. The median CV of all the identified lipids on three different days was 15.1%, 15.5%, and 14.7% respectively. On day 1 out of 1018 lipid species, we observed 809 lipid species with CV below 30%, whereof 259 of the lipid species has below 10% CV and 665 have less than 20% CV (figure 5b). We observed 737 and 773 lipids species on day 2 and day 3 respectively, have less than 30% CV. Among all three days, 410 phospholipids had CV<30. The TAGs are a large class and 413 species were measurable in plasma with CV falling below 30%. In lysoPL's, LPC and LPE were detected consistently on different days. But other lysoPL's were very low abundant and therefore had much lower reproducibility. Total of 42 lysophospholipids out of 86 had CV<30%. In total we identified 849 lipid species with CV<30% in either of the three days, out of which 586 lipid species has been consistently detected in all days with CV<30%. The detailed table of % CV for individual lipid species observed on 3 different days is given in supplementary table 7. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Lipidomics study in normal and vitamin B12 deficient human plasma- Vitamin B12, is a micronutrient mainly sourced from animal products, deficiency of which has been reported to result in lipid imbalance. Using this method, we attempted to identify lipid species that are altered due to vitamin B12 deficiency. There was no significant alteration in any of the lipid classes when taken as a whole between the two groups (supplementary table 8). However, when individual lipid species within the classes were compared, we found that, lipid species containing one of the types of omega 3 fatty acid (FA 20:5) was significantly low in plasma of vitamin B12 deficient individuals (figure 6a). In total 6 lipid species containing 20:5 fatty acids were down regulated significantly, two of TAG and PC, one each from PE and PA. Additionally, lipid species containing a omega 6 fatty acid (FA 18:2) were significantly high in vitamin B12 deficient condition (figure 6b, supplementary table 9). These results hint at the possibility of lower ω-3: ω-6 ratio in vitamin B12 deficient individuals. Discussion: Lipids in general are known to be associated with the pathogenesis of various complex diseases 10 . However, the exact role played by each lipid species has not been studied in detail majorly due to the limitation in identifying individual lipid species in large scale studies. We report a single extraction, targeted mass spectrometric method using Amide-HILIC-chromatography and scheduled MRM with variable-RTW and relative- DTW which detects more than 1000 lipid species from 18 lipid classes including various isomers in a single run of 24 minutes. With this method, which covers most of the lipid species which are present in human plasma with 14-22 carbons atoms and 0-6 double bonds in fatty acid chain, we could identify considerably higher number of lipid species than those reported in previous large-scale lipidomics studies 14,21,43-45 . In this method, the MRM transitions were monitored in a particular time segment, rather than performing scans for all the lipid species during the entire run. This strategy reduces the time required for identification of the multiple transitions. We improved the coverage by additionally optimizing the assigned dwell time weightage for each lipid species, which is required especially for medium and low abundant lipid species. The dwell time for each lipid species was customized and the dwell weightage was optimized based on lipid species abundance without affecting the target scan time in each cycle. This improved peak quality with good reproducibility. Current methods for large-scale lipid analysis can only identify the lipid classes and fatty acid chains but the structure specificity of lipid analysis is critical for studying the biological function of lipids. Finding the composition of fatty acyl chain with respect to sn-position is a major limitation in large scale lipidomics studies 21,30 . Recently using a combination of photochemical reaction (Ozone-induced dissociation and ultraviolet photodissociation) with tandem MS, Cao et al. reported the identification of isomers for TAGs and PLs on the basis of sn-position and carbon-carbon double bond (C=C) 46 . Their identification also revealed the sequential loss of different fatty acyl chain based on sn-position, disclosing identification of different positional isomers 46 . However, a single step identification of TAG isomers in large scale studies remains a challenge due (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 to the three fatty acyl chains with glycerol backbone, bearing no easily ionizable moiety 21,30 . We have focused on identification of structural isomers based on sn-position using LC-MS platform, without adding extra step to burden the analysis time and effort. We were able to detect structural isomers with respect to fatty acyl chain at sn-position where the neutral loss of one of the sn-position fatty acyl chain (RCOOH) and NH3 from parent ion (M+NH4+) makes their detection possible. Detection was purely based on assigning a unique combination of Q1/Q3 for structural isomer of TAG species (figure 2a); however, one of the limitations of this method is the inability to assign fatty acyl group (sn1, sn2, or sn3) to their respective sn-position. Hence, the three fatty acyl chains are represented by the adding the number of carbon atoms and unsaturation level (e.g., TAG (52:6) and the identified fatty acid at one of the sn-position (e.g., FA- 14:0) is represented by TAG (52:6/FA14:0). The optimized Q3 for phospholipids including PC, PE, PG, PI, PS, and PA was derived from neutral loss of their fatty acid side chains in the negative ion mode (e.g., PC 16:0‒22:4, PC 18:0‒20:4, PC 18:1‒20:3 and PC 18:2‒20:2 for PC 38:4) (figure 3d). In total we were able to identify 415 TAG species which belong to 90 different fatty acyl compositions (sn1+sn2+sn3) and 385 PL species based on fatty acyl compositions (sn1+sn2) (supplementary table 3). The LoD for various lipid species in our method was between 0.245 fmol/L – 41.96 pmol/L which was better than or similar to previously reported LoD utilizing different LC- MS platforms 21,27,31,44,45 and similar to a previously reported large scale lipidomics method using supercritical fluid-scheduled MRM (5‒1,000 fmol/L) 21 . The LoQ in previously reported methods were in between nmol to umol/L range while we have observed much lower LoQs (0.291 pmol/L to 167.84 pmol/L) 21 . Apart from this, the calculation of limits was based on mean raw analytical signal and SD which gives better idea about the method, without any false detection hope (or lower detection limits). In our method, DAG has highest LoD and LoQ of 1 nmol/L and 2nmol/L respectively, which was still lower as compared to the previously reported methods for targeted analysis 21 . The linearity of our method was found to be comparable to previous lipidomics methods 21,27,45 . The recovery of lipid species in our method was in the range of 69.75 % - 113.19 %, except DAG - 137.5%, which were within the generally accepted range for quantification and is comparable with other lipidomics studies 21,27 . Although, DAG class is not frequently quantified in other published papers, while we observed comparatively higher recovery because the concentration present in the lipid standard mix (1:20 of lipid standard master mix) used for recovery test doesn’t fall in the linear range 47 . A major challenge in lipidomics experiments have been the high variability in the signals and even the “shared reference materials harmonize lipidomics across MS-based detection platforms and laboratories” have shown that most lipid species showed large variability (CV) between 30% to 80% 48 . However variability for endogenous lipid species that were normalized to the corresponding stable isotope-labelled analogue were lower than 30% 43,48 . In this method, we used sum normalization (although we are not addressing batch effect in this study) and found that 849 lipid species had a CV <30% 43 . Overall, the median CV of our method (15.1%, 15.5%, and 14.7%), similar to or better than the previous reports 21,27,31 , we have also reported species-specific CV. It should be (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 noted that most of the large scale lipidomics studies previously done reports the median or average CV of the method but not the species-specific CV 14,21,27,31 . Lipidomics study in normal and vitamin B12 deficient human plasma- Using the method developed we identified lipid species that are altered in individuals with vitamin B12 deficiency. Vitamin B12 is a cofactor of methyl malonyl CoA mutase and controls the transfer of long-chain fatty acyl-CoA into the mitochondria 49 . Deficiency of vitamin B12 results in accumulation of methylmalonyl CoA increasing lipogenesis via inhibition of beta-oxidation. In the last decade, several studies revealed that vitamin B12 deficiency causes alteration in the lipid profile through changes in lipid metabolism, either by modulating their synthesis or its transport 50 . In particular, the effects of vitamin B12 on omega 3 fatty acid and phospholipid metabolism have received much attention. Khaire A et al., found that vitamin B12 deficiency increased cholesterol levels but reduced docosahexaenoic acid (DHA-omega 3) 51 . An imbalance in maternal micronutrients (folic acid, vitamin B12) in Wistar rats increased maternal oxidative stress, decreases placental and pup brain DHA levels, and decreases placental global methylation levels 52,53 . Although various studies have shown that B12 deficiency results in adverse lipid profile as well as pathophysiological changes linked to CAD, type 2 diabetes mellitus and atherosclerosis, very few studies have independently investigated the effect of vitamin B12 status on changes in human plasma lipid among apparently healthy population 54-56 . Importantly the lipid species that are altered because of the vitamin deficiency are still not yet well understood. To our knowledge, this is the first study to identify lipids with a significantly decreased ω-3 fatty acid (20:5) chains and increased ω-6 (18:2) chains, which might alter/increased ω-6 to ω-3 fatty acid ratio in human plasma in relation to vitamin B12 deficiency and may promote development of many chronic diseases. Notably this study for the first time in humans demonstrated that vitamin B12 deficiency may induce lower level of synthesis or a higher rate of degradation of lipid species containing omega 3 fatty acid (FA 20:5). Most importantly we found that although there was no significant alteration in the lipid classes, individual lipid species varied in vitamin B12 deficient individuals clearly demonstrating the utility of identifying lipid species. The application of scheduled MRM with variable-RTW and relative-DTW enabled large- scale quantification of lipid species in a single-run as compared to unscheduled/scheduled/dynamic MRM. With this combinatorial approach, we were able to detect more than 1000 lipid species in plasma, including isomers of TAG, DAG and PL's. Additionally we validated the retention time through MSMS analysis in IDA-EPI scan mode by matching fragmented daughter ion from MSMS spectrum to putative lipid species structure. It should be noted that the MRMs currently used were specific for (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 plasma and may not be ideal for other biological systems. Therefore, for developing a separate MRM panel may be required for each system. To the best of our knowledge this is the largest number of lipid species identified till date in a single experiment. A comprehensive identification of structural isomers in large-scale lipid method proves to be critical for studying the important biological functions of lipids. Acknowledgement The authors would like to thank Dr. Mainak Dutta from BITS Dubai, Mrs. Akanksha Singh and Dr. Christei Hunter of Sciex for their invaluable inputs and suggestions in shaping this study. Akash Kumar Bhaskar and Salwa Naushin would like to thank CSIR for their fellowship. References 1 Smilowitz, J. T. et al. Nutritional lipidomics: molecular metabolism, analytics, and diagnostics. Molecular nutrition & food research 57, 1319-1335 (2013). 2 Muro, E., Atilla-Gokcumen, G. E. & Eggert, U. S. Lipids in cell biology: how can we understand them better? Molecular biology of the cell 25, 1819-1823 (2014). 3 Yáñez-Mó, M. et al. Biological properties of extracellular vesicles and their physiological functions. Journal of extracellular vesicles 4, 27066 (2015). 4 Van Meer, G., Voelker, D. R. & Feigenson, G. W. Membrane lipids: where they are and how they behave. Nature reviews Molecular cell biology 9, 112-124 (2008). 5 Glomset, J. A. Protein-lipid interactions on the surfaces of cell membranes. Curr. Opin. Struct. Biol 9, 425-427 (1999). 6 Ye, R., Onodera, T. & Scherer, P. E. Lipotoxicity and β cell maintenance in obesity and type 2 diabetes. Journal of the Endocrine Society 3, 617-631 (2019). 7 Fu, S. et al. Aberrant lipid metabolism disrupts calcium homeostasis causing liver endoplasmic reticulum stress in obesity. Nature 473, 528-531 (2011). 8 Yang, M., Zhang, Y. & Ren, J. Autophagic regulation of lipid homeostasis in cardiometabolic syndrome. Frontiers in cardiovascular medicine 5, 38 (2018). 9 Beloribi-Djefaflia, S., Vasseur, S. & Guillaumond, F. Lipid metabolic reprogramming in cancer cells. Oncogenesis 5, e189-e189 (2016). 10 Wymann, M. P. & Schneiter, R. Lipid signalling in disease. Nature reviews Molecular cell biology 9, 162-176 (2008). 11 Quehenberger, O. & Dennis, E. A. The human plasma lipidome. New England Journal of Medicine 365, 1812-1823 (2011). 12 Shevchenko, A. & Simons, K. Lipidomics: coming to grips with lipid diversity. Nature reviews Molecular cell biology 11, 593-598 (2010). 13 Sud, M. et al. Lmsd: Lipid maps structure database. Nucleic acids research 35, D527-D532 (2007). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 14 Pradas, I. et al. Lipidomics reveals a tissue-specific fingerprint. Frontiers in physiology 9, 1165 (2018). 15 van Meer, G. Cellular lipidomics. The EMBO journal 24, 3159-3165 (2005). 16 Brügger, B., Erben, G., Sandhoff, R., Wieland, F. T. & Lehmann, W. D. Quantitative analysis of biological membrane lipids at the low picomole level by nano-electrospray ionization tandem mass spectrometry. Proceedings of the National Academy of Sciences 94, 2339-2344 (1997). 17 Wu, Z., Shon, J. C. & Liu, K.-H. Mass spectrometry-based lipidomics and its application to biomedical research. Journal of lifestyle medicine 4, 17 (2014). 18 Wenk, M. R. The emerging field of lipidomics. Nature reviews Drug discovery 4, 594-610 (2005). 19 Han, X. & Gross, R. W. Global analyses of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry a bridge to lipidomics. Journal of lipid research 44, 1071-1079 (2003). 20 Kirkwood, J. S., Maier, C. & Stevens, J. F. Simultaneous, untargeted metabolic profiling of polar and nonpolar metabolites by LC‐Q‐TOF Mass Spectrometry. Current protocols in toxicology 56, 4.39. 31-34.39. 12 (2013). 21 Takeda, H. et al. Widely-targeted quantitative lipidomics method by supercritical fluid chromatography triple quadrupole mass spectrometry. Journal of lipid research 59, 1283-1293 (2018). 22 Contrepois, K. et al. Cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. Scientific reports 8, 1-9 (2018). 23 Khan, M. J. et al. Evaluating a targeted multiple reaction monitoring approach to global untargeted lipidomic analyses of human plasma. Rapid Communications in Mass Spectrometry 34, e8911 (2020). 24 Dekker, B. Reduce complexity by choosing your reactions. Nature Methods 12, 16-16 (2015). 25 Mao, C. et al. Cloning and Characterization of a Mouse Endoplasmic Reticulum Alkaline Ceramidase AN ENZYME THAT PREFERENTIALLY REGULATES METABOLISM OF VERY LONG CHAIN CERAMIDES. Journal of Biological Chemistry 278, 31184-31191 (2003). 26 Song, J. et al. A highly efficient, high-throughput lipidomics platform for the quantitative detection of eicosanoids in human whole blood. Analytical biochemistry 433, 181-188 (2013). 27 Weir, J. M. et al. Plasma lipid profiling in a large population-based cohort. J Lipid Res 54, 2898- 2908, doi:10.1194/jlr.P035808 (2013). 28 Zhang, W. et al. Online photochemical derivatization enables comprehensive mass spectrometric analysis of unsaturated phospholipid isomers. Nature communications 10, 1-9 (2019). 29 Thomas, M. C., Mitchell, T. W. & Blanksby, S. J. Ozonolysis of phospholipid double bonds during electrospray ionization: A new tool for structure determination. Journal of the American Chemical Society 128, 58-59 (2006). 30 Baba, T., Campbell, J. L., Le Blanc, J. Y. & Baker, P. R. Structural identification of triacylglycerol isomers using electron impact excitation of ions from organics (EIEIO). Journal of lipid research 57, 2015-2027 (2016). 31 Tabassum, R. et al. Genetic architecture of human plasma lipidome and its link to cardiovascular disease. Nature communications 10, 1-14 (2019). 32 Li, J. et al. Large-scaled human serum sphingolipid profiling by using reversed-phase liquid chromatography coupled with dynamic multiple reaction monitoring of mass spectrometry: method development and application in hepatocellular carcinoma. Journal of chromatography A 1320, 103-110 (2013). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 33 Liang, J. et al. A dynamic multiple reaction monitoring method for the multiple components quantification of complex traditional Chinese medicine preparations: Niuhuang Shangqing pill as an example. Journal of Chromatography a 1294, 58-69 (2013). 34 Rao, Z. et al. Development of a dynamic multiple reaction monitoring method for determination of digoxin and six active components of Ginkgo biloba leaf extract in rat plasma. Journal of Chromatography B 959, 27-35 (2014). 35 Shah, I., Petroczi, A., Uvacsek, M., Ránky, M. & Naughton, D. P. Hair-based rapid analyses for multiple drugs in forensics and doping: application of dynamic multiple reaction monitoring with LC-MS/MS. Chemistry Central Journal 8, 73 (2014). 36 Andrade, G. et al. Liquid chromatography–electrospray ionization tandem mass spectrometry and dynamic multiple reaction monitoring method for determining multiple pesticide residues in tomato. Food chemistry 175, 57-65 (2015). 37 Jia, Z.-X., Zhang, J.-L., Shen, C.-P. & Ma, L. Profile and quantification of human stratum corneum ceramides by normal-phase liquid chromatography coupled with dynamic multiple reaction monitoring of mass spectrometry: development of targeted lipidomic method and application to human stratum corneum of different age groups. Analytical and bioanalytical chemistry 408, 6623-6636 (2016). 38 Xu, G., Amicucci, M. J., Cheng, Z., Galermo, A. G. & Lebrilla, C. B. Revisiting monosaccharide analysis–quantitation of a comprehensive set of monosaccharides using dynamic multiple reaction monitoring. Analyst 143, 200-207 (2018). 39 Armbruster, D. A. & Pry, T. Limit of blank, limit of detection and limit of quantitation. The clinical biochemist reviews 29, S49 (2008). 40 Armbruster, D. A., Tillman, M. D. & Hubbs, L. M. Limit of detection (LQD)/limit of quantitation (LOQ): comparison of the empirical and the statistical methods exemplified with GC-MS assays of abused drugs. Clinical chemistry 40, 1233-1238 (1994). 41 Rower, J. E., Bushman, L. R., Hammond, K. P., Kadam, R. S. & Aquilante, C. L. Validation of an LC/MS method for the determination of gemfibrozil in human plasma and its application to a pharmacokinetic study. Biomedical Chromatography 24, 1300-1308 (2010). 42 van Amsterdam, P. et al. The European Bioanalysis Forum community’s evaluation, interpretation and implementation of the European Medicines Agency guideline on Bioanalytical Method Validation. Bioanalysis 5, 645-659 (2013). 43 Medina, J. et al. Single-Step Extraction Coupled with Targeted HILIC-MS/MS Approach for Comprehensive Analysis of Human Plasma Lipidome and Polar Metabolome. Metabolites 10, 495 (2020). 44 Schoeny, H. et al. Preparative supercritical fluid chromatography for lipid class fractionation—a novel strategy in high-resolution mass spectrometry based lipidomics. Analytical and bioanalytical chemistry, 1-10 (2020). 45 Rampler, E. et al. Simultaneous non-polar and polar lipid analysis by on-line combination of HILIC, RP and high resolution MS. Analyst 143, 1250-1258 (2018). 46 Cao, W. et al. Large-scale lipid analysis with C= C location and sn-position isomer resolving power. Nature communications 11, 1-11 (2020). 47 Wolrab, D., Chocholoušková, M., Jirásko, R., Peterka, O. & Holčapek, M. Validation of lipidomic analysis of human plasma and serum by supercritical fluid chromatography–mass spectrometry and hydrophilic interaction liquid chromatography–mass spectrometry. Analytical and Bioanalytical Chemistry, 1-14 (2020). 48 Triebl, A. et al. Shared reference materials harmonize lipidomics across MS-based detection platforms and laboratories. Journal of lipid research 61, 105-115 (2020). 49 Green, R. et al. Vitamin B 12 deficiency. Nature reviews Disease primers 3, 1-20 (2017). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 50 Saraswathy, K. N., Joshi, S., Yadav, S. & Garg, P. R. Metabolic distress in lipid & one carbon metabolic pathway through low vitamin B-12: a population based study from North India. Lipids in health and disease 17, 96 (2018). 51 Khaire, A., Rathod, R., Kale, A. & Joshi, S. Vitamin B12 and omega-3 fatty acids together regulate lipid metabolism in Wistar rats. Prostaglandins, Leukotrienes and Essential Fatty Acids 99, 7-17 (2015). 52 Kulkarni, A. et al. Effects of altered maternal folic acid, vitamin B 12 and docosahexaenoic acid on placental global DNA methylation patterns in Wistar rats. PLoS One 6, e17706 (2011). 53 Roy, S. et al. Maternal micronutrients (folic acid and vitamin B12) and omega 3 fatty acids: implications for neurodevelopmental risk in the rat offspring. Brain and Development 34, 64-71 (2012). 54 Adaikalakoteswari, A. et al. Vitamin B12 deficiency is associated with adverse lipid profile in Europeans and Indians with type 2 diabetes. Cardiovascular diabetology 13, 129 (2014). 55 Kumar, J. et al. Vitamin B12 deficiency is associated with coronary artery disease in an Indian population. Clinical Chemistry and Laboratory Medicine (CCLM) 47, 334-338 (2009). 56 Mahalle, N., Kulkarni, M. V., Garg, M. K. & Naik, S. S. Vitamin B12 deficiency and hyperhomocysteinemia as correlates of cardiovascular risk factors in Indian subjects with coronary artery disease. Journal of cardiology 61, 289-294 (2013). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Figure and table: Figure 1 Chromatograms of the scheduled MRM method with variable-RTW and relative-DTW. a A total ion chromatogram of method consisting of 1224 lipid species and 12 internal standards from 18 lipid classes in positive or negative mode. b,c In positive ion mode, SM (18:1)-729/184.1 has elution window of 36.1 seconds with dwell weight 1 (b) and CE (24:0)-754.7/369.4 has elution window of 32.5 seconds with dwell weight 3.01 (c). d,e In negative ion mode, LPC (20:4)-602.3/303.2 and LPE (22:5)- 526.3/329.2 has equal elution window (40.2 seconds) but LPE (22:5) has higher dwell weight (1.15) (d) compared to LPC (20:4) dwell weight (1) (e). Figure 2 XIC (extracted ion chromatogram) of nine isomers of TAG (52:6). Parent m/z for all was 868.8 while the product m/z was derived from the remaining mass (R1+R2 with glycerol backbone) after the loss of fatty acid released from the parent ion. R1+R2 can be any composition of fatty acid which sum-up to give product ion. Different color of dot represents different isomers confirmed through IDA-EPI experiment (refer to supplementary figure 1). Figure 3 Abundance of different lipids. a Abundance of different TAGs on the basis of total chain length and unsaturation. b 415 TAG isomers were detected from 90 different categories of TAG. c Abundance of different phospholipids on the basis of total chain length and unsaturation. d Abundance of 385 phospholipids belonging to 6 classes (PC, PE, PG, PI, PS, and PA), different dots of same color represent isomers. Figure 4 Representative graphs from positive and negative ion mode showing LoD, LoQ and coefficient of determination, x and y-axis was log transformed. a SM from positive ion mode and b PC from negative ion mode. Figure 5 Validation of the method. a Spike and recovery of different lipid class, blue bar represent the recovery of lipids when known concentration of lipid standards was spiked during extraction and green bar represents the reference (same concentration of lipid standard spiked after extraction). b Coefficient of variance on day 1 where 1018 lipid species from 15 lipid classes were detected (n=5). Figure 6 Significantly dysregulated lipid species in vitamin B12 deficiency. a Significantly down-regulated Omega 3 fatty acid 20:5 in vitamin B12 deficiency. b Significantly upregulated Omega 6 fatty acid 18:2 in vitamin B12 deficient condition. Table1. Analytical validation of the method with lipid standards. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Table1. Lipid class Ion mode Number of lipid species Internal standard LoD Conc. (pmol/L) LoQ Conc. (pmol/L) Coefficient of determination (R 2 ) SM ESI+ 12 SM (d18:1-18:1(d9)) 0.319 0.639 0.99 CE ESI+ 21 Ceramide (17:0) 6.082 12.164 0.99 Cer ESI+ 62 TAG ESI+ 445 TAG (15:0-18:1(d7)-15:0) 17.233 34.466 0.99 DAG ESI+ 50 DAG (15:0-18:1(d7)) 999.184 1998.367 0.99 MAG ESI+ 17 LPC ESI- 16 LPC (18:1(d7)) 0.368 5.887 0.99 PC ESI- 79 PC (15:0-18:1(d7)) 13.024 26.048 0.98 LPE ESI- 16 LPE (18:1(d7)) 1.329 5.318 0.99 PE ESI- 142 PE (15:0-18:1(d7)) 0.245 0.979 0.99 LPG ESI- 16 PG (15:0-18:1(d7)) 0.291 0.291 0.99 PG ESI- 78 LPI ESI- 16 PI (15:0-18:1(d7)) 2.639 10.557 0.98 PI ESI- 77 LPS ESI- 16 PS (15:0-18:1(d7)) 41.961 167.846 0.99 PS ESI- 78 LPA ESI- 6 PA (15:0-18:1(d7)) 41.897 167.587 0.97 PA ESI- 77 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 . 0e0 4 . 0e7 8 . 0e7 1 . 2e8 1 . 6e8 2 . 0e8 2 . 4e8 2 . 8e8 3 . 2e8 Total ion chromatograma. TAG Cer CE DAG MAG PG PC SM PI PE LPC LPE PS PA LPG LPI LPS LPA Time 0 e0 1 e6 2 e6 3 e6 4 e6 11.8 12.312.0 C E( 2 4 : 0 ) - 7 5 4 . 7 /3 69 .4c. S M( 1 8 : 1 ) - 7 2 9 . 7 /1 84 .1b. Time 2 .3 2 .4 2 .5 2 .6 0 . 0 e0 5 . 0 e4 1 . 0 e5 1 . 5 e5 2 . 0 e5 2 . 5 e5 2.2 2.7 Time 0 . 0 e0 2 . 0 e4 4 . 0 e4 6 . 0 e4 8 . 0 e4 1 . 0 e5 12 .2 12 .6 12 .812 .4 Time L P C ( 2 0 : 4 ) - 6 02 . 3 / 30 3 .2d. 0 1000 2000 3000 4000 12 .6 13 .0 13 .212 .8 Time e. L PE( 2 2 : 5 ) - 5 2 6 . 3 /3 29 .2 24 11.9 12.1 12.2 Figure1. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 0 . 0e0 5 . 0e4 1 . 0e5 1 . 5e5 T AG (52: 6/F A14 : 0)( 868. 8 / 6 23. 6) T AG (52: 6/F A16 : 0)( 868. 8 / 5 95. 5) T AG (52: 6/F A16 : 1)( 868. 8 / 5 97. 5) T AG (52: 6/F A18 : 1)( 868. 8 / 5 69. 5) T AG (52: 6/F A18 : 2)( 868. 8 / 5 71. 5) T AG (52: 6/F A18 : 3)( 868. 8 / 5 73. 5) T AG (52: 6/F A20 : 4)( 868. 8 / 5 47. 5) T AG (52: 6/F A20 : 5)( 868. 8 / 5 49. 5) T AG (52: 6/F A22 : 6)( 868. 8 / 5 23. 5) 2 3 Ti me, min Figure 2. R3 R1 R3 R1 R3 R1 R3 R1 R3 R1 R3 R1 R3 R1 R3 R1 R3 R1 OH O CH3 OH O CH3 O CH3 OH O CH3 O CH3 OH O CH3 OH O CH 3 OH O CH3 OH O CH 3 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Figure 3. 1 2 0 3 4 5 6 7 8 10 9 11 12 42 44 46 48 50 52 54 56 58 60 U n sa tu ra ti o n U n sa tu ra ti o n Chain length Chain length a. b. Chain length 0 PC PC PC PC PC PC PC PE PE PE PE PE PE PG PG PG PGPG PG PG PI PI PI PI PI PI PS PS PS PS PS PS PS PA PA 1 PC PC PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PG PI PI PI PI PI PI PI PI PI PS PS PS PS PS PS PS PS PS PS PA PA PA PA PA PA PA 2 PC PC PC PC PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PG PG PI PI PI PI PI PI PI PI PI PS PS PS PS PS PS PS PS PS PS PA PA PA PA PA PA PA 3 PC PC PC PC PC PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PG PG PG PI PI PI PI PI PI PI PI PS PS PS PS PS PS PS PS PS PS PS PS PA PA PA PA PA PA PA 4 PC PC PC PC PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PG PG PI PI PI PI PI PI PI PI PI PI PI PS PS PS PS PS PS PS PS PS PS PS PS PA PA PA PA PA PA PA PA PA 5 PC PC PC PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PG PG PG PI PI PI PI PI PI PI PI PI PS PS PS PS PS PS PS PS PS PA PA PA PA PA PA PA PA 6 PC PC PC PC PC PC PC PC PE PE PE PE PE PE PE PG PG PG PG PG PG PG PG PI PI PI PI PI PS PS PS PS PS PS PS PA PA PA PA PA PA 7 PCPC PC PE PE PE PG PG PG PI PS PS PS 8 PC PE PG PS 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 c. U n sa tu ra ti o n U n sa tu ra ti o n Chain length d. 1 2 0 3 4 5 6 7 8 10 9 11 12 42 44 46 48 50 52 54 56 58 60 2 5 0 3 4 6 7 8 1 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 4e+05 4e+06 4e+07 4e+08 Isomer 1 Isomer 2 Isomer 3 isomer 4 Isomer 5 Isomer 6 Isomer 7 Isomer 8 Isomer 9 Isomer 10 Isomer 11 Isomer 12 Abundance Isomers 2e+05 2e+06 2e+07 2e+08 PC PE PG PI PS PA Abundance Lipid class 1e+06 1e+07 1e+08 1e+09 4e+05 4e+06 4e+07 4e+08 Abundance Abundance (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 LOQ LOD LOD LOQ Figure 4. a. b. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 Figure5. a. b. SM CE CER TAG DAG LPC PC LPE PE LPG PG PI LPS PS PA Lipid classes 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 C o e � ci e n t o f v a ri a n ce 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 C o e � cie n t o f v a ria n ce 0 0.02 0.04 Density (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 TAG(52:6/FA20:5)+NH4 0.0E-00 0.5E-04 1.0E-04 1.5E-04 2.0E-04 0.0E-00 0.5E-04 1.0E-04 1.5E-04 2.0E-04 TAG(56:7/FA20:5)+NH4 0.0E-00 2.0E-04 4.0E-04 6.0E-04 PC(16:0/20:5)+AcO PC(20:0/20:5)+AcO 3.0E-05 2.0E-05 1.0E-05 0.0E-00 4.0E-05 5.0E-05 PE(18:0/20:5)-H 0.0E-00 0.5E-03 1.0E-03 1.5E-03 2.0E-03 PA(20:0/20:5)-H 0.0E-00 1.0E-03 2.0E-03 3.0E-03 Low Normal TAG(46:3/FA18:2)+NH4 0.0E-00 1.0E-04 2.0E-04 3.0E-04 TAG(46:4/FA18:2)+NH4 0.0E-00 0.5E-04 1.0E-04 TAG(48:4/FA18:2)+NH4 0.0E-00 1.0E-04 2.0E-04 3.0E-04 Low Normal Low Normal Low Normal Low Normal Low Normal Low Normal Low Normal Low Normal TAG(48:5/FA18:2)+NH4 0.0E-00 2.0E-05 4.0E-05 6.0E-05 8.0E-05 TAG(50:4/FA18:2)+NH4 0.0E-00 0.5E-03 1.0E-03 1.5E-03 2.0E-03 TAG(50:5/FA18:2)+NH4 0.0E-00 0.5E-04 1.0E-04 1.5E-04 2.0E-04 Low Normal Low Normal Low Normal TAG(51:4/FA18:2)+NH4 0.0E-00 2.0E-04 4.0E-04 6.0E-04 8.0E-04 TAG(52:5/FA18:2)+NH4 0.0E-00 1.0E-03 2.0E-03 3.0E-03 4.0E-03 TAG(52:6/FA18:2)+NH4 0.0E-00 1.0E-04 2.0E-04 3.0E-04 4.0E-04 Low NormalLow NormalLow Normal TAG(54:5/FA18:2)+NH4 0.0E-00 0.2E-02 0.4E-02 0.6E-02 0.8E-02 1.0E-02 TAG(54:6/FA18:2)+NH4 0.0E-00 1.0E-03 3.0E-03 3.0E-03 4.0E-03 TAG(54:7/FA18:2)+NH4 0.0E-00 0.2E-03 0.4E-03 0.6E-03 0.8E-03 1.0E-03 Low Normal Low Normal Low Normal TAG(55:5/FA18:2)+NH4 0.0E-00 2.0E-04 4.0E-04 6.0E-04 8.0E-04 TAG(56:5/FA18:2)+NH4 0.0E-00 0.5E-03 1.0E-03 1.5E-03 PE(P-18:2/18:2)-H 0.0E-00 0.5E-03 1.0E-03 1.5E-03 2.0E-03 Low Normal S um n or m al iz ed a re a S um n or m al iz ed a re a S um n or m al iz ed a re a Low Normal Low Normal a. Omega 3 - 20:5 b. Omega 6 - 18:2 1.5E-04 Figure6. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425875doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425875 10_1101-2021_01_08_425887 ---- Auto-CORPus: Automated and Consistent Outputs from Research Publications Auto-CORPus: Automated and Consistent Outputs from Research Publications Yan Hu1,a, Shujian Sun1,a, Thomas Rowlands2, Tim Beck2,3,b, and Joram M. Posma1,3,b 1 Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, SW7 2AZ, United Kingdom 2 Department of Genetics and Genome Biology, University of Leicester, LE1 7RH, United Kingdom 3 Health Data Research (HDR) UK, United Kingdom a These authors contributed equally. b These authors contributed equally. � Abstract Motivation: The availability of improved natural lan- guage processing (NLP) algorithms and models enable researchers to analyse larger corpora using open source tools. Text mining of biomedical literature is one area for which NLP has been used in recent years with large untapped potential. However, in order to generate cor- pora that can be analyzed using machine learning NLP algorithms, these need to be standardized. Summarizing data from literature to be stored into databases typically requires manual curation, especially for extracting data from result tables. Results: We present here an automated pipeline that cleans HTML files from biomedical literature. The output is a single JSON file that contains the text for each section, table data in machine-readable format and lists of phenotypes and abbreviations found in the article. We analyzed a total of 2,441 Open Access articles from PubMed Central, from both Genome-Wide and Metabolome-Wide Association Studies, and developed a model to standardize the section headers based on the Information Artifact Ontology. Extraction of table data was developed on PubMed articles and fine-tuned using the equivalent publisher versions. Availability: The Auto-CORPus package is freely available with detailed instructions from Github at https://github.com/jmp111/AutoCORPus/. information artefact ontology | natural language processing | text standard- ization Correspondence: timbeck [at] leicester.ac.uk and jmp111 [at] ic.ac.uk Introduction Natural language processing (NLP) is a branch of artificial intelligence that uses computers to process, understand and use human language. NLP is applied in many different fields including language modelling, speech recognition, text min- ing and translation systems. In the biomedical realm, NLP has been applied to extract for example medication data from electronic health records and patient clinical history from clinical notes, to significantly speed up processes that would otherwise be extracted manually by experts (1). Biomedical publications, unlike structured electronic health records, are semi-structured and this makes it difficult to extract and inte- grate the relevant information (2). The format of research ar- ticles differs between publishers and sections describing the same entity, for example statistical methods, can be found in different locations in the document in different publica- tions. Both unstructured text and semi-structured document elements, such as headings, main texts and tables, can con- tain important information that can be extracted using text mining (3). The development of the genome-wide association study (GWAS) has been led to by the on-going revolution in high- throughput genomic screening and a deeper understanding of the relationship between genetic variations and diseases/traits (4). In a typical GWAS, researchers collect data from study participants, use single nucleotide polymorphism (SNP) ar- rays to detect the common variants among participants, and conduct statistical tests to determine if the association be- tween the variants and traits is significant. The results are mostly represented in publication tables, but can also be found in the main text, and there are multiple community ef- forts to store these reported associations in queryable, on- line databases (5, 6). These efforts involve time-intensive and costly manual data curation to transcribe results from the publications, and supplementary information, into databases. Summary-level GWAS results are generally reported in the literature according to community norms (e.g. a SNP asso- ciated to a phenotype with a probability value), hence NLP algorithms can be trained to recognize the formats in which data are reported to facilitate faster and scalable information extraction that is less prone to human error. Development of effective automatic text mining algorithms for GWAS literature can also potentially benefit other fields in biomedical research as the body of biomedical literature grows every day. Yet previous attempts of mining scientific literature focused mainly on information extraction from ab- stracts and some on the main text, while for the most part ignoring tables. To facilitate the process of preparing a cor- pus for NLP tasks such as named-entity recognition (NER), text classification or relationship extraction, we have devel- oped an Automated pipeline for Consistent Outputs from Research Publications (Auto-CORPus) as a Python package. The main aims of Auto-CORPus are: • To provide clean text outputs for each publication sec- tion with standardized section names Hu and Sun, et al. | bioRχiv | January 8, 2021 | 1–10 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://github.com/jmp111/AutoCORPus/ timbeck@leicester.ac.uk jmp111@ic.ac.uk https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ • To represent each publication’s tables in a JavaScript Object Notation (JSON) format to facilitate data im- port into databases • To use the text outputs to find abbreviations used in the text We exemplify the package on a corpus of 1,200 Open Access GWAS publications whose data have been manually added to the GWAS Central database to list phenotypes, SNPs and P-values found in the cleaned text (Figure 1). In addition, we also include data on 1,200+ Metabolome-Wide Association Studies (MWAS) to ensure the methods are not biased towards one domain. MWAS focus on small molecules, some of which are end-products of cellular regulatory processes, that are the response of the human body to genetic or environmental variations (7). Materials and Methods Data. Hypertext Markup Language (HTML) files for 1,200 Open Access GWAS publications whose data exists in the GWAS Central database (5) were downloaded from PubMed Central (PMC) in March 2020. A further 1,241 Open Access publications of MWAS on cancer, gastrointestinal diseases, metabolic syndrome, sepsis and neurodegenerative, psychi- atric, and brain illnesses were also downloaded in the same format. Publisher versions of ca. 10% of these publications were downloaded in July 2020 to test the algorithms on pub- lications with different HTML formats. The GWAS dataset was randomly divided into 700 training publications to de- velop algorithms, and a test set of the remaining 500 publica- tions. Processing. HTML files were loaded using the Beautiful- soup4 HTML parser package (v4.9.0). Beautifulsoup4 was used to convert HTML files to tree-like structures with each branch representing a HTML section and each leaf a HTML element. After HTML files were loaded, all superscripts, subscripts, and italics were converted to plain text. Auto- CORPus extracts h1, h2 and h3 tags for titles and headings, and p tags for paragraph texts using the default configura- tion. The headings and paragraphs are saved in a structured JavaScript Object Notation (JSON) file for each HTML file. Tables are extracted from the document using a different set of configuration files (separate configurations for different ta- ble structures can be defined and used) and saved in a new JSON model that ensures tables of all formats and origin, not only restricted to GWAS publications, can be described in the same structured model, so that these can be used as in- put to rule-based or deep learning algorithms for data extrac- tion. The data cells are stored in the “result” key, and their corresponding section name and header names are stored in “section_name” and “columns” keys respectively. Therefore, extracting relationships between cells only requires simple rules. Fig. 1. Workflow of the Auto-CORPus package. 2 | bioRχiv Hu and Sun, et al. | Auto-CORPus .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Ontologies for entity recognition. The Information Arti- fact Ontology (IAO) was created to serve as a domain-neutral resource for the representation of types of information con- tent entities such as documents, databases, and digital im- ages (8). We used the v2020-06-10 model (9) in which 37 different terms exist that describe headers typically found in biomedical literature. The extracted headers in the JSON file were first mapped to the IAO terms using the Lexical OWL Ontology Matcher (10). We use fuzzy matching using the fuzzywuzzy package (v0.17.0) to map headers to the pre- ferred section header terms and synonyms, with a similarity threshold of 0.8. This threshold was evaluated by confirming all matches were accurate by two independent researchers. After the direct IAO mapping and fuzzy matching, unmapped headers still exist. To map these headings, we developed a new method using a directed graph (digraph) for representa- tion since headers are not repeated within a document, are se- quential and have a set order that can be exploited. Digraphs consist of nodes (entities, headers) and edges (links between nodes) and the weight of the nodes and edges is propor- tional to the number of publications in which these are found. While digraphs from individual publications are acyclic, the combined graph can contain cycles hence digraphs opposed to directed acyclic graphs are used. Unmapped headers are assigned a section based on the digraph and the headers in the publication that could be mapped (anchor points). For example, at this point in this article the main headers are ‘ab- stract’ followed by ‘introduction’ and ‘materials and meth- ods’ that could make up a digraph. Another article with head- ers ‘abstract’, ‘background’ and ‘materials and methods’ has two anchor points that match the digraph, and the unmapped header (‘background’) can be inferred from appearing in be- tween the anchor points in the digraph (‘abstract’, ‘materials and methods’): ‘introduction’. We use this process to eval- uate new potential synonyms for existing terms and identify new potential terms for sections found in biomedical litera- ture. We used the Human Phenotype Ontology (HPO) to identify disease traits in the full texts. The HPO was developed with the goal to cover all common phenotypic abnormalities in hu- man monogenic diseases (11). Use cases: regular expression algorithms. Abbrevia- tions in the full text are found using an adaptation of a previ- ously published methodology (12) based on regular expres- sions using the abbreviations package (v0.2.5). The brief principle of it is to find all brackets within a corpus. If the number of words in a bracket is <3 it considers if it could be an abbreviation. It searches the characters within the brackets in the text on either side of the brackets one by one. The first character of one of these words must contain the first charac- ter within that bracket. And the other characters within that bracket must be contained by other words followed by the previous word whose first character is the same as the first character in that bracket. We combine the output of the pack- age with abbreviations defined in the abbreviations section (if found) from the IAO/digraph model. For phenotype entity recognition, first any abbreviations in paragraphs extracted from the full text are replaced by their definition. This text is then tokenized using the spacy pack- age (v2.3) (model en_core_web_sm) and compared against phenotypes and their synonyms defined by HPO for disease traits matching. P-values and SNPs were identified in the full text and tables based on regular expressions as they have a standard form. Pairs of P-value-SNP associations are found in the text using dependency parse trees (13). Use cases: deep learning-based named-entity recog- nition. The first example of a use case is to recognize the assay with which the data was acquired, however no ex- isting models exist for this purpose. We fine-tuned a pre- existing model trained for biomedical NER, the biomedi- cal Bidirectional Encoder Representations from Transform- ers (bioBERT) (14), using part of our corpus where only MWAS assays were tagged. We applied our fine-tuned model only on the paragraphs in the materials and methods sec- tions to recognize the assays used. A second bioBERT-based model was fine-tuned on phenotypes, which already exist in the data, and enriched in phenotypes associated with the MWAS publications. This model was applied on only the abstract and paragraphs from the results section. The third example was applied only on paragraphs from the results and discussion sections using an existing model specifically trained to recognize chemical entities, ChemListem (v0.1.0) (15). Use cases: paragraph classification. It is possible un- mapped headers are mapped to multiple sections if the an- chor points are far apart. In order to test the applicability of a machine learning model to classify paragraphs we trained a random forest classifier on a dataset consisting of 1,242 ab- stract paragraphs and 936 non-abstract paragraphs. 80% of the data was used for training and the remainder as the test set. Results The order of sections in biomedical literature. A total of 21,849 headers were extracted from the 2,441 publica- tions, mapped to IAO (v2020-06-10) terms and visualized by means of a digraph with 372 unique nodes and 806 directed edges (Figure 2A). The major unmapped node is ‘associated data’, which is a header specific for PMC articles that ap- pears at the beginning of each article before the abstract. The main structure of biomedical articles that were analyzed is: abstract → introduction → materials → results → discus- sion → conclusion → acknowledgements → footnotes sec- tion → references. IAO has separate definitions for ‘mate- rials’ (IAO:0000633), ‘methods’ (IAO:0000317) and ‘statis- tical methods’ (IAO:0000644) sections, hence they are sepa- rate nodes in the graph and introduction is also often followed by headers to reflect the methods section (and synonyms). There is also a major directed edge from introduction directly to results, with materials and methods placed after the discus- sion and/or conclusion sections. Hu and Sun, et al. | Auto-CORPus bioRχiv | 3 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ All unmapped headers were investigated and evaluated whether some could be used as synonym for existing cate- gories. The digraph was also inspected by means of visual- izing individual ego-networks which show the edges around a specific node mapped to an existing IAO term. Figure 2B shows the ego-network for abstract, and four main categories and one potential new synonym (precis, in red) were iden- tified. The majority of unmapped headers (in purple), that follow the abstract, relate to a document that is written as one coherent whole, with specific headers for each section or a general header for the full/main text. An additional four unmapped headers relate to ‘materials and methods’ in their broader sense and these are data, data description, par- ticipants and sample. The remaining two categories of un- mapped headers to/from abstract can be classified as new sections ‘graphical abstract’ and ‘highlights’. These head- ers were found alongside, and appear to be distinct from, the (textual) abstract. Based on the digraph, we then assigned data and data descrip- tion to be synonyms of the materials section, and participants and sample as a new category termed ‘participants’ which is related to, but deemed distinct from, the existing patients sec- tion (IAO:0000635). The same process was applied to ego- networks from other nodes linked to existing IAO terms to add additional synonyms to simplify the digraph. Figure 2C shows the resulting digraph with only existing and newly pro- posed section terms. New proposed elements for the IAO. Each existing IAO term contains one or more synonyms and extracted head- ers were first mapped directly to these terms. Any headers that could not be mapped directly are mapped in the second step using fuzzy matching (e.g. the typographical error ‘ex- peremintal section’ in PMC4286171 is correctly mapped to the methods section). The last step involves mapping remain- ing unmapped headers to existing terms based on the digraph and using the structure (anchor headers) of the publication. Headers that can be mapped to existing terms in the second and third steps, are included as synonyms in the model. The existing categories for which new potential synonyms were identified are listed in Table 1a and 1b with their existing synonyms and newly identified synonyms. From the analysis of ego-networks four new potential cate- gories were identified: disclosure, graphical abstract, high- lights and participants. Table 2 details the proposed defini- tion and synonyms for these categories. In the digraph in Figure 2C this section is located towards the end of a pub- lication and in some instances is followed by the conflict of interest section. Table data extraction with different configurations. PMC articles are standardized which makes data extraction more straightforward, however some publications are not deposited into PMC or other repositories and can only be found via publisher websites. While the package has been developed using a large set of PMC articles, we compared the Auto-CORPus output for PMC articles with the output for the equivalent articles made available by the publishers. We found no differences in how headers were extracted and paragraphs were classified based on the digraph. However, the representation of tables does differ substantially between publishers, hence a model developed on PMC articles alone will fail to extract the data. We circumvent this issue by defin- ing configuration files for different table formats and we com- pare the accuracy of the data represented in the JSON format (Figure 3) between PMC and publisher versions of the same papers. Using the default (PMC) configuration on non-PMC arti- cles none of the 302 tables are represented accurately in the JSON. Auto-CORPus allows to use a variety of configura- tion files (a single file, or all as batch) to be used to extract data from tables. One configuration file, different to the de- fault, correctly represented the data in JSON format of 93% (280) of tables. The remaining 22 tables could be repre- sented correctly using 8 different configuration files. When the right configuration file is used for non-PMC articles, all tables (100%) are represented identically to the JSON output from the matching PMC version. Use cases. The extracted paragraphs were classified as one (or more) categories based on the digraph. This is the purpose of the Auto-CORPus package, to prepare a corpus for analy- sis so that different sections can be used for specific purposes. We detail how these standardized texts can be used for entity recognition. Paragraph classification. While many headers can be mapped using fuzzy matching plus the digraph structure, some headers remain unmapped (e.g. the headers in purple in Figure 2B: full text, main text, etc.) while others can be assigned to multiple (possible) sections. The choice of as- signing multiple categories to unmapped headers based on the digraph is deliberate as it is to ensure the algorithm does not wrongly assign it to only one (e.g. ‘materials’ over ‘meth- ods’). The next step is to perform the paragraph classification using NLP algorithms to learn from the word usage and con- text. We show that random forests can be used to this end by training it to distinguish between abstracts and other para- graphs. 435 paragraphs from the test set were predicted us- ing a random forest trained on 1,743 paragraphs. For the test set, we obtained an F1-score of 0.90 for classifying abstracts (precision = 0.91, recall = 0.90) and 0.88 for classifying non- abstracts (precision = 0.87, recall = 0.88). Abbreviation identification. The abbreviation detection algo- rithm searches through each paragraph using a rule-based ap- proach to find all abbreviations used. Auto-CORPus then investigates whether a paragraph is mapped to the abbrevia- tions category and, if found, it combines these two lists of ab- breviations found in the publication. For example, when ap- plied on an MWAS publication (16) which contains a header titled “ABBREVIATIONS” the algorithm combines the 9 ab- breviations listed by the authors and with a further 7 identi- fied from the text (Figure 4), including an abbreviation used with two spellings in the text. 4 | bioRχiv Hu and Sun, et al. | Auto-CORPus .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Fig. 2. Digraph generated from analyzing section headers from 2,441 Open Access publications from PubMed Central. (A) digraph of the v2020-06-10 IAO model consists of 372 unique nodes, of which 24 could be directly mapped to section terms (in orange) and the remainder are unmapped headers (in grey), and 806 directed edges. Relative node sizes and edge widths are directly proportional to the number of publications with these (subsequent) headers. Blue edges indicate the edge with the highest weight from the source node, edges that exist in fewer than 1% of publications are shown in light grey and the remainder in black. (B) Unmapped nodes connected to ‘abstract’ as ego node, excluding corpus specific nodes, grouped into different categories. Unlabeled nodes are titles of paragraphs in the main text. (C) Final digraph model used in Auto-CORPus to classify paragraphs after fuzzy matching. This model includes new (proposed) section terms and each section contains new synonyms identified in this analysis. ‘Associated Data’ is included as this is a PMC-specific header found before abstracts and can be used to indicate the start of most articles. Rule-based extraction of GWAS summary-level data. GWAS Central relies on curated data extracted manually from pub- lications or other databases. We investigated whether a rule-based approach to recognize phenotypes, SNPs and P- values can correctly identify data from publications con- tained within the database. A rule-based approach by ap- plying the HPO on the 500 GWAS publications from the test set, identified a total of 9,599 unique disease traits (major and minor) in these publications. 949 traits are recorded for these publications in GWAS Central and the rule-based approach found 449 with a perfect match. For 65% of the publica- tions all traits were correctly identified. SNPs have standard- ized formats, hence rule-based approaches are well suited for their identification. Likewise, P-values in GWAS publica- tions are typically represented using scientific notation and can also be identified using rule-based methods. A total of 26,031 SNP/P-value pairs were found across the main text and tables of the 500 publications. For 62.4% of publications all associations recorded in the GWAS Central database are also found using this approach. While 57.6% of these pub- lications present results (SNP/P-value pairs) only in tables, and 94.3% of pairs are found in tables, 276 associations were identified from the main text that are not represented in ta- bles. 2,673 pairs match those recorded in the database (total of 6,969 pairs for these publications), however many associ- ations in the database are not represented in main text/tables but in supplementary materials. Auto-CORPus includes a separate function to convert csv/tsv data to table JSON for- mat (Figure 3), as summary-level results are often saved in these file formats as part of the supplementary information. Named-entity recognition. Three different deep learning models were used for NER on specific paragraphs of publica- tions. A pre-trained biomedical entity recognition algorithm (14) was fine-tuned using the results from the rule-based approach applied on GWAS data. Example sentences that contain HPO terms were used to fine-tune the transformer model and then applied on 928 MWAS publications from four broad and distinct phenotypes (cancer, gastrointestinal diseases, metabolic syndrome, and neurodegenerative, psy- chiatric and brain illnesses). The fine-tuned deep learning algorithm obtained accuracies between 0.76 and 0.97, aver- aging around 82.3% (Table 3). We then fine-tuned the same base model for recognizing as- says in text by training on sentences identified from the text that contain assays routinely used in MWAS. The first pass consisted of a rule-based approach, with fuzzy matching, to find sentences with terms and these were then used to fine- tune the deep learning model. Figure 5 shows the result- ing output in JSON format for one MWAS publication (16). Hu and Sun, et al. | Auto-CORPus bioRχiv | 5 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Category (IAO identifier) Existing synonyms (IAO v2020-06-10) New synonyms identified a abstract (IAO:0000315) abstract precis acknowledgements (IAO:0000324) acknowledgements, acknowledgments acknowledgement, acknowledgment, acknowledgments and disclaimer author contributions (IAO:0000323) author contributions, contributions by the authors authors’ contribution, authors’ contributions, authors’ roles, contributorship, main authors by consortium and author contributions discussion (IAO:0000319) discussion, discussion section discussions footnote (IAO:0000325) endnote, footnote footnotes introduction (IAO:0000316) background, introduction introductory paragraph methods (IAO:0000317) experimental, experimental procedures, experimental section, materials and methods, methods analytical methods, concise methods, experimental methods, method, method validation, methodology, methods and design, methods and procedures, methods and tools, methods/design, online methods, star methods, study design, study design and methods references (IAO:0000320) bibliography, literature cited, references literature cited, reference, references, reference list, selected references, web site references supplementary material (IAO:0000326) additional information, appendix, supplemental information, supplementary material, supporting information additional file, additional files, additional information and declarations, additional points, electronic supplementary material, electronic supplementary materials, online content, supplemental data, supplemental material, supplementary data, supplementary figures and tables, supplementary files, supplementary information, supplementary materials, supplementary materials figures, supplementary materials figures and tables, supplementary materials table, supplementary materials tables Table 1a. Newly identified synonyms for existing IAO terms (00003xx) from the digraph mapping of 2,441 publications. Elements in italics have previously been submitted by us for inclusion into IAO and added in the latest release (v2020-12-09). Lastly, we applied a domain specific algorithm for recogniz- ing chemical entities in the text and tables (15) to identify metabolites in the same publication (Figure 5). Discussion The analysis of our corpus of 2,441 Open Access publica- tions has resulted in identifying well over 100 new synonyms for existing terms used in biomedical literature to indicate what a paragraph is about. In addition, we identified four new potential categories not previously included in the IAO. We previously submitted a subset of synonyms reported here and one of the new categories for inclusion in the IAO. These have been accepted by the IAO and are included in the lat- est release (v2020-12-09), hence we presented our analyses using the previous version of IAO that does not include part of our work. In the latest release, the ‘graphical abstract’ section has been added (IAO:0000707) based on our contri- bution. Also, a new ‘research participants’ (IAO:0000703) section has been added as contribution by others in the same release; therefore synonyms found here for the new category ‘participants’ section will be proposed in future as synonyms for the ‘research participants’ section. While the disclosure section appears to be distinct from the conflict of interest sec- tion due to a directed edge in the digraph, its synonyms could also be proposed to be part of the existing conflict of interest section in IAO. Standardization of text for NLP is an important step in preparing a corpus. Auto-CORPus outputs a JSON file of cleaned text, with standardized headers as well as all data presented in tables in JSON format. Standardizing headers is important because some sections are more important than 6 | bioRχiv Hu and Sun, et al. | Auto-CORPus .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Category (IAO identifier) Existing synonyms (IAO v2020-06-10) New synonyms identified a abbreviations (IAO:0000606) abbreviations, abbreviations list, abbreviations used, list of abbreviations, list of abbreviations used abbreviation and acronyms, abbreviation list, abbreviations and acronyms, abbreviations used in this paper, definitions for abbreviations, glossary, key abbreviations, non-standard abbreviations, nonstandard abbreviations, nonstandard abbreviations and acronyms author information (IAO:0000607) author information, authors’ information biographies, contributor information availability (IAO:0000611) availability, availability and requirements availability of data, availability of data and materials, data archiving, data availability, data availability statement, data sharing statement conclusion (IAO:0000615) concluding remarks, conclusion, conclusions, findings, summary conclusion and perspectives, summary and conclusion conflict of interest (IAO:0000616) competing interests, conflict of interest, conflict of interest statement, declaration of competing interests, disclosure of potential conflicts of interest authors’ disclosures of potential conflicts of interest, competing financial interests, conflict of interests, conflicts of interest, declaration of competing interest, declaration of interest, declaration of interests, disclosure of conflict of interest, duality of interest, statement of interest consent (IAO:0000618) consent informed consent ethical approval (IAO:0000620) ethical approval ethics approval and consent to participate, ethical requirements, ethics, ethics statement funding source declaration (IAO:0000623) funding, funding information, funding sources, funding statement, funding/support, source of funding, sources of funding financial support, grants, role of the funding source, study funding future directions (IAO:0000625) future challenges, future considerations, future developments, future directions, future outlook, future perspectives, future plans, future prospects, future research, future research directions, future studies, future work outlook materials (IAO:0000633) materials data, data description statistical analysis (IAO:0000644) statistical analysis statistical methods, statistical methods and analysis, statistics study limitations (IAO:0000631) limitations, study limitations strengths and limitations, study strengths and limitations Table 1b. Newly identified synonyms for existing IAO terms (00006xx) from the digraph mapping of 2,441 publications. Elements in italics have previously been submitted by us for inclusion into IAO and added in the latest release (v2020-12-09). Hu and Sun, et al. | Auto-CORPus bioRχiv | 7 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Proposed category Proposed definition Proposed synonyms disclosure “A part of a document used to disclose any associations by authors that might be perceived as to potentially interfere with or prevent them from reporting research with complete objectivity.” author disclosure statement, declarations, disclosure, disclosure statement, disclosures graphical abstract “An abstract that is a pictorial summary of the main findings described in a document.” central illustration, graphical abstract, TOC image, visual abstract highlights “A short collection of key messages that describe the core findings and essence of the article in concise form. It is distinct and separate from the abstract and only conveys the results and concept of a study. It is devoid of jargon, acronyms and abbreviations and targeted at a broader, non-technical audience.” author summary, editors’ summary, highlights, key points, overview, research in context, significance, TOC participants “A section describing the recruitment of subjects into a research study. This section is distinct from the ‘patients’ section and mostly focusses on healthy volunteers.” participants, sample Table 2. Newly proposed categories of entities found in 2,441 publications in the biomedical literature that could not be mapped to existing terms in IAO. Elements in italics have previously been submitted by us for inclusion into IAO and added in the latest release (v2020-12-09). Known phenotype Papers Accuracy cancer 492 0.84 gastrointestinal diseases 37 0.97 metabolic syndrome 286 0.80 neurodegenerative, psychiatric, brain illnesses 113 0.76 Table 3. Summary of results for named-entity recognition (NER) of phenotypes in MWAS papers. others for specific tasks. For example, no new findings can be found in an introduction however it is well suited to discover the main phenotypes under study, only in materials/methods can details be found on how these phenotypes are studied and using what technologies, and findings can only be found in results (and discussion) sections. Hence it is important to classify these paragraphs and Auto-CORPus does this by using the structure of the publication and the digraph. We showed that we can further improve the assignment by train- ing machine learning models with good accuracy to distin- guish between different types of texts in cases where there may be ambiguity - this can be further improved by using a multi-class classifier and using all paragraphs. These data are then available for use in downstream analyses using ded- icated algorithms for entity recognition or other methods. Auto-CORPus is able to process all HTML formatted tables from both GWAS and MWAS corpora, as opposed to pre- vious methods which could only operate on 86% of 3,573 tables (17). It takes Auto-CORPus on average 0.77 seconds to process all tables within a publication compared to several minutes if this is done manually. Moreover, Auto-CORPus also supports parallel computing, thereby further reducing the time needed to process publications as these can be run in batch. The structured JSON output is machine readable and can be used to support data import into database. Here we used the JSON output of Auto-CORPus in several examples to demonstrate some potential use cases. We demonstrated that existing algorithms trained on biomedical data can be fine- tuned to recognize new entities such as assays and pheno- types, which also opens up the possibility of using these data to train new deep learning algorithms for recognizing new entities such as metabolites (opposed to chemical entities), SNPs and P-values, as well as identifying the relationships between them from text. NER algorithms have difficulty with recognizing terms that are abbreviated, therefore the list of abbreviations found by Auto-CORPus can be used to replace all abbreviations in the text to their definitions. Conclusion The Auto-CORPus package is freely available and can be de- ployed on local machines as well as using high-performance computing to process publications in batch. A step-by-step guide to detail how to use Auto-CORPus is supplied with the package. The key features of Auto-CORPus are that it: 1. outputs all text and table data in a standardized JSON format, 2. classifies each paragraph into separate categories of text, and 8 | bioRχiv Hu and Sun, et al. | Auto-CORPus .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Fig. 3. Example of JSON format for table data from this work (shown for Table 3). The Auto-CORPus output for tables consists of ‘status’, ‘error message’ and ‘tables’ as top level fields, ‘tables’ has fields ‘identifier’, ‘title’, ‘columns’, ‘section’ and ‘footer’, and ‘section’ contains ‘section name’ and ‘results’. Fig. 4. Example of JSON output of abbreviation detection using a rule-based ap- proach on an MWAS publication (16). Fig. 5. Example of JSON output of named-entity recognition (NER) on an MWAS publication (16) using a fine-tuned transformer-based deep learning model for as- says and bidirectional long-short term memory network for chemical entity recogni- tion. 3. is implemented in pure Python code and does not have non-Python dependencies. ACKNOWLEDGEMENTS We thank Mohamed Ibrahim (University of Leicester) for identifying different configu- rations of tables for different HTML formats, and Joy Li and Filip Makraduli (Imperial College London) for testing the package and providing feedback. AUTHOR CONTRIBUTIONS TB and JMP designed and supervised the research. SS and YH developed the pipeline and analyzed data. SS developed the initial table extraction algorithm and implemented the phenotype recognition algorithm. YH developed the section header standardization algorithm and implemented the abbreviation recognition al- gorithm. SS fine-tuned the table extraction algorithm for use on non-PMC texts. TR refined standardization of full texts and contributed algorithms for UTF-8 and UTF- 16 conversions of non-ASCII characters to Unicode. SS, YH, TB and JMP wrote the manuscript. FUNDING This work has been supported by Health Data Research (HDR) UK and the Medical Research Council via an UKRI Innovation Fellowship to TB (MR/S003703/1) and a Rutherford Fund Fellowship to JMP (MR/S004033/1). FOOTNOTE ORCID: 0000-0002-4971-9003 (JMP). Bibliography 1. Seyedmostafa Sheikhalishahi, Riccardo Miotto, Joel T Dudley, Alberto Lavelli, Fabio Rinaldi, and Venet Osmani. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med Inform, 7(2):e12239, 4 2019. ISSN 2291-9694. doi: 10.2196/ 12239. 2. Ramón A-A. Erhardt, Reinhard Schneider, and Christian Blaschke. Status of text-mining techniques applied to biomedical text. Drug Discovery Today, 11(7):315–325, 2006. ISSN 1359-6446. doi: https://doi.org/10.1016/j.drudis.2006.02.011. 3. Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. A frame- work for information extraction from tables in biomedical literature. International Jour- nal on Document Analysis and Recognition (IJDAR), 22(1):55–78, 2 2019. doi: 10.1007/ s10032- 019- 00317- 0. 4. Peter M. Visscher, Naomi R. Wray, Qian Zhang, Pamela Sklar, Mark I. McCarthy, Matthew A. Brown, and Jian Yang. 10 years of gwas discovery: Biology, function, and translation. The American Journal of Human Genetics, 101(1):5 – 22, 2017. ISSN 0002-9297. doi: https://doi.org/10.1016/j.ajhg.2017.06.005. 5. Tim Beck, Tom Shorter, and Anthony J Brookes. Gwas central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide as- sociation studies. Nucleic Acids Research, 48(D1):D933–D940, 10 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz895. 6. Annalisa Buniello, Jacqueline A L MacArthur, Maria Cerezo, Laura W Harris, James Hay- hurst, Cinzia Malangone, Aoife McMahon, Joannella Morales, Edward Mountjoy, Elliot Sol- lis, Daniel Suveges, Olga Vrousgou, Patricia L Whetzel, Ridwan Amode, Jose A Guillen, Harpreet S Riat, Stephen J Trevanion, Peggy Hall, Heather Junkins, Paul Flicek, Tony Bur- dett, Lucia A Hindorff, Fiona Cunningham, and Helen Parkinson. The NHGRI-EBI GWAS Hu and Sun, et al. | Auto-CORPus bioRχiv | 9 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://gtr.ukri.org/projects?ref=MR/S003703/1 https://gtr.ukri.org/projects?ref=MR/S004033/1 https://orcid.org/0000-0002-4971-9003 https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ Catalog of published genome-wide association studies, targeted arrays and summary statis- tics 2019. Nucleic Acids Research, 47(D1):D1005–D1012, 11 2018. ISSN 0305-1048. doi: 10.1093/nar/gky1120. 7. Jeremy K. Nicholson, Elaine Holmes, and Paul Elliott. The metabolome-wide association study: A new look at human disease risk factors. Journal of Proteome Research, 7(9): 3637–3638, 2008. doi: 10.1021/pr8005099. PMID: 18707153. 8. Werner Ceusters. An information artifact ontology perspective on data collections and asso- ciated representational artifacts. Studies in health technology and informatics, 180:68–72, 2012. ISSN 0926-9630. 9. Alan Ruttenberg, Adam Goldstein, Albert Goldfain, Barry Smith, Bjoern Peters, Carlo Tor- niai, Chris Mungall, Chris Stoeckert, Christian A. Boelling, Darren Natale, David Osumi- Sutherland, Gwen Frishkoff, Holger Stenzhorn, James A. Overton, James Malone, Jen- nifer Fostel, Jie Zheng, Jonathan Rees, Larisa Soldatova, Lawrence Hunter, Mathias Brochhausen, Matt Brush, Melanie Courtot, Michel Dumontier, Paolo Ciccarese, Pat Hayes, Philippe Rocca-Serra, Randy Dipert, Ron Rudnicki, Satya Sahoo, Sivaram Ara- bandi, Werner Ceusters, William Duncan, William Hogan, and Yongqun (Oliver) He. Infor- mation artefact ontology (v2020-06-10). https://raw.githubusercontent.com/ information-artifact-ontology/IAO/v2020-06-10/iao.owl, 2020. Ac- cessed: 2020-06-21. 10. A. Ghazvinian, N. F. Noy, and M. A. Musen. Creating mappings for ontologies in biomedicine: simple methods work. AMIA Annu Symp Proc, 2009:198–202, 11 2009. 11. Peter N. Robinson, Sebastian Köhler, Sebastian Bauer, Dominik Seelow, Denise Horn, and Stefan Mundlos. The human phenotype ontology: A tool for annotating and analyzing hu- man hereditary disease. The American Journal of Human Genetics, 83(5):610–615, 2008. ISSN 0002-9297. doi: https://doi.org/10.1016/j.ajhg.2008.09.017. 12. Ariel Schwartz and Marti Hearst. A simple algorithm for identifying abbreviation definitions in biomedical text. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 4:451–62, 02 2003. doi: 10.1142/9789812776303_0042. 13. Katrin Fundel, Robert Küffner, and Ralf Zimmer. RelEx—Relation extraction using de- pendency parse trees. Bioinformatics, 23(3):365–371, 12 2006. ISSN 1367-4803. doi: 10.1093/bioinformatics/btl616. 14. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 09 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/btz682. 15. Peter Corbett and John Boyle. Chemlistem: chemical named entity recognition using recurrent neural networks. Journal of Cheminformatics, 10(1), 12 2018. doi: 10.1186/ s13321- 018- 0313- 8. 16. Charles R. Evans, Alla Karnovsky, Melissa A. Kovach, Theodore J. Standiford, Charles F. Burant, and Kathleen A. Stringer. Untargeted LC–MS metabolomics of bronchoalveolar lavage fluid differentiates acute respiratory distress syndrome from health. Journal of Pro- teome Research, 13(2):640–649, 12 2013. doi: 10.1021/pr4007624. 17. Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. Disentangling the structure of tables in scientific literature. In Elisabeth Métais, Farid Meziane, Mohamad Saraee, Vijayan Sugumaran, and Sunil Vadera, editors, Natural Language Processing and Information Systems, pages 162–174. Springer International Publishing, 2016. ISBN 978- 3-319-41754-7. doi: https://doi.org/10.1007/978- 3- 319- 41754- 7_14. 10 | bioRχiv Hu and Sun, et al. | Auto-CORPus .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425887doi: bioRxiv preprint https://raw.githubusercontent.com/information-artifact-ontology/IAO/v2020-06-10/iao.owl https://raw.githubusercontent.com/information-artifact-ontology/IAO/v2020-06-10/iao.owl https://doi.org/10.1101/2021.01.08.425887 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_08_425379 ---- Competitive binding of STATs to receptor phospho-Tyr motifs accounts for altered cytokine responses in autoimmune disorders 1 Competitive binding of STATs to receptor phospho-Tyr motifs accounts for altered cytokine responses in autoimmune disorders Stephan Wilmes1*, Polly-Anne Jeffrey2*, Jonathan Martinez-Fabregas1, Maximillian Hafer3, Paul Fyfe1, Elizabeth Pohler1, Silvia Gaggero 4, Martín López-García2, Grant Lythe2, Thomas Guerrier5, David Launay5, Mitra Suman4, Jacob Piehler3, Carmen Molina-París2# and Ignacio Moraga1# 1 Division of Cell Signalling and Immunology, School of Life Sciences, University of Dundee, Dundee, UK. 2 Department of Applied Mathematics, School of Mathematics, University of Leeds, Leeds, UK. 3 Department of Biology and Centre of Cellular Nanoanalytics, University of Osnabrück, Osnabrück, Germany. 4 Université de Lille, INSERM UMR1277 CNRS UMR9020–CANTHER and Institut pour la Recherche sur le Cancer de Lille (IRCL), Lille, France. 5 Univ. Lille, Inserm, CHU Lille, U1286 - INFINITE - Institute for Translational Research in Inflammation, F-59000 Lille, France. * These authors contributed equally to this work # These authors share senior authorship ABSTRACT Cytokines elicit pleiotropic and non-redundant activities despite strong overlap in their usage of receptors, JAKs and STATs molecules. We use IL-6 and IL-27 to ask how two cytokines activating the same signaling pathway have different biological roles. We found that IL-27 induces more sustained STAT1 phosphorylation than IL-6, with the two cytokines inducing comparable levels of STAT3 phosphorylation. Mathematical and statistical modelling of IL-6 and IL-27 signaling identified STAT3 binding to GP130, and STAT1 binding to IL-27Ra, as the main dynamical processes contributing to sustained pSTAT1 by IL-27. Mutation of Tyr613 on IL-27Ra decreased IL-27-induced STAT1 phosphorylation by 80% but had limited effect on STAT3 phosphorylation. Strong receptor/STAT coupling by IL-27 initiated a unique gene expression program, which required sustained STAT1 phosphorylation and IRF1 expression and was enriched in classical Interferon Stimulated Genes. Interestingly, the STAT/receptor coupling exhibited by IL-6/IL-27 was altered in patients with Systemic lupus erythematosus (SLE). IL-6/IL-27 induced a more potent STAT1 activation in SLE patients than in healthy controls, which correlated with higher STAT1 expression in these patients. Partial inhibition of JAK activation by sub-saturating doses of Tofacitinib specifically lowered the levels of STAT1 activation by IL-6. Our data show that receptor and STATs concentrations critically contribute to shape cytokine responses and generate functional pleiotropy in health and disease. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 2 INTRODUCTION IL-27 and IL-6 both have intricate functions regulating inflammatory responses (1). IL-27 is a hetero-dimeric cytokine comprised of p28 and EBI3 subunits (2). IL-27 exerts its activities by binding GP130 and IL-27Rα receptor subunits in the surface of responsive cells, triggering the activation of the JAK1/STAT1/STAT3 signaling pathway. IL-27 elicits both pro- and anti- inflammatory responses, although the later activity seems to be the dominant one (3). IL-27 stimulation inhibits RORgt expression, thereby suppressing Th-17 commitment and limiting subsequent production of pro-inflammatory IL-17 (4, 5). Moreover, IL-27 induces a strong production of anti-inflammatory IL-10 on (Tbet+ and FoxP3-) Tr-1 cells (6-8) further contributing to limit the inflammatory response. IL-6 engages a hexameric receptor complex comprised of each of two copies of IL-6Ra, GP130 and IL-6 (9), triggering the activation, as IL-27 does, of the JAK1/STAT1/STAT3 signaling pathway. However, opposite to IL-27, IL-6 is known as a paradigm pro-inflammatory cytokine (10, 11). IL-6 inhibits lineage differentiation to Treg cells (12) while promoting Th-17 (13, 14), thus supporting its pro-inflammatory role. How IL-27 and IL-6 elicit opposite immuno-modulatory activities despite activating almost identical signaling pathways is currently not completely understood. The relative and absolute STATs activation levels seem to have intricate roles, which lead to a strong signaling and functional plasticity by cytokines. Although IL-6 robustly activates STAT3, it is capable to mount a considerable STAT1 response as well (15). Moreover, in the absence of STAT3, IL-6 induces a strong STAT1 response comparable to IFNg – a prototypic STAT1 activating cytokine (16). Likewise, the absence of STAT1 potentiates the STAT3 response for IL-27, which normally elicits a strong STAT1 response, rendering it to mount an IL-6-like response (15). Furthermore, negative feedback mechanisms like SOCSs and phosphatases have been described as critical players influencing STAT1 and STAT3 phosphorylation kinetics and thereby shaping their signal integration for GP130-utilizing cytokines (17-20). Yet, how all these molecular components are integrated by a given cell to produce the desired response is still an open question. Among the IL-6/IL-12 cytokine family, IL-27 exhibits a unique STAT activation pattern. The majority of GP130-engaging cytokines activate preferentially STAT3, with activation of STAT1 being an accessory or balancing component (21, 22). IL-27, however, triggers STAT1 and STAT3 activation with high potency (23). Indeed, different studies have shown that IL-27 responses rely on either STAT1 (24-26) or STAT3 activation (7, 27). Moreover, recent transcriptomics studies showed that in the absence of STAT3, IL-6 and IL-27 lost more than 75% of target gene induction. Yet, STAT1 was the main factor driving the specificity of the IL-27 versus the IL-6 response, highlighting a critical interplay of STAT1 and STAT3 engagement (28). While the biological responses induced by IL-27 and IL-6 have been extensively studied (3, 11), the very initial steps of signal activation and kinetic integration by these two cytokines have not been comprehensively analysed. Since the different biological outcomes elicited by IL-27 and IL-6 are most likely encoded in the early events of cytokine stimulation, here we specifically aimed to identify the molecular determinants underlying functional selectivity by IL-27 in human T-cells. We asked how a defined cytokine stimulus is propagated in time over multiple layers of signaling to produce the desired response. To this end, we probed IL-27 and IL-6 signaling at different scales, ranging from cell surface receptor assembly and early STAT1/3 effector activation to an unbiased and quantitative multi-omics approach: phospho- proteomics after early cytokine stimulation, kinetics of transcriptomic changes and alteration of the T-cell proteome upon prolonged cytokine exposure. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 3 IL-6 and IL-27 induced similar levels of assembly of their respective receptor complexes, which resulted in comparable phosphorylation of STAT3 by the two cytokines. IL-27, on the other hand, triggered a more sustained STAT1 phosphorylation. To decipher the molecular events which determine sustained STAT1 phosphorylation by IL-27, we mathematically model the STAT1 and STAT3 signaling kinetics induced by each of these cytokines. We identified differential binding of STAT1 and STAT3 to IL-27Ra and GP130, respectively, as the main factor contributing to a sustained STAT1 activation by IL-27. At the transcriptional level, IL-27 triggered the expression of a unique gene program, which strictly required the cooperative action between sustained pSTAT1 and IRF1 expression to drive the induction of an interferon- like gene signature that profoundly shaped the T-cell proteome. Interestingly, our mathematical models of IL-6 and IL-27 signaling predicted that changes in receptor and STAT expression could fundamentally change the magnitude and timescale of the IL-6 and IL-27 responses. We found high levels of STAT1 expression in SLE patients when compared to healthy donors, which correlated with biased STAT1 responses induced by IL-6 and IL-27 in these patients. Strikingly, we could specifically inhibit STAT1 activation by IL-6 using suboptimal doses of the JAK inhibitor Tofacitinib. This could provide a new strategy to specifically target individual STATs engaged by cytokines. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 4 RESULTS: IL-27 induces a more sustained STAT1 activation than HypIL-6 in human Th-1 cells IL-6 and IL-27 are critical immuno-modulatory cytokines. While IL-6 engages a hexameric surface receptor comprised of two molecules of IL-6Ra and two molecules of GP130 to trigger the activation of STAT1 and STAT3 transcription factors (Figure 1a), IL-27 binds GP130 and IL-27Ra to trigger activation of the same STATs molecules (Figure 1a). Despite sharing a common receptor subunit, GP130, and activating similar signaling pathways, these two cytokines exhibit non-redundant immuno-modulatory activities, with IL-6 eliciting a potent pro- inflammatory response and IL-27 acting more as an anti-inflammatory cytokine. Here, we set to investigate the molecular rules that determine the functional specificity elicited by IL-6 and IL-27 using human Th-1 cells as a model experimental system. Due to the challenging recombinant expression of the human IL-27, we have recombinantly produced a murine single-chain variant of IL-27 (p28 and EBI3) which cross-reacts with the human receptors and triggers potent signaling, comparable to the signaling output produced by commercial human IL-27 (29) (Supp. Fig. 1a). In addition, we have used a linker-connected single-chain fusion protein of IL-6Ra and IL-6 termed HyperIL-6 (HypIL-6) (30) to diminish IL-6 signaling variability due to changes in IL-6Ra expression during T cell activation (31). CD4+ T cells from human buffy coat samples were isolated by magnetic activated cell sorting (MACS) and grew under Th-1 polarizing conditions. Th-1 cells were then used to study in vitro signaling by IL-27 and IL-6 (Supp. Fig. 1b). We took advantage of a barcoding methodology allowing high-throughput multiparameter flow cytometry to perform detailed dose/response and kinetics studies induced by HypIL-6 and IL-27 in Th-1 cells (32) (Supp. Fig. 1b). Dose- response experiments with IL-27 and HypIL-6 on Th-1 cells showed concentration-dependent phosphorylation of STAT1 and STAT3. Phosphorylation of STAT1/3 was more sensitive to activation by IL-27 with an EC50 of ~20pM compared to ~400pM for HypIL-6 (Figure 1b). Despite this difference in sensitivity, both cytokines yielded the same activation amplitude for pSTAT3. For pSTAT1, however, we observed a significantly reduced maximal amplitude for HypIL-6 relative to IL-27 (Figure 1b). We next performed kinetic studies to assess whether the poor STAT1 activation by HypIL-6 was a result from different activation kinetics. For STAT3, we saw the peak of phosphorylation after ~15-30 minutes, followed by a gradual decline. Both cytokines exhibited an almost identical sustained pSTAT3 profile, with ~20% of activation still seen after 3h of continuous stimulation. Interestingly, IL-27 did not only activate STAT1 with higher amplitude but also more sustained than HypIL-6 (Figure 1c). This could be better appreciated when pSTAT1 levels were normalized to maximal MFI for each cytokine, with IL- 27 inducing clearly a more sustain phosphorylation of STAT1 than HypIL-6 (Supp. Fig. 1c). The same phenotype was observed in other T-cell subsets of activated PBMCs (Supp. Fig. 1d). As cell surface GP130 levels are significantly reduced upon T-cell activation (33), we next investigated whether the transient STAT1 activation profile induced by HypIL-6 resulted from limited availability of GP130. For that we generated a RPE1 cell clone stably expressing ten times higher levels of GP130 in its surface (Figure 1d, right panel). Stimulation of this RPE1 clone with HypIL-6 resulted in a more sustained activation of STAT3, with very little effect on STAT1 activation kinetics when compared to RPE1 wild type cells, suggesting that GP130 receptor density does not contribute to the transient STAT1 activation kinetics elicited by HypIL-6 (Figure 1d). Ligand-induced cell-surface receptor assembly by IL-27 and HypIL-6 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 5 We next investigated whether IL-27 and HypIL-6 elicited differential cell surface receptor engagement that could explain their distinct signaling output. For that, we measured the dynamics of receptor assembly in the plasma membrane of live cells by simultaneous dual- colour total internal reflection fluorescence (TIRF) imaging. RPE1 cells were chosen as a model experimental system since they do not express endogenous IL-27Ra (Supp. Fig. 1e). We used previously described RPE1 GP130 KO cells (Supp. Fig. 2a) (34) to transfect and express tagged variants of IL-27Ra and GP130, to allow quantitative site-specific fluorescence cell surface labelling by dye-conjugated nanobodies (NBs) (Figure 1e) as recently described in (35). For both IL-27Ra and GP130 we found a random distribution and unhindered lateral diffusion of individual receptor monomers (Figure 1f). Single molecule co- localization combined with co-tracking analysis was then used to identify correlated motion of IL-27Ra and GP130 which was taken as a readout for receptor heterodimer formation (36) (Figure 1f, Figure 1 supp. Movie 1). In the resting state, we did not observe pre-assembly of IL-27Ra and GP130. However, after stimulation with IL-27 we found substantial heterodimerization (Figure 1f & 1g, Supp. Fig. 2b, Figure 1 supp. Movie 1 & 2). At elevated laser intensities, bleaching analysis of individual complexes confirmed a one-to-one (1:1) complex stoichiometry of IL-27Ra and GP130, whereas single-molecule Förster resonance energy transfer (FRET) further corroborated close molecular proximity of the two receptor chains (Figure 1h). We also observed association and dissociation events of receptor heterodimers, pointing to a dynamic equilibrium between monomers and dimers as proposed for other heterodimeric cytokine receptor systems (37, 38) (Figure1 supp. Movie 3). To measure homodimerization of GP130 by HypIL-6, we stochastically labelled GP130 with equal concentrations of the same NB species conjugated to either of the two dyes (39). We saw strong homodimerization of GP130 after stimulation with HypIL-6 (Figure 1g, Supp. Fig. 2b , Figure 1 supp. Movie 4). Homodimerization was confirmed either by single- color dual-step bleaching or dual-color single-step bleaching as shown for other homodimeric cytokine receptors (Supp. Fig. 2c) (40). For both cytokine receptor systems, we saw a cytokine-induced reduction of the diffusion mobility, which has been ascribed to increased friction of receptor dimers diffusing in the plasma membrane. However, we note that HypIL-6 stimulation impaired diffusion of GP130 more strongly than IL-27 did, possibly indicating faster receptor internalization (Supp. Fig. 2d). Based on the dimerization data, we were able to calculate the two-dimensional equilibrium dissociation constants (𝐾!"!) according to the law of mass action for a dynamic monomer-dimer equilibrium: for IL-27-induced heterodimerization of IL-27Ra and GP130, we calculated a 2D KD of ~0.81 µm-2. In activated T-cells with high levels and a significant excess of IL-27Ra over GP130, this 𝐾!"! ensures strong receptor assembly by IL-27 (41). The 2D KD for GP130 homodimerization by HypIL-6 was ~0.21 µm-2. This higher affinity is most likely due to the two high-affinity binding sites engaged in the hexameric receptor complex (9). However, in T-cells the expression of GP130 can be particularly low, thus, probably limiting HypIL-6. Taken together, these experiments marked ligand-induced receptor assembly as the initial step triggering downstream signaling for both IL-27 and HypIL-6, with no obvious differences in their receptor activation mechanism which could support the observed more sustained STAT1 activation elicited by IL-27. Mathematical and statistical analysis of HypIL-6 and IL-27 induced STAT kinetic responses To gain further insight into the molecular rules and kinetics that define IL-27 sustained STAT1 phosphorylation, we developed two mathematical models of the initial steps of HypIL-6 and .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 6 IL-27 receptor-mediated signaling, respectively. The mathematical model for each cytokine considers the following events: i) cytokine association and dissociation to a receptor chain (Figure 2a, Supp. Fig. 3a and 3b, top panel), ii) cytokine-induced dimer association and dissociation (Supp. Fig. 3a and 3b, bottom panel), iii) STAT1 (or STAT3) binding and unbinding to dimer (Supp. Fig. 3c and 3d), iv) STAT1 (or STAT3) phosphorylation when bound to dimer (Supp. Fig. 3c and 3d), v) internalisation/degradation of complexes (Supp. Fig. 3e and 3f), and vi) dephosphorylation of free STAT1 (or STAT3) (Supp. Fig. 3g). Details of model assumptions, model parameters and parameter inference have been provided in the Material and Methods under Mathematical models and Bayesian inference. We first wanted to explore if there existed a potential feedback mechanism in the way in which receptor molecules are internalised/degraded over time. To this end, and for each cytokine model, we considered two hypotheses: hypothesis 1 assumes that receptor complexes (Supp. Fig. 3e and 3f) are internalised with rate proportional to the concentration of the species in which they are contained (e.g., different dimer types), and hypothesis 2, that receptor complexes are internalised with rate proportional to the product of the concentration of the species in which they are contained and the sum of the concentrations of free phosphorylated STAT1 and STAT3. Hypothesis 2 is consistent with a negative feedback mechanism in which pSTAT molecules translocate to the nucleus, where they increase the production of negative feedback proteins such as SOCS3. As described in the Material and Methods (Mathematical models and Bayesian inference) we made use of the RPE1 experimental data set to carry out mathematical model selection for the two different hypotheses. We found that hypothesis 1 could explain the data better than hypothesis 2, with a probability of 70%. This result can be seen in Figure 2b, in which we plot, for different values of the distance threshold between the mathematical model output and the data (see Mathematical models and Bayesian inference in Material and Methods, for details), the relative probability of each hypothesis, where hypothesis 1 is denoted 𝐻# and hypothesis 2 is denoted 𝐻". It can be observed that for smaller values of the distance threshold, which indicate better support from the data to the mathematical model, the relative probability of hypothesis 1 is higher than that of hypothesis 2. We then made use of this result to explore the mathematical models for both cytokines under hypothesis 1, in particular we performed parameter calibration. To this end (and as described in Material and Methods under Mathematical models and Bayesian inference), we carried out Bayesian inference together with the mathematical models (hypothesis 1) and the experimental data sets to quantify the reaction rates (see Supp. Fig. 3) and initial molecular concentrations (see Table 1 and Table 2). The Bayesian parameter calibration of the two models of cytokine signaling allows one to quantify the observed kinetics of pSTAT1/3 phosphorylation induced by HypIL-6 and IL-27 in RPE1 and Th-1 cells (Figure 2c). Substantial differences in STAT association rates to and dissociation rates from the dimeric complexes were inferred to critically contribute to defining pSTAT1/3 kinetics. Figure 2d shows the kernel density estimates (KDEs) for the posterior distributions of the rate constants and initial concentrations in the models. 𝑘$% & denotes the rate at which STAT𝑖 binds to GP130 and 𝑘$' & denotes the rate at which STAT𝑖 binds to IL-27Ra, for 𝑖 ∈ {1,3}. Our results indicate that STAT1 and STAT3 exhibit different binding preferences towards IL-27Ra and GP130, respectively. While STAT1 exhibits stronger binding to IL-27Ra than GP130 (𝑘#' & > 𝑘#% & ), STAT3 exhibits stronger binding to GP130 than IL-27Ra, (𝑘(%& > 𝑘(' & ) in agreement with previous observations (42). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 7 IL-27Rα cytoplasmic domain is required for sustained pSTAT1 kinetics The Bayesian inference carried out with the experimental data and the mathematical models clearly indicated statistically significant differences in the binding rates of STAT1/STAT3 to GP130 and IL-27Ra, to account for the different phosphorylation kinetics exhibited by HypIL- 6 and IL-27. Thus, we next investigated whether the more sustained STAT1 activation by IL- 27 resulted from its specific engagement of IL-27Ra. For that, we used RPE1 cells, which do not express IL-27Ra (Supp. Fig. 1e), to systematically dissect the contribution of the IL-27Ra cytoplasmic domain to the differential pSTAT activation by IL-27. IL-27Ra’s intracellular domain is very short and only encodes two Tyr susceptible to be phosphorylated in response to IL-27 stimulation, i.e., Tyr543 and Ty613 (Figure 3a). We mutated these two Tyr to Phe to analyse their contribution to IL-27 induced signaling. We stably expressed WT IL-27Ra as well as different IL-27Ra Tyr mutants in RPE1 cells with comparable cell surface expression levels (Figure 3b). Importantly, this reconstituted experimental system mimicked the pSTAT1/3 activation kinetics of T-cells (Supp. Fig. 4a). As the endogenous GP130 expression levels remain unaltered, all generated clones exhibited very comparable responses to HypIL- 6 (Figure 3b, bottom panels). IL-27 triggered comparable levels of STAT1 and STAT3 activation in RPE1 cells reconstituted with IL-27Ra WT and IL-27Ra Y543F mutant, suggesting that this Tyr residue does not contribute to signaling by this cytokine (Figure 3b and Supp. Fig. 4b). In RPE1 cells reconstituted with the IL-27Ra Y613F or Y543F-Y613F mutants, IL-27 stimulation resulted in 80% of the STAT3 activation, but only 20% of the STAT1 activation levels induced by this cytokine relative to IL-27Ra WT (Figure 3b) (43). These observations suggest a tight coupling of STAT phosphorylation to one of the receptor chains; namely, IL-27Ra with pSTAT1 and GP130 with pSTAT3, respectively. We next tested how the cytoplasmic domains of GP130 and IL-27Ra shape the pSTAT kinetic profiles. Thus, we generated a stable RPE1 clone expressing a chimeric construct comprised of the extracellular and transmembrane domain of IL-27Ra but the cytoplasmic domain of GP130 (Figure 3c, Supp. Fig. 5a). Again, as both cell lines express unaltered endogenous GP130 levels, they exhibited comparable responses to HyIL-6 (Figure 3c). Strikingly, this domain-swap resulted in a transient pSTAT1 kinetic response by IL-27 comparable to HypIL-6 stimulation. STAT3 activation on the other hand remained unaltered suggesting that the cytoplasmic domain of IL-27Ra is essential for a sustained pSTAT1 response but not for pSTAT3. Two plausible scenarios could explain the observed pSTAT1/3 activation differential by HypIL- 6 and IL-27: i) IL-27Ra-JAK2 complex phosphorylates STAT1 faster than GP130-JAK1 complex or ii) pSTAT1 is more quickly dephosphorylated in the IL-6/GP130 receptor homodimer. In the latter case, pSTAT deactivation by constitutively expressed phosphatases could be an additional factor of regulation. Indeed, SHP-2 has been described to bind to GP130 and shape IL-6 responses (44). However, our Bayesian inference results (together with the mathematical models and the experimental data) identified the STAT/receptor association rates as the only rates that could account for the greater and more sustained activation of STAT1 by IL-27. We note (as described in the Material and Methods) that the phosphorylation rate, denoted by q, of STAT1 and STAT3 when bound to a dimer (homo- or hetero-) has been assumed to be independent of the STAT type and the receptor chain. Moreover, the model also included dephosphorylation of free pSTAT molecules, and predicted that the rates at which these reactions occur (𝑑# and 𝑑() had rather similar posterior distributions, hence arguing against the potential role of phosphatases to specifically target STAT1 upon HypIL-6 stimulation. To distinguish between the two plausible scenarios, we next .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 8 determined the rates of pSTAT1/3 dephosphorylation by blocking JAK activity upon cytokine stimulation making use of the JAK inhibitor Tofacitinib in RPE1 cells. Tofacitinib was added 15 minutes after stimulation with either cytokine and pSTAT1 and pSTAT3 levels were measured at the indicated times. JAK inhibition markedly shortened the pSTAT1/3 activation profiles induced by both cytokines (Figure 3d, Supp. Fig. 5b). The relative dephosphorylation rates could then be determined by the signal intensity ratio of +/- Tofacitinib. Even though pSTAT1 levels were more affected by JAK inhibition than those of pSTAT3, the observed relative changes were nearly identical for IL-27 and HypIL-6. These findings were also confirmed for Th-1 cells (Supp. Fig. 5c & 5d) and indicate, that selective phosphatase activity cannot serve as an explanation for the pSTAT1/3 differential by HypIL-6 and IL-27, in agreement with our mathematical modelling predictions. Similarly, we tested whether neosynthesis of feedback inhibitors such as SOCS3 (19) would selectively impair signaling by HypIL-6 but not by IL-27. To this end we pre-treated cells with Cycloheximide (CHX) and followed the pSTAT1/3 kinetics induced by the two cytokines (Supp. Fig. 6a & 6b). CHX treatment resulted in more sustained pSTAT3 activity for both cytokines. To our surprise, STAT1 phosphorylation by IL-27 was even more sustained while pSTAT1 levels induced by IL-6 remained unaffected. These observations exclude that feedback inhibitors selectively impair STAT1 activation kinetics by HypIL-6 and thus do not account for the faster STAT1 dephosphorylation kinetics observed under HypIL-6 stimulation. Overall our data from the chimera and mutant experiments, which were not used in the Bayesian calibration, provide strong and independent support, as well as validation, to the mathematical models of HypIL- 6 and IL-27 signaling, and point to the differential association/dissociation of STAT1 and STAT3 to IL-27Ra and GP130, respectively, as the main factor defining STAT phosphorylation kinetics in response to HypIL-6 and IL-27 stimulation. Unique and overlapping effects of IL-27 and HypIL-6 on the Th-1 phosphoproteome Thus far, we have investigated the differential activation of STAT1/STAT3 induced by HypIL- 6 and IL-27. Next, we asked whether IL-27 and IL-6 induced the activation of additional and specific intracellular signaling programs that could contribute to their unique biological profiles. To this end, we investigated the IL-27 and HypIL-6 activated signalosome using quantitative mass-spectrometry-based phospho-proteomics. MACS-isolated CD4+ were polarized into Th- 1 cells and expanded in vitro for stable isotope labelling by amino acids in cell culture (SILAC). Cells were then stimulated for 15 min with saturating concentrations of IL-27, HypIL-6 or left untreated. Samples were enriched for phosphopeptides (Ti-IMAC), subjected to mass spectrometry and raw files analysed by MaxQuant software (Supp. Fig. 7a). In total we could quantify ~6400 phosphopeptides from 2600 proteins, identified across all conditions (unstimulated, IL-27, HypIL-6) for at least two out of three tested donors. For IL-27 and HypIL- 6 we detected similar numbers of significantly upregulated (87 vs. 78) and downregulated (155 vs. 140) phosphorylation events (Figure 4a) and systematically categorized them in context with their cellular location and ascribed biological functions (Supp. Fig. 7b & 7c) (45). The two cytokines shared approximately half of the upregulated and one third of the downregulated phospho-peptides (Supp. Fig. 8a) but also exhibited differential target phosphorylation (Figure 4b and Supp. Fig. 8b). As expected, we found multiple members of the STAT protein family among the top phosphorylation hits by the two cytokines, validating our study (Figure 4b & 4c). In line with our previous observations, we detected the same relative amplitudes for tyrosine phosphorylated STAT3 and STAT1. In addition to tyrosine- phosphorylation, we detected robust serine-phosphorylation on S727 for STAT1 and STAT3 (Figure 4c). While pS-STAT1 activity correlated with pY-STAT1 with IL-27 being more potent .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 9 than HypIL-6, this was not the case for STAT3. Despite an identical pY-STAT3 phosphorylation profile, HypIL-6 induced a ~50% higher pS-STAT3 relative to IL-27 (Figure 4c). These results were corroborated, following the phosphorylation kinetics of pS- STAT1 and pS-STAT3 by flow-cytometry (Figure 4d). Given the overlapping phospho-proteomic changes, gene ontology (GO) analysis associated several sets of phosphopeptides with biological processes that were mostly shared between both cytokines (Figure 4e, Supp. Fig. 8c). A large set of phospho-peptides was linked to transcription initiation (including JAK/STAT signaling) or mRNA modification (Figure 5e). Interestingly, IL-27 stimulation was associated to negative regulation of RNA polymerase II, whereas a positive regulation was detected for HypIL-6. A closer look into the functional regulation of RNA-pol II activity by the two cytokines revealed that multiple proteins involved in this process were differentially regulated by HypIL-6 and IL-27 (Figure 5f). While positive regulators of RNA-pol II transcription, such as Negative Elongation Factor A (NELFA), PPM1G, RCHY1 and POL2RA, were much more phosphorylated in response to HypIL-6 than IL-27, negative regulators of RNA-pol II transcription, such as LARP7, were much more engaged by IL-27 treatment than by HypIL-6 (Figure 4f). Interestingly, in a previous study we linked RNA-pol II regulation with the levels of STAT3 S727phosphorylation induced by HypIL- 6 via recruitment of CDK8 to STAT3 dependent genes (46). Our phospho-proteomic analysis thus, suggests that IL-27 and HypIL-6 recruit different transcriptional complexes that ultimately could contribute to provide gene expression specificity by the two cytokines. Additionally, we identified several interesting IL-27-specific phosphorylation targets. One example was Ubiquitin Protein Ligase E3 Component N-Recognin 5 (UBR5). Phosphorylated UBR5 leads to ubiquitination and subsequent degradation of Rorgc (47), the key transcription factor required for Th-17 lineage commitment, thus limiting Th-17 differentiation (Supp. Fig. 8d). A second example is PAK2, which phosphorylates and stabilizes FoxP3 leading to higher levels of TReg cells (Supp. Fig. 8d) (48). Moreover, IL-27 stimulation led to a very strong phosphorylation of BCL2-associated agonist of cell death (BAD), a critical regulator of T-cell survival and a well-known substrate of the PAK2 kinase (49). Overall, our data show a large overlap between the IL-6 and IL-27 signaling program, with a strong focus on JAK/STAT signaling. However, IL-27 engages additional signaling intermediaries that could contribute to its unique immuno-modulatory activities. Further studies will be required to assess how these IL-27 specific signaling pockets contribute to shape IL-27 responses. Kinetic decoupling of gene induction programs depends on sustained STAT1 activation and IRF1 expression by IL-27 Next, we investigated how the different kinetics of STAT activation induced by HypIL-6 and IL-27 ultimately modulated gene expression by these two cytokines. To this end, we performed RNA-seq analysis of Th-1 cells stimulated with HypIL-6 or IL-27 for 1h, 6h and 24h to obtain a dynamic perspective of gene regulation. We identified ~12500 shared genes that could be quantified for all three donors and throughout all tested experimental conditions. In a first step, we compared how similar the gene programs induced by HypIL-6 and IL-27 were. Principal component analysis (PCA) was run for a subset of genes, found to be significantly up- (total ~250) or downregulated (total ~950) by either of the experimental conditions (p value£ 0.05, fold change ³+2 or £-2). At one hour of stimulation HypIL-6 and IL-27 induced very similar gene programs, with the two cytokines clustering together in the PCA analysis regardless of whether we focused on the subsets of upregulated or downregulated genes (Figure 5a). However, the similarities between the two cytokines changed dramatically in the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 10 course of continuous stimulation. While the two cytokines induced the downregulation of comparable gene programs at 6h and 24h stimulation, as denoted by the close clustering in the PCA analysis (Figure 5a, right panel) and the fraction of shared genes (~40%, Figure 5b, Supp. Fig. 9a-c, Supp. Fig. 10a), this was not observed for upregulated genes. Although the two cytokines induced comparable gene upregulation programs after 1h of stimulation (~80% shared genes), this trend almost completely disappeared at later stimulation times (Figure 5a & 5b, Supp. Fig. 10b). This is well-reflected by the absolute numbers of up- or downregulated genes observed for IL-27 and HypIL-6 (Figure 5c). Stimulation with both cytokines yielded a similar trend of gene downregulation (Figure 5c, right panel). However, while HypIL-6 stimulation resulted in a spike of gene upregulation at 1h that quickly disappeared at later stimulation times, IL-27 stimulation was capable to increase the number of upregulated genes beyond 6h of stimulation and maintains it even after 24h (Figure 5c, left panel). This “kinetic decoupling” of gene induction seems to have a striking functional relevance. Gene set enrichment analysis (GSEA) (50) identified several reactome pathways to be enriched for IL-27 over the course of stimulation – most of them linked with Interferon signaling and immune responses (Figure 5d). In contrast, for HypIL-6 stimulation no pathway enrichment was detected. Most importantly, the vast majority of IL-27-induced genes that were associated to these pathways belonged to genes upregulated by IL-27 treatment and that have been previously linked to STAT1 activation (51, 52) (Supp. Fig. 10c). Although HypIL-6 treatment resulted in the induction of some of these genes, their expression was very transient in time, in agreement with the short STAT1 activation kinetic profile exhibited by HypIL-6 (Supp. Fig. 10b & 10c). Next, we performed cluster analysis to find further similarities and discrepancies between the gene expression programs engaged by HypIL-6 and IL-27 (Figure 5e). Since genes downregulated by IL-27 and HypIL-6 showed overall good similarity throughout the whole kinetic series, we mainly focused on differences in upregulated gene induction. We identified three functionally relevant gene clusters. The first gene cluster corresponds to genes that are transiently and equally induced by HypIL-6 and IL-27. These genes peak after one hour and return to basal levels after 6h and 24h of stimulation (Figure 5e). Interestingly, this cluster contains classical IL-6-induced and STAT3-dependent genes, such as members of the NFkB and Jun/Fos transcriptional complex (53), as well as the feedback inhibitor Suppressor Of Cytokine Signaling 3 (SOCS3) (54) and T-cell early activation marker CD69. (Figure 5e). A second cluster of genes corresponded to genes that were persistently activated by IL-27 but only transiently by HypIL-6 (Figure 5e). Among these genes we found classical STAT1- dependent genes, such as SOCS1, Programmed Cell Death Ligand 1 (PDL1 = CD274) (55) and members of the interferon-induced protein with tetratricopeptide repeats (IFIT) family. The third cluster of genes corresponded to genes exhibiting strong and sustained activation by IL- 27 after 6h and 24h stimulation but no activation by HypIL-6 at all. This “2nd wave” of gene induction by IL-27 was almost exclusively comprised of classical Interferon Stimulated Genes (ISGs) (Supp. Fig. 10c), such as STAT1 & 2, Guanylate Binding Protein 1 (GBP1), GBP2, 4 & 5, and IRF8 & 9. It is worth mentioning, that genes in the third cluster appear to require persistent STAT1 activation (56, 57) and were the basis for the IFN signature identified in our reactome pathway analysis. Still, we were surprised about the magnitude of this 2nd gene wave. Even though IL- 27 exerts a sustained pSTAT1 kinetic profile, pSTAT1 levels were down to ~10% of maximal amplitude after 3h of stimulation. We reasoned that additional factors could further amplify the STAT1 response for IL-27 but not for HypIL-6. Within the 1st wave of STAT1-dependent genes, .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 11 we also spotted the transcription factor Interferon Response Factor 1 (IRF1), that was continuously induced throughout the kinetic series in response to IL-27 but only transiently spiking after 1h of HypIL-6 stimulation (Figure 5e). IRF1 expression was shown to prolong pSTAT1 kinetics (58) and to be required for IL-27-dependent Tr-1 differentiation and function (59). We confirmed the kinetics of IRF1 protein expression by flow cytometry and showed higher and more sustained protein levels after IL-27 stimulation relative to HypIL-6 (Figure 6a). Next, we tested in our RPE1 cell system, whether siRNA mediated knockdown of IRF1 would alter the gene induction profiles of certain STAT1 or STAT3-dependent marker genes. In RPE1 cells, reconstituted with IL-27Ra, IRF1 protein levels were peaking around 6h after stimulation with IL-27 and transfection with IRF1-targeting siRNA knocked down expression by >80% (Figure 6b). Importantly, knockdown of IRF1 did not alter the overall kinetics of pSTAT1 and pSTAT3 activation (Figure 6c). Induction of STAT1-dependent genes STAT1, GBP5 and OAS1 as well as STAT3-dependent gene SOCS3 were followed by RT qPCR (Figure 6d). Interestingly, up to 6h of stimulation, the gene induction curves were identical for control- and IRF1-siRNA treated cells. Later than 6h – that is, when IRF1 protein levels are peaking – the gene induction was decreased between 40-70% in absence of IRF1. Strikingly, expression of SOCS3, a classical STAT3-dependent reporter gene was transient and independent on IRF1 levels, highlighting that IRF1 selectively amplifies STAT1-dependent gene induction. Taken together our data support a scenario whereby IL-27 by exhibiting a kinetic decoupling of STAT1 and STAT3 activation is capable of triggering independent gene expression waves, which ultimately contribute to shape its distinct biology. IL-27-induced STAT1 response drives global proteomic changes in Th-1 cells Next, we aimed to uncover how the distinct gene expression programs engaged by HypIL-6 and IL-27 ultimately relate to alterations of the Th-1 cell proteome. For that, we continuously stimulated SILAC labelled Th-1 cells for 24h with saturating doses of IL-27 and HypIL-6 and compared quantitative proteomic changes to unstimulated controls (Figure 7a). We quantified ~3600 proteins present in all three biological replicates and in all tested conditions (unstimulated/IL-27/HypIL-6). Both cytokines downregulated a similar number of proteins (IL- 27: 57, HypIL-6: 52) (Figure 7b) with approximately half of them being shared by the two cytokines, mimicking our observations in the RNA-seq studies (Figure 7c, Supp. Fig. 11a). With 68 upregulated proteins, IL-27 was almost twice as potent as HypIL-6 (35 proteins) with very little overlap. Among the upregulated proteins by IL-27 but not HypIL-6, we detected several proteins with described immune-modulatory functions on T-cells. One of these proteins was Transforming Growth Factor b (TGF-b), which is a key regulator with pleiotropic functions on T-cells (60). TGF-b has been identified to synergistically act with IL-27 to induce IL-10 secretion from Tr-1 cells – thus accounting for one of the key anti-inflammatory functions of IL-27 (61). On the other hand, we also found SELPLG-encoded protein RSGL-1 which is critically required for efficient migration and adhesion of Th-1 cells to inflamed intestines (62, 63). Interestingly, we found LARP7 moderately upregulated by IL-27. This negative regulator for RNA pol II was also identified in our phospho-target screening and selectively engaged by IL-27 (Figure 4f). IL-27 and HypIL-6 share ~60% of downregulated proteins, but without strong functional patterns. Both cytokines downregulated several proteins related to mitotic cell cycle (LIG1, CSNK2B, PSMB1) mRNA processing and splicing (NCBP2, PCBP2, NUDT21) (64). Strikingly, a significant number (~40%) of proteins upregulated by IL-27 belong to the group of ISGs (Figure 7b & 7c, Supp. Fig. 11b). This particular set of proteins including STAT1, .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 12 STAT2, MX Dynamin like GTPase 1 (MX1), Interferon Stimulated Gene 20 (ISG20) or Poly(ADP-Ribose) Polymerase Family Member 9 (PARP9) was not markedly altered by HypIL-6. Of note: the overall expression patterns of the most significantly altered proteins are congruent to the gene induction patterns observed after 6h and 24h (Figure 7d & 7e, Supp. Fig. 10b). Similar to this, GSEA reactome analysis identified again pathways associated with interferon signaling and cytokine/immune system but failed to detect any significant functional enrichment by HypIL-6 (Figure 7e, Supp. Fig. 11b & 11c). Finally, we correlated RNAseq-based gene induction patterns with detected proteomic changes. To our surprise we only found a relatively low number of shared hits. However, the identified proteins belong exclusively to a group upregulated by IL-27 (Figure 7f). They are all located in the “2nd gene wave” cluster and all of them are regulated by ISGs (Figure 5e). Taken together these results provide compelling evidence that sustained pSTAT1 activation by IL-27 accounts for its gene induction and proteomic profiles, thus, giving a mechanistic explanation for the diverse biological outcomes of IL-27 and IL-6. Our observations are in good agreement with previous findings in cancer cells, showing that particularly the involvement of STAT1 activation is responsible for proteomic remodeling by IL-27 (65). Receptor and STAT concentrations determine the nature of the IL-6/IL-27 response Our data suggest that STAT molecules compete for binding to a limited number of phospho- Tyr motifs in the intracellular domains of cytokine receptors. A direct consequence derived from this hypothesis is that cells can adjust and change their responses to cytokines by altering their concentrations of specific STATs or receptors molecules. To assess to what degree immune cells differ in their expression of cytokine receptors and STATs, we investigated levels of IL-6Ra, GP130, IL-27Ra, STAT1 and STAT3 protein expression across different immune cell populations making use of the Immunological Proteomic Resource (ImmPRes - http://immpres.co.uk) database. Strikingly, the level of expression of these proteins change dramatically across the populations studied (Figure 8a), suggesting that these cells could potentially produce very different responses to HypIL-6 and IL-27 stimulation. In order to quantify (and predict) how changes in expression levels of different proteins modify the kinetics of pSTAT, we made use of the two mathematical models of HypIL-6 and IL-27 stimulation and the parameters inferred with Bayesian methods. Our mathematical models could accurately reproduce the experimental results generated across our study, i.e., signaling by the IL-27Ra chimeric and IL-27Ra-Y616F mutant receptors and dose/response studies (Supp. Fig. 12a-c), making use of the posterior parameter distributions generated from the Bayesian parameter calibration. Having developed mathematical models which are able to accurately explain the experimental data (Supp. Fig. 5b and 5c) and reproduce independent experiments (Fig. 3b and 3c), we then sought to use the models to predict pSTAT signaling kinetics under different concentration regimes of receptors and STATs. To simplify the simulations, we focused our analysis in GP130 and STAT1 proteins, two of the proteins that greatly vary in the different immune populations (Figure 8a). As baseline values for the concentrations [𝐺𝑃130(0)], [𝐼𝐿27𝑅𝑎(0)] [𝑆𝑇𝐴𝑇1(0)] and [𝑆𝑇𝐴𝑇3(0)] we used approximately the median values from the posterior distributions for each parameter: [𝐺𝑃130(0)] = 25 nM, [𝐼𝐿27𝑅𝑎(0)] = 50 nM and [𝑆𝑇𝐴𝑇1(0)] = [𝑆𝑇𝐴𝑇3(0)] = 500 nM. To see the effect of varying GP130 concentrations on pSTAT signaling, we decreased the initial concentration of GP130 and simulated the model using the accepted parameters sets from the ABC-SMC to inform the other parameter values. A tenfold reduction on GP130 concentration ([𝐺𝑃130(0)] = 2.5𝑛𝑀) resulted in a striking loss in pSTAT1 levels induced by HypIL-6, with very little effect .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 13 on pSTAT3 levels induced by this cytokine (Figure 8b). pSTAT1/3 kinetics induced by IL-27 however was not affected by this decrease in GP130 concentration (Figure 8b). Interestingly, the HypIL-6 signaling profile predicted by our model at low GP130 concentrations strongly resemble the one induced by HypIL-6 in Th-1 cells (Figure 1c), where very low levels of GP130 are found, further confirming the robustness of the predictions generated by our mathematical models. When the concentration of STAT1 was increased by a factor of ten ([𝑆𝑇𝐴𝑇1(0)] = 5000 nM, both HypIL-6 and IL-27 induced significantly higher levels of pSTAT1 activation (Figure 8b). pSTAT3 levels were not affected for HypIL-6 stimulation but were decreased for IL-27 stimulation (Figure 8b), further indicating the competitive nature of the binding of STAT1 and STAT3 to IL-27Ra and GP130. Overall, our mathematical model predicts that changes on GP130 and STAT1 expression produce a substantial remodeling of the HypIL-6 and IL-27 signalosome, which ultimately could lead to aberrant responses. STAT1 protein levels in SLE patients modify HypIL-6 and IL-27 signaling responses STAT1 is a classical IFN responsive gene and STAT1 levels are highly increased in environments rich in IFNs (66). Thus, we next ask whether STAT1 levels would be increased in SLE patients, an examples of disease where IFNs have been shown to correlate with a poor prognosis, making use of available gene expression datasets (67). We did not find differences in the expression of GP130, IL-6Ra or IL-27Ra in SLE patients (Figure 8c). However, we detected a significant increase in the levels of STAT1 and STAT3 transcripts in these patients when compared to healthy controls, with the increase on STAT1 expression being significantly more pronounced (Figure 8c). Since our mathematical model predicted that increases in STAT1 expression could significantly change cytokine-induced cellular responses by HypIL-6 and IL-27, we next experimentally tested this prediction. For that, we primed Th-1 cells with IFNa2 overnight to increase total STAT1 levels (and to a lower extent STAT3) in these cells (Supp. Fig. 13a). While both HypIL-6 and IL-27 induced comparable levels of pSTAT3 in primed and non-primed Th-1 cells, levels of pSTAT1 induced by the two cytokines were significantly upregulated in primed Th-1 cells, resulting in a bias STAT1 response and confirming our model predictions (Figure 8d). We next investigated whether this bias STAT1 activation by HypIL-6 and IL-27 observed in IFNa2-primed Th-1 cells was also present in SLE patients. For that we collected PBMCs from six SLE patients or five age-matched healthy controls and measured STAT1 and STAT3 expression, as well as pSTAT1 and pSTAT3 induction by HyIL-6 and IL-27 after 15 min treatments in CD4 T cells. Importantly, comparable results to those obtained with IFN-primed Th-1 cells were obtained, with signaling bias towards pSTAT1 in CD4+ T cells from SLE patients stimulated with HypIL-6 and IL-27 (Figure 8e, Supp. Fig. 13b & c), further supporting the fact that STAT concentrations play a critical role in defining cytokine responses in autoimmune disorders. Our data show that STAT1 and STAT3 compete for phospho-Tyr motifs in GP130, with STAT3 having an advantage resulting from its tighter affinity to GP130. Finally, we asked whether crippling JAK activity by using sub-saturating doses of JAK inhibitors could differentially affect STAT1 and STAT3 activation by HypIL-6 and therefore rescue the altered cytokine responses found in SLE patients. To test this, RPE1 and Th-1 cells were stimulated with saturated concentrations of HypIL-6 and titrating the concentrations of Tofacitinib, a clinically approved JAK inhibitor. Strikingly, Tofacitinib inhibited HypIL-6 induced pSTAT1 more efficiently than pSTAT3 in both RPE1 cells and Th-1 cells (Figure 8f). At 50 nM concentration, Tofacitinib inhibited pSTAT1 levels induced by HypIL-6 by 60%, while only inhibited pSTAT3 levels by 30% (Figure 8f) – an effect that we did not observe for IL-27 stimulation (Supp. Fig. 13d). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 14 Overall, our results show that the changes in STATs concentration found in autoimmune disorders shape cytokine signaling responses and could contribute to disease progression. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 15 DISCUSSION: Cytokine pleiotropy is the ability of a cytokine to exert a wide range of biological responses in different cell types. This functional pleiotropy has made the study of cytokine biology extremely challenging given the strong cross-talk and shared usage of key components of their signaling pathways, leading to a high degree of signaling plasticity, yet still allowing functional selectivity (68, 69). Here we aimed to identify the underlying determinants that define cytokine functional selectivity by comparing IL-27 and IL-6 at multiple scales – ranging from cell surface receptors to proteomic changes. We show that IL-27 triggers a more sustained STAT1 phosphorylation than IL-6, via a high affinity STAT1/IL-27Ra interaction centered around Tyr613 on IL-27Ra. This in turn results in a more sustained IRF1 expression induced by IL-27, which leads to the upregulation of a second wave of gene expression unique to IL-27 and comprised of classical ISGs. We go one step further and show that this strong receptor/STAT coupling is altered in autoimmune disorders where STATs concentrations are often dysregulated. Increased expression of STAT1 in SLE patients biases HypIL-6 and IL-27 responses towards STAT1 activation, further contributing to the worsening of the disease. By using suboptimal doses of the JAK inhibitor Tofacitinib we show that specific STAT proteins engaged by a given cytokine can be targeted. Overall, our study highlights a new layer of cytokine signaling regulation, whereby STAT affinity to specific cytokine receptor phospho-Tyr motifs controls STAT phosphorylation kinetics and the identity of the gene expression program engaged, ultimately ensuing the generation of functional diversity through the use of a limited set of signaling intermediaries. The tight coupling of one receptor subunit to one particular STAT that we have identified in our study is a rather unusual phenomenon for heterodimeric cytokine receptor complexes, which has been first suggested by Owaki et al. (27). Generally, the entire signaling output driven by a cytokine-receptor complex emanates from a dominant receptor subunit, which carries several Tyr residues susceptible of being phosphorylated (70, 71). This in turn results in competition between different STATs for binding to shared phospho-Tyr motifs in the dominant receptor chain, leading to different kinetics of STAT phosphorylation as observed for IL-6 stimulation (15) (Figure 1b). Moreover, this localized signaling quantum allows phosphatases and feedback regulators – induced upon cytokine stimulation – to act in synergy to reset the system to its basal state, generating a very synchronous and coordinated signaling wave. Although very effective, this molecular paradigm presents its limitations. STAT competition for the same pool of phospho-Tyr makes the system very sensitive to changes in STAT concentration. IFNg primed cells, which exhibit increased STAT1 levels, trigger an IFNg- like STAT1 response upon IL-6 stimulation (16). IL-10 anti-inflammatory properties are lost in cells with high levels of STAT1 expression, as a result of a pro-inflammatory environment rich in IFNs (72). Indeed, we show that STAT1 transcripts levels are increased in Crohn’s disease and SLE patients and they contributed to alter IL-6 responses. Strikingly, IL-27 appears to have evolved away from this general model of cytokine signaling activation. Our results show that STAT1 activation by IL-27 is tightly coupled to IL-27Ra, while STAT3 activation by this cytokine mostly depends on GP130. This decoupled STAT1 and STAT3 activation by IL-27 is possible thanks to the presence of a putative high affinity STAT1 binding site on IL-27Ra that resembles the one present in IFNgR1 (41). As a result of this, IL-27 can trigger sustained and independent phosphorylation of both STAT1 and STAT3. This unique feature of IL-27 allows it to induce robust responses in dynamic immune environments. Indeed, our mathematical models of cytokine signaling and Bayesian inference, together with the experimental observations show that changes in receptor concentration minimally affected .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 16 pSTAT1/3 induced by IL-27, while they fundamentally alter IL-6 responses. Overall, our data show that cytokine responses are versatile and adapt to the continuously changing cell proteome, highlighting the need to measure cytokine receptors and STATs expression levels, in addition to cytokine levels, in disease environments to better understand and predict altered responses elicited by dysregulated cytokines. In recent years, it has become apparent that the stability of the cytokine-receptor complex influences signaling identity by cytokines (73). Short-lived complexes activate less efficiently those STAT molecules that bind with low affinity phospho-Tyr motif in a given cytokine receptor (34). Our current results further support this kinetic discrimination mechanism for STAT activation. Our statistical inference identified differences in STAT recognition to the cytokine receptor phospho-Tyr motifs as one of the major determinants of STAT phosphorylation kinetics. This parameter alone was sufficient to explain transient and sustained STAT1 phosphorylation induced by IL-6 and IL-27, respectively, without the need to invoke the action of phosphatases or negative feedback regulators such as SOCSs. Indeed, our results indicate that the rate of STAT1 dephosphorylation is similar between the IL-6 and IL-27 systems, suggesting that phosphatases do not contribute to these early kinetic differences. Moreover, blocking protein translation, and therefore the upregulation of negative feedback regulators by IL-6 treatment did not result in a more sustained STAT1 phosphorylation by IL-6, again indicating that the transient kinetics of STAT1 phosphorylation by IL-6 is encoded at the receptor level and does not require further regulation. However, recent reports have found that the amplitude of STAT1 phosphorylation in response to IL-6 is regulated by levels of PTPN2 expression, suggesting that phosphatases can play additional roles in shaping IL-6 responses beyond controlling the kinetics of STAT activation (74). STAT1 phosphorylation levels by IL-27 on the other hand were significantly more sustained in the absence of protein translation, suggesting that negative feedback mechanisms are required to downmodulate signaling emanating from high affinity STAT-receptor interactions. Overall our results suggest that while phosphatases and negative feedback regulators play an important role in maintaining cytokine signaling homeostasis (75), the kinetics of STAT activation appears to be already encoded at the level of receptor engagement, thus ensuring maximal efficiency and signal robustness. Cytokine signaling plasticity can occur at the level of receptor activation. In the past years, a scenario has emerged suggesting that the absolute number of signaling active receptor complexes is a critical determinant for signal output integration. Accordingly, specific biological responses were shown to be tuned either by abundance of cell surface receptors (76, 77) or by the level of receptor assembly (34, 38, 78). Here, we show for the first time that IL-27- induced dimerization of IL-27Ra and GP130 at the cell surface of live cells – in good agreement with previous studies on heterodimeric cytokine receptor systems (38, 73). For IL- 27, the receptor subunits IL-27Ra and GP130 can be expressed at different ratios as seen for naïve vs. activated T-cells (79) as well as intestinal cells (80). On T-cells, particularly after activation, IL-27Ra is expressed in strong excess over GP130, rendering GP130 as the limiting factor for receptor complex assembly (41). Interestingly, we observe that in addition to a faster kinetic of STAT1 phosphorylation, HypIL-6 treatment induces a lower maximal amplitude in pSTAT1 activation in T cells. This is in stark contrast to our results in RPE1 cells, where high abundance of GP130 (~3000-4000 copies of cell surface GP130) is found. In these cells both cytokines elicited similar amplitudes of STAT1 phosphorylation. Our results suggest that surface receptor density in synergy with STATs binding dynamics to phospho-Tyr motif .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 17 on cytokine receptors act to define the amplitude and kinetics of STAT activation in response to cytokine stimulation. The distinct STAT1 and STAT3 kinetic profiles induced by IL-6 and IL-27 are the prerequisite for time-correlated decoupling of genetic programs: a “shared GP130/STAT3-dependent wave” and an IL-27-“unique IL-27Ra/STAT1-dependent wave”. However, pSTAT1 levels induced by IL-27 at 3h were down to ~10% of maximal amplitude, suggesting that additional factors would be required to amplify the initial STAT1 response elicited by IL-27. We observed that IL-27 induces the expression of an early wave of classical STAT1-dependent genes, which is also shared by IL-6. However, while IL-27 induces the upregulation of these genes throughout the entire duration of the experiment, IL-6 only resulted in a transient spike. We reasoned that this additional factor required for IL-27 signal amplification would be among these early STAT1-dependent genes. Among this set of genes we found the transcription factor IRF1, which had been shown to act as a feedback amplificant for pSTAT1 activity (58). Importantly, IRF1 protein levels have been shown to be upregulated in response to IL-27 and IFNg but not to IL-6 stimulation in hepatocytes (81). IRF1 plays a key role in chromatin accessibility which is critically required for IL-27-induced differentiation of Tr1 cells and subsequent IL-10 secretion (59). Here, we could prove that the contribution of IRF1 on STAT1- but not STAT3-dependent genes is a generic feature of IL-27 signaling. This readily explains the significant transcriptomic overlap of IL-27 with type I (82) or type II interferons (15) after long-term stimulation with these cytokines. Along this line, it is not surprising that IL-27 – beyond its well-described effects on T-cell development – can also mount a considerable antiviral response as shown in hepatic cells and PBMCs (83, 84). Our results suggest that by modulating the kinetics of STAT phosphorylation, cytokines can modulate the expression of accessory transcription factors, such as IRF1, that act in synergy with STATs to fine-tune gene expression and provide functional diversity. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 18 ACKNOWLEDGMENTS We thank members of the Moraga, Molina-París, Piehler and Mitra laboratories for helpful advice and discussion. We thank G. Hikade and H. Kenneweg for technical support, C. P. Richter for providing software for single-molecule image analysis, R. Kurre (Integrated Bioimaging Facility Osnabrück) for support with fluorescence microscopy and the FingerPrints Proteomics facility (Dundee) for support with the mass spectrometry data. This work was supported by the StG, LS6, Wellcome-Trust-202323/Z/16/Z (IM EP), ERC-206-STG grant (IM JMF EP PKF), EMBO (SW 454–2017), DFG (SFB 944, P8/Z, JP), National Heart, Lung and Blood Institute (K22HL125593, MK) and Contrat de Plan Etat Région Hauts de France and Institut pour la Recherche sur le Cancer de Lille (SM SG). CMP and GL were supported by H2020, QuanTII. PJ is supported by the EPSRC, AstraZeneca and Smith Institute (Smith Institute CASE studentship, award reference 1969354). Numerical work was undertaken on ARC3, which is part of the High Performance Computing facilities at the University of Leeds, UK. COMPETING INTERESTS The authors declare that they have no competing interests. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 19 MATERIAL AND METHODS Protein expression and purification: Murine IL-27 was cloned as a linker-connected single-chain variant (p28+EBI3) as described in (29). Human HyperIL-6 (HypIL-6), and murine single-chain IL-27 were cloned into the pAcGP67-A vector (BD Biosciences) in frame with an N-terminal gp67 signal sequence and a C-terminal hexahistidine tag, and produced using the baculovirus expression system, as described in (85). Baculovirus stocks were prepared by transfection and amplification in Spodoptera frugiperda (Sf9) cells grown in SF900II media (Invitrogen) and protein expression was carried out in suspension Trichoplusiani ni (High Five) cells grown in InsectXpress media (Lonza). Purification was performed using the method described in (86). For IL-27, the cells were pelleted with centrifugation at 2000 rpm, prior to a precipitation step through addition of Tris pH 8.0, CaCl2 and NiCl2 to final concentrations of 200mM, 50mM and 1mM respectively. The precipitate formed was then removed through centrifugation at 6000 rpm. Nickel-NTA agarose beads (Qiagen) were added and the target proteins purified through batch binding followed by column washing in HBS-Hi buffer (HBS buffer supplemented to 500mM NaCl and 5% glycerol, pH 7.2). Elution was performed using HBS-Hi buffer plus 200mM imidazole. Final purification was performed by size exclusion chromatography on an ENrich SEC 650 300 column (Biorad), again equilibrated in HBS-Hi. Concentration of the purified sample was carried out using 10kDa Millipore Amicon-Ultra spin concentrators. For HypIL-6, proteins were purified likewise, but in 10 mM HEPES (pH 7.2) containing 150 mM NaCl. Recombinant cytokines were purified to greater than 98% homogeneity. For cell surface labeling, the anti-GFP nanobody (NB) “enhancer” and “minimizer” were used, which bind mEGFP with subnanomolar binding affinity (87). NB was cloned into pET-21a with an additional cysteine at the C-terminus for site-specific fluorophore conjugation in a 1:1 fluorophore:nanobody stoichiometry. Furthermore, (PAS)5 sequence to increase protein stability and a His-tag for purification were fused at the C-terminus. Protein expression in E. coli Rosetta (DE3) and purification by immobilized metal ion affinity chromatography was carried out by standard protocols. Purified protein was dialyzed against HEPES pH 7.5 and reacted with a two-fold molar excess of DY647 maleimide (Dyomics), ATTO 643 maleimide (AT643) and ATTO Rho11 maleimide (Rho11) (ATTO-TEC GmbH), respectively. After 1 h, a 3-fold molar excess (with respect to the maleimide) of cysteine was added to quench excess dye. Protein aggregates and free dye were subsequently removed by size exclusion chromatography (SEC). A labeling degree of 0.9-1:1 fluorophore:protein was achieved as determined by UV/Vis spectrophotometry. CD4+ T cell purification and Th-1 differentiation: Human buffy coats were obtained from the Scottish Blood Transfusion Service and peripheral blood mononuclear cells (PBMCs) of healthy donors were isolated from buffy coat samples by density gradient centrifugation according to manufacturer’s protocols (Lymphoprep, STEMCELL Technologies). From each donor, 100x106 PBMCs were used for isolation of CD4+ T-cells. Cells were decorated with anti-CD4FITC antibodies (Biolegend, #357406) and isolated by magnetic separation according to manufacturer’s protocols (MACS Miltenyi) to a purity >98% CD4+. Freshly isolated resting CD4+ T cells (3x107 per donor) were activated under Th-1 polarizing conditions using ImmunoCult™ Human CD3/CD28 T Cell Activator (StemCell, Cat#10971) following manufacturer instructions for 3 days in RPMI-1640, 10% v/v .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 20 FBS, 100 U/ml penicillin-streptomycin (Gibco) in the presence of the cytokines IL-2 (Novartis, #709421, 20 ng/ml), anti-IL-4 antibody (10 ng/ml, BD Biosciences, #554481), IL-12 (20 ng/ml, BioLegend, #573002). After three days of priming, cells were expanded for another 5 days in the presence of IL-2 (20 ng/ml). Human SLE patient samples: This study was authorized by the French Competent Authority dealing with Research on Human Biological Samples namely the French Ministry of Research. The Authorization number is ECH 19/04. To issue such authorization, the Ministry of Research has sought the advice of an independent ethics committee, namely the “Comité de Protection des Personnes,” which voted positively, and all patients gave their written informed consent. The healthy volunteer was recruited to serve as healthy control individuals. Healthy and patients’ blood samples were collected in heparinized tubes (BD Vacutainer 368886, BD Biosciences San Jose, CA, USA) and PBMC samples were isolated using Ficoll (Pancoll, Pan Biotech #P04-60500) density gradient centrifugation. The isolated PBMCs were washed with PBS and the remaining red blood cells were lysed using RBC lysis buffer (ACK lysing buffer, Gibco #A10492-01), incubate 3min at room temperature. Cells were washed in PBS and resuspend the cells with 1ml of freezing medium (with DMSO, PAN Biotech, #P07-90050) and transfer the cells in a cryotube. cryotube in a Freezing container (Nalgene) and at -80°C and then transferred into liquid nitrogen container for long term storage. Classification and demographic information about SLE patients and healthy controls: SLE patients were included if they fulfilled the American College of Rheumatology (ACR) Classification Criteria (Hochberg MC. Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus (88). Exclusion criteria were current intake of 10 mg or more of prednisone or equivalent and/or use of immunosupressants within the previous 6 months before inclusion. Use of hydroxychloroquine was not an exclusion criterion. Patients were mostly in clinical remission, half with biological remission, half with persistent anti native DNA autoantibodies. All SLE patients and healthy controls were females between 41 and 58 years old. (Phospho-) Proteomics: For (phospho-) proteomic experiments, Th-1 cells from each donor were split into three different conditions after initial expansion: Light SILAC media (40 mg/ml L-Lysine K0 (Sigma, #L8662) and 84 mg/ml L-Arginine R0 (Sigma, #A8094)), medium SILAC media (49 mg/ml L- Lysine U-13C6 K6 (CKGAS, #CLM-2247-0.25) and 103 mg/ml L-Arginine U-13C6 R6 (CKGAS, #CLM-2265-0.25)) and heavy SILAC media (49.7 mg/ml L-Lysine U-13C6,U-15N2 K8 (CKGAS, #CNLM-291-H-0.25) and 105.8 mg/ml L-Arginine U-13C6,U-15N2 R10 (CKGAS, #CNLM-539-H-0.25)) prepared in RPMI SILAC media (Thermo Scientific, #88365) supplemented with 10% dialyzed FBS (HyClone, #SH30079.03), 5 ml L-Glutamine (Invitrogen, #25030024), 5 ml Pen/Strep (Invitrogen, #15140122), 5 ml MEM vitamin solution (Thermo Scientific, #11120052), 5 ml Selenium-Transferrin-Insulin (Thermo Scientific, #41400045) and expanded in the presence of 20 ng/ml IL-2 and 10 ng/ml anti-IL4 for another 10 days in order to achieve complete labelling. Media was exchanged every two days. Incorporation of medium and heavy version of Lysine and Arginine was checked by mass spectrometry and samples with an incorporation greater than 95% were used. After expansion, cells were starved without IL-2 for 24 hours before stimulation with 10 nM IL- 27 or 20 nM HyIL-6 for 15 minutes (phosphoproteomics) or 24 h (global proteomic changes). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 21 Cells were then washed three times in ice-cold PBS, mix in a 1:1:1 ratio, resuspended in SDS- containing lysis buffer (1% SDS in 100 mM Triethylammonium Bicarbonate buffer (TEAB)) and incubated on ice for 10 min to ensure cell lysis. Then, cell lysates were centrifuged at 20000 g for 10 minutes at +4°C and supernatant was transferred to a clean tube. Protein concentration was determined by using BCA Protein Assay Kit (Thermo, #23227), and 10 mg of protein per experiment were reduced with 10mM dithiothreitol (DTT, Sigma, #D0632) for 1 h at 55°C and alkylated with 20mM iodoacetamide (IAA, Sigma, #I6125) for 30 min at RT. Protein was then precipitated using six volumes of chilled (-20°C) acetone overnight. After precipitation, protein pellet was resuspended in 1 ml of 100 mM TEAB and digested with Trypsin (1:100 w/w, Thermo, #90058) and digested overnight at 37.C. Then, samples were cleared by centrifugation at 20000 g for 30 min at +4°C, and peptide concentration was quantified with Quantitative Colorimetric Peptide Assay (Thermo, #23275). Phosphopeptide enrichment in the peptide fractions generated as described above was carried out using MagResyn Ti-IMAC following manufacturer instructions (2BScientific, MRTIM002). High pH reverse phase fractionation for phosphoproteomics: Samples were dissolved in 200 μL of 10 mM ammonium formate buffer pH 9.5 and peptides are fractionated using high pH RP chromatography. A C18 Column from Waters (XBridge peptide BEH, 130Å, 3.5 µm 4.6 X 150 mm, Ireland) with a guard column (XBridge, C18, 3.5 µm, 4.6 X 20mm, Waters) are used on a Ultimate 3000 HPLC (Thermo-Scientific). Buffers A and B used for fractionation consist, respectively of 10 mM ammonium formate in milliQ water (Buffer A) and 10 mM ammonium formate in 90% acetonitrile (Buffer B), both buffers were adjusted to pH 9.5 with ammonia. Fractions are collected using a WPS-3000FC autosampler (Thermo-Scientific) at 1 min intervals. Column and guard column were equilibrated with 2% buffer B for 20 min at a constant flow rate of 0.8 ml/min and a constant temperature 0f 21oC. Samples (193 µl) are loaded onto the column at 0.8 ml/min, and separation gradient started from 2% buffer B, to 8% B in 6 min, then from 8% B to 45% B within 54 min and finaly from 45% B to 100% B in 5 min. The column is washed for 15 min at 100% buffer B and equilibrated at 2% buffer B for 20 min as mentioned above. The fraction collection started 1 min after injection and stopped after 80 min (total of 80 fractions, 800 µl each). Each peptide fraction was acidified immediately after elution from the column by adding 20 to 30 µl 10% formic acid to each tube in the autosampler. The total number of fractions concatenated was set to 10. The content of fractions from each set was dried prior to further analysis. LC-MS/MS Analysis: LC-MS analysis was done at the FingerPrints Proteomics Facility (University of Dundee). Analysis of peptide readout was performed on a Q Exactive™ plus, Mass Spectrometer (Thermo Scientific) coupled with a Dionex Ultimate 3000 RS (Thermo Scientific). LC buffers used are the following: buffer A (0.1% formic acid in Milli-Q water (v/v)) and buffer B (80% acetonitrile and 0.1% formic acid in Milli-Q water (v/v). Dried fractions were resuspended in 35µl, 1% formic acid and aliquots of 15 μL of each fraction were loaded at 10 μL/min onto a trap column (100 μm × 2 cm, PepMap nanoViper C18 column, 5 μm, 100 Å, Thermo Scientific) equilibrated in 0.1% TFA. The trap column was washed for 5 min at the same flow rate with 0.1% TFA and then switched in-line with a Thermo Scientific, resolving C18 column (75 μm × 50 cm, PepMap RSLC C18 column, 2 μm, 100 Å). The peptides were eluted from the column .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 22 at a constant flow rate of 300 nl/min with a linear gradient from 2% buffer B to 5 % buffer B in 5 min then from 5% buffer B to 35% buffer B in 125 min, and finally from 35% buffer B to 98% buffer B in 2 min. The column was then washed with 98% buffer B for 20 min and re- equilibrated in 2% buffer B for 17 min. The column was kept at a constant temperature of 50oC. Q-exactive plus was operated in data dependent positive ionization mode. The source voltage was set to 2.5 Kv and the capillary temperature was 250oC. A scan cycle comprised MS1 scan (m/z range from 350-1600, ion injection time of 20 ms, resolution 70 000 and automatic gain control (AGC) 1x106) acquired in profile mode, followed by 15 sequential dependent MS2 scans (resolution 17500) of the most intense ions fulfilling predefined selection criteria (AGC 2 x 105, maximum ion injection time 100 ms, isolation window of 1.4 m/z, fixed first mass of 100 m/z, spectrum data type: centroid, intensity threshold 2 x 104, exclusion of unassigned, singly and >7 charged precursors, peptide match preferred, exclude isotopes on, dynamic exclusion time 45 s). The HCD collision energy was set to 27% of the normalized collision energy. Mass accuracy is checked before the start of samples analysis. Mass spectrometry data analysis: Q Exactive Plus Mass Spectrometer .RAW files were analyzed, and peptides and proteins quantified using MaxQuant (89), using the built-in search engine Andromeda (90). All settings were set as default, except for the minimal peptide length of 5, and Andromeda search engine was configured for the UniProt Homo sapiens protein database (release date: 2018_09). Peptide and protein ratios only quantified in at least two out of the three replicates were considered, and the p-values were determined by Student’s t test and corrected for multiple testing using the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995). Plasmid constructs: For single molecule fluorescence microscopy, monomeric non-fluorescent (Y67F) variant of eGFP was N-terminally fused to GP130. This tag (mXFPm) was engineered to specifically bind anti-GFP nanobody “minimizer” (aGFP-miNB). This construct was inserted into a modified version of pSems-26 m (Covalys) using a signal peptide of Igk. The ORF was linked to a neomycin resistance cassette via an IRES site. A mXFPe-IL-27Ra construct was designed likewise but is recognized by aGFP nanobody “enhancer” (mXFPe). The chimeric construct mXFP-IL-27Ra (ECD & TMD)-GP130(ICD) was a fusion construct of IL-27Ra (aa 33-540) and GP130 (aa 645-918). Cell lines and media: HeLa cells were grown in DMEM containing 10% v/v FBS, penicillin-streptomycin, and L- glutamine (2 mM). RPE1 cells were grown in DMEM/F12 containing 10% v/v FBS, penicillin- streptomycin, and L-glutamine (2 mM). RPE1 cells were stably transfected by mXFPe-IL- 27Ra, mutants and the chimeric construct by PEI method according to standard protocols. Using G418 selection (0.6 mg/ml) individual clones were selected, proliferated and characterized. For comparing receptor cell surface expression levels of stable clones expressing variants of IL-27Ra, cells were detached using PBS+2mM EDTA, spun down (300g, 5 min) and incubated with “enhancer” aGFP-enNBDy647 (10 nM, 15 min on ice). After incubation, cells were washed with PBS and run on cytometer. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 23 Flow cytometry staining and antibodies: For measuring dose-response curves of STAT1/3 phosphorylation (either Th-1 cells or RPE1 clones), 96-well plated were prepared with 50µl of cell suspensions at 2x106 cells/ml/well for Th-1 and 2x105 cells/ml/well for RPE1. The latter were detached using Accutase (Sigma). Cells were stimulated with a set of different concentrations to obtain dose-response curves. To this end cells were stimulated for 15 min at 37°C with the respective cytokines followed by PFA fixation (2%) for 15 min at RT. For kinetic experiments, cell suspensions were stimulated with a defined, saturating concentration of cytokines (10 nM IL-27, 20 nM HypIL-6, 100 nM wt-IL-6) in a reverse order so that all cell suspensions were PFA-fixed (2%) simultaneously. For pSTAT1/3 kinetic experiments at JAK inhibition, Tofacitinib (2 μM, Stratech, #S2789-SEL) was added after 15 min of stimulation and cells were PFA-fixed in correct order. After fixation (15 min at RT), cells were spun down at 300g for 6 min at 4°C. Cell pellets were resuspended and permeabilized in ice-cold methanol and kept for 30 min on ice. After permeabilization cells were fluorescently barcoded according to (91). In brief: using two NHS- dyes (PacificBlue, #10163, DyLight800, #46421, Thermo Scientific), individual wells were stained with a combination of different concentrations of these dyes. After barcoding, cells are pooled and stained with anti-pSTAT1Alexa647 (Cell Signaling Technologies, #8009) and anti- pSTAT3Alexa488 (Biolegend, #651006) at a 1:100 dilution in PBS+0.5%BSA for 1h at RT. T-cells were also stained with anti-CD8AlexaFlour700 (1:120, Biolegend, #300920), anti-CD4PE (1:120, Biolegend, #357404), anti-CD3BrilliantViolet510 (1:100, Biolegend, #300448). Cells were analzyed at the flow cytometer (Beckman Coulter, Cytoflex S) and individual cell populations were identified by their barcoding pattern. Mean fluorescence intensity (MFI) of pSTAT1647and pSTAT3488 was measured for all individual cell populations. For measuring total STAT levels, methanol-permeabilized cells were stained with anti- STAT1Alexa647 (1:70, Biolegend, #558560) or anti-STAT3APC (1:50, Biolegend, #560392). Total IRF1 levels methanol-permeabilized cells were stained with anti-IRF1Alexa647 (1:50, Biolegend, #14105). For measuring cell surface levels of GP130, cells were detached with Accutase (Sigma) and stained with anti-GP130APC (1:100, Biolegend, #362006) for 1h on ice. RNA Transcriptome Sequencing: Human Th-1 cells from three donors each (StemCell Technologies) were cultivated and stimulated as described in above. Cells were washed in Hank’s balanced salt solution (HBSS, Gibco) and snap frozen for storage. RNA was isolated using the RNeasy Kit (Quiagen) according to manufacturer’s protocol. All RNA 260/280 ratios were above 1.9. Of each sample, 1 μg of RNA was used. Transcriptomic analysis was done by Novogene as follows. Sequencing libraries were generated using NEBNext® UltraTM RNALibrary Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First StrandSynthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 24 cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37 °C for 15 min followed by 5 min at 95 °C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. RNA Sequencing Data Analysis: Primary data analysis for quality control, mapping to reference genome and quantification was conducted by Novogene as outlined below. Quality control: Raw data (raw reads) of FASTQ format were firstly processed through in- house scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapter and poly-N sequences and reads with low quality from raw data. At the same time, Q20, Q30 and GC content of the clean data were calculated. All the downstream analyses were based on the clean data with high quality. Mapping to reference genome: Reference genome and gene model annotation files were downloaded from genome website browser (NCBI/UCSC/Ensembl) directly. Paired-end clean reads were mapped to the reference genome using HISAT2 software. HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome. These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. Quantification: HTSeq was used to count the read numbers mapped of each gene, including known and novel genes. And then RPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. RPKM, (Reads Per Kilobase of exon model per Million mapped reads), considers the effect of sequencing depth and gene length for the reads count at the same time and is currently the most commonly used method for estimating gene expression levels. For each identified gene, the fold change was calculated by the ratio of cytokine stimulated/unstimulated expression levels within each donor and an unpaired, two-tailed t test was applied to calculate p values. Genes were considered to be significantly altered if: p value £ 0.05, and log2 fold change ³+1 or £-1. Genes with an RPKM of less than 1 in two or more donors were excluded from analysis so as to remove genes with abundance near detection limit. Genes without annotated function were also removed. Functional annotation of genes (KEGG pathways, GO terms) was done using DAVID Bioinformatics Resource functional annotation tool (92, 93). Clustered heatmap was generated using R Studio Pheatmap package. siRNA-mediated knockdown of IRF1 in RPE1 cells: A set of four IRF1-siRNAs were purchased from Dharmacon and tested individually to determine levels of knockdown achieved. The siRNA providing the highest level of IRF1. knockdown (Horizon, LQ-011704-00-0005, siRNA #2: UGAACUCCCUGCCAGAUAU) were subsequently used in all the experiments. RPE1-IL27Ra cells were plated in 6-well dishes (0.4x106 cells per well) and transfected the next day with IRF1-siRNA or control-GAPDH siRNA (Horizon, D-001830-10-05) (Dharmacon) using DharmaFect 1 transfection reagent (Dharmacon) following the manufacturer’s instructions for 24h. At different timepoints of IL-27 (2nM) or HypIL-6 (10nM) stimulation, samples were collected from each one 6-well. Cells were .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 25 trypsinized and each sample was spun down and pellets snap-frozen in liquid nitrogen for subsequent RNA isolation (90%) or PFA-fixed for total IRF1 staining (10%) by flow cytometry. Real-time quantitative PCR: Cells were subject to RNA isolation using the Qiagen RNeasy kit. RNA (100 ng) was reverse transcribed to complementary DNA (cDNA) using an iScript cDNA synthesis kit (BioRad, #1708890), which was used as template for quantitative PCR. PowerTrack™ SYBR Green Master Mix (Takara, #A46109) was used for the reaction with the following primers: b-actin was used as housekeeping gene for normalization. Each siRNA knockdown experiment was performed in three replicates with each sample for qPCR being done in two technical replicates. Mathematical models and Bayesian inference: We developed two new mathematical models, making use of ordinary differential equations (ODEs), for the initial steps of cytokine-receptor binding, dimer formation and signal activation by HypIL-6 and IL-27, respectively; namely, a set of ODEs for the HypIL-6 system and a separate set of ODEs for the IL-27 system (see end of this section for the set of ODEs included in each model). These ODEs describe the rate of change of the concentration for each molecular species considered in the receptor-ligand systems (HypIL-6 and IL-27) over time. By solving these ODEs, a time-course for the concentration of total (free and bound) phosphorylated STAT1 and STAT3 can be obtained and compared to the experimental data (Supp. Fig. 5b & c). The HypIL-6 and IL-27 mathematical models differ due to the reactions involved in the formation of the signaling dimer for each cytokine. Under stimulation with HypIL-6, two HypIL-6 bound GP130 monomers are required to form the homodimer (Supp. Fig. 3a), whereas under IL-27 stimulation, we assume that IL-27 binds to the IL-27Ra chain and not to GP130 (Supp. Fig. 3b) and hence the heterodimer is comprised of an IL-27 molecule bound to an IL-27Ra monomer and one GP130 chain. In the mathematical models, we assume that upon formation of the dimers (homo- or heterodimer), these receptor chains become immediately phosphorylated. The models do not consider JAK molecules explicitly. We are assuming that these molecules are constitutively bound to their corresponding receptor chains and that they phosphorylate immediately upon receptor phosphorylation (dimer formation). After the formation of the dimer, which we denote by 𝐷) or 𝐷"*, formed by HypIL-6 or IL-27 respectively, the biochemical reactions included in each mathematical model are similar, and are summarized as follows. Table 1 provides a description of the rates for each reaction considered in each (and both) mathematical model(s). In what follows we assume mass action kinetics for all the reactions. A free cytoplasmic unphosphorylated STAT1 or STAT3 molecule can bind to either receptor chain in the dimer, provided that the intracellular tyrosine residue of the receptor in the dimer is free (Supp. Fig. 3c & d). The STAT1 or STAT3 target For Rev Size b-actin CATGTACGTTGCTATCCAGGC CTCCTTAATGTCACGCACGAT 250bp STAT1 CTAGTGGAGTGGAAGCGGAG CACCACAAACGAGCTCTGAA 252bp GBP5 TCCTCGGATTATTGCTCGGC CCTTTGCGCTTCAGCCTTTT 309bp OAS1 GAAGGCAGCTCACGAAACC AGGCCTCAGCCTCTTGTG 114bp SOCS3 GTCCCCCCAGAAGAGCCTATTA TTGACGGTCTTCCGACAGAGAT 118 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 26 molecule can subsequently dissociate from the receptor chain in the dimer or can become phosphorylated (with rate 𝑞) whilst bound to the dimer. We have assumed that the rate of STAT1 or STAT3 phosphorylation when bound does not depend on the STAT type (1 or 3) or on the receptor chain (Supp. Fig. 3c & d). Phosphorylated STAT1 (pSTAT1) and STAT3 (pSTAT3) molecules can dissociate from the dimer. Once free in the cytoplasm, they can then dephosphorylate (Supp. Fig. 3g). We have assumed that this rate of STAT dephosphorylation only depends on the concentration of the respective pSTAT type, free in the cytoplasm. We note that no allostery has been considered in the models and hence, phosphorylated and unphosphorylated STAT molecules dissociate from the receptor with the same rate (Supp. Fig. 3c & d). Finally, any molecular species containing receptor molecules can be removed from the system, due to internalisation or degradation, via one of two hypothesised mechanisms (Supp. Fig. 3e & f): • hypothesis 1 (H1): receptors (free or bound, phosphorylated or unphosphorylated) are internalised/degraded with a rate proportional to the concentration of the species in which they are contained, or • hypothesis 2 (H2): receptors (free or bound, phosphorylated or unphosphorylated) are internalised/degraded with a rate proportional to the product of the concentration of the species in which they are contained and the sum of the concentrations of free cytoplasmic phosphorylated STAT1 and STAT3. We note that hypothesis 1 assumes that receptor molecules (free or bound, phosphorylated or unphosphorylated) are being internalised/degraded as part of the natural cellular trafficking cycle. Hypothesis 2 is consistent with a potential feedback mechanism, whereby the free cytoplasmic pSTAT molecules would migrate to the nucleus and increase the production of negative feedback proteins, such as SOCS3, which down-regulate cytokine signaling. Thus, the internalisation/degradation rate of receptor molecules (free or bound, phosphorylated or unphosphorylated) under hypothesis 2 increases with the total amount of free cytoplasmic phosphorylated STAT1 and STAT3, to account for this surface receptor down-regulation. A depiction of the reactions in both the HypIL-6 and IL-27 mathematical models and under each hypothesis is given in Supp. Fig. 3 where a), c), e) and g) describe the HypIL-6 model and b), d), f) and g) describe the IL-27 model. In this figure, 𝑖 ∈ {1,3} so that the reactions shown can either involve STAT1 or STAT3. Above or below the reaction arrows is a symbol which represents the rate at which the reaction occurs (under the assumption of mass action kinetics). The notation for the rate constants and initial concentrations in the models, along with their descriptions and units, are given in Table 1. Parameter Description Unit 𝑟#,) & ,𝑟#,"* & Rate of receptor-ligand binding nM-1s-1 𝑟#,) , ,𝑟#,"* , Rate of receptor-ligand dissociation s-1 𝑟",) & ,𝑟","* & Rate of monomers binding to form a dimer nM-1s-1 𝑟",) , ,𝑟","* , Rate of dissociation of the dimer s-1 𝑘$% & Rate of STAT𝑖 binding to GP130 nM-1s-1 𝑘$' & Rate of STAT𝑖 binding to IL-27Ra nM-1s-1 𝑘$% , Rate of STAT𝑖 dissociating GP130 s-1 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 27 𝑘$' , Rate of STAT𝑖 dissociating IL-27Ra s-1 𝑞 Rate of STAT phosphorylation on the dimer s-1 𝑑$ Rate of free pSTAT𝑖 dephosphorylation s -1 𝛽),𝛽"* Rate of receptor internalisation/degradation under hypothesis 1 s-1 𝛾),𝛾"* Rate of receptor internalisation/degradation under hypothesis 2 nM-1s-1 [𝑅#(0)] Initial concentration of GP130 nM [𝑅"(0)] Initial concentration of IL-27Rα nM [𝑆$(0)] Initial concentration of STAT𝑖 nM Table 1: Notation, definitions and units for the parameter values used in the mathematical models, where 𝑖 ∈ {1,3} so that STAT𝑖 corresponds to STAT1 or STAT3. The HypIL-6 mathematical model was formulated based on reactions involving the following species: • 𝐿) = HypIL-6, • 𝑅# = GP130, • 𝐶# = GP130 - HypIL-6 monomer, • 𝐷) = Phosphorylated GP130 - HypIL-6 - HypIL-6 - GP130 homodimer, • 𝑆# = Unbound cytoplasmic unphosphorylated STAT1, • 𝑆( = Unbound cytoplasmic unphosphorylated STAT3, • 𝐷) ⋅ 𝑆# = Dimer bound to STAT1, • 𝐷) ⋅ 𝑆( = Dimer bound to STAT3, • 𝐷) ⋅ 𝑝𝑆# = Dimer bound to pSTAT1, • 𝐷) ⋅ 𝑝𝑆( = Dimer bound to pSTAT3, • 𝑆# ⋅ 𝐷) ⋅ 𝑆# = Dimer bound to two molecules of STAT1, • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆# = Dimer bound to two molecules of STAT1, one of which is phosphorylated, • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆# = Dimer bound to two molecules of pSTAT1, • 𝑆( ⋅ 𝐷) ⋅ 𝑆( = Dimer bound to two molecules of STAT3, • 𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆( = Dimer bound to two molecules of STAT3, one of which is phosphorylated, • 𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆( = Dimer bound to two molecules of pSTAT3, • 𝑆# ⋅ 𝐷) ⋅ 𝑆( = Dimer bound to one molecule of STAT1 and one of STAT3, • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆( = Dimer bound to one molecule of pSTAT1 and one of STAT3, • 𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆( = Dimer bound to one molecule of STAT1 and one of pSTAT3, • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆( = Dimer bound to one molecule of pSTAT1 and one of pSTAT3, • 𝑝𝑆# = Unbound cytoplasmic phosphorylated STAT1, • 𝑝𝑆( = Unbound cytoplasmic phosphorylated STAT3. The initial reactions in the HypIL-6 signaling pathway can then be described by the ODEs (1) – (22), under the law of mass action, where the terms involving the parameter 𝛽) apply only to the model under hypothesis 1 and the terms involving the parameter 𝛾) apply only to the model under hypothesis 2. Square brackets around a species is a notation that denotes the concentration of this species with unit nM, and “⋅” implies a reaction bond between two molecules/species. The ODEs are valid for any time 𝑡, with 𝑡 ≥ 0, but time has been omitted in the species concentration for ease of notation. We note here that, for example [𝑅#] = [𝑅#](𝑡) for all 𝑡 ≥ 0. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 28 𝑑[𝑅1] 𝑑𝑡 = −𝑟1,6 + [𝑅1][𝐿)] + 𝑟1,6 − [𝐶1] − 𝛽6[𝑅1] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝑅1] (1) 𝑑[𝐿)] 𝑑𝑡 = −𝑟1,6 + [𝑅1][𝐿)] + 𝑟1,6 − [𝐶1] (2) 𝑑[𝐶1] 𝑑𝑡 = 𝑟1,6 + [𝑅1][𝐿)] − 𝑟1,6 − [𝐶1] − 2𝑟2,6 + [𝐶1]2 + 2𝑟2,6 − [𝐷6] − 𝛽6[𝐶1] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐶1] (3) 𝑑[𝐷6] 𝑑𝑡 = 𝑟2,6 + [𝐶1]2 − 𝑟2,6 − [𝐷6] − 2𝑘1𝑎 + [𝐷6][𝑆1] + 𝑘1𝑎 − ([𝐷6 ⋅ 𝑆1] + [𝐷6 ⋅ 𝑝𝑆1]) − 2𝑘3𝑎 + [𝐷6][𝑆3] + 𝑘3𝑎 − ([𝐷6 ⋅ 𝑆3] + [𝐷6 ⋅ 𝑝𝑆3]) − 𝛽6[𝐷6] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐷6] (4) 𝑑[𝑆1] 𝑑𝑡 = −𝑘1𝑎 + [𝑆1](2[𝐷6] + [𝐷6 ⋅ 𝑆1] + [𝐷6 ⋅ 𝑆3] + [𝐷6 ⋅ 𝑝𝑆1] + [𝐷6 ⋅ 𝑝𝑆3]) + 𝑘1𝑎 − ([𝐷6 ⋅ 𝑆1] + 2[𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3]) + 𝑑1[𝑝𝑆1] (5) 𝑑[𝑆3] 𝑑𝑡 = −𝑘3𝑎 + [𝑆3](2[𝐷6] + [𝐷6 ⋅ 𝑆3] + [𝐷6 ⋅ 𝑆1] + [𝐷6 ⋅ 𝑝𝑆3] + [𝐷6 ⋅ 𝑝𝑆1]) + 𝑘3𝑎 − ([𝐷6 ⋅ 𝑆3] + 2[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] + [𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(]) + 𝑑3[𝑝𝑆3] (6) 𝑑[𝐷6 ⋅ 𝑆1] 𝑑𝑡 = 2𝑘1𝑎 + [𝑆1][𝐷6] − 𝑘1𝑎 − [𝐷6 ⋅ 𝑆1] − 𝑘1𝑎 + [𝐷6 ⋅ 𝑆1][𝑆1] + 2𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] − 𝑘3𝑎 + [𝐷6 ⋅ 𝑆1][𝑆3] + 𝑘3𝑎 − [𝑆# ⋅ 𝐷6 ⋅ 𝑆(] − 𝑞[𝐷6 ⋅ 𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + 𝑘3𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] − 𝛽6[𝐷6 ⋅ 𝑆1] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐷6 ⋅ 𝑆1] (7) 𝑑[𝐷6 ⋅ 𝑆3] 𝑑𝑡 = 2𝑘3𝑎 + [𝑆3][𝐷6] − 𝑘3𝑎 − [𝐷6 ⋅ 𝑆3] − 𝑘3𝑎 + [𝐷6 ⋅ 𝑆3][𝑆3] + 2𝑘3𝑎 − [𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝑘1𝑎 + [𝐷6 ⋅ 𝑆3][𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] − 𝑞[𝐷6 ⋅ 𝑆3] + 𝑘1𝑎 − [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] − 𝛽6[𝐷6 ⋅ 𝑆3] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐷6 ⋅ 𝑆3] (8) 𝑑[𝐷6 ⋅ 𝑝𝑆1] 𝑑𝑡 = −𝑘1𝑎 + [𝑆1][𝐷6 ⋅ 𝑝𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] − 𝑘3𝑎 + [𝑆3][𝐷6 ⋅ 𝑝𝑆1] + 𝑘3𝑎 − [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(] + 𝑞[𝐷6 ⋅ 𝑆1] − 𝑘1𝑎 − [𝐷6 ⋅ 𝑝𝑆1] + 2𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + 𝑘3𝑎 − [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] − 𝛽6[𝐷6 ⋅ 𝑝𝑆1] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐷6 ⋅ 𝑝𝑆1] (9) 𝑑[𝐷6 ⋅ 𝑝𝑆3] 𝑑𝑡 = −𝑘3𝑎 + [𝑆3][𝐷6 ⋅ 𝑝𝑆3] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] − 𝑘1𝑎 + [𝑆1][𝐷6 ⋅ 𝑝𝑆3] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + 𝑞[𝐷6 ⋅ 𝑆3] − 𝑘3𝑎 − [𝐷6 ⋅ 𝑝𝑆3] + 2𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] − 𝛽6[𝐷6 ⋅ 𝑝𝑆3] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝐷6 ⋅ 𝑝𝑆3] (10) 𝑑[𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷6 ⋅ 𝑆1] − 2𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] − 2𝑞[𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] − 𝛽6[𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] (11) 𝑑[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷6 ⋅ 𝑆3] − 2𝑘3𝑎 − [𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 2𝑞[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛽6[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] (12) 𝑑[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑎 + [𝑝𝑆1 ⋅ 𝐷6][𝑆1] − 2𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] +2𝑞[𝑆) ⋅ 𝐷* ⋅ 𝑆)] − 𝑞[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] − 𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] (13) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 29 −𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] 𝑑[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑎 + [𝑝𝑆3 ⋅ 𝐷6][𝑆3] − 2𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] + 2𝑞[𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝑞[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛽6[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] (14) 𝑑[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] 𝑑𝑡 = 𝑞[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑆1] − 2𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] −𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆)] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆)] (15) 𝑑[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑞[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑆3] − 2𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] −𝛽*[𝑝𝑆+ ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷* ⋅ 𝑝𝑆+] (16) 𝑑[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷6 ⋅ 𝑆3] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] + 𝑘3𝑎 + [𝑆1 ⋅ 𝐷6][𝑆3] − 𝑘3𝑎 − [𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] − 2𝑞[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛽6[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] − 𝛾6([𝑝𝑆1] + [𝑝𝑆3])[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] (17) 𝑑[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] 𝑑𝑡 = 𝑞[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] + 𝑘3𝑎 + [𝑝𝑆1 ⋅ 𝐷6][𝑆3] −𝑘+,- [𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝑞[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝑘),- [𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] −𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] (18) 𝑑[𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑞[𝑆1 ⋅ 𝐷6 ⋅ 𝑆3] + 𝑘1𝑎 + [𝑆1][𝐷6 ⋅ 𝑝𝑆3] −𝑘),- [𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝑞[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝑘+,- [𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] −𝛽*[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] (19) 𝑑[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑞([𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑆3]) −[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+](𝑘),- + 𝑘+,- ) − 𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] −𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] (20) 𝑑[𝑝𝑆1] 𝑑𝑡 = 𝑘1𝑎 − ([𝐷6 ⋅ 𝑝𝑆1] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + [𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + [𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆1] + 2[𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆1]) − 𝑑1[𝑝𝑆1] (21) 𝑑[𝑝𝑆3] 𝑑𝑡 = 𝑘3𝑎 − ([𝐷6 ⋅ 𝑝𝑆3] + [𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + [𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + [𝑝𝑆1 ⋅ 𝐷6 ⋅ 𝑝𝑆3] + 2[𝑝𝑆3 ⋅ 𝐷6 ⋅ 𝑝𝑆3]) − 𝑑3[𝑝𝑆3] (22) Similarly, and with some species in common with the HypIL-6 model, the IL-27 model has been formulated based on reactions involving the following species: • 𝐿"* = IL-27, • 𝑅# = GP130, • 𝑅" = IL-27Ra, • 𝐶" = IL-27Ra - IL-27 monomer, • 𝐷"* = Phosphorylated IL-27Ra - IL-27 - GP130 heterodimer, • 𝑆# = Unbound cytoplasmic unphosphorylated STAT1, • 𝑆( = Unbound cytoplasmic unphosphorylated STAT3, • 𝑆# ⋅ 𝐷"* = Dimer bound to STAT1 via 𝑅#, .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 30 • 𝑆( ⋅ 𝐷"* = Dimer bound to STAT3 via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* = Dimer bound to pSTAT1 via 𝑅#, • 𝑝𝑆( ⋅ 𝐷"* = Dimer bound to pSTAT3 via 𝑅#, • 𝐷"* ⋅ 𝑆# = Dimer bound to STAT1 via 𝑅", • 𝐷"* ⋅ 𝑆( = Dimer bound to STAT3 via 𝑅", • 𝐷"* ⋅ 𝑝𝑆# = Dimer bound to pSTAT1 via 𝑅", • 𝐷"* ⋅ 𝑝𝑆( = Dimer bound to pSTAT3 via 𝑅", • 𝑆# ⋅ 𝐷"* ⋅ 𝑆# = Dimer bound to two molecules of STAT1, • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆# = Dimer bound to two molecules of STAT1, one of them phosphorylated on 𝑅#, • 𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆# = Dimer bound to two molecules of STAT1, one of them phosphorylated on 𝑅", • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆# = Dimer bound to two molecules of pSTAT1, • 𝑆( ⋅ 𝐷"* ⋅ 𝑆( = Dimer bound to two molecules of STAT3, • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆( = Dimer bound to two molecules of STAT3, one of them phosphorylated on 𝑅#, • 𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆( = Dimer bound to two molecules of STAT3, one of them phosphorylated on 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆( = Dimer bound to two molecules of pSTAT3, • 𝑆# ⋅ 𝐷"* ⋅ 𝑆( = Dimer bound to STAT1 via 𝑅# and STAT3 via 𝑅", • 𝑆( ⋅ 𝐷"* ⋅ 𝑆# = Dimer bound to STAT1 via 𝑅" and STAT3 via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆( = Dimer bound to pSTAT1 via 𝑅# and STAT3 via 𝑅", • 𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆# = Dimer bound to pSTAT1 via 𝑅" and STAT3 via 𝑅#, • 𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆( = Dimer bound to STAT1 via 𝑅# and pSTAT3 via 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆# = Dimer bound to STAT1 via 𝑅" and pSTAT3 via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆( = Dimer bound pSTAT1 via 𝑅# and pSTAT3 via 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆# = Dimer bound pSTAT3 via 𝑅# and pSTAT1 via 𝑅#, • 𝑝𝑆# = Unbound cytoplasmic phosphorylated STAT1, • 𝑝𝑆( = Unbound cytoplasmic phosphorylated STAT3. Again, under the law of mass action, the initial reactions in the IL-27 signaling pathway can be described by the ODEs (23) – (55). 𝑑[𝑅1] 𝑑𝑡 = −𝑟2,27 + [𝐶2][𝑅1] + 𝑟2,27 − [𝐷27] − 𝛽27[𝑅1] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑅1] (23) 𝑑[𝑅2] 𝑑𝑡 = −𝑟1,27 + [𝑅2][𝐿27] + 𝑟1,27 − [𝐶2] − 𝛽27[𝑅2] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑅2] (24) 𝑑[𝐿27] 𝑑𝑡 = −𝑟1,27 + [𝑅2][𝐿27] + 𝑟1,27 − [𝐶2] (25) 𝑑[𝐶2] 𝑑𝑡 = 𝑟1,27 + [𝑅2][𝐿27] − 𝑟1,27 − [𝐶2] − 𝑟2,27 + [𝐶2][𝑅1] + 𝑟2,27 − [𝐷27] − 𝛽27[𝐶2] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐶2] (26) 𝑑[𝐷27] 𝑑𝑡 = 𝑟2,27 + [𝐶2][𝑅1] − 𝑟2,27 − [𝐷27] − M𝑘1𝑎 + + 𝑘1𝑏 + N[𝐷27][𝑆1] + 𝑘1𝑎 − ([𝑆1 ⋅ 𝐷27] + [𝑝𝑆1 ⋅ 𝐷27]) + 𝑘1𝑏 − ([𝐷27 ⋅ 𝑆1] + [𝐷27 ⋅ 𝑝𝑆1]) − M𝑘3𝑎 + + 𝑘3𝑏 + N[𝐷27][𝑆3] + 𝑘3𝑎 − ([𝑆3 ⋅ 𝐷27] + [𝑝𝑆3 ⋅ 𝐷27]) + 𝑘3𝑏 − ([𝐷27 ⋅ 𝑆3] + [𝐷27 ⋅ 𝑝𝑆3]) − 𝛽27[𝐷27] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐷27] (27) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 31 𝑑[𝑆1] 𝑑𝑡 = −𝑘1𝑎 + [𝑆1]([𝐷27] + [𝐷27 ⋅ 𝑆1] + [𝐷27 ⋅ 𝑝𝑆1] + [𝐷27 ⋅ 𝑆3] + [𝐷27 ⋅ 𝑝𝑆3]) + 𝑘1𝑎 − ([𝑆1 ⋅ 𝐷27] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3]) − 𝑘1𝑏 + [𝑆1]([𝐷27] + [𝑆1 ⋅ 𝐷27] + [𝑝𝑆1 ⋅ 𝐷27] + [𝑆3 ⋅ 𝐷27] + [𝑝𝑆3 ⋅ 𝐷27]) + 𝑘1𝑏 − ([𝐷27 ⋅ 𝑆1] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1]) + 𝑑1[𝑝𝑆1] (28) 𝑑[𝑆3] 𝑑𝑡 = −𝑘3𝑎 + [𝑆3]([𝐷27] + [𝐷27 ⋅ 𝑆1] + [𝐷27 ⋅ 𝑝𝑆1] + [𝐷27 ⋅ 𝑆3] + [𝐷27 ⋅ 𝑝𝑆3]) + 𝑘3𝑎 − ([𝑆3 ⋅ 𝐷27] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3]) − 𝑘3𝑏 + [𝑆3]([𝐷27] + [𝑆1 ⋅ 𝐷27] + [𝑝𝑆1 ⋅ 𝐷27] + [𝑆3 ⋅ 𝐷27] + [𝑝𝑆3 ⋅ 𝐷27]) + 𝑘3𝑏 − ([𝐷27 ⋅ 𝑆3] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3]) + 𝑑3[𝑝𝑆3] (29) 𝑑[𝑆1 ⋅ 𝐷27] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷27] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27] − 𝑞[𝑆1 ⋅ 𝐷27] − 𝑘1𝑏 + [𝑆1][𝑆1 ⋅ 𝐷27] + 𝑘1𝑏 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] − 𝑘3𝑏 + [𝑆3][𝑆1 ⋅ 𝐷27] + 𝑘3𝑏 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + 𝑘1𝑏 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + 𝑘3𝑏 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] − 𝛽27[𝑆1 ⋅ 𝐷27] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑆1 ⋅ 𝐷27] (30) 𝑑[𝐷27 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑏 + [𝑆1][𝐷27] − 𝑘1𝑏 − [𝐷27 ⋅ 𝑆1] − 𝑞[𝐷27 ⋅ 𝑆1] − 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] − 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑆1] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] + 𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] − 𝛽27[𝐷27 ⋅ 𝑆1] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐷27 ⋅ 𝑆1] (31) 𝑑[𝑆3 ⋅ 𝐷27] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷27] − 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27] − 𝑞[𝑆3 ⋅ 𝐷27] − 𝑘3𝑏 + [𝑆3][𝑆3 ⋅ 𝐷27] + 𝑘3𝑏 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] − 𝑘1𝑏 + [𝑆1][𝑆3 ⋅ 𝐷27] + 𝑘1𝑏 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + 𝑘3𝑏 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + 𝑘1𝑏 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] − 𝛽27[𝑆3 ⋅ 𝐷27] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑆3 ⋅ 𝐷27] (32) 𝑑[𝐷27 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑏 + [𝑆3][𝐷27] − 𝑘3𝑏 − [𝐷27 ⋅ 𝑆3] − 𝑞[𝐷27 ⋅ 𝑆3] − 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑆3] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] − 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑆3] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + 𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] + 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] − 𝛽27[𝐷27 ⋅ 𝑆3] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐷27 ⋅ 𝑆3] (33) 𝑑[𝑝𝑆1 ⋅ 𝐷27] 𝑑𝑡 = −𝑘1𝑏 + [𝑝𝑆1 ⋅ 𝐷27][𝑆1] + 𝑘1𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] − 𝑘3𝑏 + [𝑝𝑆1 ⋅ 𝐷27][𝑆3] + 𝑘3𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + 𝑞[𝑆1 ⋅ 𝐷27] − 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷27] + 𝑘1𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + 𝑘3𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] − 𝛽27[𝑝𝑆1 ⋅ 𝐷27] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑝𝑆1 ⋅ 𝐷27] (34) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 32 𝑑[𝐷27 ⋅ 𝑝𝑆1] 𝑑𝑡 = −𝑘1𝑎 + [𝐷27 ⋅ 𝑝𝑆1][𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] − 𝑘3𝑎 + [𝐷27 ⋅ 𝑝𝑆1][𝑆3] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + 𝑞[𝐷27 ⋅ 𝑆1] − 𝑘1𝑏 − [𝐷27 ⋅ 𝑝𝑆1] + 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + 𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] − 𝛽27[𝐷27 ⋅ 𝑝𝑆1] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐷27 ⋅ 𝑝𝑆1] (35) 𝑑[𝑝𝑆3 ⋅ 𝐷27] 𝑑𝑡 = −𝑘3𝑏 + [𝑝𝑆3 ⋅ 𝐷27][𝑆3] + 𝑘3𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] − 𝑘1𝑏 + [𝑝𝑆3 ⋅ 𝐷27][𝑆1] + 𝑘1𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + 𝑞[𝑆3 ⋅ 𝐷27] − 𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷27] + 𝑘3𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + 𝑘1𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] − 𝛽27[𝑝𝑆3 ⋅ 𝐷27] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝑝𝑆3 ⋅ 𝐷27] (36) 𝑑[𝐷27 ⋅ 𝑝𝑆3] 𝑑𝑡 = −𝑘3𝑎 + [𝐷27 ⋅ 𝑝𝑆3][𝑆3] + 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] − 𝑘1𝑎 + [𝐷27 ⋅ 𝑝𝑆3][𝑆1] + 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + 𝑞[𝐷27 ⋅ 𝑆3] − 𝑘3𝑏 − [𝐷27 ⋅ 𝑝𝑆3] + 𝑘3𝑎 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + 𝑘1𝑎 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] − 𝛽27[𝐷27 ⋅ 𝑝𝑆3] − 𝛾27([𝑝𝑆1] + [𝑝𝑆3])[𝐷27 ⋅ 𝑝𝑆3] (37) 𝑑[𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑆1] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] +𝑘)0 1 [𝑆) ⋅ 𝐷23][𝑆)] − 𝑘)0 - [𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 2𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆)] −𝛽23[𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷23 ⋅ 𝑆)] (38) 𝑑[𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑏 + [𝑝𝑆1 ⋅ 𝐷27][𝑆1] − 𝑘1𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] +𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 𝑞[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 𝑘),- [𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆)] −𝛽23[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆)] (39) 𝑑[𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑝𝑆1] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] +𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆)] − 𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] − 𝑘)0 - [𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] −𝛽23[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] (40) 𝑑[𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] 𝑑𝑡 = 𝑞([𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1]) −[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)](𝑘),- + 𝑘)0 - ) − 𝛽23[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] −𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆)] (41) 𝑑[𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑆3] − 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] +𝑘+0 1 [𝑆+ ⋅ 𝐷23][𝑆+] − 𝑘+0 - [𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 2𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] −𝛽23[𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] (42) 𝑑[𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑏 + [𝑝𝑆3 ⋅ 𝐷27][𝑆3] − 𝑘3𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] +𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 𝑞[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 𝑘+,- [𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] −𝛽23[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] (43) 𝑑[𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑝𝑆3] − 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] (44) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 33 +𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆+] − 𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] − 𝑘+0 - [𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] −𝛽23[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] 𝑑[𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑞([𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3]) −[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+](𝑘+,- + 𝑘+0 - ) − 𝛽23[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] −𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆+] (45) 𝑑[𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑆3] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] +𝑘+0 1 [𝑆) ⋅ 𝐷23][𝑆+] − 𝑘+0 - [𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 2𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆+] −𝛽23[𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷23 ⋅ 𝑆+] (46) 𝑑[𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑆1] − 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] +𝑘)0 1 [𝑆+ ⋅ 𝐷23][𝑆)] − 𝑘)0 - [𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 2𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] −𝛽23[𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] (47) 𝑑[𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] 𝑑𝑡 = 𝑘3𝑏 + [𝑝𝑆1 ⋅ 𝐷27][𝑆3] − 𝑘3𝑏 − [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] +𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 𝑞[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 𝑘),- [𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆+] −𝛽23[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑆+] (48) 𝑑[𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] 𝑑𝑡 = 𝑘1𝑏 + [𝑝𝑆3 ⋅ 𝐷27][𝑆1] − 𝑘1𝑏 − [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] +𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 𝑞[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 𝑘+,- [𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] −𝛽23[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] (49) 𝑑[𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑘1𝑎 + [𝑆1][𝐷27 ⋅ 𝑝𝑆3] − 𝑘1𝑎 − [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] +𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑆+] − 𝑞[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] − 𝑘+0 - [𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] −𝛽23[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] (50) 𝑑[𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] 𝑑𝑡 = 𝑘3𝑎 + [𝑆3][𝐷27 ⋅ 𝑝𝑆1] − 𝑘3𝑎 − [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] +𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑆)] − 𝑞[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] − 𝑘)0 - [𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] −𝛽23[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] − 𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] (51) 𝑑[𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] 𝑑𝑡 = 𝑞([𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3]) −[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+](𝑘),- + 𝑘+0 - ) − 𝛽23[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] −𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷23 ⋅ 𝑝𝑆+] (52) 𝑑[𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] 𝑑𝑡 = 𝑞([𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1]) −[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)](𝑘+,- + 𝑘)0 - ) − 𝛽23[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] −𝛾23([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷23 ⋅ 𝑝𝑆)] (53) .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 34 𝑑[𝑝𝑆1] 𝑑𝑡 = 𝑘1𝑎 − ([𝑝𝑆1 ⋅ 𝐷27] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3]) + 𝑘1𝑏 − ([𝐷27 ⋅ 𝑝𝑆1] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1]) − 𝑑1[𝑝𝑆1] (54) 𝑑[𝑝𝑆3] 𝑑𝑡 = 𝑘3𝑎 − ([𝑝𝑆3 ⋅ 𝐷27] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆3] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑆1] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆1]) + 𝑘3𝑏 − ([𝐷27 ⋅ 𝑝𝑆3] + [𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑝𝑆3 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3] + [𝑝𝑆1 ⋅ 𝐷27 ⋅ 𝑝𝑆3]) − 𝑑3[𝑝𝑆3] (55) Similarly to the HypIL-6 model, the terms in Equations (23) - (55) involving the parameter 𝛽"* apply only to the model under hypothesis 1 and the terms involving the parameter 𝛾"* apply only to the model under hypothesis 2. We now describe how we have made use of the experimental data (Fig. 6b and 6c supp.) to parameterise the mathematical models described above. Since the experimental outputs are levels of pSTAT1 and pSTAT3 as a function of time under HypIL-6 and IL-27 stimulation (Fig. 6b and 6c supp.), we consider two model outputs of interest for the HypIL-6 and IL-27 mathematical models, which are proportional to the experimental data in Supp. Figure 6b and 6c; namely, the sum of all molecular species (variables) containing phosphorylated STAT1 (free or bound) ([𝑝𝑆#]-,., for 𝑗 ∈ {6,27}) and the sum of all species (variables) containing phosphorylated STAT3 (free or bound) ([𝑝𝑆(]-,., for 𝑗 ∈ {6,27}). The concentrations of the two model outputs of interest at any time 𝑡 are given by [𝑝𝑆#]-,)(𝑡) = [𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆#](𝑡) + 2[𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆#](𝑡), (56) [𝑝𝑆(]-,)(𝑡) = [𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆(](𝑡) + 2[𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆#](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆(](𝑡), (57) for the HypIL-6 model, and by [𝑝𝑆#]-,"*(𝑡) = [𝑝𝑆# ⋅ 𝐷"*](𝑡) + [𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆#](𝑡) + [𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + 2[𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆(](𝑡) + [𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆#](𝑡), (58) [𝑝𝑆(]-,"*(𝑡) = [𝑝𝑆( ⋅ 𝐷"*](𝑡) + [𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆(](𝑡) + [𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + 2[𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆#](𝑡) + [𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆(](𝑡), (59) for the IL-27 model. Having developed two mathematical models for the stimulation of the experimental system with HypIL-6 and IL-27, it was then our objective to parameterise these models making use of approximate Bayesian computation sequential Monte Carlo (ABC-SMC). Firstly, a Bayesian model selection was carried out to determine which hypothesis (mechanism) of internalisation/degradation of receptor molecules is most likely given the data. Once a hypothesis was selected, together with the experimental data, the ABC-SMC method allows one to obtain posterior distributions for each of the parameter values and initial concentrations in the mathematical models. In this way, we can learn about which reactions and parameters in the models are causing the differential signaling by pSTAT1 observed when stimulating with HypIL-6 and IL-27. The experimental data we used to compare with the mathematical model .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 35 outputs, was the mean relative fluorescence intensity of total phosphorylated STAT1 and total phosphorylated STAT3 in both RPE1 and Th-1 cells (Supp. Figure 5b and 5c). We normalised the data to obtain dimensionless values, which can be compared with the mathematical model outputs. Firstly, we constructed a linear model for the fluorescence intensity (background fluorescence) of antibodies for phosphorylated STAT1 and STAT3 in unstimulated cells. We subtracted the value of this linear model at each time point from the corresponding fluorescence intensity in HypIL-6 and IL-27 stimulated cells, for each repeat of the experiment and each cell type. Denoting by 𝑓 the experimental fluorescence intensity, 𝑓(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) corresponds to the fluorescence intensity for the 𝑟th repeat, 𝑟 ∈ 𝑅 = {1,2,3,4} with antibody for STAT𝑖, 𝑖 ∈ 𝐼 = {1,3} at time point 𝑡𝑝 ∈ 𝑇𝑃 = {0 𝑚𝑖𝑛,5 𝑚𝑖𝑛,15 𝑚𝑖𝑛,30 𝑚𝑖𝑛,60 𝑚𝑖𝑛,90 𝑚𝑖𝑛,120 𝑚𝑖𝑛,180 𝑚𝑖𝑛} under stimulation by cytokine IL-𝑗 (HypIL-𝑗 when 𝑗 = 6), with 𝑗 ∈ 𝐽 = {6,27} and in cell type 𝑑 ∈ 𝐷 = {RPE1,Th-1}. Each data point 𝑑𝑎𝑡𝑎(𝑟, 𝑖, 𝑡𝑝,𝑗,𝑑), to be used in the Bayesian inference and Bayesian model selection was then computed as 𝑑𝑎𝑡𝑎(𝑟, 𝑖, 𝑡𝑝,𝑗,𝑑) = 𝑓(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) 𝑓(𝑟, 𝑖, 𝑡𝑝 = 30 𝑚𝑖𝑛,𝑗 = 27,𝑑) . To compare the model output, 𝑠𝑖𝑚, with the data, the output was normalised in the same way as the data, i.e., 𝑠𝑖𝑚(𝑖,𝑡𝑝,𝑗,𝑑) = [𝑝𝑆$]-,.(𝑡𝑝,𝑑) [𝑝𝑆$]-,"*(30 𝑚𝑖𝑛,𝑑) , where [𝑝𝑆$]-,.(𝑡𝑝,𝑑) denotes the total concentration of phosphorylated STAT𝑖 at time 𝑡𝑝 (see Equations 56-59) when considering cell type 𝑑. In this way, experimental data and the mathematical model outputs are comparable. The similarity between the model output and the data points is then computed by the introduction of a distance measure 𝛿(𝑠𝑖𝑚,𝑑𝑎𝑡𝑎). Here, this distance measure was chosen as a generalisation of the Euclidean distance, where 𝛿/(𝑠𝑖𝑚,𝑑𝑎𝑡𝑎)" = Z Z ZM𝑠𝑖𝑚(𝑖,𝑡𝑝,𝑗,𝑑) − 𝜇/%0%(𝑖,𝑡𝑝,𝑗,𝑑)N " .∈203∈-4$∈5 , for 𝑑 ∈ 𝐷 = {RPE1,Th-1}, where 𝜇/%0%(𝑖,𝑡𝑝,𝑗,𝑑) is the mean of the four repeats of the data and is given by 𝜇/%0%(𝑖,𝑡𝑝,𝑗,𝑑) = 1 4 Z𝑑𝑎𝑡𝑎(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) 6 78# . To carry out the Bayesian model selection and Bayesian parameter inference, prior beliefs about the parameters were firstly defined. Each of the parameters (reaction rates) and initial concentrations in the model were sampled from a prior distribution, where the distribution was informed by experimental data or values from the literature, when possible. The choice of prior distributions is given in Table 2. Parameter Prior distribution Reference 𝑟#,) & 107 for 𝑟 ∼ 𝑁(−3,1.5) * 𝑟#,) , 107 for 𝑟 ∼ 𝑁(−3.9,1.96) * 𝑟#,"* & 107 for 𝑟 ∼ 𝑁(−2.34,1.17) * .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 36 𝑟#,"* , 107 for 𝑟 ∼ 𝑁(−2.82,1.41) * 𝑟",$ & for 𝑗 ∈ {6,27} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−2,3) (94) 𝑟",$ , for 𝑗 ∈ {6,27} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−3,1) (94) 𝑘$% & ,𝑘$' & for 𝑖 ∈ {1,3} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−7,1) ** 𝑘$% , ,𝑘$' , for 𝑖 ∈ {1,3} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−2,1) ** 𝑞 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−3,2) Assumed 𝑑$ for 𝑖 ∈ {1,3} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−5,−2) *** β. for 𝑗 ∈ {6,27} 107 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(−5,−1) † [𝑅#(0)] 𝑁(12.7,6.35) ‡ [𝑅"(0)] 𝑁(33.8,16.9) ‡ [𝑆#(0)] 𝑁(300,100) (95) [𝑆((0)] 𝑁(400,100) (95) Table 2: Prior distributions assigned to each parameter and initial concentration in the model. * These distributions are centred around measurements obtained from cell surface receptor quantification experiments. ** These distributions were derived based on 𝐾/ values obtained from the literature (42). *** These distributions are based on values derived from experimental data in which the cells were treated with Tofacitinib. † These distributions were based on values derived from experimental data in which the cells were treated with cycloheximide. ‡ These distributions were based on computations involving approximate cell sizes and average numbers of molecules per cell. We made use of the prior distributions from Table 2 to then carry out a Bayesian model selection to determine which hypothesis is most likely given the RPE1 data for both HypIL-6 and IL-27 signaling. We ran 10) simulations for each mathematical model (HypIL-6 and IL-27) and for each hypothesis, sampling model parameters from their prior distributions. We then computed a summary statistic for varying values of 𝛿94:#,∗, the distance threshold between the mathematical model and data at which parameters are accepted (or rejected) in the ABC. Finally, we computed 𝑓(𝐻<), the number of accepted parameter sets for hypothesis 𝑘, where the parameter sets are accepted if they result in a distance value less than or equal to 𝛿94:#,∗, the distance threshold. This allowed us to compute the relative probability, 𝑝(𝐻=), for each hypothesis, as defined by the following equation 𝑝(𝐻=|δ94:#,∗) = 𝑓(𝐻=|δ94:#,∗) 𝑓(𝐻#|δ94:#,∗) + 𝑓(𝐻"|δ94:#,∗) , for 𝑘 ∈ {1,2}. The results of the model selection analysis for RPE1 are shown in Figure 2d, where the relative probability of hypothesis 1 increases as 𝛿94:#,∗ tends to 0, whilst the relative probability of hypothesis 2 decreases as a function of 𝛿94:#,∗. We hence concluded that the experimental data together with the mathematical models for HypIL-6 and IL-27 signaling provide greater support to hypothesis 1 (around 70%) when compared to hypothesis 2 (around 30%). We note that as the distance threshold, 𝛿94:#,∗, is increased, both hypotheses become equally likely, as is to be expected. Given the results of the model selection, the Bayesian parameter inference for the mathematical models of HypIL-6 and IL-27 signaling was only carried out for hypothesis 1. We used the ABC, sequential Monte Carlo (ABC-SMC), approach (96), to obtain posterior distributions for the parameters in Table 1, making use of the prior distributions in Table 2. All .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 37 model parameters in Table 1 were estimated for the RPE1 data set. A subset of the parameters, which we would expect may vary with cell type, were then estimated for the Th-1 data set. In particular, the parameters not being estimated for Th-1 were sampled from the posterior distributions obtained via the ABC-SMC for RPE1, and those parameters estimated separately for Th-1 were: 𝑞, 𝑑#, 𝑑(, 𝛽), 𝛽"*, [𝑅#(0)], [𝑅"(0)], [𝑆#(0)] and [𝑆((0)]. To further validate the two mathematical models of cytokine signaling, we aimed to reproduce additional experimental results making use of the posterior parameter predictions from the RPE1 data ABC-SMC. Firstly, and in order to replicate the experimental dose response curve seen in Supp. Fig. 2a, we run both models using the 106 accepted parameters sets from the ABC-SMC for 18 different values of cytokine concentration, within the range [10,6 – 10"] log nM. The results of this analysis are seen in Supp. Fig. 12b. We also modified the mathematical models to allow them to describe the IL-27Rα-GP130 chimera experiments (Fig. 3c). In particular, a new mathematical model for the chimera experiments was developed as follows: it consisted of the ODEs from the IL-27 model which are involved in the formation of the dimer, (Equations (23) – (26)) and the ODEs from the HypIL-6 model post-dimer formation (Equations (5) – (22)), in which 𝐷) was replaced by 𝐷"*. The ODE for the IL-27 induced dimer in the chimera model was as follows 𝑑[𝐷"*] 𝑑𝑡 = 𝑟","* & [𝐶"][𝑅#] − 𝑟","* , [𝐷"*] − 2𝑘#% & [𝐷"*][𝑆#] + 𝑘#% , ([𝑆# ⋅ 𝐷"*] + [𝑝𝑆# ⋅ 𝐷"*]) − 2𝑘(% & [𝐷"*][𝑆(] + 𝑘(% , ([𝑆( ⋅ 𝐷"*] + [𝑝𝑆( ⋅ 𝐷"*]) − β"*[𝐷"*]. We simulated both the original mathematical model of IL-27 and the chimera model using the accepted parameter sets from the ABC-SMC. The results can be seen in Supp. Fig. 12a. Finally, we focussed on one of the mutant varieties of IL-27Rα, Y613F and sought to reproduce the results of Fig. 3b making use of the mathematical model of IL-27 signaling. Since the mutation decreases the affinity of STAT1 to IL-27Rα, we fixed the association and dissociation rates of STAT1 to the IL-27Rα chain,𝑘#' & and 𝑘#' , , at values which resulted in a high µM affinity. The specific values chosen were 𝑘#' & = 10,> nM-1s-1 and 𝑘#' , = 10# s-1 which yields an affinity of 10" µM. The rate 𝑘#' , was chosen as approximately the median of the posterior distribution for this parameter from the ABC-SMC, and the rate 𝑘#' & was then significantly decreased in order to increase the affinity value. We simulated the mathematical model of IL-27 signaling using the 106 accepted parameter sets from the ABC-SMC, but where the rates 𝑘#' & and 𝑘#' , were fixed as described above. The pointwise medians and 95% credible intervals of these simulations are plotted in Supp. Fig. 12c, as well as the simulations for the WT, without altering any of the parameter values from the posterior distributions. Altering the binding affinity of STAT1 to IL-27Rα in this way in the mathematical model allows us to generate results which replicate reasonably well, the experimental observations for the Y613F mutant in Figure 3b. Live-cell dual-color single-molecule imaging studies: Single molecule imaging experiments were carried out by total internal reflection fluorescence (TIRF) microscopy with an inverted microscope (Olympus IX71) equipped with a triple-line total internal reflection (TIR) illumination condenser (Olympus) and a back-illuminated electron multiplied (EM) CCD camera (iXon DU897D, 512 x 512 pixel, Andor Technology) as recently described (38-40). A 150 x magnification objective with a numerical aperture of 1.45 (UAPO 150 3 /1.45 TIRFM, Olympus) was used for TIR illumination. All experiments were carried out at room temperature in medium without phenol red supplemented with an oxygen scavenger and a redox-active photoprotectant to minimize photobleaching (97). For Heterodimerization experiments of IL-27Ra and GP130 cell surface labeling of RPE1 GP130 KO, co-transfected with mXFPe-IL-27Ra and mXFPm-GP130, was achieved by adding aGFP-enNBRHO11 and .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 38 aGFP-miNBDY647 to the medium at equal concentrations (5 nM) and incubated for at least 5 min prior to stimulation with IL-27 (20 nM) or HypIL-6 (20 nM). For homodimerization experiments with mXFPm-GP130, aGFP-miNBDY647 and aGFP-miNBRHO11 (98) were used for cell surface receptor labelling as described above. The nanobodies were kept in the bulk solution during the whole experiment in order to ensure high equilibrium binding to mXFP- GP130. For simultaneous dual color acquisition, aGFP-NBRHO11 was excited by a 561 nm diode-pumped solid-state laser at 0.95 mW (~32 W/cm2) and aGFP-NBDY647 by a 642 nm laser diode at 0.65 mW (~22 W/cm2). Fluorescence was detected using a spectral image splitter (DualView, Optical Insight) with a 640 DCXR dichroic beam splitter (Chroma) in combination with the bandpass filter 585/40 (Semrock) for detection of RHO11 and 690/70 (Chroma) for detection of DY647 dividing each emission channel into 512x256 pixel. Image stacks of 150 frames were recorded at 32 ms/frame. Single molecule localization and single molecule tracking were carried out using the multiple- target tracing (MTT) algorithm (99) as described previously (100). Step-length histograms were obtained from single molecule trajectories and fitted by two fraction mixture model of Brownian diffusion. Average diffusion constants were determined from the slope (2-10 steps) of the mean square displacement versus time lapse diagrams. Immobile molecules were identified by the density-based spatial clustering of applications with noise (DBSCAN) algorithm as described recently (101). For comparing diffusion properties and for co-tracking analysis, immobile particles were excluded from the data set. Prior to co-localization analysis, imaging channels were aligned with sub-pixel precision by using a spatial transformation. To this end, a transformation matrix was calculated based on a calibration measurement with multicolour fluorescent beads (TetraSpeck microspheres 0.1 mm, Invitrogen) visible in both spectral channels (cp2tform of type ‘affine’, The MathWorks MATLAB 2009a). Individual molecules detected in the both spectral channels were regarded as co-localized, if a particle was detected in both channels of a single frame within a distance threshold of 100 nm radius. For single molecule co-tracking analysis, the MTT algorithm was applied to this dataset of co-localized molecules to reconstruct co-locomotion trajectories (co- trajectories) from the identified population of co-localizations. For the co-tracking analysis, only trajectories with a minimum of 10 steps (~320 ms) were considered in order to robustly remove random receptor co-localizations (39). For heterodimerization experiments of mXFPe-IL-27Ra and mXFPm-GP130, the relative fraction of dimerized receptors was calculated from the number of co-trajectories relative to the number of IL-27Ra trajectories. GP130 was expressed in moderate excess (~1.5-2 fold), so that maximal receptor assembly was not limited by abundance of the low-affinity subunit GP130. For homodimerization experiments with GP130, the relative fraction of co-tracked molecules was determined with respect to the absolute number of trajectories and corrected for GP130 stochastically double-labelled with the same fluorophore species as follows: 𝐴𝐵∗ = ?@ "×BC ! !"# D×C # !"# DE , 𝑟𝑒𝑙.𝑐𝑜 − 𝑙𝑜𝑐𝑜𝑚𝑜𝑡𝑖𝑜𝑛 = "×?@ ∗ (?&@) where A, B, AB and AB* are the numbers of trajectories observed for Rho11, DY647, co- trajectories and corrected co-trajectories, respectively. The two-dimensional equilibrium dissociation constants (𝐾!"!) were calculated according to the law of mass action for a monomer-dimer equilibrium: Heterodimerization (IL-27Ra+GP130): .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 39 𝐾! "! = M[𝐺𝑃130] − (𝛼 × [𝐼𝐿27𝑅𝑎])N × M[𝐼𝐿27𝑅𝑎] − (𝛼 × [𝐼𝐿27𝑅𝑎])N (𝛼 × [𝐼𝐿27𝑅𝑎]) or 𝐾! "! = [𝐺𝑃130] × j 1 𝛼 − 1k + [𝐼𝐿27𝑅𝑎] × (𝛼 − 1) with: 𝛼 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐼𝐿27 𝑏𝑜𝑢𝑛𝑑 𝐼𝐿27𝑅𝑎 𝑖𝑛 𝑐𝑜𝑚𝑝𝑙𝑒𝑥 𝑤𝑖𝑡ℎ 𝐺𝑃130 Homodimerization (GP130+GP130): 𝐾! "! = [I]% [!] = ([I]&,"[!])% [!] 𝐾! "! = K[L4#(M],"×(N×[L4#(M])O % "×(N×[L4#(M]) with: 𝛼 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐺𝑃130 ℎ𝑜𝑚𝑜𝑑𝑖𝑚𝑒𝑟𝑠 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑡𝑜 [𝐺𝑃130]/2 Where [M] and [D] are the concentrations of the monomer and the dimer, respectively, and [M]0 is the total receptor concentration. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 40 References: 1. J. J. O'Shea, R. Plenge, JAK and STAT signaling molecules in immunoregulation and immune-mediated disease. Immunity 36, 542-550 (2012). 2. S. Pflanz et al., IL-27, a heterodimeric cytokine composed of EBI3 and p28 protein, induces proliferation of naive CD4+ T cells. Immunity 16, 779-790 (2002). 3. H. Yoshida, C. A. Hunter, The immunobiology of interleukin-27. Annu Rev Immunol 33, 417-443 (2015). 4. J. S. Stumhofer et al., Interleukin 27 negatively regulates the development of interleukin 17-producing T helper cells during chronic inflammation of the central nervous system. Nat Immunol 7, 937-945 (2006). 5. C. Diveu et al., IL-27 blocks RORc expression to inhibit lineage commitment of Th17 cells. J Immunol 182, 5748-5756 (2009). 6. D. C. Fitzgerald et al., Suppression of autoimmune inflammation of the central nervous system by interleukin 10 secreted by interleukin 27-stimulated T cells. Nat Immunol 8, 1372-1379 (2007). 7. J. S. Stumhofer et al., Interleukins 27 and 6 induce STAT3-mediated T cell production of interleukin 10. Nat Immunol 8, 1363-1371 (2007). 8. C. Pot, L. Apetoh, A. Awasthi, V. K. Kuchroo, Induction of regulatory Tr1 cells and inhibition of T(H)17 cells by IL-27. Semin Immunol 23, 438-445 (2011). 9. M. J. Boulanger, D. C. Chow, E. E. Brevnova, K. C. Garcia, Hexameric structure and assembly of the interleukin-6/IL-6 alpha-receptor/gp130 complex. Science 300, 2101- 2104 (2003). 10. S. Rose-John, Interleukin-6 Family Cytokines. Cold Spring Harb Perspect Biol 10, (2018). 11. C. A. Hunter, S. A. Jones, IL-6 as a keystone cytokine in health and disease. Nature Immunology 16, 448-457 (2015). 12. T. Korn et al., IL-6 controls Th17 immunity in vivo by inhibiting the conversion of conventional T cells into Foxp3+ regulatory T cells. Proc Natl Acad Sci U S A 105, 18460-18465 (2008). 13. A. Kimura, T. Kishimoto, IL-6: regulator of Treg/Th17 balance. Eur J Immunol 40, 1830-1835 (2010). 14. G. W. Jones et al., Loss of CD4+ T cell IL-6R expression during inflammation underlines a role for IL-6 trans signaling in the local maintenance of Th17 cells. J Immunol 184, 2130-2139 (2010). 15. C. Rolvering et al., Crosstalk between different family members: IL27 recapitulates IFN gamma responses in HCC cells, but is inhibited by IL6-type cytokines. Bba-Mol Cell Res 1864, 516-526 (2017). 16. A. P. Costa-Pereira et al., Mutational switch of an IL-6 response to an interferon- gamma-like response. P Natl Acad Sci USA 99, 8043-8047 (2002). 17. J. Schmitz, M. Weissenbach, S. Haan, P. C. Heinrich, F. Schaper, SOCS3 exerts its inhibitory function on interleukin-6 signal transduction through the SHP2 recruitment site of gp130. Journal of Biological Chemistry 275, 12848-12856 (2000). 18. H. Yasukawa et al., IL-6 induces an anti-inflammatory response in the absence of SOCS3 in macrophages. Nat Immunol 4, 551-556 (2003). 19. B. A. Croker et al., SOCS3 negatively regulates IL-6 signaling in vivo. Nat Immunol 4, 540-545 (2003). 20. C. Brender et al., Suppressor of cytokine signaling 3 regulates CD8 T-cell proliferation by inhibition of interleukins 6 and 27. Blood 110, 2528-2536 (2007). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 41 21. A. Camporeale, V. Poli, IL-6, IL-17 and STAT3: a holy trinity in auto-immunity? Front Biosci (Landmark Ed) 17, 2306-2326 (2012). 22. G. Regis, S. Pensa, D. Boselli, F. Novelli, V. Poli, Ups and downs: the STAT1:STAT3 seesaw of Interferon and gp130 receptor signalling. Semin Cell Dev Biol 19, 351-359 (2008). 23. S. Lucas, N. Ghilardi, J. Li, F. J. de Sauvage, IL-27 regulates IL-12 responsiveness of naive CD4(+) T cells through Stat1-dependent and -independent mechanisms. P Natl Acad Sci USA 100, 15047-15052 (2003). 24. S. Kamiya et al., An indispensable role for STAT1 in IL-27-induced T-bet expression but not proliferation of naive CD4(+) T cells. Journal of Immunology 173, 3871-3877 (2004). 25. A. Takeda et al., Cutting edge: Role of IL-27/WSX-1 signaling for induction of T-Bet through activation of STAT1 during initial Th1 commitment. Journal of Immunology 170, 4886-4890 (2003). 26. C. Neufert et al., IL-27 controls the development of inducible regulatory T cells and Th17 cells via differential effects on STAT1. Eur J Immunol 37, 1809-1816 (2007). 27. T. Owaki et al., STAT3 is indispensable to IL-27-mediated cell proliferation but not to IL-27-induced Th1 differentiation and suppression of proinflammatory cytokine production. Journal of Immunology 180, 2903-2911 (2008). 28. K. Hirahara et al., Asymmetric Action of STAT Transcription Factors Drives Transcriptional Outputs and Cytokine Specificity. Immunity 42, 877-889 (2015). 29. S. Oniki et al., Interleukin-23 and interleukin-27 exert quite different antitumor and vaccine effects on poorly immunogenic melanoma. Cancer Res 66, 6395-6404 (2006). 30. M. Fischer et al., I. A bioactive designer cytokine for human hematopoietic progenitor cell expansion. Nat Biotechnol 15, 142-145 (1997). 31. H. H. Oberg, D. Wesch, S. Grussel, S. Rose-John, D. Kabelitz, Differential expression of CD126 and CD130 mediates different STAT-3 phosphorylation in CD4+CD25- and CD25high regulatory T cells. Int Immunol 18, 555-563 (2006). 32. P. O. Krutzik, M. R. Clutter, A. Trejo, G. P. Nolan, Fluorescent cell barcoding for multiplex flow cytometry. Curr Protoc Cytom Chapter 6, Unit 6 31 (2011). 33. U. A. Betz, W. Muller, Regulated expression of gp130 and IL-6 receptor alpha chain in T cell maturation and activation. Int Immunol 10, 1175-1184 (1998). 34. J. Martinez-Fabregas et al., Kinetics of cytokine receptor trafficking determine signaling and functional selectivity. Elife 8, (2019). 35. C. Gorby et al., Engineered IL-10 variants elicit potent immunomodulatory effects at low ligand doses. Sci Signal 13, (2020). 36. V. Ruprecht, Weghuber, J., Wieser, S., Schütz, G. J, in Advances in Planar Lipid Bilayers and Liposomes. (2010), vol. 12,, pp. 21-40. 37. I. Moraga et al., Instructive roles for cytokine-receptor binding parameters in determining signaling and functional potency. Science Signaling 8, (2015). 38. S. Wilmes et al., Receptor dimerization dynamics as a regulatory valve for plasticity of type I interferon signaling. J Cell Biol 209, 579-593 (2015). 39. S. Wilmes et al., Mechanism of homodimeric cytokine receptor activation and dysregulation by oncogenic mutations. Science 367, 643-652 (2020). 40. I. Moraga et al., Tuning Cytokine Receptor Signaling by Re-orienting Dimer Geometry with Surrogate Ligands. Cell 160, 1196-1208 (2015). 41. S. Pflanz et al., WSX-1 and glycoprotein 130 constitute a signal-transducing receptor for IL-27. J Immunol 172, 2225-2231 (2004). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 42 42. M. Wiederkehr-Adam et al., Characterization of phosphopeptide motifs specific for the Src homology 2 domains of signal transducer and activator of transcription 1 (STAT1) and STAT3. J Biol Chem 278, 16117-16128 (2003). 43. A. Pradhan, Q. T. Lambert, L. N. Griner, G. W. Reuther, Activation of JAK2-V617F by components of heterodimeric cytokine receptors. J Biol Chem 285, 16651-16663 (2010). 44. H. Kim, T. S. Hawley, R. G. Hawley, H. Baumann, Protein tyrosine phosphatase 2 (SHP-2) moderates signaling by gp130 but is not required for the induction of acute- phase plasma protein genes in hepatic cells. Mol Cell Biol 18, 1525-1533 (1998). 45. D. W. Huang, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44-57 (2009). 46. J. Bancerek et al., CDK8 kinase phosphorylates transcription factor STAT1 to selectively regulate the interferon response. Immunity 38, 250-262 (2013). 47. S. Rutz et al., Deubiquitinase DUBA is a post-translational brake on interleukin-17 production in T cells. Nature 518, 417-421 (2015). 48. K. L. O'Hagan, S. D. Miller, H. Phee, Pak2 is essential for the function of Foxp3+regulatory T cells through maintaining a suppressive Treg phenotype. Sci Rep- Uk 7, (2017). 49. D. Z. Ye, J. Field, PAK signaling in cancer. Cell Logist 2, 105-116 (2012). 50. Y. Liao, J. Wang, E. J. Jaehnig, Z. Shi, B. Zhang, WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res 47, W199-W205 (2019). 51. J. Satoh, H. Tabunoki, A Comprehensive Profile of ChIP-Seq-Based STAT1 Target Genes Suggests the Complexity of STAT1-Mediated Gene Regulatory Mechanisms. Gene Regul Syst Bio 7, 41-56 (2013). 52. I. Rusinova et al., Interferome v2.0: an updated database of annotated interferon- regulated genes. Nucleic Acids Res 41, D1040-1046 (2013). 53. H. N. Suh et al., Role of interleukin-6 in the control of DNA synthesis of hepatocytes: involvement of PKC, p44/42 MAPKs, and PPARdelta. Cell Physiol Biochem 22, 673- 684 (2008). 54. A. V. Villarino et al., IL-27 limits IL-2 production during Th1 differentiation. J Immunol 176, 237-247 (2006). 55. K. Hirahara et al., Interleukin-27 Priming of T Cells Controls IL-17 Production In trans via Induction of the Ligand PD-L1. Immunity 36, 1017-1030 (2012). 56. X. Hu et al., Sensitization of IFN-gamma Jak-STAT signaling during macrophage activation. Nat Immunol 3, 859-866 (2002). 57. V. Francois-Newton, M. Livingstone, B. Payelle-Brogard, G. Uze, S. Pellegrini, USP18 establishes the transcriptional and anti-proliferative interferon alpha/beta differential. Biochem J 446, 509-516 (2012). 58. K. Zenke, M. Muroi, K. I. Tanamoto, IRF1 supports DNA binding of STAT1 by promoting its phosphorylation. Immunol Cell Biol 96, 1095-1103 (2018). 59. K. Karwacz et al., Critical role of IRF1 and BATF in forming chromatin landscape during type 1 regulatory cell differentiation. Nat Immunol 18, 412-421 (2017). 60. A. Yoshimura, Y. Wakabayashi, T. Mori, Cellular and molecular basis for the regulation of inflammation by TGF-beta. J Biochem 147, 781-792 (2010). 61. A. Awasthi et al., A dominant function for interleukin 27 in generating interleukin 10- producing anti-inflammatory T cells. Nat Immunol 8, 1380-1389 (2007). 62. J. B. Brown et al., P-selectin glycoprotein ligand-1 is needed for sequential recruitment of T-helper 1 (Th1) and local generation of Th17 T cells in dextran sodium sulfate (DSS) colitis. Inflamm Bowel Dis 18, 323-332 (2012). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 43 63. M. Matsumoto et al., CD43 collaborates with P-selectin glycoprotein ligand-1 to mediate E-selectin-dependent T cell migration into inflamed skin. J Immunol 178, 2499-2506 (2007). 64. D. N. Slenter et al., WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 46, D661-D667 (2018). 65. A. Petretto et al., Proteomic analysis uncovers common effects of IFN-gamma and IL- 27 on the HLA class I antigen presentation machinery in human cancer cells. Oncotarget 7, 72518-72536 (2016). 66. L. H. Wong, I. Hatzinisiriou, R. J. Devenish, S. J. Ralph, IFN-gamma priming up- regulates IFN-stimulated gene factor 3 (ISGF3) components, augmenting responsiveness of IFN-resistant melanoma cells to type I IFNs. J Immunol 160, 5475- 5484 (1998). 67. M. Tokuyama et al., ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. Proc Natl Acad Sci U S A 115, 12565-12572 (2018). 68. C. Garbers et al., Plasticity and cross-talk of interleukin 6-type cytokines. Cytokine Growth Factor Rev 23, 85-97 (2012). 69. S. Kang, M. Narazaki, H. Metwally, T. Kishimoto, Historical overview of the interleukin-6 family cytokine. J Exp Med 217, (2020). 70. R. Umeshita-Suyama et al., Characterization of IL-4 and IL-13 signals dependent on the human IL-13 receptor alpha chain 1: redundancy of requirement of tyrosine residue for STAT3 activation. Int Immunol 12, 1499-1509 (2000). 71. O. W. Nadeau et al., The proximal tyrosines of the cytoplasmic domain of the beta chain of the type I interferon receptor are essential for signal transducer and activator of transcription (Stat) 2 activation. Evidence that two Stat2 sites are required to reach a threshold of interferon alpha-induced Stat2 tyrosine phosphorylation that allows normal formation of interferon-stimulated gene factor 3. J Biol Chem 274, 4045-4052 (1999). 72. M. N. Sharif et al., IFN-alpha priming results in a gain of proinflammatory function by IL-10: implications for systemic lupus erythematosus pathogenesis. J Immunol 172, 6476-6481 (2004). 73. D. Richter et al., Ligand-induced type II interleukin-4 receptor dimers are sustained by rapid re-association within plasma membrane microcompartments. Nat Commun 8, 15976 (2017). 74. J. P. Twohig et al., Activation of naive CD4(+) T cells re-tunes STAT1 signaling to deliver unique cytokine responses in memory CD4(+) T cells. Nat Immunol 20, 458- 470 (2019). 75. P. C. Heinrich et al., Principles of interleukin (IL)-6-type cytokine signalling and its regulation. Biochem J 374, 1-20 (2003). 76. D. Levin, D. Harari, G. Schreiber, Stochastic receptor expression determines cell fate upon interferon treatment. Mol Cell Biol 31, 3252-3266 (2011). 77. I. Moraga, D. Harari, G. Schreiber, G. Uze, S. Pellegrini, Receptor density is key to the alpha2/beta interferon differential activities. Mol Cell Biol 29, 4778-4787 (2009). 78. C. C. M. Ho et al., Decoupling the Functional Pleiotropy of Stem Cell Factor by Tuning c-Kit Signaling. Cell 168, 1041-1052 e1018 (2017). 79. P. Charlot-Rabiega, E. Bardel, C. Dietrich, R. Kastelein, O. Devergne, Signaling events involved in interleukin 27 (IL-27)-induced proliferation of human naive CD4+ T cells and B cells. J Biol Chem 286, 27350-27362 (2011). 80. J. Diegelmann, T. Olszak, B. Goke, R. S. Blumberg, S. Brand, A Novel Role for Interleukin-27 (IL-27) as Mediator of Intestinal Epithelial Barrier Protection Mediated via Differential Signal Transducer and Activator of Transcription (STAT) Protein .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 44 Signaling and Induction of Antibacterial and Anti-inflammatory Proteins. Journal of Biological Chemistry 287, 286-298 (2012). 81. H. Bender et al., Interleukin-27 displays interferon-gamma-like functions in human hepatoma cells and hepatocytes. Hepatology 50, 585-591 (2009). 82. T. Imamichi, J. Yang, W. Huang da, B. Sherman, R. A. Lempicki, Interleukin-27 induces interferon-inducible genes: analysis of gene expression profiles using Affymetrix microarray and DAVID. Methods Mol Biol 820, 25-53 (2012). 83. J. M. Fakruddin et al., Noninfectious papilloma virus-like particles inhibit HIV-1 replication: implications for immune control of HIV-1 infection by IL-27. Blood 109, 1841-1849 (2007). 84. A. C. Frank et al., Interleukin-27, an anti-HIV-1 cytokine, inhibits replication of hepatitis C virus. J Interferon Cytokine Res 30, 427-431 (2010). 85. S. L. LaPorte et al., Molecular and structural basis of cytokine receptor pleiotropy in the interleukin-4/13 system. Cell 132, 259-272 (2008). 86. J. B. Spangler, I. Moraga, K. M. Jude, C. S. Savvides, K. C. Garcia, A strategy for the selection of monovalent antibodies that span protein dimer interfaces. J Biol Chem 294, 13876-13886 (2019). 87. A. Kirchhofer et al., Modulation of protein properties in living cells using nanobodies. Nat Struct Mol Biol 17, 133-138 (2010). 88. M. C. Hochberg, Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum 40, 1725 (1997). 89. J. Cox, M. Mann, MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26, 1367-1372 (2008). 90. J. Cox et al., Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10, 1794-1805 (2011). 91. P. O. Krutzik, G. P. Nolan, Fluorescent cell barcoding in flow cytometry allows high- throughput drug screening and signaling profiling. Nat Methods 3, 361-368 (2006). 92. W. Huang da, B. T. Sherman, R. A. Lempicki, Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 1-13 (2009). 93. W. Huang da, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44-57 (2009). 94. N. Kozer et al., Exploring higher-order EGFR oligomerisation and phosphorylation--a combined experimental and theoretical approach. Mol Biosyst 9, 1849-1863 (2013). 95. D. N. Itzhak, S. Tyanova, J. Cox, G. H. Borner, Global, quantitative and dynamic mapping of protein subcellular localization. Elife 5, (2016). 96. T. Toni, D. Welch, N. Strelkowa, A. Ipsen, M. P. Stumpf, Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6, 187-202 (2009). 97. J. Vogelsang et al., A reducing and oxidizing system minimizes photobleaching and blinking of fluorescent dyes. Angew Chem Int Ed Engl 47, 5465-5469 (2008). 98. A. Kirchhofer et al., Modulation of protein properties in living cells using nanobodies. Nat Struct Mol Biol 17, 133-U162 (2010). 99. A. Serge, N. Bertaux, H. Rigneault, D. Marguet, Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. Nat Methods 5, 687-694 (2008). 100. C. You et al., Receptor dimer stabilization by hierarchical plasma membrane microcompartments regulates cytokine signaling. Sci Adv 2, e1600452 (2016). 101. F. Roder, A. Lubk, D. Wolf, T. Niermann, Noise estimation for off-axis electron holography. Ultramicroscopy 144, 32-42 (2014). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 45 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 46 FIGURE LEGENDS: Figure 1 Cytokine receptor activation by IL-27 and (Hyp)IL-6: a) Cartoon model of stepwise assembly of the IL-27 and HypIL-6-induced receptor complex and subsequent activation of STAT1 and STAT3. b) Dose-dependent phosphorylation of STAT1 and STAT3 as a response to IL-27 and HypIL-6 stimulation in TH-1 cells, normalized to maximal IL-27 stimulation. Data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. c) Phosphorylation kinetics of STAT1 and STAT3 followed after stimulation with saturating concentrations of IL-27 (2nM) and HypIL-6 (20nM) or unstimulated TH-1 cells, normalized to maximal IL-27 stimulation. Data was obtained from five biological replicates with each two technical replicates, showing mean ± std dev. d) Top: Phosphorylation kinetics of STAT1 and STAT3 followed after stimulation with HypIL-6 (20nM) or left unstimulated, comparing wt RPE1 and RPE1 GP130KO reconstituted with high levels of mXFPm-GP130 (=10x [GP130]). Data was normalized to maximal stimulation levels of wt RPE1. Left: cell surface GP130 levels comparing RPE1 GP130KO, wt RPE1 and RPE1 GP130KO stably expressing mXFPm-GP130 measured by flow cytometry. Data was obtained from one biological replicate with each two technical replicates, showing mean ± std dev. Bottom right: cell surface levels of GP130 measured by flow cytometry for indicated cell lines. e) Cartoon model of cell surface labeling of mXFP-tagged receptors by dye-conjugated anti-GFP nanobodies (NB) and identification of receptor dimers by single molecule dual-colour co-localization. f) Raw data of dual-colour single-molecule TIRF imaging of mXFPe-IL-27RαNB-RHO11 and GP130NB-DY649 after stimulation with IL-27. Particles from the insets (IL-27Ra: red & GP130: blue) were followed by single molecule tracking (150 frames ~ 4.8s) and trajectories >10 steps (320ms) are displayed. Receptor heterodimerization was detected by co-localization/co-tracking analysis. g) Relative number of co-trajectories observed for heterodimerization of IL-27Rα and GP130 as well as homodimerization of GP130 for unstimulated cells or after indicated cytokine stimulation. Each data point represents the analysis from one cell with a minimum of 23 cells measured for each condition. *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. h) Stoichiometry of the IL-27–induced receptor complex revealed by bleaching analysis. Left: Intensity traces of mXFPe-IL27RαNB-RHO11 and GP130NB-DY649 were followed until fluorophore bleaching. Middle: Merged imaging raw data for selected timepoints. Right: overlay of the trajectories for IL-27Rα (red) and GP130 (blue). Figure 2: Mathematical modelling results in RPE1 and Th-1 cells. a) Simplified cartoon model of IL-27/HypIL-6 signal propagation layers and coverage of the mathematical modelling approach. b) Model selection results showing the relative probabilities of each hypothesis, for different values of the distance threshold, 𝛿∗, in RPE1 cells. c) Pointwise median and 95% credible intervals of the predictions from the mathematical model, calibrated with the experimental data, using the posterior distributions for the parameters from the ABC-SMC. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 47 d) Kernel density estimates of the posterior distributions for the parameters 𝑝 ∈ {𝑟#,. & ,𝑟#,. , ,𝑟",. & ,𝑟",. , ,𝑘$% & ,𝑘$% , ,𝑘$' & ,𝑘$' , ,𝑞,𝑑$,𝛽., [𝑅#(0)],[𝑅"(0)],[𝑆#(0)],[𝑆((0)]} in the mathematical models where 𝑗 ∈ {6,27} and 𝑖 ∈ {1,3}. Figure 3: IL-27Rα cytoplasmic domain is required for sustained pSTAT1 kinetics. a) Representation of the cytoplasmic domain of IL-27Rα with its highlighted tyrosine residues Y543 and Y613. b) STAT1 and STAT3 phosphorylation kinetics of RPE1 clones stably expressing wt and mutant IL-27Rα after stimulation with IL-27 (10 nM, top panels) or after stimulation with HypIL-6 (20 nM, bottom panels), normalized to maximal levels of wt IL-27Rα stimulated with IL-27 (top) or HypIL-6 (bottom). Data was obtained from three experiments with each two technical replicates, showing mean ± std dev. Bottom right: cell surface levels variants measured by flow cytometry for indicated IL-27Rα cell lines. c) Cytoplasmic domain of IL-27Rα is required for sustained pSTAT1 activation. Left: Cartoon representation of receptor complexes. Right: STAT1 and STAT3 phosphorylation kinetics of RPE1 clones stably expressing wt IL-27Rα and IL-27Rα- GP130 chimera after stimulation with IL-27 (10 nM, top panels) or after stimulation with HypIL-6 (20 nM, bottom panels). Data was normalized to maximal levels for each cytokine and cell line. Data was obtained from two experiments with each 2 technical replicates, showing mean ± std dev. d) Phosphatases do not account for differential pSTAT1/3 activity induced by IL-27 and HypIL-6. Left: Schematic representation of workflow using JAK inhibitor Tofacitinib. Right: MFI ratio of Tofacitinib-treated and non-treated RPE1 mXFPe-IL-27Rα cells for pSTAT1 and pSTAT3 after stimulation with IL-27 (10nM) and HypIL-6 (20nM). Data was obtained from two experiments with each two technical replicates, showing mean ± std dev. Figure 4: Unique and overlapping effects of IL-27 and HypIL-6 on the phosphoproteome of Th-1 cells. a) Volcano plot of the phospho-sites regulated (p value £ 0.05, fold change ³+1.5 or £- 1.5) by IL-27 (left) and HypIL-6 (right). Data was obtained from three biological replicates. b) Heatmap representation (examples) of shared and differentially up- (left) and downregulated (right) phospho-sites after IL-27 and HypIL-6 stimulation. Data represents the mean (log2) fold change of three biological replicates. c) Tyrosine and Serine phosphorylation of selected STAT proteins after stimulation with IL-27 (red) and HypIL-6 (blue). *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. d) pS727-STAT1 and pS727-STAT3 phosphorylation kinetics in Th-1 cells after stimulation with IL-27 or HypIL-6, normalized to maximal IL-27 stimulation. Data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. e) GO analysis “biological processes” of the phospho-sites regulated by IL-27 (red) and HypIL-6 (blue) represented as bubble-plots. f) Phosphorylation of target proteins associated with STAT3/CDK transcription initiation complex after stimulation with IL-27 (blue) and HypIL-6 (red) and schematic representation of transcription regulation of RNA polymerase II with identified phospho-sites (red flags). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 48 Figure 5: Kinetic decoupling of gene induction programs depends on sustained STAT1 activation by IL-27. a) Principal component analysis for genes found to be significantly upregulated (left) or downregulated (right) for at least one of the tested conditions (time & cytokine). Data was obtained from three biological replicates. b) Kinetics of gene induction shared between IL-27 and HypIL-6 (relative to IL-27) for upregulated genes (red) or downregulated genes (green). c) Kinetics of gene numbers induced after IL-27 and HypIL-6 stimulation for upregulated genes (left) and downregulated genes (right). d) GSEA reactome analysis of selected pathways with significantly altered gene induction in response to IL-27 or HypIL-6 stimulation. Data represents the mean (log2) fold change of three biological replicates. e) Cluster analysis comparing the gene induction kinetics after IL-27 or HypIL-6 stimulation. Gene induction heatmaps for example genes as well as induction kinetics (mean) are shown for highlighted gene clusters. Data represents the mean (log2) fold change of three biological replicates. Figure 6: IL-27-induced upregulation of IRF1 amplifies induction of STAT1-dependent genes a) Kinetics of IRF1 protein expression as a response to continuous IL-27 and HypIL-6 stimulation in Th-1 cells. Data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. Dotted line indicates baseline level. b) Kinetics of IRF1 protein expression and siRNA-mediated IRF1 knockdown in RPE1 IL- 27Rα cells stimulated with IL-27 (2nM). Data was obtained from one representative experiment with each two technical replicates, normalized to maximal IRF1 induction (6h), showing mean ± std dev. c) Kinetics of STAT1 (left) and STAT3 (right) phosphorylation after siRNA-mediated IRF1 knockdown in RPE1 IL-27Rα cells stimulated with IL-27 (2nM). Data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. d) Kinetics of gene induction (STAT1, GBP5, OAS1, SOCS3) followed by RT qPCR in RPE1 IL-27Rα cells stimulated with IL-27 (2nM) with and without siRNA-mediated knockdown of IRF1. Data was obtained from three experiments with each two technical replicates, showing mean ± SEM. Figure 7: IL-27-induced STAT1 response drives global proteomic changes in Th-1 cells. a) Workflow for quantitative SILAC proteomic analysis of Th-1 cells continuously stimulated (24h) with IL-27 (10nM), HypIL-6 (20nM) or left untreated. b) Global proteomic changes in Th-1 cells induced by IL-27 (left) or HypIL-6 (right) represented as volcano plots. Proteins significantly up- or downregulated are highlighted in red (p value £ 0.05, fold change ³+1.5 or £-1.5). Significantly altered ISG-encoded proteins by IL-27 are highlighted in yellow. Data was obtained from three biological replicates. c) Venn diagrams comparing unique upregulated (left) and downregulated (right) proteins by IL-27 (blue) and HypIL-6 (red) as well as shared altered proteins. ISG-encoded proteins are highlighted in yellow. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 49 d) Heatmaps of the top 30 up- and downregulated proteins by IL-27 compared to HypIL- 6. Data representation of the mean (log2) fold change of three biological replicates. e) Heatmap representation and enrichment plot of proteins identified by GSEA reactome pathway enrichment analysis “Cytokine signaling and immune system” induced by IL- 27. Data representation of the mean (log2) fold change of three biological replicates. f) Correlation of IL-27 and HypIL-6-induced RNA-seq transcript levels (³+2 or £-2 fc) with quantitative proteomic data (³+1.5 or £-1.5 fc). Data representation of the mean (log2) fold change of three biological replicates. Figure 8: Receptor and STAT concentrations determine the nature of the cytokine response. a) Copy numbers of indicated proteins determined for different T-cell subsets using mass- spectrometry based proteomics (ImmPRes - http://immpres.co.uk). b) Model predictions for varying levels of STAT1 and STAT3 (left panel) or IL-27Rα and GP130 (right panel) for phosphorylation kinetics of STAT1 and STAT3. c) Gene expression profiles determined by RNAseq analysis comparing indicated genes of a cohort of SLE risk patients with a cohort of healthy controls. Data obtained from: Proc Natl Acad Sci U S A 115, 12565-12572 . *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. d) Dose-dependent phosphorylation of STAT1 and STAT3 as a response to IL-27 and HypIL-6 stimulation in naive and IFNα2-primed (2nM, 24h) Th-1 cells, normalized to maximal IL-27 stimulation (ctrl). Data was obtained from four biological replicates with each two technical replicates, showing mean ± std dev. e) Phosphorylation of STAT1 (left) and STAT3 (right) as a response to IL-27 (2nM, 15min) and HypIL-6 (10nM, 15min) stimulation in healthy control (ctrl) and SLE patient CD4+ T-cells. Data was obtained from five healthy control donors (5) and six SLE patients. *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. f) Tofacitinib titration to inhibit STAT1 and STAT3 phosphorylation by HypIL-6 (10nM, 15min) in Th-1 cells (left) and RPE1 cells stably expressing wt IL-27Rα (right). Supp. Figure 1: a) Comparison of dose-dependent phosphorylation (STAT1/3) of purchased IL-27 and mIL-27sc in activated CD4+ cells, normalized to maximal MFI levels. Data was obtained from one (purchased) or two (mIL-27sc) biological replicates with each two technical replicates, showing mean ± std dev. b) Schematic workflow of T-cell isolation, TH1 differentiation, fluorescence barcoding and gating strategy for high throughput flow cytometry. c) Phosphorylation kinetics of STAT1 and STAT3 followed after stimulation with IL-27 (10nM) and HypIL-6 (20nM) or unstimulated TH1 cells. Data (from Fig. 1c) was normalized to maximal MFI levels for each cytokine. Data was obtained from five biological replicates with each two technical replicates, showing mean ± std dev. d) Phosphorylation kinetics of activated PBMCs (CD4+, CD8+) of STAT1 and STAT3 followed after stimulation with IL-27 (2nM) and HypIL-6 (20nM) or unstimulated cells. Data was normalized to maximal IL-27 stimulation. Data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. e) Dose-response experiments in wt RPE1 cells for pSTAT1 (left) and pSTAT3 (right), stimulated with IL-27 or HypIL-6, normalized to maximal HypIL-6 stimulation. Data was .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 50 obtained from one representative experiment with each two technical replicates, showing mean ± std dev. Supp. Figure 2: a) Dose-response experiments for pSTAT1 and pSTAT3 comparing RPE1 GP130 KO cells (left), wt RPE1 (middle) and RPE1 mXFPe-IL27Ra (right) after stimulation with IL-27 or HypIL-6, normalized to maximal HypIL-6 stimulation. Data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. b) Ligand-induced receptor dimerization: Top panel: Dual-colour co-tracking of IL-27Rα and GP130 in the absence (top) and presence (bottom) of IL-27 (20nM). Trajectories (150 frames, ~4.8 s) of individual mXFPe-IL27RαNB-RHO11 (red) and GP130NB-DY649 (blue) and co-trajectories (magenta) are shown for a representative cell. Bottom panel: Dual-colour co-tracking of GP130 in the absence (top) and presence (bottom) of HypIL-6 (20nM). Trajectories (150 frames, ~4.8 s) of individual mXFPe-IL27RαNB-RHO11 (red) and GP130NB-DY649 (blue) and co-trajectories (magenta) are shown for a representative cell. c) Top: Cartoon model of cell surface labeling of mXFP-tagged GP130 by dye-conjugated anti-GFP nanobodies (NB) and formation of single-colour homodimers (left) or dual- colour homodimers (right). Below: Examples for intensity traces of single-colour dual- step bleaching (left) or dual-colour single-step bleaching (right). Insets show raw data for selected timepoints and corresponding trajectories. d) Top: comparison of diffusion coefficients (D) for mXFPe-IL-27RαNB-RHO11 (red) and mXFPmGP130NB-DY649 (blue) in presence and absence of IL-27 stimulation (20nM), as well as co-trajectories after IL-27 stimulation (magenta). Bottom: comparison of diffusion coefficients for mXFPm-GP130NB-RHO11 (red) in presence and absence of HypIL-6 stimulation (20nM), as well as co-trajectories after HypIL-6 stimulation (magenta). Each data point represents the analysis from one cell with a minimum of 23 cells measured for each condition. *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. Supp. Figure 3: a) Reactions involving ligand binding and dimerization in the HypIL-6 model. b) Reactions involving ligand binding and dimerization in the IL-27 model. c) Reactions involving the STAT molecules (𝑆. 𝑓𝑜𝑟 𝑗 ∈ {1,3}) in the HypIL-6 model. d) Reactions involving the STAT molecules (𝑆. 𝑓𝑜𝑟 𝑗 ∈ {1,3}) in the IL-27 model. e) Reactions involving receptor internalisation/degradation in the HypIL-6 model. Here 𝐻1 = 𝛽) and 𝐻2 = 𝛾)([𝑝𝑆1] + [𝑝𝑆1]). f) Reactions involving receptor internalisation/degradation in the IL-27 model. Here 𝐻1 = 𝛽"* and 𝐻2 = 𝛾"*([𝑝𝑆1] + [𝑝𝑆1]). g) Dephosphorylation of (𝑆. 𝑓𝑜𝑟 𝑗 ∈ {1,3}) in the cytoplasm. This reaction occurs in both models. h) Key for the molecules in the reactions. Supp. Figure 4: a) STAT1 (left) and STAT3 (right) phosphorylation kinetics of RPE1 clones stably expressing wt IL-27Rα after stimulation with IL-27 or after stimulation with HypIL-6 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 51 normalized to maximal IL-27 stimulation. Data was obtained from three experiments with each two technical replicates, showing mean ± std dev. b) Dose-response experiments for pSTAT1 (left) and pSTAT3 (right) in RPE1 cells stably expressing wt IL-27Rα or tyrosine-mutants after stimulation with IL-27, normalized to maximal stimulation of wt IL-27Rα. Data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. Supp. Figure 5: a) Dose-response experiments for pSTAT1 (left) and pSTAT3 (right) in RPE1 cells stably expressing wt IL-27Rα or IL-27Ra-GP130 chimera after stimulation with IL-27. Data normalized to maximal stimulation of wt IL-27Rα. Data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. b) STAT1 (left) and STAT3 (right) phosphorylation kinetics in RPE1 IL-27Rα cells stimulated with IL-27 or HypIL-6 with and without JAK inhibition by Tofacitinib. Data was normalized to maximal IL-27 stimulation. Data was obtained from two experiments with each two technical replicates, showing mean ± std dev. c) STAT1 (left) and STAT3 (right) phosphorylation kinetics in Th-1 cells stimulated with IL-27 or HypIL-6 with and without JAK inhibition by Tofacitinib. Data was normalized to to maximal IL-27 stimulation. Data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. d) MFI ratio of Tofacitinib-treated and non-treated Th-1 cells for pSTAT1 (left) and pSTAT3 (right) after stimulation with IL-27 (10nM) and HypIL-6 (20nM). Data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. Supp. Figure 6: a) STAT1 (left) and STAT3 (right) phosphorylation kinetics in RPE1 IL-27Rα cells stimulated with IL-27 or HypIL-6 with and without pretreatment with cycloheximide (CHX). Data was normalized to to maximal IL-27 stimulation. Data was obtained from two experiments with each two technical replicates, showing mean ± std dev. b) STAT1 (left) and STAT3 (right) phosphorylation kinetics in TH1 cells stimulated with IL-27 or HypIL-6 with and without pretreatment with cycloheximide (CHX). Data was normalized to to maximal IL-27 stimulation. Data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. Supp. Figure 7: a) Workflow for quantitative SILAC phospho-proteomic analysis of TH-1 cells stimulated (15min) with IL-27 (10 nM), HypIL-6 (20 nM) or left untreated. b) Schematic representation of the main GO terms regulated by IL27 as inferred from our p-proteomics studies. Red represents downregulated p-sites and blue represents upregulated p-sites upon IL27 stimulation of human primary Th-1 cells. c) Schematic representation of the main GO terms regulated by HyIL6 as inferred from our p-proteomics studies. Red represents downregulated p-sites and blue upregulated p-sites upon HyIL6 stimulation of human primary Th-1 cells. Supp. Figure 8: .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 52 a) Venn diagrams comparing the numbers of unique upregulated (left) and downregulated (right) phospho-sites by IL-27 (blue) and HypIL-6 (red) as well as the number of shared phospho-sites. b) List of most strongly altered phosphosites (downregulated: green; upregulated: red) in response to IL-27 (left) or HypIL-6 (right). c) GO analysis “cellular location” and “UP keywords” of the phospho-sites regulated by IL27 (red) and HypIL-6 (blue) represented as bubble-plots. d) Phosphorylation of target proteins related to Treg functions and schematic representation of their activity on T-cells. Supp. Figure 9: a) Kinetics of gene induction in Th-1 cells induced by IL-27 represented as volcano plots. Genes significantly up- or downregulated are highlighted in red (p value £ 0.05, fold change ³+2 or £-2). Data was obtained from three biological replicates. b) Kinetics of gene induction in Th-1 cells induced by HypIL-6 represented as volcano plots. Genes significantly up- or downregulated are highlighted in red (p value £ 0.05, fold change ³+2 or £-2). Data was obtained from three biological replicates. c) Kinetics of gene induction in Th-1 cells induced by HypIL-6 represented as volcano plots. Genes identified to be significantly up- or downregulated by IL-27 are highlighted in red (p value £ 0.05, fold change ³+2 or £-2). Data was obtained from three biological replicates. Supp. Figure 10: a) Gene induction kinetics represented as pie-charts, separated for upregulated genes (top panel) and downregulated genes (bottom panel). b) Kinetics of ISG induction (examples) as heatmap representation comparing IL-27 with HypIL-6 (top) and GSEA reactome pathway enrichment “IFN signaling” for genes induced by IL-27 after 6h (bottom). Data represents the mean (log2) fold change of three biological replicates. c) Heatmaps of the top 30 up- and downregulated genes by IL-27 compared to HypIL-6 for 1h, 6h and 24h. Data represents the mean (log2) fold change of three biological replicates. d) Kinetics of IRF1 protein expression as a response to continuous IL-27 and HypIL-6 stimulation in Th-1 cells. Data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. Supp. Figure 11: a) Pie charts of proteomic changes (unique & shared) for upregulated (left) and downregulated (right) proteins in response to IL-27 or HypIL-6 stimulation in Th-1 cells. b) Left: GSEA reactome pathway enrichment analysis “Interferon signaling” for proteins induced by IL-27. Middle: heatmap representation pathway-associated proteins comparing IL-27 with HypIL-6 stimulation. Data represents the mean (log2) fold change of three biological replicates. Right: Localization of the identified proteins in context to the data distribution of IL-27-induced proteomic changes. Pathway-associated proteins are highlighted for IL-27 (blue) and HypIL-6 (red) as well as non-significant data distribution (grey). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 53 c) Left: GSEA reactome pathway enrichment analysis “Cytokine signaling and immune system” for proteins induced by IL-27. Middle: heatmap representation pathway- associated proteins comparing IL-27 with HypIL-6 stimulation. Data represents the mean (log2) fold change of three biological replicates. Right: Localization of the identified proteins in context to the data distribution of IL-27-induced proteomic changes. Pathway-associated proteins are highlighted for IL-27 (blue) and HypIL-6 (red) as well as non-significant data distribution (grey). d) Average Intensity distribution of untreated proteomic data. Top up- and downregulated proteins (≥ +4x or ≤ -4x change) altered by IL-27 (left) or HypIL-6 (right) stimulation are indicated. Supp. Figure 12: a) Pointwise median and 95% credible intervals of the WT and chimera mathematical models, using the posterior distributions for the parameters from the ABC-SMC. b) Dose response curve in RPE1 using the posterior distributions from the ABC-SMC and varying the concentrations of HypIL-6 and IL-27 in the model. c) Pointwise median and 95% credible intervals of the WT mathematical model and simulations of a mutant model with 𝑘#' & = 10,> nM-1 s-1 and 𝑘#' , = 10M s-1, using the posterior distributions for the parameters from the ABC-SMC for the other parameters. Supp. Figure 13: a) Fold induction of total STAT1 and STAT3 levels in Th-1 measured by flow cytometry. Data was obtained from two biological replicates. b) Total levels of STAT1 and STAT3 measured in CD4+ by flow cytometry for healthy control (ctrl) and Lupus patients (SLE). Data was obtained from five (ctrl) and six (SLE) biological replicates. *P < 0.05, **P ≤ 0.01,***P ≤ 0.001; n.s., not significant. c) Ratio of pSTAT1 and pSTAT3 after IL-27 (15min, 2nM) or HypIL-6 (15 min, 10nM) stimulation measured in CD4+ by flow cytometry for healthy control (ctrl) and Lupus patients (SLE). Data was obtained from five (ctrl) and six (SLE) biological replicates normalized to mean ratio of healthy control samples. d) Tofacitinib titration to inhibit STAT1 and STAT3 phosphorylation by IL-27 (2nM) in Th- 1 cells (left) and RPE1 cells stably expressing wt IL-27Rα (right). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 54 Supp. Movie 1: Single-molecule co-tracking as a readout for dimerization of cytokine receptors. Cell surface labelling of mXFPe-IL-27Rα by eNBRHO11 (left, top) and mXFPm-GP130 by mNBDY649 (left, bottom) after stimulation with IL-27 (20nM). In the overlay of the zoomed section of both spectral channels (mXFPe-IL-27RαRHO11: Red, mXFPm-GP130DY649: Blue), yellow lines indicate co-locomotion of IL-27Rα and GP130 (≥ 10 steps). Acquisition frame rate: 30 Hz, Playback: real time. Supp. Movie 2: Dynamics of IL-27-induced receptor assembly. Formation of a single-molecule heterodimer of mXFPe-IL-27RαRHO11 (Red) and mXFPm-GP130DY649 (Blue) in presence of IL-27. Yellow lines indicate co-locomotion of IL-27Rα and GP130 (≥ 10 steps). Acquisition frame rate: 30 Hz, Playback: real time with break at time of receptor dimerization. Supp. Movie 3: Ligand-induced heterodimerization of IL-27Rα and GP130. Overlay of the two spectral channels (mXFPe-IL-27RαRHO11: Red, mXFPm-GP130DY649: Blue) in absence (left) or presence (right) of IL-27 (20nM). Yellow lines indicate co-locomotion of IL-27Rα and GP130 (≥ 10 steps). Acquisition frame rate: 30 Hz, Playback: real time. Supp. Movie 4: Ligand-induced homodimerization of GP130. Overlay of the two spectral channels (mXFPm- GP130RHO11: Red, mXFPm-GP130DY649: Blue) in absence (left) or presence (right) of HypIL-6 (20nM). Yellow lines indicate co-locomotion of IL-27Rα and GP130 (≥ 10 steps). Acquisition frame rate: 30 Hz, Playback: real time. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425379doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425379 http://creativecommons.org/licenses/by/4.0/ 0.0 0.5 1.0 1.5 2.0 0 5000 10000 15000 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 Fig. 1 IL-27Rα p28 EBI3 IL-27 JAK1JAK2 GP130 HypIL-6 IL-6IL-6Rα(ECD) pSTAT1/3 a) b) e) time / min time / min pS TA T1 / re l. M FI pS TA T3 / re l. M FI pSTAT1 pSTAT3 𝚫 𝚫 𝚫 𝚫 𝚫 -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 c / log nMc / log nM pS TA T1 / re l. M FI pS TA T3 / re l. M FI pSTAT1 pSTAT3 𝚫 c) 5µm GP130 IL-27 IL-27Rα GP130 Co-Localization eNBRho11 mNBDy647 IL-27Rα R el . C o- Lo co m ot io n in te ns ity . / a .u . IL-27Rα GP130 time / s IL-27Rα GP130 Dimers f) 0 s 0.54 s 1.53 s 2.43 s 500 nmIL-27Rα GP130Rho11 bleached 𝚫FRET Rho11 bleached DY649 bleached g) h) d) time / mintime / min pS TA T1 / re l. M FI pS TA T3 / re l. M FI pSTAT1 pSTAT3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Heterodimerization IL-27Rα + GP130 +HypIL-6+IL-27 Homodimerization GP130 + GP130 *** *** 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 wt [GP130] unstim. 10x [GP130] unstim. wt [GP130] + HypIL-6 10x [GP130] + HypIL-6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 co un t receptor expression GP130 KO wt [GP130] 10x [GP130] a) Fig. 2 1. Receptor assembly 4. Proteome changes 3. Gene induction IL-27 IL-27 Rα GP13 0 pSTAT1/3 STAT1/3 2. STAT activation mathematical modelling pS TA T1 / re l. M FI pS TA T3 / re l. M FI time / min time / min 𝜹∗ N o. a cc ep te d pa ra m et er s c) b) d) 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. wt Y543F Y613F Y543F-Y613F 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. wt chimera 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. wt chimera 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27Rα cytoplasmic domain Y543 Y613 TSGRCYHLRHKVLPRWVWEKVPDPANSSSGQPHMEQVPEAQPLGDLPILEVEEMEPPPVMESS QPAQATAPLDSGYEKHFLPTPEELGLLGPPRPQVLA* Fig. 3 0min 5min 15min 30min 60min 90min 120min 180min +T of ac iti ni b unstim. +IL-27 +HypIL-6 time / min pS TA T3 / re l. M FI pS TA T1 / re l. M FI time / min -80% pSTAT1 -20% pSTAT3 b) a) d) 0 15 30 45 60 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 time / min R at io p S TA T1 + /- To f. +Tofacitinib 0 15 30 45 60 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 time / min R at io p S TA T3 + /- To f. +Tofacitinib IL-27Rα GP130 +IL-27 IL-27Rα-GP130 GP130 +IL-27 GP130 GP130 +HypIL-6 pS TA T1 / re l. M FI time / min HypIL-6 pSTAT1 pS TA T1 / re l. M FI time / min IL-27 pSTAT1 𝚫 𝚫 𝚫 𝚫 IL-27 pSTAT3 HypIL-6 pSTAT3 pS TA T3 / re l. M FI pS TA T3 / re l. M FI time / min time / min c) time / min pS TA T3 / re l. M FI pS TA T1 / re l. M FI time / min HypIL-6 pSTAT1 IL-27 pSTAT1 IL-27 pSTAT3 HypIL-6 pSTAT3 pSTAT1 pSTAT3 co un t receptor expression ctrl wt Y543F Y613F Y543F- Y613F JAK1 JAK2 NE LFA S2 33 PP M1 G T 122 RC HY 1 S 257 LA RP 7 S 300 PO LR 2A S19 10 PO LR 2A S19 20 PO LR 2A S19 13 0 1 2 5 10 15 20 Fig. 4 -8 -4 -2 -1 0 1 2 4 8 0 1 2 3 4 5 6 7 8 9 10 11 12 fold change / log2 p v al u e / - lg 10 unchanged downregulated upregulated -8 -4 -2 -1 0 1 2 4 8 0 1 2 3 4 5 6 7 8 9 10 11 12 fold change / log2 p v al u e / - lg 10 unchanged downregulated upregulated MAP1B CHD12 SCAF11 WRNIP1 BOLA1 BAD STAT3 STAT1 UBR5 STAT5 MAP1B CHD12 SCAF11WRNIP1 BOLA1 RCHY1 NELFA STAT1 STAT3 PPM1G 155 87 140 78 b) a) IL-27 HypIL-6 c)shared and differentially regulated p-sites LGALSL (S) BAD (S) STAT4 (Y) STAT3 (Y) STAT1 (Y) STAT5A,B (Y) PTPN11 (Y) PPM1G (T) SUGP2 (S) CARD11 (S) STAT3 (S) RNASE9 (S, T) AHNAK (S) CLK3 (S) AHNAK (T) BAD (S) ARL6IP4 (S) UBR5 (S) PIEZO1 (S) REPS1 (S) SRRM2 (S) ANKRD36C (T) CDCA7L (S) NELFA (S) NDRG1 (S) PRR12 (S) RCHY1 (S) OSBPL11 (S) ZNF217 (S) RPS6KA3 (S) 0 1 2 3 4 >5 CDH12 (S) MAP1B (S) ZNF280C (S,T) ADGRF2 (T,Y) ZC2HC1A (S) BOLA1 (S) GTF2I (S) TACC1 (S, Y) SCAF11 (S) ABCC1 (S) WRNIP1 (S) SEC23IP (S) OSBPL8 (S) STAU2 (S) LRRFIP1 (S) TOP2B (S) ZCRB1 (S) RFX5 (S) PABPN1 (S) ARHGDIA (S) FAM47E (T,Y) NUDT19 (S) HNRNPF (S) TPR (S) TALDO1 (S) PCNX (S) KLC1 (S) RBM39 (S) IRS2 (S) PML (S) -4 -3 -2 -1 0 < -4 IL- 27 Hy pIL -6 fc / lo g 2 IL- 27 Hy pIL -6 fc / lo g 2 Fo ld c ha ng e p TEF b 7 SK snRNP LARP7PPM1G RNA Pol-2 NELFACy clin T1 CDK9 STAT3 p53 RCHY1 Cyclin C CDK8 Mediator complex f) 0 30 60 90 120 150 180 0.0 0.5 1.0 1.5 2.0 IL-27 HypIL-6 time / min 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 time / min pS -S TA T1 r el . M FI e) Fo ld c ha ng e 0 2 4 6 8 10 12 STAT1 Y701 STAT3 Y705 STAT5 Y694 STAT6 Y641 STAT1 S727 STAT3 S727 Tyrosine-P Serine-P IL-27 HypIL-6 * * * ** *** ** *** IL-27 HypIL-6 pS -S TA T3 r el . M FI mR NA P ro ce ss ing mR NA S pli cin g mR NA ex po rt JA K/ ST AT ca sc ad e Ce ll-c ell ad he sio n Tr an sc rip tio n Po sit ive R NA po l II re gu lat ion Ne ga tiv e R NA po l II re gu lat ion Nu cle ar po re co mp lex as se mb ly Re gu lat ion R ho si gn ali ng Hi sto ne H 3-K 4 t rim eth yla tio n DN A me th yla tio n Re gu lat ion R NA po l II d) FOS SOCS3 CD69 IFNG EGR1 NFKBIA KLF5 JUN OSM RHOB IL13 -3 -2 -1 0 1 2 3 4 5 0 6 12 18 24 -2 -1 0 1 2 3 4 IL-27 HypIL-6 0 6 12 18 24 -2 -1 0 1 2 3 4 IL-27 HypIL-6 GBP1 GBP2 GBP4 GBP5 IFI44 IL12RB2 IL15 IRF8 IRF9 JAK2 MX1 OAS1 PARP9 STAT1 STAT2 TRAFD1 TRIM21 TRIM22 UBE2L6 USP18 0 1 2 CD274 IFIT1 IFIT2 IFIT3 IFIT5 IRF1 RGS1 SOCS1 -1 0 1 2 3 1h 6h 24h 1h 6h 24h IL-27 HypIL-6 1h 6h 24h 1h 6h 24h Interferon signature STAT1 dependent genes STAT3 dependent genes 0 6 12 18 24 -2 -1 0 1 2 3 4 IL-27 HypIL-6 fo ld c ha ng e / l og 2 fo ld c ha ng e / l og 2 24h 1h 6h 24h 24h 1h 6h 24h IL-27 HypIL-6 fc / log2 fc / log2 fc / log2 IL-27 HypIL-6 IL-27 HypIL-6 time / h 1h 6h 1h 6h Fig. 5 0 100 200 Z X 200 100 0 -100 -100 -200 -200 -100 0 Y 100 IL-27 HypIL-6 1h 6h24h 1h 6h 24h Y X 0 -100 -200 -300 200 100 -1000 -400 -500 0 500 -200 0 Z 200 1h 6h 24h 1h 6h 24h 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 upregulated genes downregulated genes Upregulated genes Downregulated genesa) time / h Fr ac tio n sh ar ed w ith IL -2 7 b) e) time / h fo ld c ha ng e / l og 2 time / h 0 6 12 18 24 0 50 100 150 IL-27 HypIL-6 0 6 12 18 24 0 100 200 300 400 500 600 700 800 IL-27 HypIL-6 ge ne s ge ne s time / h time / h upregulated downregulatedc) d) Interferon Signaling Immune System Interferon alpha/beta signaling Interferon gamma signaling Cytokine Signaling in Immune system 0 1 2 3 4 24h 1h 6h 24h fc / log2 IL-27 HypIL-6 1h 6h fo ld c ha ng e / l og 2 Fig. 6 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 1.2 control siRNA IRF1 siRNA IR F1 /r el . M FI time / h IRF1 protein levels 0 6 12 18 24 0 5 10 15 20 25 30 control siRNA IRF1 siRNA 0 6 12 18 24 0 20 40 60 80 GAPDH siRNA control siRNA fo ld in du ct io n time / h fo ld in du ct io n time / h STAT1 OAS1 0 6 12 18 24 0 200 400 600 800 1000 control siRNA IRF1 siRNA 0 6 12 18 24 0 10 20 30 40 50 control siRNA IRF1 siRNA fo ld in du ct io n time / h fo ld in du ct io n time / h GBP5 SOCS3 b) c) IRF1 protein levels IR F1 / M FI time / h a) 0 6 12 18 24 0 20000 40000 60000 80000 100000 control siRNA IRF1 siRNA untransfected pS TA T1 / M FI time / h pSTAT1 0 6 12 18 24 0 10000 20000 30000 40000 control siRNA IRF1 siRNA untransfected pS TA T3 / M FI time / h pSTAT3 d) 0 6 12 18 24 8000 10000 12000 14000 16000 18000 20000 IL-27 HypIL-6 -5 -4 -3 -2 -1 0 1 2 3 4 8 0 1 2 3 4 5 6 7 8 -5 -4 -3 -2 -1 0 1 2 3 0 1 2 3 4 5 6 7 8 4 8 Differentiate to TH1 In SILAC media Light (R0K0) Medium (R6K6) High (R10K8) Stimulation 24 hIsolate PBMCs From buffy coat & CD4+ isolation Mix 1:1 cell numbers Fractionation LC-MS/MS MaxQuant peptide quantification Lyse Reduce Alkylate Digest unstim. IL-27 HypIL-6 IL-27 HypIL-6 MX1 STAT1 STAT2 IFITM1 GBP4 GBP5 VPS25 TGFb ISG20 UBE2L6 6857 3552 unchanged changed ISGs Upregulated proteins IL-27 HypIL-6 Downregulated proteins IL-27 HypIL-6 in du ct io n TGFB1 SMARCD2 VPS25 RALA SELPLG DRG1 ATP2B4 PRKAR1A LARP7 ABCB11 TCEAL3 MAPK14 HLA-C RAP2C FAM111A SUZ12 BCAT2 ARID1B ARF6 MIEN1 METTL14 UVRAG PIP4K2A ZMYM6NB COX17 ISY1 EIF3C B2M HBS1L DNAJC2 TMED1 ITGA4 MLLT4 ACSL5 FOXO1 ATG4B PPP6R3 SLC9B2 RNF114 DNAJC10 RBM22 CUL4B CASP4 PPP1R18 ROCK1 MCM6 DENND4C NDUFA10 TMED3 SDE2 KPNA5 JAK3 ARHGAP9 COA3 SNX3 LIMD1 SELK RNF20 CNDP2 ERBB2IP PMPCA HLA-E SRCAP SEC24B ANAPC5 BTAF1 CCDC86 RPL29 MYH14 IL7R TUBB8 RTN4 LANCL2 AARS2 QTRTD1 SCPEP1 CCDC9 HIST1H3A KTI12 GTF3C4 RPAP3 NUDT16L1 OTULIN ACOT1 GSTM2 HIST1H1E P2RX4 MYADM ABCB11 PLD3 GTF2B NPEPPS NAA15 CBX1 MT-CO1 LUC7L3 TP53BP1 GDI1 SPTBN1 YWHAG RBM27 HLA-DQB1 KDM1A QARS PCBP2 EHD1 YIF1B DNASE2 LIG1 GBF1 NUDT21 RPL14 BTN3A3 TXNRD1 LMNB2 TBC1D10B EXOSC2 NDUFA4 NCBP2 MCM3AP MIPEP CBX3 HMHA1 CSNK2B TBC1D2B BOP1 MLST8 SNAPIN GBP5 UBE2L6 GBP4 STAT2 TRAFD1 PARP9 STAT1 PARP14 DDX60 MX1 ISG20 GBP1 NMI BST2 NUB1 IFI35 XRN1 LGALS3BP LAP3 TRANK1 TRIM22 NT5C3A PLSCR1 DNAJA1 GBP2 OAS2 IFITM1 PML TYMPALOX5AP PPP1R2 ACADM PRKCSH ZCCHC10 SRPK2 MECP2 HMGN4 EIF4E3 PSMB1 E nr ic hm en t s co re R an ke d lis t m et ri c Rank in ordered dataset GSEA pathway reactome: Cytokine signaling and immune system IL-27 HypIL-6 TGFB1 GBP5 RALA UBE2L6 GBP4 STAT2 STAT1 MX1 ISG20 GBP1 MAPK14 IFITM1 HLA-C 0 1 2 Fig. 7 a) b) d) c) e) GBP5 UBE2L6 GBP4 STAT2 TRAFD1 PARP9 STAT1 PARP14 MX1 GBP1 DDX60 IFI35 XRN1 LGALS3BP TRIM22 GBP2 0 1 2 1h 6h 24h 24h 1h 6h 24h 24h fc/ log2 tra ns cr ipt pr ot ein tra ns cr ipt pr ot ein IL-27 HypIL-6 f) fc/ log2 fc / lo g 2 (0/23) (1/34) (2/18)(26/57) (1/11) (0/24) ISGs DENND4C DNAJC10 TGFB1 SMARCD2 NDUFA10 VPS25 GBP5 RALA RBM22 UBE2L6 SELPLG GBP4 STAT2 TRAFD1 PRKAR1A PARP9 STAT1 PARP14 LARP7 ABCB11 TCEAL3 MX1 ISG20 CUL4B DRG1 GBP1 CASP4 MAPK14 ATP2B4 DDX60 PPP1R2 BOP1 TP53BP1 CCDC86 ALOX5AP TBC1D2B CSNK2B SCPEP1 HMHA1 SNAPIN CBX3 LUC7L3 QTRTD1 MLST8 MT-CO1 NUDT21 GBF1 AARS2 LIG1 BTAF1 DNASE2 YIF1B EHD1 LANCL2 CBX1 PCBP2 MIPEP MCM3AP QARS NCBP2 -5 -4 -3 -2 -1 0 1 2 3 >3IL -2 7 Hy pI L- 6 NCBP2 DENND4C DNAJ10C fold change / log2fold change / log2 p va lu e / - lo g 1 0 p va lu e / - lo g 1 0 Fig. 8 pS TA T (n or m al iz ed ) c / log μM f) co py n um be rs n ai ve C D 4 n ai ve C D 8 T H 1 T H 2 T H 17 C T L N K M as t B M D M E o si n o p h il0 1000 2000 3000 4000 n ai ve C D 4 n ai ve C D 8 T H 1 T H 2 T H 17 C T L N K M as t B M D M E o si n o p h il0 1000 2000 3000 4000 n ai ve C D 4 n ai ve C D 8 T H 1 T H 2 T H 17 C T L N K M as t B M D M E o si n o p h il0 2000 4000 6000 8000 10000 n ai ve C D 4 n ai ve C D 8 T H 1 T H 2 T H 17 C T L N K M as t B M D M E o si n o p h il0 500000 1000000 1500000 2000000 2500000 n ai ve C D 4 n ai ve C D 8 T H 1 T H 2 T H 17 C T L N K M as t B M D M E o si n o p h il0 100000 200000 300000 400000 GP130 IL-6Rα IL-27Rα STAT1 STAT3 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 pSTAT1 pSTAT3 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 pSTAT1 pSTAT3 pS TA T (n or m al iz ed ) c / log μM Th-1 RPE1 e) b) a) 0 5000 10000 15000 20000 0 200 400 600 800 1000 unstim. ctrl unstim. SLE IL-27 ctrl IL-27 SLE HypIL-6 ctrl HypIL-6 SLEpS TA T1 / M FI pS TA T3 / M FI pSTAT3 n.s. ** ** n.s. *** ** pSTAT1 pS TA T1 / re l. M FI c / log nM pS TA T1 / re l. M FI c / log nM d) -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 IL-27 IL-27 primed HypIL-6 HypIL-6 primed -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 IL-27 IL-27 primed HypIL-6 HypIL-6 primed pSTAT1 pSTAT3 time / min time / min time / min time / min pS TA T3 / r el . M FI pS TA T1 / r el . M FI pS TA T3 / r el . M FI pS TA T1 / r el . M FI pS TA T3 / r el . M FI pS TA T1 / r el . M FI pS TA T3 / r el . M FI pS TA T1 / r el . M FI 0 2000 4000 6000 8000 10000 12000 14000 0 5000 10000 15000 20000 25000 IL-6Rα GP130 IL-27Rα R P K M R P K M n.s. n.s.n.s. STAT1 STAT3 **** SLE dis. risk healthy control c) supp. Fig. 1 -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 (Miltenyi) mIL-27sc -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 (Miltenyi) mIL-27sc IL-27 / log nM pS TA T1 / re l. M FI pSTAT1 IL-27 / log nM pS TA T3 / re l. M FI pSTAT3 time / min pS TA T1 / re l. M FI pSTAT1 time / min pS TA T3 / re l. M FI pSTAT3 time / min pS TA T1 / re l. M FI pSTAT1 time / min pS TA T3 / re l. M FI pSTAT3 CD4+ CD8+ b) d) 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 time / min pS TA T3 / re l. M FI pSTAT3 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 time / min pS TA T1 / re l. M FI pSTAT1 𝚫 𝚫 𝚫 c) dose-response or kinetic exp. II) stimulation & sample barcoding III) merge cells & AB staining Leukocytes CD3+ CD8+ CD4+ Leukocytes CD3+ CD8-/CD4+ Barcodeall data IV) flow cytometryI) PBMC isolation and TH1 differentiation a) pS TA T / r el . M FI c / log nM pS TA T / r el . M FI c / log nM e) -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 RPE1 + IL-27 RPE1 + HypIL-6 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 RPE1 + IL-27 RPE1 + HypIL-6 pSTAT1 pSTAT3 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 unstim. IL-27 HypIL-6 Heterodimerization IL-27Rα GP130 Trajectories Rho11 Trajectories DY647 Co-Trajectories Homodimerization GP130 GP130 unstim. +IL-27 unstim. +HypIL-6 5 µm c) 0.0 0.5 1.0 1.5 2.0 0 2000 4000 6000 8000 10000 0.0 0.5 1.0 1.5 2.0 0 5000 10000 15000 20000 500 nm500 nm Fl uo re sc en ce in t. / a .u . time / s Fl uo re sc en ce in t. / a .u . time / s Dual-color dimerSingle-color dimer Single-color dual-step bleaching Dual-color single-step bleaching 2 labels 1 label 𝚫FRET DY649 bleached label 1 bleached label 2 bleached Rho11 bleached HypIL-6 0.0 s 0.9 s 1.6 s 2.1 s 0.0 s 0.9 s 1.9 s 2.1 s 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.00 0.02 0.04 0.06 0.08 0.10 0.12 D / µm 2 s -1 GP130IL-27Rα Dimer +IL-27 +IL-27 +IL-27 D / µm 2 s -1 GP130 Dimer +HypIL-6 d) +HypIL-6 ** n.s. *** *** *** supp. Fig. 2 b) -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 𝚫GP130 𝚫IL-27Rα +GP130 𝚫IL-27Rα +GP130 +IL-27Rα -4 -3 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 pSTAT1 IL-27 pSTAT3 HypIL-6 pSTAT1 HypIL6 pSTAT3 c / log nM pS TA T / r el . M FI c / log nM pS TA T / r el . M FI c / log nM pS TA T / r el . M FI a) a) b) c) d) e) f) g) h) supp. Fig. 3 b) IL-27 / log nM pS TA T1 / re l. M FI IL-27 / log nM pS TA T3 / re l. M FI -4 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 -4 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 - wt Y543F Y613F Y543F-Y613F 𝚫Y613F 𝚫Y613F 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 unstim. IL-27 HypIL-6 pS TA T3 / re l. M FI pS TA T1 / re l. M FI time / min time / min 𝚫 𝚫 𝚫 𝚫 a) 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 unstim. IL-27 HypIL-6 pSTAT1 pSTAT3 pSTAT1 pSTAT3 supp. Fig. 4 TH1 cells (ratio +/- Tofacitinib) 0 15 30 45 60 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 0 15 30 45 60 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 time / min R at io p S TA T1 + /- To f. +Tofacitinib +Tofacitinib R at io p S TA T3 + /- To f. time / min d) -4 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 IL-27Rα(wt) IL-27Rα-GP130 pS TA T / r el . M FI IL-27 / log nM a) -4 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 IL-27Rα(wt) IL-27Rα-GP130 pS TA T / r el . M FI IL-27 / log nM c) 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + Tof. HypIL-6 + Tof. 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + Tof. HypIL-6 + Tof. time / min pS TA T3 / re l. M FI RPE1 IL-27Rα cells TH1 cells time / min pS TA T3 / re l. M FI b) +Tofac. +Tofac. 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + Tof. HypIL-6 + Tof. 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + Tof. HypIL-6 + Tof. time / min pS TA T1 / re l. M FI time / min pS TA T1 / re l. M FI +Tofac. +Tofac. supp. Fig. 5 supp. Fig. 6 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 IL-27 + CHX HypIL-6 + CHX 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 IL-27 HypIL-6 IL-27 + CHX HypIL-6 + CHX 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + CHX HypIL-6 + CHX 0 30 60 90 120 150 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IL-27 HypIL-6 IL-27 + CHX HypIL-6 + CHX b) time / min pS TA T3 / re l. M FI RPE1 IL-27Rα cells TH1 cells time / min pS TA T3 / re l. M FI a) time / min pS TA T1 / re l. M FI time / min pS TA T1 / re l. M FI IL-27 GP130 IL-27Rα p-S485 PIAS1 p-Y701 S727 STAT1 p-Y705 S727 STAT3 p-Y693 STAT4 p-Y694 STAT5A p-Y699 STAT5B JAK/STAT Cascade Cell-cell adhesion p-T38 S41 AHNAK p-S540 PPFIBP1 p-S141 PAK2 p-Y701 S727 STAT1 p-S490 LIMA1 p-S16 S521 LRRFIP1 p-S578 S621 MICALL1 p-S385 ADD1 p-S36 S39 ALDOA p-T508 EIF4G2 p-S334 SEPT07 p-S277 SNX2 p-S168 TMPO Actin cytoskeleton p-T38 S41 AHNAK p-S490 LIMA1p-S36 S39 ALDOA p-S334 SEPT07 p-S463 CD2AP p-S573 FYB p-S3 CFL1 Pre-autophagosomal structures p-T658 NBR1p-S755 ATG9A p-S272 S366 SQSTM1 Regulation of RNA Pol II Negative Regulation of RNA Pol II p-S184 ETV6 p-S2 HIST1H1C p-S2 HIST1H1D p-S2 HIST1H1B p-S2 T3 SMARCA4 p-S183 RFX5 p-S255 DNMT3A p-S465 SAP130 p-S485 PIAS1 p-Y701 S727 STAT1 p-Y705 S727 STAT3 p-S272 S366 SQSTM1 p-S2120 S2124 S1259 SPEN p-S183 T185 ZNF280C p-S1425 SPEN AAA mRNA Processing p-S239 ARL6IP4 p-S109 RBM15B p-S1359 PHRF1 p-S388 S766 SCAF11 p-S573 SUGP2 p-T414 ACIN1 p-T601 ADAR p-S627 CCAR2 p-S50 METTL3 p-S653 S797 SRRM1 mRNA Splicing p-S13 NCBP2 p-S109 RBM15B p-S1542 SRRM2 p-S239 ALYREF p-S1425 SPEN p-S1910 S1913 S1920 POLR2A p-S271 HNRNPUp-S50 METTL3 p-S653 S797 SRRM1p-S95 PABPN1 p-S876 SRRM2 p-S2120 S2124 S2159 SPEN mRNA Nuclear export p-S239 ALYREF p-S633 NUP153 p-S653 S797 SRRM1 p-S13 NCBP2 p-S1023 NUP214 p-S221 NUP50 Histone H3-K4 methylation p-S2 HIST1H1D p-S161 KMT2A p-S2 HIST1H1C DNA methylation p-S496 BAZ2A p-S161 KMT2A p-S255 DNMT3A Transcription p-S1591 DENND4Ap-T190 BCLAF1 p-S16 S521 LRRFIP1p-S191 MRGBP p-S218 MYSM1 p-S183 NFKBIB p-S295 PAXBP1 p-S448 POU2F1 p-S109 RBM15B p-S2 T3 SMARCA2 p-S1342 BAZ1B p-S496 BAZ2A p-S627 CCAR2 p-S538 CHAF1B p-S36 CHD6p-S1856 GTF3C1 p-S206 GON4L p-S311 MSL3 p-S166 NACA p-S121 PPHLN1 p-S2 S9 PTMAp-S183 RFX5 p-S221 RPS3 p-S2120 S2124 S2159 SPEN p-S23 TFDP1 p-S56 MGA p-S5 PHF11 p-S857 PHF8 p-S1080 RBL2 p-S43 SAP30BP p-S465 SAP130 p-S34 ITGB1BP1 p-S485 PIAS1 p-Y701 S727 STAT1 p-Y705 S727 STAT3 p-Y693 STAT4 p-Y694 STAT5A p-Y699 STAT5B p-S1425 SPEN p-S183 T185 ZNF280C p-S113 ZNF34 p-S388 ZNF507 p-S85 ZNF513 p-Y641 STAT6 p-Y701 STAT1 p-Y705 S727 STAT3 p-Y693 STAT4 p-Y694 STAT5A p-Y699 STAT5B JAK/STAT Cascade Cell-cell adhesion p-S336 NDRG1 p-S41 AHNAK p-Y701 STAT1 p-T38 AHNAK p-S127 ANXA2 p-S119 S277 SNX2 p-S578 MICALL1 p-S30 T42 SEPT9 p-S521 LRRFIP1 p-SS299 CLINT1 p-S168 TMPO Golgi apparatus HypIL-6 GP130 Actin filament p-S2398 AKAP13p-Y397 HCK p-S395 S790 S1411 AKAP13 p-S1114 FKBP15 p-S1261 MYO9B p-Y397 HCK p-S1118 LRBA p-Y397 LYN p-S42 PASK p-S553 RAB11FIP5 p-S301 RAF1 p-S5 WDR44 p-S299 CLINT1 p-S121 PPHLN1 p-S535 SLC1A5 p-T175 ARHGEF2 p-S368 ARFGAP2 p-S1874 HTT p-S172 OSBPL11 p-S341 ZDHHC2 Regulation of RNA Pol II p-S1080 RBL2 p-S191 MRGBP p-S16 S521 LRRFIP1 p-S327 RBBP8 p-S2 T3 SMARCA4 p-S103 GTF2I p-S183 RFX5 p-S23 TFDP1 p-S344 NFATC3 p-Y705 S727 STAT3 p-Y694 STAT5A p-Y699 STAT5B Positive Regulation of RNA Pol II p-S233 NELFA p-S75 S79 NUCKS1 p-S301 RAF1 p-S366 SQSTM1 p-S681 TRIM28 p-S575 THRAP3 p-S565 PML p-S11 SAFBp-S344 NFATC3 p-S208 NCOA7 p-S415 RPS6KA3 p-S176 YBX1p-S41 PKNOX1 p-S771 TP53BP1 p-S175 ARHGEF2 AAA mRNA Processing p-S392 TFIP11 p-S627 CCAR2 p-S35 CASC3 p-S388 S766 SCAF11 p-S573 SUGP2 p-S337 RBM39 p-S772 RBBP6 p-S109 RBM15B p-S471 XRN2 p-S653 SRRM1 mRNA Splicing p-S392 TFIP11 p-S187 HNRNPF p-S35 CASC3 p-S2124 S2159 SPEN p-S43 CDC40 p-S21 RNPC3 p-S5 SRSF3p-S2 SRSF2 p-S653 SRRM1p-S95 PABPN1 p-S82 HNRNPD p-S176 YBX1 mRNA Nuclear export p-S633 NUP153 p-S2 POM121p-S653 SRRM1 p-S43 CDC40 p-S2 SRSF2 p-S35 CASC3 Transcription p-S1591 DENND4A p-S135 GATAD2Bp-T190 BCLAF1 p-S565 PML p-S109 RBM15B p-S337 RBM39 p-S1342 BAZ1B p-S627 CCAR2 p-S1856 GTF3C1 p-S82 HNRNPD p-S2234 NCOR2 p-S121 PPHLN1 p-S771 TP53BP1 p-S2124 S2159 SPEN p-S183 T185 ZNF280C p-S388 ZNF507 p-S113 ZNF34p-S521 LRRFIP1 p-S56 MGA p-S5 PHF11 p-S372 MIER1 p-Y641 STAT6 p-S795 ZNF217 p-S261 CDCA7L p-S34 ITGB1BP1 p-S208 NCOA7 p-Y701 STAT1 p-Y705 S727 STAT3 p-Y693 STAT4 p-Y694 STAT5A p-Y699 STAT5B p-S233 ACTL6A p-S183 NFKBIB Rho signaling p-S301 RAF1 p-S395 S790 S1411 AKAP13 p-S24 ARHGDIA p-S1261 MYO9B p-T175 ARHGEF2 p-S2398 AKAP13 p-S327 RBBP8 p-Y641 STAT6 p-S103 GTF2I p-S521 LRRFIP1 p-S75 S79 NUCKS1 p-S382 ARID1A p-S344 NFATC3 p-S233 ACTL6A p-Y699 STAT5B p-Y705 S727 STAT3 p-Y694 STAT5A p-S11 SAFB p-Y705 S727 STAT3 p-Y641 STAT6 p-Y693 STAT4 p-Y694 STAT5A p-Y699 STAT5B p-Y701 STAT1 p-S575 THRAP3 p-S2 SRSF2 p-S5 SRSF3 p-S1838 TPR Nuclear Pore Assembly p-S1838 TPR p-S509 AHCTF1 p-S633 NUP153 p-S382 ARID1A p-S11 SAFB Differentiate to Th-1 In SILAC media Light (R0K0) Medium (R6K6) High (R10K8) Stimulation: 15min Isolate PBMCs From buffy coat & CD4+ isolation Mix 1:1 cell numbers Fractionation LC-MS/MS MaxQuant peptide quantification Lyse Reduce Alkylate Digest unstim. IL-27 HypIL-6 Phosphopeptide Enrichment (TiO2) a) b) c) supp. Fig. 7 0 2 4 6 8 10 0 2 4 Nucleus Membrane Cytoplasm Pre-autophagosomal struct. Actin cytoskeleton Actin filament Golgi apparatus IL-27 HypIL-6 0 5 10 15 20 25 0 2 4 Nucleus Methylation Cytoplasm Transcription mRNA processing Chromatin regulator mRNA transport Actin cytoskeleton Actin filament Golgi apparatus Golgi apparatus IL-27 HypIL-6 Cellular location UP keywords peptide Fold change / log2 peptide Fold change / log2 CHD12 S144 -6.33 LGALSL S4 9.05 MAP1B S2271 -3.66 RNASE9 S53 T54 5.73 ZNF280C S183 T185 -3.16 AHNAK S41 T38 4.00 ADGRF2 T601 Y588 -3.11 BAD S25 3.99 ZC2HC1A S223 -2.39 CLK3 S157 3.74 BOLA1 S81 -2.30 STAT4 Y693 3.67 GTF2I S103 -2.25 DCP1B S283 3.47 TACC1 S689 Y695 -2.17 STAT3 Y705 2.81 SCAF11 S776 -2.08 STAT1 Y701 2.63 ABCC1 S915 -1.97 STAT5A/B Y694/Y699 2.18 WRNIP1 S151 -1.95 PTPN11 Y546 1.93 SEC23IP S737 -1.92 BAD S134 1.84 RBM15B S109 -1.81 ARL6IP4 S239 1.78 MECP2 S25 -1.65 UBR5 S1549 1.77 PSMD11 S14 -1.63 PIEZO1 S1646 1.70 OSPBL8 S68 -1.40 PPM1G T122 1.69 peptide Fold change / log2 peptide Fold change / log2 TACC1 S689 Y695 -4.88 LGALSL S4 6.49 CDH12 S144 -4.16 STAT4 Y693 5.74 MAP1B S2271 -4.01 MYO9B S1261 4.34 ZNF280C S183 T185 -3.42 ANKRD36C T828 4.30 ADGFR2 T601 Y588 -3.37 CDCA7L S261 3.54 ZC2HC1A S223 -2.46 STAT3 Y705 3.40 BOLA1 S81 -2.44 NELFA S233 2.92 WRNIP1 S151 -2.40 PPM1G T122 2.90 FAM47E T158 Y161 -2.17 BAD S25 2.84 SCAF11 S776 -2.15 NDRG1 S336 2.79 ABCC1 S915 -2.07 STAT1 Y701 2.69 NUDT19 S4 -1.97 SUGP2 S573 2.18 GTF2I S103 -1.85 PRR12 S44 1.98 ZC3H3 S408 -1.69 STAT3 S727 1.97 SEC23IP S737 -1.64 PTPN11 Y546 1.73 PSMD11 S14 -1.60 RCHY1 S257 1.72 b) c) d) IL-27 HypIL-6 UBR 5 S 154 9 BAD S1 34 PAK 2 S 141 0 1 2 3 4 5 6 * IL-27 HypIL-6 88 67 73 62 25 53 Downregulated phospho-sites Upregulated phospho-sites IL-27 HypIL-6 TH17 Treg p-UBR5 p-PAK2 p-BAD a) Fo ld c ha ng e supp. Fig. 8 a) b) c) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated 7327 23219 112631h 6h 24h -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated IL-27 6036 111304 1265321h 6h 24h -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 unchanged regulated 1h 6h 24h -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 fold induction / log2 p v al u e / - lg 10 HypIL-6 HypIL-6 (IL-27 regulated genes highlighted) supp. Fig. 9 IL-27 top 30 up & downregulated genes FOSB RGS1 IFIT3 FOS IFIT2 C5orf58 SOCS1 SOCS3 CD69 NFKBIZ PTCHD3P2 PRR25 RGS16 CMPK2 C10orf10 PMAIP1 DUSP5 CCL3 IFNG EGR1 SGK1 IFIT1 CFL2 GRM2 KLF6 NFKBIA DNAJB13 KLF5 JUN ZNF888 BCDIN3D PLEKHF1 ZKSCAN4 SENP8 TNFSF14 ALG1L2 HIST1H4J B3GALT2 PARS2 AJUBA KBTBD7 EFNA3 ID3 DUSP2 TRGV5P IGIP ADRB2 ZNF396 ZSWIM3 SOWAHD hsa-mir-146a GUSBP9 CEBPE CDK5R1 ARL4D NUAK2 NOG SERTAD3 ZFP36L2 DDIT4 -1 0 1 2 3 4 5 IFIT3 CTSL1 IFI44L RGS1 RSAD2 GBP1P1 SLC6A9 SLAMF8 LAMP3 ETV7 CHAC1 GBP1 FAM157B GTF2IRD1 GBP5 LRRC2 GBP4 SEMA3G PTCHD3P2 CETP SOCS1 SLC7A11 STAT1 CMPK2 WARS HAPLN3 SMTNL1 BCL2L14 IFIT2 EPSTI1 GAS2L1 RASSF4 IGFBP4 HBEGF ADORA1 CGN FGF11 TNFRSF10D P4HA2 DDIT4 NEK11 TMEM213 NPTX1 MT1DP DUSP6 P4HA1 IL10 MATN2 PDE7B HSPG2 CD248 AK4 DTX4 PPFIA4 CFD DHDH EGR1 FOS PFKFB4 MIR210HG -5 -4 -3 -2 -1 0 1 2 3 IFI44L C1orf61 GBP1P1 IFI27 SPAG6 IFIT3 IFIT1 RSAD2 SLAMF8 FCRL6 GBP1 RGS1 GBP5 ETV7 LAMP3 USP18 STAT1 CMPK2 NFIX RUFY4 CETP GBP4 IFIT2 WARS ALG13-AS1 IFI44 LRRN2 FRMD3 TNFSF13B BCL2L14 MAP7 CDC42EP4 ITGAX HSPG2 AICDA HIST1H2BO APBA1 VLDLR C2orf48 RIMKLA SDK2 ATOH8 KISS1R HIST1H2BL DTX4 EMP1 WNT1 CCDC74B AK4 OSCP1 PFKFB4 STC2 S100A9 SPON1 EGR1 FOS VEGFA ADORA1 MIR210HG PPFIA4 -6 -5 -4 -3 -2 -1 0 1 2 3 IL -2 7 Hy pI L- 6 IL -2 7 Hy pI L- 6 IL -2 7 Hy pI L- 6 Total=80 IL-27 HypIL-6 shared Total=119 IL-27 HypIL-6 shared Total=132 IL-27 HypIL-6 shared Total=49 IL-27 HypIL-6 shared Total=387 IL-27 HypIL-6 shared Total=590 IL-27 HypIL-6 shared Upregulated genes Downregulated genes Time 1h 6h 24h IL-27 HypIL-6 Interferon Stimulated Genes (ISGs) 1h 6h 24h 1h 6h 24h GBP1 GBP4 GBP5 IFIT1 IFIT2 IFIT3 IFNG IRF1 IRF8 IRF9 MX1 OAS1 PARP9 RGS1 SOCS1 SOCS3 STAT1 STAT2 USP18 -1 0 1 2 3 a) b) c) 1h 6h 24h GSEA pathway enrichment: IFN Signalling Rank in ordered dataset 0 100 200 300 400 En ric hm en t Sc or e 0. 0 0. 4 lis t m et ric 0 -4 4 Upregulated genes Downregulated genes fc / lo g 2 fc / lo g 2 fc / lo g 2 fc / lo g 2 supp. Fig. 10 GSEA pathway reactome: Interferon signalling 0 1000 2000 3000 -5 0 5 10 protein ID fo ld c h an g e / l o g 2 data distribution IL-27 HypIL-6 E nr ic hm en t s co re R an ke d lis t m et ri c IL-27 HypIL-6 GBP5 UBE2L6 GBP4 STAT2 STAT1 MX1 ISG20 GBP1 IFITM1 HLA-C BST2 IFI35 TRIM22 B2M OAS2 0 0.5 1.0 1.5 fc/ log2 a) b) c) E nr ic hm en t s co re R an ke d lis t m et ri c Rank in ordered dataset GSEA pathway reactome: Cytokine signalling and immune system IL-27 HypIL-6 TGFB1 GBP5 RALA UBE2L6 GBP4 STAT2 STAT1 MX1 ISG20 GBP1 MAPK14 IFITM1 HLA-C 0 1 2 0 1000 2000 3000 -5 0 5 10 protein ID fo ld c h an g e / l o g 2 data distribution IL-27 HypIL-6 Upregulated proteins Downregulated proteins Total=92 61.96% IL-27 26.09% HypIL-6 11.96% shared Total=75 30.67% IL-27 24.00% HypIL-6 45.33% shared fc/ log2 supp. Fig. 11 Rank in ordered dataset a) b) c) supp. Fig. 12 time / min pS TA T1 / re l. M FI time / min pS TA T1 / re l. M FI time / min pS TA T3 / re l. M FI time / min pS TA T1 3/ r el . M FI c / log nM pS TA T3 / re l. M FI time / min pS TA T1 / re l. M FI time / min pS TA T1 / re l. M FI time / min pS TA T3 / re l. M FI time / min pS TA T1 3/ r el . M FI pS TA T (n or m al iz ed ) c / log μM pS TA T (n or m al iz ed ) c / log μM -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 pSTAT1 pSTAT3 -3 -2 -1 0 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 pSTAT1 pSTAT3 Th-1 RPE1 Tofacitinib titration – IL-27 signaling supp. Fig. 13 a) d) 0 8 16 24 1.0 1.1 1.2 1.3 1.4 1.5 STAT1 STAT3 fo ld in du ct io n time / h 0 500 1000 1500 2000 2500 ctrl SLE 0 100 200 300 ctrl SLE S TA T1 / M FI S TA T3 / M FI total STAT1 total STAT3 b) p: 0.067 p: 0.009 0.8 1.0 1.2 1.4 1.6 1.8 2.0 IL-27 ctrl IL-27 SLE HypIL-6 ctrl HypIL-6 SLE ra tio p S TA T1 /p S TA T3 p: 0.023 p: 0.009 c) 10_1101-2021_01_08_425897 ---- 62441649 1 APOBEC1 mediated C-to-U RNA editing: target sequence and trans-acting factor contribution to 177 RNA editing events in 119 murine transcripts in-vivo. Saeed Soleymanjahi1, Valerie Blanc1 and Nicholas O. Davidson1,2 1Division of Gastroenterology, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63105 2To whom communication should be addressed: Email: nod@wustl.edu Running title: APOBEC1 mediated C to U RNA editing Keywords: RNA folding; A1CF; RBM47; January 8, 2021 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 2 ABSTRACT (184 words) Mammalian C-to-U RNA editing was described more than 30 years ago as a single nucleotide modification in APOB RNA in small intestine, later shown to be mediated by the RNA-specific cytidine deaminase APOBEC1. Reports of other examples of C-to-U RNA editing, coupled with the advent of genome-wide transcriptome sequencing, identified an expanded range of APOBEC1 targets. Here we analyze the cis-acting regulatory components of verified murine C- to-U RNA editing targets, including nearest neighbor as well as flanking sequence requirements and folding predictions. We summarize findings demonstrating the relative importance of trans- acting factors (A1CF, RBM47) acting in concert with APOBEC1. Using this information, we developed a multivariable linear regression model to predict APOBEC1 dependent C-to-U RNA editing efficiency, incorporating factors independently associated with editing frequencies based on 103 Sanger-confirmed editing sites, which accounted for 84% of the observed variance. Co- factor dominance was associated with editing frequency, with RNAs targeted by both RBM47 and A1CF observed to be edited at a lower frequency than RBM47 dominant targets. The model also predicted a composite score for available human C-to-U RNA targets, which again correlated with editing frequency. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 3 INTRODUCTION Mammalian C-to-U RNA editing was identified as the molecular basis for human intestinal APOB48 production more than three decades ago (Chen et al. 1987; Hospattankar et al. 1987; Powell et al. 1987). A site-specific enzymatic deamination of C6666 to U of Apob mRNA was originally considered the sole example of mammalian C-to-U RNA editing, occurring at a single nucleotide in a 14 kilobase transcript and mediated by an RNA specific cytidine deaminase (APOBEC1) (Teng et al. 1993). With the advent of massively parallel RNA sequencing technology we now appreciate that APOBEC1 mediated RNA editing targets hundreds of sites (Rosenberg et al. 2011; Blanc et al. 2014) mostly within 3’ untranslated regions of mRNA transcripts. This expanded range of targets of C-to-U RNA editing prompted us to reexamine key functional attributes in the regulatory motifs (both cis-acting elements and trans-acting factors) that impact editing frequency, focusing primarily on data emerging from studies of mouse cell and tissue-specific C-to-U RNA editing. Earlier studies identified RNA motifs (Davies et al. 1989) contained within a 26-nucleotide segment flanking the edited cytidine base in vivo (in cell lines) or within 55 nucleotides using S100 extracts from rat hepatoma cells (Bostrom et al. 1989; Driscoll et al. 1989). Those, and other studies, established that Apob RNA editing reflects both the tissue/cell of origin as well as RNA elements remote and adjacent to the edited base (Bostrom et al. 1989; Davies et al. 1989). A granular examination of the regions flanking the edited base in Apob RNA demonstrated a critical 3’ sequence 6671-6681, downstream of C6666, in which mutations reduced or abolished editing activity (Shah et al. 1991). This 3’ site, termed a “mooring sequence” was associated with a 27s- “editosome” complex (Smith et al. 1991), which was both necessary and sufficient for site-specific Apob RNA editing and editosome assembly (Backus and Smith 1991). Other cis-acting elements include a 5 nucleotide spacer region between the edited cytidine and the mooring sequence, and also sequences 5’ of the editing site that regulate editing efficiency (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 4 (Backus and Smith 1992; Backus et al. 1994) along with AU-rich regions both 5’ and 3’ of the edited cytidine that together function in concert with the mooring sequence (Hersberger and Innerarity 1998). Advances in our understanding of physiological Apob RNA editing emerged in parallel from both the delineation of key RNA regions (summarized above) and also with the identification of components of the Apob RNA editosome (Sowden et al. 1996). APOBEC1, the catalytic deaminase (Teng et al. 1993) is necessary for physiological C-to-U RNA editing in vivo (Hirano et al. 1996) and in vitro (Giannoni et al. 1994). Using the mooring sequence of Apob RNA as bait, two groups identified APOBEC1 complementation factor (A1CF), an RNA-binding protein sufficient in vitro to support efficient editing in presence of APOBEC1 and Apob mRNA (Lellek et al. 2000; Mehta et al. 2000). Those findings reinforced the importance of both the mooring sequence and an RNA binding component of the editosome in promoting Apob RNA editing. However, while A1CF and APOBEC1 are sufficient to support in vitro Apob RNA editing, neither heterozygous (Blanc et al. 2005) or homozygous genetic deletion of A1cf impaired Apob RNA editing in vivo in mouse tissues (Snyder et al. 2017), suggesting that an alternate complementation factor was likely involved. Other work identified a homologous RNA binding protein, RBM47, that functioned to promote Apob RNA editing both in vivo and in vitro (Fossat et al. 2014), and more recent studies utilizing conditional, tissue-specific deletion of A1cf and Rbm47 indicate that both factors play distinctive roles in APOBEC1-mediated C-to-U RNA editing, including Apob as well as a range of other APOBEC1 targets (Blanc et al. 2019). These findings together establish important regulatory roles for both cis-acting elements and trans-acting factors in C-to-U mRNA editing. However, the majority of studies delineating cis- acting elements reflect earlier, in vitro experiments using ApoB mRNA and relatively little is known regarding the role of cis-acting elements in tissue-specific C-to-U RNA editing of other transcripts, in vivo. Here we use statistical modeling to investigate the independent roles of (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 5 candidate regulatory factors in mouse C-to-U mRNA editing using data from in vivo studies from over 170 editing sites in 119 transcripts (Meier et al. 2005; Rosenberg et al. 2011; Gu et al. 2012; Blanc et al. 2014; Rayon-Estrada et al. 2017; Snyder et al. 2017; Blanc et al. 2019; Kanata et al. 2019). We also examined these regulatory factors in known human mRNA targets (Chen et al. 1987; Powell et al. 1987; Skuse et al. 1996; Mukhopadhyay et al. 2002; Grohmann et al. 2010; Schaefermeier and Heinze 2017). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 6 RESULTS Descriptive data 177 C-to-U RNA editing sites were identified based on eight studies that met inclusion and exclusion criteria (Meier et al. 2005; Rosenberg et al. 2011; Gu et al. 2012; Blanc et al. 2014; Rayon-Estrada et al. 2017; Snyder et al. 2017; Blanc et al. 2019; Kanata et al. 2019), representing 119 distinct RNA editing targets. 84% (100/119) of RNA targets were edited at one chromosomal location (Figure 1C) and 75% (89/119) of mRNA targets were edited at both a single chromosomal location and also within a single tissue (Figure 1D). The majority of editing sites occur in the 3` untranslated region (142/177; 80%), with exonic editing sites the next most abundant subgroup (28/177; 16%, Figure 1E). Chromosome X harbors the highest number of editing sites (18/177; 10%), followed by chromosomes 2 and 3 (15/177; 8.5% for both, Supplemental Figure 1). 103/177 editing sites were confirmed by Sanger sequencing, with a mean editing frequency of 37 ± 22%. Base content of sequences flanking edited and mutated cytidines AU content was enriched (~87%) in nucleotides both immediately upstream and downstream of the edited cytidine across mouse RNA editing targets (Figure 2A and 2C). The average AU content across the region 10 nucleotides upstream to 20 nucleotides downstream of the edited cytidine was ~70% (60 - 87%). Because APOBEC1 has been shown to be a DNA mutator (Harris et al. 2003; Wolfe et al. 2019; Wolfe et al. 2020), we determined the AU content of the mutated deoxycytidine region flanking human DNA targets (Nik-Zainal et al. 2012) to be ~66% at a site one nucleotide downstream of the edited base (Figure 2B, C). The average AU content in the sequence 10 nucleotides upstream and 10 nucleotides downstream of mutated deoxycytidines is 59% (57-66.0%). The average AU content was 90% and 80% in nucleotides immediately upstream and downstream, respectively, of the targeted deoxycytidine in a (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 7 subgroup of over 700 DNA editing events of the C to T type (Nik-Zainal et al. 2012), which is closer to the distribution found in C to U RNA editing targets. These features suggest that AU enrichment is an important component to editing function of APOBEC1 on both RNA and DNA targets, especially for the C/dC to U/dT change. Factors influencing editing frequency Regulatory-spacer-mooring cassette: We observed no significant associations between editing frequency and mismatches in motif A (r=-0.05, P=.46) or motif B (r=-0.1, P=.20) (Supplemental Figure 2), while mismatches in motif C and D negatively impacted editing frequency (r=-0.24, P=.001) (motif D r=-0.20, P=.008, Figure 3B). AU content of motif B showed a trend towards negative association with editing frequency (r=-0.13, P=.08 Figure 3C), but AU contents of motifs A (r=0.06, P=.4), C (r=-0.02, P=.8), and D (r=-0.02, P=.78) did not impact editing frequency (Supplemental Figure 2). The abundance of G in motif C (r=0.17, P=.02), abundance of C in motif B (r=0.13, P=.08), and G/C fraction in motif C (r=0.14, P=.04) showed either significance or a trend to associations with editing frequency. The spacer sequence averaged 5 ± 4 nucleotides, ranging from 0 to 20, with trend of association between length and editing frequency (r=-0.14, P=.09). The mean spacer sequence AU content was 73 ± 23%, with no association between editing frequency and AU content (r=-0.1, P=.2, Supplemental Figure 3). However, G abundance (r=-0.23, P=.01) and G/C fraction (r=-0.20, P=.03) of spacer showed significant associations with editing frequency in Sanger-confirmed targets. The mean number of mismatches in the first 4 nucleotides of the spacer sequence was 2.5 ± 1 with higher number of mismatches exerting a significant negative impact on editing frequency (r=-0.24, P=.01) (Figure 3D). The mean number of mismatches in the mooring sequence was 2.1 ± 1.8, ranging from 0 to 8 nucleotides. The number of mismatches showed a significant negative association with editing frequency (r=-0.30, P=.0003, Figure 3E). The base content of individual nucleotides surrounding the edited cytidine showed significant associations with editing frequency, which (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 8 was more emphasized in nucleotides closer to the edited cytidine (Figure 3F, Supplemental Table 1). Furthermore, overall AU content of downstream sequence +16 to +20 had positive impact on editing frequency (r=0.17, P=.02) (Supplemental Figure 3). However, G abundance in downstream 20 nucleotides (r=-0.24, P=.001) and G/C fraction in downstream 10 nucleotides (r=-0.16, P=.09) showed significant or a trend of significant negative associations with editing frequency in Sanger-confirmed targets. Secondary structure: We generated a predicted secondary structure for 172 editing sites, with four subgroups based on overall structure and location of the edited cytidine: loop (Cloop), stem (Cstem), tail (Ctail), and non-canonical structure (NC). The majority of editing sites were in the Cloop subgroup (59%), followed by Cstem (20%), Ctail (13%), and NC (8%) subgroups (Figure 4A). Editing sites in the Ctail subgroup exhibited lower editing frequencies compared to editing sites in Cloop (29 ± 12 vs 41 ± 23%, P=.02) or Cstem (37 ± 21%, P=.04) subgroups. No significant differences were detected in other comparisons (Figure 4B). The edited cytidine was located in loop, stem, and tail of the secondary structure in 110 (64%), 38 (22%), and 24 (14%) of the edited RNAs, respectively. Editing sites with the edited cytidine within the loop exhibited significantly higher editing frequency compared to those with the edited cytidine in the tail (40 ± 24% vs 28 ± 12 %, P=.04). Other subgroups exhibited comparable editing frequencies (Supplemental Figure 4). The majority (78%) of editing sites contained a mooring sequence located in main stem-loop structure (Figure 4C), with the remainder located in the tail or secondary loop. Average editing efficiency was significantly higher in targets where the mooring sequence was located in the main stem-loop (Figure 4D). We also calculated the proportion of total nucleotides that constitute the main stem-loop in the secondary structure. The average ratio was 0.62 ± 0.18 ranging from 0.28 to 1 (Supplemental Table 2) with higher ratios associated with higher editing frequency of the corresponding editing site (r=0.20, P=.007) (Figure 4E). Finally, we considered the orientation of free tails in the secondary structure in (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 9 terms of length and symmetry. Symmetric free tails were observed in 59% of editing sites (Supplemental Figure 4). The length of 5’ free tail showed negative association with editing frequency (r=-0.14, P=.04, Figure 4F) while no significant associations were detected between either the length of 3’ tail or symmetry of tails and editing frequency (Supplemental Figure 4). Trans-acting factors and tissue specificity: Data for relative dominance of cofactors in APOBEC1- dependent RNA editing were available for 72 editing sites for targets in small intestine or liver (Blanc et al. 2019). RBM47 was identified as the dominant factor in 60/72 (83%) sites; A1CF was the dominant factor in 5/72 (7%) editing sites with the remaining sites (7/72; 10%), exhibiting equal codominancy (Figure 5A). The average editing frequencies at editing sites revealed differences across the groups with 41 ± 20% in RBM47-dominant targets, 23 ± 14% in A1CF-dominant, and 27 ± 11% in the co-dominant group (P=.03) (Figure 5B). The majority of RNA editing targets were edited in one tissue (103/119; 86% Figure 5C), while the maximum number of tissues in which an editing target is edited (at the same site) is 5 (Cd36). The small intestine harbors the highest number of verified editing sites (95/177; 54%), followed by liver (31/177; 17%), and adipose tissue (19/177; 11% Figure 5D). Sites edited in brain tissue showed the highest average editing frequency (54 ± 35 %, n=11), followed by bone marrow myeloid cells (50 ± 22 %, n=4), and kidney (47 ± 29%, n=10 Figure 5E). We then developed a multivariable linear regression model to predict APOBEC1 dependent C- to-U RNA editing efficiency, incorporating factors independently associated with editing frequencies (Table 1). This model, based on 103 Sanger-confirmed editing sites with available data for all of the parameters mentioned, accounted for 84% of variance in editing frequency of editing sites included (R2=0.84, P<.001 Table 1). The final multivariable model revealed several factors independently associated with editing frequency, specifically the number of mismatches in mooring sequence; regulatory sequence motif D; AU content of regulatory sequence motif B; overall secondary structure for group Ctail vs group Cloop; location of mooring sequence in (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 10 secondary structure; “base content score” parameter that represents base content of the sequences flanking edited cytidine (Table 1). Removing “base content score” from the model reduced the power from R2=0.84 to R2=0.59. Next, we added a co-factor dominance variable and fit the model using the 72 editing sites with available data for cofactor dominance. Along with other factors mentioned above, co-factor dominance showed significant association with editing frequency (Table 1) with RNAs targeted by both RBM47 and A1CF observed to be edited at a lower frequency than RBM47 dominant targets. Factors associated with co-factor dominance (Figure 6, Supplemental Table 3, Supplemental Figure 5), included tissue-specificity, with higher frequency of RBM47-dominant sites in small intestine compared to liver (91 vs 63%, P=.008) and A1CF-dominant and co-dominant editing sites more prevalent in liver. The number of mooring sequence mismatches also varied among three subgroups: 1.1 ± 1.3 in RBM47-dominant subgroup; 2.0 ± 2.5 in A1CF-dominant subgroup; and 2.9 ± 0.4 in co-dominant subgroup (P=.004). This was also the case regarding mismatches in the spacer: 2.4 ± 1.2 in RBM47-dominant subgroup; 2.7 ± 1.5 in A1CF-dominat subgroup; 3.8 ± 0.4 in co-dominant subgroup (P=.02). AU content (%) of downstream sequence +6 to +10 was higher in RBM47-dominant subgroup (P=.01). Finally, the location of the edited cytidine in secondary structure of mRNA strand was different across three subgroups (P=.04, Figure 6). We used pairwise multinomial logistic regression to determine factors independently associated with co-factor dominance (Figure 6C, Supplemental Table 4). Ctail editing sites, those with more mismatches in mooring and regulatory motif C, lower AU content in downstream sequence, and higher AU content in regulatory motif D were more likely co-dominant. Editing sites from small intestine and those with higher AU content of downstream sequence were more likely RBM47-dominant. Editing sites from liver and those with higher mismatches in regulatory motif B were more likely A1CF-dominant (Figure 6C). Human mRNA targets (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 11 Finally, we turned to an analysis of human C-to-U RNA editing targets for which this same panel of parameters was available (Table 2). Aside from APOB RNA, which is known to be edited in the small intestine (Chen et al. 1987; Powell et al. 1987), other targets have been identified in central or peripheral nervous tissue (Skuse et al. 1996; Mukhopadhyay et al. 2002; Meier et al. 2005; Schaefermeier and Heinze 2017). The human targets were categorized into low editing (NF1, GLYRα2, GLYRα3) and high editing (APOB, TPH2B exon3, TPH2B exon7) subgroups using 20% as cut-off. A composite score (maximum=6) was generated based on six parameters introduced in the mouse model with notable variance between the two subgroups including mismatches in mooring sequence, spacer length, location of the edited cytidine, and relative abundance of stem-loop bases (Table 2). High editing targets exhibited a significantly higher composite score (4.7 vs 2, P=.001) compared to low editing targets and the composite score significantly correlated with editing frequency in individual targets (r=0.95, P=.005). The canonical editing target ApoB (Chen et al. 1987; Powell et al. 1987) achieved a score of 5 (out of 6), reflecting the observation that one of the six parameters (AU% of regulatory motifs) in human APOB is non-preferential compared to the editing-promoting features identified in the mouse multivariable model. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 12 DISCUSSION The current study reflects our analysis of 177 C-to-U RNA editing sites from 119 target mRNAs, with the majority residing within the 3’ untranslated region. Our multivariable model identified several key factors influencing editing frequency, including host tissue, base content of nucleotides surrounding the edited cytidine, number of mismatches in regulatory and mooring sequences, AU content of the regulatory sequence, overall secondary structure, location of the mooring sequence, and co-factor dominance. These factors, each exerting independent effects, together accounted for 84% of the variance in editing frequency. Our findings also showed that mismatches in the mooring and regulatory sequences, AU content of regulatory and downstream sequences, host tissue and secondary structure of target mRNA were associated with the pattern of co-factor dominance. Several aspects of these primary conclusions merit further discussion. Previous studies investigating the key factors that regulate C-to-U mRNA editing were confined to in vitro studies and predicated on a single mRNA target (ApoB) (Backus and Smith 1991; Shah et al. 1991; Smith et al. 1991; Backus and Smith 1992; Hersberger and Innerarity 1998). With the expanded range of verified C-to-U RNA editing targets now available for interrogation, we revisited the original assumptions to understand more globally the determinants of C-to-U mRNA editing efficiency. In undertaking this analysis, we were reminded that the requirements for C-to-U mRNA editing in vitro often appear more stringent than in vivo (Backus and Smith 1991; Shah et al. 1991), which further emphasizes the importance of our findings. In addition, our approach included both cis-acting sequence- and folding-related predictions along with the role of trans-acting factors and took advantage of statistical modeling to adjust for confounding or modifier effects between these factors to identify their role in editing frequency. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 13 We began with the assumptions established for Apob RNA editing which identified a 26 nucleotide segment encompassing the edited base, spacer, mooring sequence, and part of regulatory sequence as the minimal sequence competent for physiological editing in vitro and in vivo (Davies et al. 1989; Shah et al. 1991; Backus and Smith 1992). Those studies identified an 11-nucleotide mooring sequence as essential and sufficient for editosome assembly and site- specific C-to-U editing (Backus and Smith 1991; Shah et al. 1991; Backus and Smith 1992) and established optimal positioning of the mooring sequence relative to the edited base in Apob RNA (Backus and Smith 1992). The current work supports the key conclusions of this original mooring sequence model as applied to the entire range of C-to-U RNA editing targets. We observed that mismatches in either the mooring or regulatory sequences were independent factors governing editing frequency. By contrast, while mismatches in the spacer sequence also showed negative association with editing frequency, the impact of spacer mismatches were not retained in the final model, nor was the length of the spacer associated with editing frequency. Furthermore, we found mismatches in the regulatory sequence motif C to be more important than mismatches in motif B. These inconsistencies might conceivably reflect the context in which an RNA segment is studied (Backus and Smith 1992). For example, our analysis reflects physiological conditions in which naturally occurring mRNA targets are edited, while the aforementioned study used in vitro data based on varying lengths of Apob mRNA embedded within different mRNA contexts (Apoe RNA) (Backus and Smith 1992). In addition to the components of mooring sequence model, we examined variations in the base content in different segments/motifs as well as among individual nucleotides surrounding the edited cytidine. As expected, we found that sequences flanking the edited cytidine exhibited high AU content. We further observed a similarly high AU content in the flanking sequences of a range of proposed APOBEC-mediated DNA mutation targets in human cancer tissues and cell lines (Alexandrov et al. 2013; Petljak et al. 2019), especially in targets with dC/dT change (Nik- (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 14 Zainal et al. 2012). This observation implies that APOBEC-mediated DNA and RNA editing frequency may each be functionally modified by AU enrichment in the flanking sequences surrounding modifiable bases. The base content in individual nucleotides surrounding the edited cytidine also exerted significant impact on editing frequency, particularly in a 10- nucleotide segment spanning the edited cytidine (Supplemental Table 1), accounting for 25% of the variance in editing frequency independent of the mooring sequence model. Our findings regarding individual nucleotides surrounding the edited cytidine are consistent with findings for both DNA and RNA editing targets, particularly in the setting of cancers (Backus and Smith 1992; Conticello 2012; Roberts et al. 2013; Saraconi et al. 2014; Gao et al. 2018; Arbab et al. 2020). Recent work examining the sequence-editing relationship of a large in vitro library of DNA targets edited by different synthetic cytidine base editor (CBE)s (Arbab et al. 2020) showed that the base content of a 6-nucleotide window spanning the edited cytidine explained 23-57% of the editing variance, in particular one or two nucleotides immediately 5’ of the edited nucleotide. That study also demonstrated that occurrence of T and C nucleotides at the position -1 increased, while a G nucleotide at that position decreased editing frequency (Arbab et al. 2020). However, in contrast to our findings, the presence of A at position -1 had either a negative or null effect on DNA editing activity (Arbab et al. 2020). This latter finding is consistent with the lower AU content observed in nucleotides adjacent to the edited cytidine in Apobec-1 DNA targets compared to the AU content in RNA targets. Our findings assign a greater importance of adjacent nucleotides in RNA editing frequency, similar to earlier reports that the five bases immediately 5’ of the edited cytidine in Apob mRNA exert a greater impact on editing activity compared to nucleotides further upstream of this segment (Backus and Smith 1991; Shah et al. 1991; Backus and Smith 1992). G/C fraction of a 6-nucleotide window spanning the edited cytidine in DNA targets is associated with editing activity of the synthetic CBEs (Arbab et al. 2020). Although we found significant associations of RNA editing with G/C fraction in segments surrounding the edited cytidine in univariate analyses, these associations (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 15 were not retained in the final model. In contrast, the AU content of regulatory sequence motif B remained as an independent factor determining editing frequency in the final model. The conserved 26-nucleotide sequence around the edited C forms a stem-loop secondary structure, where the editing site is in an octa-loop (Richardson et al. 1998) as predicted for the 55-nucleotide sequence of ApoB mRNA (Shah et al. 1991). This stem-loop structure is predicted to play an important role in recognition of the editing site by the editing factors (Bostrom et al. 1989; Davies et al. 1989; Driscoll et al. 1989; Chen et al. 1990). Mutations resulting in loss of base pairing in peripheral parts of the stem did not impact the editing frequency (Shah et al. 1991). Editing sites with the cytidine located in central parts (e.g. loop) exhibited higher editing frequencies than those with the edited cytidine located in peripheral parts (e.g. tail) and it is worth noting that the computer-based stem-loop structure was independently confirmed by NMR studies of a 31-nucleotide human ApoB mRNA (Maris et al. 2005). Those studies demonstrated that the location of the mooring sequence in the ApoB mRNA secondary structure plays a critical role in the RNA recognition by A1CF (Maris et al. 2005). In line with those findings, the current findings emphasize that the location of the mooring sequence in secondary structure of the target mRNA exerts significant independent impact on editing frequency. These predictions were confirmed in crystal structure studies of the carboxyl-terminal domain of APOBEC-1 and its interaction with cofactors and substrate RNA (Wolfe et al. 2020). Our conclusions regarding murine C-to-U editing frequency, such as mooring sequence, base content, and secondary structure appear consistent with a similar regulatory role among the smaller number of verified human targets. That being said, further study and expanded understanding of the range of C-to-U editing targets in human tissues will be needed as recently suggested (Destefanis et al. 2020), analogous to that for A-to-I editing (Bahn et al. 2012; Bazak et al. 2014). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 16 We recognize that other factors likely contribute to the variance in RNA editing frequency not covered by our model. We did not consider the role of naturally occurring variants in APOBEC1, for example, which may be a relevant consideration since mutations in APOBEC family genes were shown to modify the editing activity of related hybrid DNA cytosine base editors (Arbab et al. 2020). Furthermore, genetic variants of APOBEC1 in humans were associated with altered frequency of GlyR editing (Kankowski et al. 2017). Other factors not included in our approach included entropy-related features, tertiary structure of the mRNA target and other regulatory co-factors. Another limitation in the tissue-specific designation used to categorize editing frequency is that cell specific features of editing frequency may have been overlooked. For example, small intestinal and liver preparations are likely a blend of cell types (MacParland et al. 2018; Elmentaite et al. 2020) and tumor tissues are highly heterogeneous in cellular composition (Barker et al. 2009). The current findings provide a platform for future approaches to resolve these questions. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 17 MATERIALS AND METHODS Search strategy A comprehensive literature review from 1987 (when ApoB RNA editing was first reported (Chen et al. 1987; Powell et al. 1987)) to November 2020, using studies published in English reporting C-to-U mRNA editing frequencies of individual or transcriptome-wide target genes. Databases searched included Medline, Scopus, Web of Science, Google Scholar, and ProQuest (for thesis). The references of full texts retrieved were also scrutinized for additional papers not indexed in the initial search. Study selection Primary records (N=528) were screened for relevance and in vivo studies reporting editing frequencies of individual or transcriptome-wide APOBEC1-dependent C-to-U mRNA targets selected, using a threshold of 10% editing frequency. For analyses based on RNA sequence information, only targets with available sequence information or chromosomal location for the edited cytidine were included. Exclusion criteria included: studies that reported C-to-U mRNA editing frequencies of target genes in other species, studies reporting editing frequencies of target genes in animal models overexpressing APOBEC1, exclusively in vitro studies, and conference abstracts. Human targets We included studies reporting human C-to-U mRNA targets (Chen et al. 1987; Powell et al. 1987; Skuse et al. 1996; Mukhopadhyay et al. 2002; Grohmann et al. 2010; Schaefermeier and Heinze 2017). We also included work describing APOBEC1-mediated mutagenesis in human breast cancer (Nik-Zainal et al. 2012). Data extraction (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 18 Two reviewers (SS and VB) conducted the extraction process independently and discrepancies were addressed upon consensus and input from a third reviewer (NOD). The parameters were categorized as follows: General parameters: Gene name (RNA target), chromosomal and strand location of the edited cytidine, tissue site, editing frequency determined by RNA-seq or Sanger sequencing as illustrated for ApoB (Figure 1A). Editing frequency was highly correlated by both approaches (r=0.8 P<0.0001), and where both methodologies were available we used RNA- seq. We also defined relative dominance of editing co-factors (A1CF-dominant, RBM47- dominant, or co-dominant), relative mRNA expression (edited gene vs unedited gene) by RNA- seq or quantitative RT-PCR, and abundance of corresponding protein (edited gene vs unedited gene) by western blotting or proteomic comparison. Co-factor dominancy was determined based on the relative contribution of each co-factor to editing frequency. In each editing site, editing frequencies in mouse tissues deficient in A1cf or Rbm47 were compared to that of wild- type mice. The relative contribution of each co-factor was calculated by subtracting the editing frequency for each target in A1cf or Rbm47 knockout tissue from the total editing frequency in wild-type control. Editing sites with <20% difference between contributions of RBM47 and A1CF were considered co-dominant. Sites with ≥20% difference were considered either RBM47- or A1CF-dominant, depending on the co-factor with higher contribution (Blanc et al. 2019). Sequence-related parameters: A sequence spanning 10 nucleotides upstream and 30 nucleotides downstream of the edited cytidine was extracted for each C-to-U mRNA editing site. These sequences were extracted either directly from the full-text or using online UCSC Genome Browser on Mouse (NCBI37/mm9) and Human (Grch38/hg38) (https://genome.ucsc.edu/cgi- bin/hgGateway) . Using the mooring sequence model (Backus and Smith 1992), three cis-acting elements were considered for each site. These elements included 1) a 10-nucleotide segment immediately upstream of the edited cytidine as “regulatory sequence”; 2) a 10-nucleotide segment downstream of the edited cytidine with complete or partial consensus with the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 19 canonical “mooring sequence” of ApoB mRNA; 3) the sequence between the edited cytidine and the 5’ end of the mooring sequence, referred to as “spacer”. We used an unbiased approach to identify potential mooring sequences by taking the nearest segment to the edited cytidine with lowest number of mismatch(es) compared to the canonical mooring sequence of ApoB RNA. For each of the three segments, we investigated the number of mismatches compared to the corresponding segment of ApoB gene (Blanc et al. 2014), as well as length of spacer, the abundance of A and U nucleotides (AU content) and the G to C abundance ratio (G/C fraction (Arbab et al. 2020)). We also calculated relative abundance of A, G, C, and U individually across a region 10 nucleotides upstream and 20 nucleotides downstream of the edited cytidine across all editing sites. For comparison, we examined the base content of a sequence spanning 10 nucleotides upstream and downstream of mutated deoxycytidine for over 6000 proposed C to X (T, A, and G) DNA mutation targets of APOBEC family in human breast cancer (Nik-Zainal et al. 2012) along with relative deoxynucleotide distribution in proximity to the edited site. Secondary structure parameters: We used RNA-structure (Reuter and Mathews 2010) and Mfold (Zuker 2003) to determine the secondary structure of an RNA cassette consisting of regulatory sequence, edited cytidine, spacer, and mooring sequence. Secondary structures similar to that of the cassette for ApoB chr12: 8014860 consisting of one loop and stem (with or without unassigned nucleotides with ≤4 unpaired bases inside the stem) as the main stem-loop with or without free tail(s) in one or both ends of the stem were considered as canonical. Two other types of secondary structure were considered as non-canonical structures (Figure 1B), with ≥2 loops located either at ends of the stem or inside the stem. Loops inside the stem were circular open structures with ≥5 unpaired bases. Editing sites with canonical structure were further categorized into three subgroups based on location of the edited cytidine: specifically (Cloop), stem (Cstem), or tail (Ctail). In addition to overall secondary structure, we considered (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 20 location of the edited cytidine, location of mooring sequence, symmetry of the free tails, and proportion of the nucleotides in the target cassette that constitute the main stem-loop. This proportion is 1.0 in the case of ApoB chr12: 8014860 where all the bases are part of the main stem-loop structure. Symmetry was defined based on existence of free tails in both ends of the RNA strand. Statistical methodology Continuous variables are reported as means ± SD with relative proportions for binary and categorical variables. T-test and ANOVA tests were used to compare continuous parameters of interest between two or more than two groups, respectively. Chi-squared testing was used to compare binary or categorical variables among different groups. Pearson r testing was used to investigate correlation of two continuous variables. We used linear regression analyses to develop the final model of independent factors that correlate with editing frequency. We used the Hosmer and Lemeshow approach for model building (Hosmer Jr et al. 2013) to fit the multivariable regression model. In brief, we first used bivariate and/or simple regression analyses with P value of 0.2 as the cut-off point to screen the variables and detect primary candidates for the multivariable model. Subsequently, we fitted the primary multivariable model using candidate variables from the screening phase. A backward elimination method was employed to reach the final multivariable model. Parameters with P values <0.05 or those that added to the model fitness were retained. Next, the eliminated parameters were added back individually to the final model to determine their impact. Plausible interaction terms between final determinants were also checked. The final model was screened for collinearity. We used the same approach to develop a multinomial logistic regression model to identify factors that were independently associated with co-factor dominance in RNA editing sites. Squared R and pseudo squared R were used to estimate the proportion of variance in responder parameter that could be explained by multivariable linear regression and multinomial logistic regression models, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 21 respectively. The same screening and retaining methods were used to investigate association of base content in a sequence 10 nucleotides upstream and 20 nucleotides downstream of the edited cytidine, with editing frequency. However, after determining the nucleotides that were retained in final regression model, a proxy parameter named “base content score” was calculated for each editing site based on the β coefficient values retrieved for individual nucleotides in the model. This parameter was used in the final model as representative variable for base content of the aforementioned sequence in each editing site. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 22 ACKNOWLEDGMENTS This work was supported by grants from the National Institutes of Health grants DK-119437, DK-112378, Washington University Digestive Diseases Research Core Center P30 DK-52574 (to NOD) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 23 REFERENCES UCSC Genome Browser on Mouse (NCBI37/mm9; 2007) and Human (GRCh38/hg38; 2013) assemblies. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Borresen-Dale AL et al. 2013. Signatures of mutational processes in human cancer. Nature 500: 415-421. Arbab M, Shen MW, Mok B, Wilson C, Matuszek Z, Cassa CA, Liu DR. 2020. Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning. Cell 182: 463-480 e430. Backus JW, Schock D, Smith HC. 1994. Only cytidines 5' of the apolipoprotein B mRNA mooring sequence are edited. Biochim Biophys Acta 1219: 1-14. Backus JW, Smith HC. 1991. Apolipoprotein B mRNA sequences 3' of the editing site are necessary and sufficient for editing and editosome assembly. Nucleic Acids Res 19: 6781-6786. -. 1992. Three distinct RNA sequence elements are required for efficient apolipoprotein B (apoB) RNA editing in vitro. Nucleic Acids Res 20: 6007-6014. Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X. 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142-150. Barker N, Ridgway RA, van Es JH, van de Wetering M, Begthel H, van den Born M, Danenberg E, Clarke AR, Sansom OJ, Clevers H. 2009. Crypt stem cells as the cells-of-origin of intestinal cancer. Nature 457: 608-611. Bazak L, Haviv A, Barak M, Jacob-Hirsch J, Deng P, Zhang R, Isaacs FJ, Rechavi G, Li JB, Eisenberg E et al. 2014. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res 24: 365-376. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 24 Blanc V, Henderson JO, Newberry EP, Kennedy S, Luo J, Davidson NO. 2005. Targeted deletion of the murine apobec-1 complementation factor (acf) gene results in embryonic lethality. Molecular and cellular biology 25: 7260-7269. Blanc V, Park E, Schaefer S, Miller M, Lin Y, Kennedy S, Billing AM, Ben Hamidane H, Graumann J, Mortazavi A et al. 2014. Genome-wide identification and functional analysis of Apobec-1-mediated C-to-U RNA editing in mouse small intestine and liver. Genome Biol 15: R79. Blanc V, Xie Y, Kennedy S, Riordan JD, Rubin DC, Madison BB, Mills JC, Nadeau JH, Davidson NO. 2019. Apobec1 complementation factor (A1CF) and RBM47 interact in tissue-specific regulation of C to U RNA editing in mouse intestine and liver. RNA 25: 70- 81. Bostrom K, Lauer SJ, Poksay KS, Garcia Z, Taylor JM, Innerarity TL. 1989. Apolipoprotein B48 RNA editing in chimeric apolipoprotein EB mRNA. J Biol Chem 264: 15701-15708. Chen SH, Habib G, Yang CY, Gu ZW, Lee BR, Weng SA, Silberman SR, Cai SJ, Deslypere JP, Rosseneu M et al. 1987. Apolipoprotein B-48 is the product of a messenger RNA with an organ-specific in-frame stop codon. Science 238: 363-366. Chen SH, Li XX, Liao WS, Wu JH, Chan L. 1990. RNA editing of apolipoprotein B mRNA. Sequence specificity determined by in vitro coupled transcription editing. J Biol Chem 265: 6811-6816. Conticello SG. 2012. Creative deaminases, self-inflicted damage, and genome evolution. Annals of the New York Academy of Sciences 1267: 79-85. Davies MS, Wallis SC, Driscoll DM, Wynne JK, Williams GW, Powell LM, Scott J. 1989. Sequence requirements for apolipoprotein B RNA editing in transfected rat hepatoma cells. J Biol Chem 264: 13395-13398. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 25 Destefanis E, Avsar G, Groza P, Romitelli A, Torrini S, Pir P, Conticello SG, Aguilo F, Dassi E. 2020. A mark of disease: how mRNA modifications shape genetic and acquired pathologies. RNA. Driscoll DM, Wynne JK, Wallis SC, Scott J. 1989. An in vitro system for the editing of apolipoprotein B mRNA. Cell 58: 519-525. Elmentaite R, Ross ADB, Roberts K, James KR, Ortmann D, Gomes T, Nayak K, Tuck L, Pritchard S, Bayraktar OA et al. 2020. Single-Cell Sequencing of Developing Human Gut Reveals Transcriptional Links to Childhood Crohn's Disease. Dev Cell. Fossat N, Tourle K, Radziewic T, Barratt K, Liebhold D, Studdert JB, Power M, Jones V, Loebel DA, Tam PP. 2014. C to U RNA editing mediated by APOBEC1 requires RNA-binding protein RBM47. EMBO Rep 15: 903-910. Gao J, Choudhry H, Cao W. 2018. Apolipoprotein B mRNA editing enzyme catalytic polypeptide-like family genes activation and regulation during tumorigenesis. Cancer science 109: 2375-2382. Giannoni F, Bonen DK, Funahashi T, Hadjiagapiou C, Burant CF, Davidson NO. 1994. Complementation of apolipoprotein B mRNA editing by human liver accompanied by secretion of apolipoprotein B48. J Biol Chem 269: 5932-5936. Grohmann M, Hammer P, Walther M, Paulmann N, Buttner A, Eisenmenger W, Baghai TC, Schule C, Rupprecht R, Bader M et al. 2010. Alternative splicing and extensive RNA editing of human TPH2 transcripts. PloS one 5: e8956. Gu T, Buaas FW, Simons AK, Ackert-Bicknell CL, Braun RE, Hibbs MA. 2012. Canonical A-to-I and C-to-U RNA editing is enriched at 3'UTRs and microRNA target sites in multiple mouse tissues. PLoS One 7: e33720. Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK, Watt IN, Neuberger MS, Malim MH. 2003. DNA deamination mediates innate immunity to retroviral infection. Cell 113: 803-809. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 26 Hersberger M, Innerarity TL. 1998. Two efficiency elements flanking the editing site of cytidine 6666 in the apolipoprotein B mRNA support mooring-dependent editing. J Biol Chem 273: 9435-9442. Hirano K, Young SG, Farese RV, Jr., Ng J, Sande E, Warburton C, Powell-Braxton LM, Davidson NO. 1996. Targeted disruption of the mouse apobec-1 gene abolishes apolipoprotein B mRNA editing and eliminates apolipoprotein B48. J Biol Chem 271: 9887-9890. Hosmer Jr DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression. John Wiley & Sons. Hospattankar AV, Higuchi K, Law SW, Meglin N, Brewer HB, Jr. 1987. Identification of a novel in-frame translational stop codon in human intestine apoB mRNA. Biochem Biophys Res Commun 148: 279-285. Kanata E, Llorens F, Dafou D, Dimitriadis A, Thune K, Xanthopoulos K, Bekas N, Espinosa JC, Schmitz M, Marin-Moreno A et al. 2019. RNA editing alterations define manifestation of prion diseases. Proc Natl Acad Sci U S A 116: 19727-19735. Kankowski S, Forstera B, Winkelmann A, Knauff P, Wanker EE, You XA, Semtner M, Hetsch F, Meier JC. 2017. A Novel RNA Editing Sensor Tool and a Specific Agonist Determine Neuronal Protein Expression of RNA-Edited Glycine Receptors and Identify a Genomic APOBEC1 Dimorphism as a New Genetic Risk Factor of Epilepsy. Front Mol Neurosci 10: 439. Lellek H, Kirsten R, Diehl I, Apostel F, Buck F, Greeve J. 2000. Purification and molecular cloning of a novel essential component of the apolipoprotein B mRNA editing enzyme- complex. J Biol Chem 275: 19848-19856. MacParland SA, Liu JC, Ma XZ, Innes BT, Bartczak AM, Gage BK, Manuel J, Khuu N, Echeverri J, Linares I et al. 2018. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat Commun 9: 4383. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 27 Maris C, Masse J, Chester A, Navaratnam N, Allain FH. 2005. NMR structure of the apoB mRNA stem-loop and its interaction with the C to U editing APOBEC1 complementary factor. RNA 11: 173-186. Mehta A, Kinter MT, Sherman NE, Driscoll DM. 2000. Molecular cloning of apobec-1 complementation factor, a novel RNA-binding protein involved in the editing of apolipoprotein B mRNA. Mol Cell Biol 20: 1846-1854. Meier JC, Henneberger C, Melnick I, Racca C, Harvey RJ, Heinemann U, Schmieden V, Grantyn R. 2005. RNA editing produces glycine receptor alpha3(P185L), resulting in high agonist potency. Nat Neurosci 8: 736-744. Mukhopadhyay D, Anant S, Lee RM, Kennedy S, Viskochil D, Davidson NO. 2002. C-->U editing of neurofibromatosis 1 mRNA occurs in tumors that express both the type II transcript and apobec-1, the catalytic subunit of the apolipoprotein B mRNA-editing enzyme. Am J Hum Genet 70: 38-50. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA et al. 2012. Mutational processes molding the genomes of 21 breast cancers. Cell 149: 979-993. Petljak M, Alexandrov LB, Brammeld JS, Price S, Wedge DC, Grossmann S, Dawson KJ, Ju YS, Iorio F, Tubio JMC et al. 2019. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell 176: 1282-1294 e1220. Powell LM, Wallis SC, Pease RJ, Edwards YH, Knott TJ, Scott J. 1987. A novel form of tissue- specific RNA processing produces apolipoprotein-B48 in intestine. Cell 50: 831-840. Rayon-Estrada V, Harjanto D, Hamilton CE, Berchiche YA, Gantman EC, Sakmar TP, Bulloch K, Gagnidze K, Harroch S, McEwen BS et al. 2017. Epitranscriptomic profiling across cell types reveals associations between APOBEC1-mediated RNA editing, gene expression outcomes, and cellular function. Proc Natl Acad Sci U S A 114: 13296- 13301. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 28 Reuter JS, Mathews DH. 2010. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11: 129. Richardson N, Navaratnam N, Scott J. 1998. Secondary structure for the apolipoprotein B mRNA editing site. Au-binding proteins interact with a stem loop. J Biol Chem 273: 31707-31717. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, Kiezun A, Kryukov GV, Carter SL, Saksena G et al. 2013. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet 45: 970-976. Rosenberg BR, Hamilton CE, Mwangi MM, Dewell S, Papavasiliou FN. 2011. Transcriptome- wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3' UTRs. Nat Struct Mol Biol 18: 230-236. Saraconi G, Severi F, Sala C, Mattiuz G, Conticello SG. 2014. The RNA editing enzyme APOBEC1 induces somatic mutations and a compatible mutational signature is present in esophageal adenocarcinomas. Genome Biol 15: 417. Schaefermeier P, Heinze S. 2017. Hippocampal Characteristics and Invariant Sequence Elements Distribution of GLRA2 and GLRA3 C-to-U Editing. Mol Syndromol 8: 85-92. Shah RR, Knott TJ, Legros JE, Navaratnam N, Greeve JC, Scott J. 1991. Sequence requirements for the editing of apolipoprotein B mRNA. J Biol Chem 266: 16301-16304. Skuse GR, Cappione AJ, Sowden M, Metheny LJ, Smith HC. 1996. The neurofibromatosis type I messenger RNA undergoes base-modification RNA editing. Nucleic Acids Res 24: 478- 485. Smith HC, Kuo SR, Backus JW, Harris SG, Sparks CE, Sparks JD. 1991. In vitro apolipoprotein B mRNA editing: identification of a 27S editing complex. Proc Natl Acad Sci U S A 88: 1489-1493. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 29 Snyder EM, McCarty C, Mehalow A, Svenson KL, Murray SA, Korstanje R, Braun RE. 2017. APOBEC1 complementation factor (A1CF) is dispensable for C-to-U RNA editing in vivo. RNA 23: 457-465. Sowden M, Hamm JK, Spinelli S, Smith HC. 1996. Determinants involved in regulating the proportion of edited apolipoprotein B RNAs. RNA 2: 274-288. Teng B, Burant CF, Davidson NO. 1993. Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science 260: 1816-1819. Wolfe AD, Arnold DB, Chen XS. 2019. Comparison of RNA Editing Activity of APOBEC1-A1CF and APOBEC1-RBM47 Complexes Reconstituted in HEK293T Cells. J Mol Biol 431: 1506-1517. Wolfe AD, Li S, Goedderz C, Chen XS. 2020. The structure of APOBEC1 and insights into its RNA and DNA substrate selectivity. NAR Cancer 2: zcaa027. Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 3406-3415. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 30 Table 1. Multivariable linear regression model for determinant factors of editing frequency in mouse APOBEC1-dependent C-to-U mRNA editing sites. Determinant of editing frequency Subgroup ß (95% CI) P value Model without co-factor group N=103; R2= 0.84; P<.001 Base content score per unit increments 1.00 [0.83, 1.17] <0.001 Count of mismatches in mooring sequence per unit increments -5.89 [-7.48, -4.31] <.001 Count of mismatches in regulatory sequence motif D (whole sequence) per unit increments -2.00 [-3.58, -0.43] .01 AU content of regulatory sequence motif B per 10% increments -2.41 [-4.38, -0.45] .02 Overall secondary structure C loop Reference C stem 1.20 [-5.07, 7.47] .7 C tail -12.19 [-20.80, -3.58] .006 Non-canonical -10.67 [-20.92, -0.43] 0.04 Location of mooring sequence Stem-loop Reference Other -11.56 [-17.35, -5.77] <.001 After adding co-factor group to the model N=72; R2= 0.84; P<.001 Co-factor group RBM47 dominant Reference Co-dominant -12.30 [-20.63, -3.97] .005 A1CF dominant 11.54 [-0.64, 23.72] .07 ß: represents average change (%) in the editing frequency compared to the reference group CI: confidence interval (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 31 Table 2: Characteristics of human C-to-U mRNA editing targets Parameter Low editing High editing NF1 GLYCRA3 GLYCRA2 TPH2B TPH2B APOB Editing location C2914 C554 C575 C385 (exon3) C830 (exon7) C6666 Tissue neural sheath / CNS tumor hippocampus hippocampus amygdala amygdala small intestine Editing frequency %) 10 10 17 89 98 >95 Mismatches in regulatory motif A 1 3 3 2 3 0 Mismatches in regulatory motif B 2 4 5 4 5 0 Mismatches in regulatory motif C 4 4 4 4 4 0 Mismatches in regulatory motif D 6 8 9 8 9 0 AU content (%) in regulatory motif A 100 33 33 100 0 100 AU content (%) in regulatory motif B 100 60 20 100 20 80 AU content (%) in regulatory motif C* 60 40 60 40 40 100 AU content (%) in regulatory motif D 80 50 40 70 30 90 Spacer length* 6 2 2 0 3 4 Spacer AU content (%) 67 0 0 33 100 Mismatches in spacer 2 2 2 2 0 Mismatches in mooring* 3 4 2 1 5 0 AU content (%) of 3 downstream bases* 67 33 33 100 33 100 AU content (%) of 20 downstream bases 60 60 70 55 35 85 Overall secondary structure canonical canonical canonical canonical canonical canonical Location of edited C* loop tail tail stem loop loop Location of mooring sequence stem-loop stem-loop stem-loop stem-loop stem-loop stem-loop Ratio of stem-loop bases* 0.46 0.375 0.5 0.45 0.92 0.96 Free tail orientation symmetric symmetric asymmetric symmetric asymmetric asymmetric Composite score 2 2 2 5 4 5 CNS: central nervous system * these items were used to calculate the composite score (total score = 6) as follows: AU content (%) in regulatory motif C: < 50%: 1, ≥ 50%: 0 spacer length: ≤ 4: 1, > 4: 0 mismatches in mooring: < 3: 1, ≥ 3: 0 AU content (%) of 3 downstream bases: > 50%: 1, ≤ 50%: 0 location of edited C in secondary structure: stem-loop: 1, tail: 0 ratio of stem-loop bases: > 50%: 1, ≤ 50%: 0 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 32 FIGURE LEGENDS Figure 1. Characteristics of murine APOBEC1-mediated C-to-U mRNA editing sites. A: schematic presentation of mRNA target, chromosomal editing location, and editing sites considered. Each mRNA target could be edited at one or more chromosomal location(s) (blue boxes). Each editing location could be edited in one or more tissues giving rise to one or more editing site(s) per location (green boxes). Editing site(s) of each mRNA target are the sum of editing sites from all editing locations reported for that target. B: examples of canonical (ApoB chr12: 8014860, top) and two types of non-canonical (Kctd12 chr14: 103379573 and Dcn chr10: 96980535) secondary structures. C: distribution of number of chromosomal editing location(s), or targeted cytidine(s), per mRNA target. D: distribution of number of total editing sites per mRNA target considering all chromosomal location(s) edited at different tissue(s). E: distribution of location of editing sites within gene structure. Figure 2. Base content of sequences flanking modified cytidine in RNA editing and DNA mutation targets. A: base content of 10 nucleotides upstream and 20 nucleotides downstream of edited cytidine in mouse APOBEC1-mediated C-to-U mRNA editing targets. B: base content of 10 nucleotides upstream and 10 nucleotides downstream of mutated cytidine in proposed human APOBEC-mediated DNA mutation targets in patients with breast cancer. C: comparison of AU base content (%) of nucleotides flanking modified cytidine in RNA editing targets and DNA mutation targets in mouse and human breast cancer patients, respectively. Figure 3. Characteristics of regulatory-spacer-mooring cassette and base content of individual nucleotides flanking edited cytidine in association with editing frequency. A: schematic illustration of regulatory-spacer-mooring cassette. Four motifs were defined for (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 33 regulatory sequence: motif A for nucleotides -1 to -3; motif B for nucleotides -1 to -5; motif C for nucleotides -6 to -10; motif D representative of the whole sequence. B: association of the mismatches in motif D of regulatory sequence with editing frequency. C: association between the AU content (%) of regulatory sequence (motif B) and editing frequency. D: association of the mismatches in spacer (nucleotides +1 to +4 downstream of the edited cytidine) with editing frequency. E: association of the mismatches in mooring sequence with editing frequency. F: heatmap plot illustrating the association between base content of 30 nucleotides flanking the edited cytidine with editing frequency. Red color density in each cell represents the beta coefficient value of corresponding base in the multivariable linear regression model fit including that nucleotide. The asteriska refer to the nucleotides that were retained in the final model. Mismatches in regulatory, spacer, and mooring sequences were determined in comparison to the corresponding sequences in ApoB mRNA (as reference). r: Pearson correlation coefficient. Figure 4. Secondary structure-related features in association with editing frequency. A: distribution of different types of overall secondary structure in editing sites. C loop, C stem, C tail are three subtypes of canonical secondary structure based on the location of the edited cytidine. B: association between type of secondary structure and editing frequency. C: distribution of the mooring sequence location in editing sites. “Other” refers to mooring sequences located in tail or stem/loop and not part of the main stem-loop structure. D: association of mooring sequence location with editing frequency. E: association between ratio of main stem-loop bases to total bases count and editing frequency. F: association of the 5’ free tail length with editing frequency. * P<.05; ** P<.001. r: Pearson correlation coefficient. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 34 Figure 5. Dominance and tissue-specific cofactor patterns among editing sites. A: distribution of dominant co-factor in editosomes of editing sites. B: association of dominant co- factor with editing frequency. C: distribution of number of editing tissue(s) per mRNA target. D: tissue distribution of editing sites. E: average editing frequency of editing sites edited at different tissues. SI, small intestine. Figure 6. Co-factor pattern and tissue-specific role in murine C-to-U mRNA editing sites. A: distribution of editing tissue across subgroups of editing sites with different dominant co- factor patterns. B: location of edited cytidine in secondary structure of editing sites with different dominant co-factor patterns. C: schematic presentation of factors that correlate with dominant co-factor pattern in editing sites. This graph is based on the findings derived from pairwise multinomial logistic regression models. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 35 SUPPLEMENTAL FIGURE LEGENDS Supplemental Figure 1. Chromosomal distribution of murine APOBEC1-mediated C-to-U mRNA editing sites. The black curve corresponds to left Y-axis and represents average editing frequencies of editing sites related to each chromosome. The blue curve corresponds to right Y axis and represents number of editing sites related to each chromosome. Supplemental Figure 2. Association of editing frequency with characteristics of regulatory sequence in murine APOBEC1-mediated C-to-U mRNA editing sites. A-C. Association of editing frequency with number of mismatches and AU content (%). D-F Association of editing frequency with different regulatory sequence motifs. Mismatches were determined in comparison to the same regulatory sequence motif in ApoB mRNA (as reference). Supplemental Figure 3. Association of editing frequency with characteristics of downstream sequence in murine APOBEC1-mediated C-to-U mRNA editing sites. A. Association of editing frequency with spacer length. B. Association of editing frequency with spacer AU content (%). C-F. Association of editing frequency with and AU content of successive segments downstream of the edited cytidine. Supplemental Figure 4. Association of editing frequency with secondary structure- related characteristics in C-to-U mRNA editing sites. A: distribution of edited cytidine location in secondary structure regardless of the overall secondary structure. B: association of editing frequency with edited cytidine location in secondary structure. C: distribution of free tail (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 36 orientation in editing sites. D: association of editing frequency with free tail orientation in editing sites. E: association of editing frequency with 3’ free tail length. * P<.05; *** P<.0001. r: Pearson correlation coefficient. Supplemental Figure 5. Association of secondary structure-related characteristics with dominant co-factor pattern in APOBEC1-mediated C-to-U mRNA editing sites. A. Distribution of mooring sequence location presented in the context of different dominant co- factor patterns. B. Distribution of free tail orientation in secondary structure among editing sites, presented in the context of different dominant co-factor patterns. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 37 Supplemental table 1. Multivariable linear regression model for individual nucleotides surrounding edited cytosine (-10 to +20) in mouse APOBEC1-dependent C-to-U mRNA editing sites. Location of nucleotide relative to edited C Base preference ß (95% CI) P value Nucleotide -8 GU 8.15 [3.0,13.3] 0.002 Nucleotide -7 C 12.7 [4.3, 21.0] 0.003 Nucleotide -6 G 7.1 [0.6, 13.7] 0.03 Nucleotide -5 U 5.2 [1.0, 9.5] 0.02 Nucleotide -2 AUC 13.5 [9.0, 17.9] <0.001 Nucleotide -1 AU 15.9 [4.0, 27.9] 0.01 Nucleotide +1 AGU 19.5 [12.5, 26.6] <0.001 Nucleotide +3 G 12.2 [7.4, 16.9] <0.001 Nucleotide +4 G 15.9 [10.9, 21.0] <0.001 Nucleotide +7 C 10.3 [1.5, 19.2] 0.02 Nucleotide +9 G 9.7 [1.4, 18.0] 0.02 Nucleotide +12 AUC 7.5 [1.0, 13.9] 0.02 Nucleotide +16 AC 6.6 [2.2, 11.0] 0.004 Nucleotide +17 AU 5.6 [0.5, 10.8] 0.03 Nucleotide +18 AU 6.6 [1.5, 11.8] 0.01 Nucleotide +19 AC 5.65 [1.3, 10.0] 0.01 ß: represents average change (%) in the editing frequency compared to the reference group (non- preferred group) CI: confidence interval (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 38 Supplemental table 2. Descriptive data of regulatory-spacer-mooring cassette in mouse APOBEC1- dependent C-to-U mRNA editing sites. Parameter N Mean SD Min Max Sequence-related features Mismatches in regulatory (motif A) 177 1.72 0.94 0 3 Mismatches in regulatory (motif B) 177 3.35 1.12 0 5 Mismatches in regulatory (motif C) 177 3.78 0.99 0 5 Mismatches in regulatory (motif D) 177 7.12 1.76 0 10 AU content (%) of regulatory (motif A) 177 75.14 26.00 0 100 AU content (%) of regulatory (motif B) 177 73.44 22.10 0 100 AU content (%) of regulatory (motif C) 177 63.00 23.40 0 100 AU content (%) of regulatory (motif D) 177 68.25 18.40 10 100 Spacer length 177 5.08 3.67 0 20 Mismatches in spacer 152 2.54 1.09 0 4 AU content (%) of spacer 172 72.65 23.39 0 100 Mismatches in mooring 177 2.13 1.81 0 8 AU content (%) of downstream sequence +1 to +5 177 72.88 19.46 0 100 AU content (%) of downstream sequence +6 to +10 177 69.94 22.78 0 100 AU content (%) of downstream sequence +11 to +15 177 72.43 20.65 20 100 AU content (%) of downstream sequence +16 to +20 177 66.21 22.56 0 100 Secondary structure-related features Proportion of the bases that constitute main stem- loop 172 0.61 0.18 0.28 1 Length of 5’ free tail 172 4.25 3.93 0 15 Length of 3’ free tail 172 5.27 4.65 0 17 SD: standard deviation (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 39 Supplemental table 3. Comparing three subgroups of mouse APOBEC1-dependent C-to-U mRNA editing sites based on co-factor dominance. Parameter RBM47-dominant A1CF-dominant Co-dominant P value N Mean SD N Mean SD N Mean SD Mismatches in regulatory (motif A) 60 1.48 0.93 5 1.80 0.45 7 1.14 0.69 .4 Mismatches in regulatory (motif B) 60 3.05 1.13 5 3.60 0.55 7 3.00 0.82 .51 Mismatches in regulatory (motif C) 60 3.58 1.05 5 3.80 0.45 7 4.29 1.11 .1 Mismatches in regulatory (motif D) 60 6.63 1.90 5 7.40 0.55 7 7.29 1.50 .44 AU content (%) of regulatory (motif A) 60 82.22 18.88 5 80.00 18.26 7 85.71 17.82 .8 AU content (%) of regulatory (motif B) 60 76.33 16.67 5 84.00 16.73 7 82.86 17.99 .5 AU content (%) of regulatory (motif C) 60 62.67 22.84 5 72.00 17.89 7 62.86 21.38 .6 AU content (%) of regulatory (motif D) 60 69.50 14.89 5 78.00 13.04 7 72.86 12.54 .4 Spacer length 60 5.20 3.93 5 7.20 5.45 7 7.86 5.08 .2 Mismatches in spacer (in 4-base cassette) 40 2.43 1.20 4 2.75 1.50 6 3.83 0.41 .02 Mismatches in spacer (relative abundance (%)) 60 61.81 30.89 5 61.67 36.13 7 82.14 37.40 .2 AU content (%) of spacer 60 77.30 17.83 5 72.08 18.14 7 71.37 15.24 .5 Mismatches in mooring 60 1.12 1.30 5 2.00 2.55 7 2.86 0.38 .004 AU content (%) of downstream sequence +1 to +5 60 77.33 14.94 5 80.00 20.00 7 71.43 15.74 .7 AU content (%) of downstream sequence +6 to +10 60 77.67 18.81 5 60.00 24.49 7 57.14 13.80 .01 AU content (%) of downstream sequence +11 to +15 60 80.33 15.40 5 72.00 17.89 7 65.71 15.12 0.06 AU content (%) of downstream sequence +16 to +20 60 70.33 20.00 5 72.00 10.95 7 77.14 17.99 .6 Proportion of the bases that constitute main stem-loop 60 0.62 0.18 5 0.71 0.10 7 0.59 0.21 .5 Length of 5’ free tail 60 4.08 3.81 5 2.40 3.91 7 6.86 6.20 .3 Length of 3’ free tail 60 5.35 4.84 5 6.00 2.55 7 5.00 5.66 .6 SD: standard deviation (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 40 Supplemental Table 4. Multinomial logistic regression model for determinant factors of co-factor dominancy in mouse APOBEC1-dependent C-to-U mRNA editing sites. Determinant of co-factor dominancy Subgroup Coefficient (95% CI) P value A1CF-dominant vs RBM47-dominant Tissue Small intestine Reference Liver 4.40 [0.34, 5.21] .04 Location of edited cytosine Loop Reference Stem -3.88 [-8.31, 0.55] 0.08 Tail -19.13 [-25.82, -12.44] <0.001 Mismatches in mooring sequence per unit increments 0.30 [-0.97, 1.57] 0.6 Mismatches in regulatory sequence motif B per unit increments 1.62 [0.063, 3.30] .05 Mismatches in regulatory sequence motif C per unit increments 0.12 [-0.83, 1.08] .8 AU content (%) of regulatory sequence motif D per unit increments 0.17 [-0.04, 0.39] 0.1 AU content (%) of downstream sequence +1 to +5 per unit increments -0.02 [-0.09, 0.04] 0.5 AU content (%) of downstream sequence +6 to +10 per unit increments -0.06 [-0.1, -0.02] 0.006 AU content (%) of downstream sequence +11 to +15 per unit increments -0.06 [-0.18, 0.07] 0.4 Co-dominant vs RBM47-dominant Tissue Small intestine Reference Liver -1.73 [-6.00, 2.50] 0.4 Location of edited cytosine in secondary structure C loop Reference C stem 1.70 [-2.11, 5.51] 0.4 C tail 3.70 [0.72, 6.67] 0.01 Mismatches in mooring sequence per unit increments 0.66 [0.01, 1.33] .05 Mismatches in regulatory sequence motif B per unit increments -2.32 [-3.86, -0.79] .003 Mismatches in regulatory sequence motif C per unit increments 3.16 [1.12, 5.21] 0.002 AU content (%) of regulatory sequence motif D per unit increments 0.13 [0.02, 0.24] 0.02 AU content (%) of downstream sequence +1 to +5 per unit increments -0.17 [-0.35, -0.01] 0.04 AU content (%) of downstream sequence +6 to +10 per unit increments -0.10 [-0.28, 0.07] 0.25 AU content (%) of downstream sequence +11 to +15 per unit increments -0.10 [-0.19, -0.01] 0.03 Co-dominant vs A1CF -dominant Tissue Small intestine Reference Liver -6.13 [-10.60, -0.31] 0.04 Location of edited cytosine in secondary structure C loop Reference C stem 5.58 [0.06, 9.22] 0.05 C tail 22.83 [15.53, 30.12] <0.001 Mismatches in mooring sequence per unit increments 0.36 [-0.87, 1.59] 0.6 Mismatches in regulatory sequence motif B per unit increments -3.94 [-6.27, -1.61] 0.001 Mismatches in regulatory sequence motif C per unit increments 3.04 [0.91, 5.16] 0.005 AU content (%) of regulatory sequence motif D per unit increments -0.04 [-0.29, 0.20] 0.72 AU content (%) of downstream sequence +1 to +5 per unit increments -0.15 [-0.32, 0.02] 0.09 AU content (%) of downstream sequence +6 to +10 per unit increments -0.04 [-0.22, 0.13] 0.62 AU content (%) of downstream sequence +11 to +15 per unit increments -0.04 [-0.19, 0.11] 0.58 Model parameters: N=72; Pseudo R2= 0.59; P<.001 CI: confidence interval (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425897doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425897 10_1101-2021_01_08_425958 ---- Reconstitution of cargo-induced LC3 lipidation in mammalian selective autophagy 1 Reconstitution of cargo-induced LC3 lipidation in mammalian selective autophagy Chunmei Chang1,3, Xiaoshan Shi1,3, Liv E. Jensen1,3, Adam L. Yokom1,3, Dorotea Fracchiolla2,3, Sascha Martens2,3 and James H. Hurley1,3,4 1 Department of Molecular and Cell Biology at California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA 2 Department of Biochemistry and Cell Biology, Max Perutz Labs, University of Vienna, Vienna BioCenter, Dr. Bohr-Gasse 9, 1030 Vienna, Austria 3Aligning Science Across Parkinson’s Collaborative Research Network, Chevy Chase, MD, USA 4 Corresponding author: James H. Hurley, ORCID: 0000-0001-5054-5445, e-mail: jimhurley@berkeley.edu .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 2 Abstract Selective autophagy of damaged mitochondria, intracellular pathogens, protein aggregates, endoplasmic reticulum, and other large cargoes is essential for health. The presence of cargo initiates phagophore biogenesis, which entails the conjugation of ATG8/LC3 family proteins to membrane phosphatidylethanolamine. Current models suggest that the presence of clustered ubiquitin chains on a cargo triggers a cascade of interactions from autophagic cargo receptors through the autophagy core complexes ULK1 and class III PI 3-kinase complex I (PI3KC3-C1), WIPI2, and the ATG7, ATG3, and ATG12-ATG5-ATG16L1 machinery of LC3 lipidation. This model was tested using giant unilamellar vesicles (GUVs), GST-Ub4 as a model cargo, the cargo receptors NDP52, TAX1BP1, and OPTN, and the autophagy core complexes. All three cargo receptors potently stimulated LC3 lipidation on GUVs. NDP52- and TAX1BP1-induced LC3 lipidation required the ULK1 complex together with all other components, however, ULK1 kinase activity was dispensable. In contrast, OPTN bypassed the ULK1 requirement completely. These data show that the cargo-dependent stimulation of LC3 lipidation is a common property of multiple autophagic cargo receptors, yet the details of core complex engagement vary considerably and unexpectedly between the different receptors. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 3 Introduction Macroautophagy (hereafter autophagy) is an evolutionarily conserved catabolic pathway that sequesters intracellular components in double membrane vesicles, autophagosomes, and delivers them to lysosomes for degradation (1). By removing excess or harmful materials like damaged mitochondria, protein aggregates, and invading pathogens, autophagy maintains cellular homeostasis and is cytoprotective (2). Autophagy is particularly important in maintaining the health of neurons, which are long-lived cells that have a high flux of membrane traffic. Defective autophagy of mitochondria (mitophagy) downstream of mutations in PINK1 and Parkin is thought to contribute to the etiology of a subset of Parkinson’s Disease (3). The de novo formation of autophagosome, central to autophagy, entails the formation of a membrane precursor, termed the phagophore (or isolation membrane) that expands and seals around cytosolic cargoes (4). A set of autophagy related (ATG) proteins drive autophagosome biogenesis. In mammalian cells, the unc-51-like kinase 1 (ULK1) complex, consisting of ULK1 itself, FIP200, ATG13, and ATG101, is typically recruited to the autophagosome formation site first. The class III phosphatidylinositol 3-kinase complex I (PI3KC3-C1) is subsequently activated to generate phosphatidylinositol-3-phosphate (PI(3)P). PI(3)P enriched membranes serve as platforms to recruit the downstream effector WIPIs (WD-repeat protein interacting with phosphoinositides), and ATG8/LC3 conjugation machinery (5, 6). ATG2 transfers phospholipids from endoplasmic reticulum (ER) to the growing phagophore (7-9), while ATG9 translocates phospholipids from the cytoplasmic to the luminal leaflet, enabling phagophore expansion (10, 11). These above-mentioned proteins are sometimes referred to as the “core complexes” of autophagy. The attachment of the ATG8 proteins of the LC3 and GABARAP subfamilies to the membrane lipid phosphatidylethanolamine (PE), termed LC3 lipidation, is a hallmark of autophagosome biogenesis. LC3 lipidation occurs via a ubiquitin-like conjugation cascade. The ubiquitin E1-like ATG7 and the E2-like ATG3 carry out the cognate reactions in the LC3 pathway. The ATG12-ATG5-ATG16L1 complex scaffolds transfer of LC3 from ATG3 to PE (12, 13). The role of the ATG12-ATG5-ATG16L1 is analogous to that of a RING domain ubiquitin E3 ligase, although there is no sequence homology between any of the subunits and ubiquitin E3 ligases. Covalent anchoring of LC3 to membrane is closely associated with phagophore membrane .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 4 expansion (14-17) and cargo sequestration (18-20). Although recent evidence showed that autophagosome formation can still occur in mammalian cells lacking all six ATG8 family proteins (17, 21), their size was reduced and lysosomal fusion was impaired. LC3 lipidation is thus involved in multiple steps in autophagosome biogenesis, and is critical for promoting autophagosome- lysosome fusion (17, 21). In most instances in mammalian cells, autophagy is highly selective and tightly regulated (22). Several targets of selective autophagy have been described, including mitochondria (mitophagy), intracellular pathogens (xenophagy), aggregated proteins (aggrephagy), endoplasmic reticulum (reticulophagy), lipid droplets (lipophagy), and peroxisomes (pexophagy) (23). The achievement of selectivity relies on a family of autophagy receptors, which specifically bind to cargoes and the phagophore (24-26). Some types of selective autophagy like aggrephagy, mitophagy, and xenophagy are initiated by the ubiquitination of cargoes, which are recognized by a subset of cargo receptors including p62 (sequestosome-1), NBR1, optineurin (OPTN), NDP52 and Tax1-binding protein 1 (TAX1BP1). All of these receptors contain a LC3-interaction region (LIR), a ubiquitin binding domain (UBD), and a dimerization/oligomerization domain (18, 26, 27). These cargo receptors are well-known to connect cargo to the phagophore through their interaction with both clustered ubiquitin chains and membrane-conjugated LC3. Several cargo receptors have recently been shown to trigger autophagy initiation, thus functioning upstream of LC3 lipidation. NDP52 directly binds to and recruits the ULK1 complex to damaged mitochondria and intracellular bacteria by binding to the coiled-coil (CC) of the FIP200 subunit of the ULK1 complex (28-30). p62 is also recruited to FIP200, but to its CLAW domain instead of the CC (31). Initiation of mitophagy by OPTN, however, appears to be independent of ULK1 (32). These findings are beginning to reveal different roles for various cargo receptors in triggering early autophagy machinery assembly via distinct entry points. In vitro reconstitution studies have recently shown to recapitulate the steps in autophagosome formation, especially in yeast autophagy (33-36). These in vitro approaches are powerful to investigate the molecular mechanisms of such a complicated cell biological process by controlling the multi-component compositions and spatiotemporal arrangements. However, it has been challenging to reconstitute mammalian autophagosome because of the complexity of mammalian autophagy machinery. As part of a long-term effort, we recently reconstituted the events from PI(3)P production by the PI3KC3-C1 to LC3 lipidation in mammalian autophagy in a .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 5 giant unilamellar vesicle (GUV) model system (37). Separately, we showed that in the presence of clustered ubiquitin chains, NDP52 promotes recruitment of the ULK1 complex to membranes (29). Here, using the GUV model system, and focusing on autophagy receptors involved in mitophagy, we established a start-to-finish reconstitution of selective autophagy initiation from autophagy receptor engagement through LC3 lipidation. We found that NDP52, TAX1BP1 and OPTN triggered robust LC3 lipidation in the presence of the ULK1 complex, PI3KC3-C1 complex, WIPI2, and LC3 conjugation machinery. LC3 lipidation triggered by NDP52 and TAX1BP1 was dependent on both ULK1 and PI3KC3-C1, while OPTN-induced LC3 lipidation was only dependent on the activity of PI3KC3-C1. We further found that these cargo receptors trigger LC3 lipidation through distinct multivalent webs of interactions, thereby enabling the rapid LC3 lipidation for autophagosome formation. Results Reconstitution of NDP52 and TAX1BP1-triggered LC3 lipidation We sought to establish a purified system that recapitulate the initiation of mitophagy, which is known to utilize the cargo receptors NDP52, TAX1BP1, and OPTN (38-43), together with the core autophagy initiation machinery and intracellular membranes. We used GUVs with an ER-like lipid composition to mimic the membranes, a mixture of linear tetraubiquitin and cargo receptors to mimic cargo signals, and these were incubated with a set of purified core autophagy machineries that are involved in autophagy initiation, including the ULK1 complex, the PI3KC3-C1 complex, the PI(3)P effector WIPI2d, the E1-related ATG7, the E2-related ATG3, the functional ubiquitin E3 ligase counterpart ATG12-ATG5-ATG16L1 complex (hereafter referred to as “E3” or “E3 complex” for brevity), and LC3B (Fig. 1A and fig. S1). All proteins and complexes used were full- length and wild-type, with the exception of the FIP200 D641-779 construct (fig. S2A), which was engineered to increase stability (fig. S2B) and prevent non-specific aggregation. Negative stain electron microscopy (NSEM) images showed that FIP200D641-779 had essentially the same structure as wild-type, while losing its propensity to aggregate (fig. S2C) (29). Contour lengths and end-to- end distances of FIP200D641-779 as analyzed by NSEM were comparable as the full-length (fig. S2D) (29). All of the fluorescently tagged fusion constructs were previously characterized and shown to be functional (29, 37). The typical concentrations of autophagy proteins in human cells are unknown, but as most are thought to be scarce, we used the following concentrations for all .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 6 reconstitution reactions: 5 µM GST-Ub4, 500 nM cargo receptors, 25 nM ULK1 complex, 25 nM PI3KC3-C1 complex, 100 nM WIPI2d, 50 nM E3 complex, 100 nM ATG7, 100 nM ATG3, and 500 nM mCherry-LC3B. We first investigated the recruitment of the E3 complex to membranes, since the localization of E3 complex dictates the sites of LC3 lipidation in cells (44). We observed that in the presence of WIPI2d and the LC3 conjugation machinery, but not ULK1 complex or PI3KC3- C1, little or no GFP-E3 or mCherry-LC3B was recruited to the GUVs within 30 min (Fig. 1B, first column). The addition of PI3KC3-C1 triggered the membrane recruitment of both E3 and LC3B (Fig. 1B, second column), consistent with the previous observation that PI3KC3-C1 activity is required for E3 membrane targeting and LC3 lipidation (37). ULK1 phosphorylates core subunits of PI3KC3-C1 (45, 46), but the addition of ULK1 complex together with PI3KC3-C1 had similar effects to PI3KC3-C1 alone (Fig. 1B, third column). Because NDP52 and the ULK1 complex have been shown to interact directly with the E3 complex (47-49), we asked whether NDP52 or ULK1 could mediate the membrane recruitment of E3 complex. However, the addition of NDP52 with GST-Ub4 or that together with ULK1 complex did not result in an obvious increase of membrane enrichment of E3 and subsequent LC3B (Fig. 1B, fourth and fifth columns). However, NDP52 and GST-Ub4 did enhance PI3KC3-C1 triggered E3 and LC3B membrane recruitment (Fig. 1B, sixth column). Membrane recruitment was further enhanced when ULK1 complex was added (Fig. 1B, last column). Quantification of the kinetics of membrane binding showed that both E3 recruitment and LC3 lipidation were fastest when all the components were present (Fig. 1C and D). The kinetics of E3 recruitment to membranes were faster than that of LC3 (Fig. 11C and D), indicating that the E3 is being recruited in a catalytically competent form. We found that LC3 lipidation was slightly more efficient in the presence of the FIP200D641-779 version of the ULK1 complex relative to the version containing wild-type FIP200 (fig. S2E). We thus used the FIP200D641-779 ULK1 complex for all further reconstitution assays and refer to it hereafter as simply the “ULK1 complex”. Taken together, these data show that NDP52 triggers efficient LC3 lipidation when both ULK1 and PI3KC3-C1 complexes are present. We next tested TAX1BP1, a structural paralog of NDP52 which has roles in mitophagy, xenophagy, and aggrephagy (42, 50, 51). We found that similar to NDP52, TAX1BP1 induced the most robust and efficient E3 membrane binding and LC3 lipidation when both ULK1 and PI3KC3- .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 7 C1 complex were included in the system (Fig. 1E and F and fig. S3). Although the addition of TAX1BP1 and GST-Ub4 resulted in a slightly stronger E3 recruitment than NDP52, no obvious increase in LC3 lipidation was observed. (Fig. 1F and fig. S3, fourth and fifth columns). Together, these data indicate that, like NDP52, TAX1BP1 can trigger robust LC3 lipidation in response to a cargo mimetic in vitro, which is dependent on both ULK1 and PI3KC3-C1 complex. Reconstitution of OPTN triggered LC3 lipidation We went on to investigate another cargo receptor OPTN, which has been shown to mediate Parkin- dependent mitophagy (39). Residues S177 and S473 of OPTN are phosphorylated by Tank-binding kinase 1 (TBK1), which were reported to enhance the binding of OPTN to both LC3 and ubiquitin (52). We first tested the phosphomimetic double mutant of OPTN S177D/S473D, hereafter “OPTNS2D”. We observed that OPTNS2D and GST-Ub4 alone induced a modest recruitment of both E3 and LC3 to the GUV membrane, similar to the addition of PI3KC3-C1 (Fig. 2A, first four columns). In addition, OPTNS2D and GST-Ub4 dramatically increased E3 recruitment and LC3 lipidation by PI3KC3-C1 (Fig. 2A, sixth column). However, in contrast to the situation with NDP52 or TAX1BP1, the addition of ULK1 complex had no effect on either E3 or LC3 binding triggered by the OPTNS2D-Ub-PI3KC3-C1 axis (Fig. 2A, last column). The dynamics of LC3 lipidation and E3 binding when OPTNS2D, GST-Ub4 and PI3KC3-C1 were present were essentially the same in the presence or absence of ULK1 complex (Fig. 2B and C). These data indicate that OPTNS2D can also trigger a robust LC3 lipidation, but as distinct from NDP52 and TAX1BP1, OPTN triggered LC3 lipidation depended only on the activity of PI3KC3-C1. We compared the kinetics of E3 recruitment compared to that of LC3 in the presence of all components, LC3 was recruited slower than E3 with a mean lag of 4.5 min (fig. S4). We also evaluated the activity of wild type OPTN in the presence or absence of PI3KC3-C1. OPTNWT and GST-Ub4 also enhanced membrane binding of E3 and LC3 lipidation, but more weakly than OPTNS2D (Fig. 2D and E), indicating that the higher affinities for LC3 and ubiquitin contributed to faster LC3 lipidation in the presence of OPTNS2D. The kinase activity of ULK1 is dispensable for cargo receptor induced-LC3 lipidation We next asked whether the kinase activity is required for the receptor induced LC3 lipidation, as the ULK1 kinase has been reported to phosphorylate multiple downstream autophagy components .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 8 upon autophagy initiation, including the ATG14 and BECN1 subunits of PI3KC3-C1, ATG16L1 and ATG9A (45, 46, 53-55). We found that in the presence of NDP52, GST-Ub4, PI3KC3-C1, WIPI2d, and the LC3 conjugation machinery, the ULK1 kinase dead (KD) complex accelerated both E3 membrane recruitment and LC3 lipidation to the same extent as the wild-type complex (Fig. 3A and B). In contrast, in the presence of OPTNS2D and all the other components, neither wild-type nor KD ULK1 complex enhanced E3 binding or LC3 lipidation (fig. S5). These data support that OPTN triggered LC3 lipidation is independent of both the catalytic and non-catalytic activities of the ULK1 complex. Kinetics of ULK1 complex recruitment to membranes To analyze the differences between the three cargo receptors in more detail, we went on to investigate the kinetics of the recruitment of the upstream components as triggered by cargo receptors. We first monitored the kinetics of ULK1 complex recruitment. In the presence of WIPI2d and conjugation machinery, no detectable GFP tagged ULK1 complex was recruited to membrane (Fig. 4A, first column). The addition of NDP52 or TAX1BP1-Ub, with GST-Ub4, dramatically enhanced the membrane binding of ULK1 complex (Fig. 4A, second and third columns, B, and C), consistent with the previous observations that NDP52 directly recruited ULK1 complex in mitophagy or xenophagy (28-30). We noticed that only a little LC3 lipidation occurred even though the ULK1 complex was enriched on the membrane upon the addition of GST-Ub4 with either NDP52 or TAX1BP1 (Fig. 4A, second and third columns, B, and C). This suggested that ULK1 complex alone is in-sufficient to activate LC3 conjugation. In contrast to NDP52 or TAX1BP1, the addition of OPTNS2D and GST-Ub4 did not result in any increased ULK1 membrane binding (Fig. 4A fourth column, and D). However, when PI3KC3-C1 was added in the reaction, we observed an obvious membrane binding of ULK1 complex, which was further enhanced by the addition of NDP52, TAX1BP1 or even OPTNS2D (Fig. 4A, last four columns, B to D). These data were interpreted in terms of a two-step recruitment of ULK1 complex to membrane in which the ULK1 complex is initially recruited to the membrane by NDP52 or TAX1BP1, but not OPTN. Once PI3KC3-C1 is active on the membrane, ULK1 complex recruitment is promoted further, even when PI3KC3-C1 is recruited downstream of OPTN only. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 9 We then sought to understand the mechanism of PI3KC3-C1 dependent recruitment of the ULK1 as triggered by OPTNS2D. Omission of PI3KC3-C1 or WIPI2d almost completely eliminated the membrane binding of the ULK1 complex. Depletion of E3 complex also largely decreased ULK1 complex recruitment, however, depletion of ATG7 or LC3 slightly increased the ULK1 membrane recruitment (Fig. 4E and F). As expected, omission of any of the components downstream of ULK1 complex eliminated OPTNS2D triggered LC3 lipidation (Fig. 4E and F). These data indicate that the PI3KC3-C1-WIPI2d-E3 axis is required for the further recruitment of ULK1, which is consistent with the previous observations that the translocation of ULK1 complex to omegasomes was stabilized by sustained PI(3)P synthesis (56) and that FIP200 could form a trimeric complex with ATG16 and WIPI2 (57). The lack of dependence on ATG7 or LC3 rules out that an ULK1 LIR motif-LC3 interaction (58, 59) is driving ULK1 recruitment in these experiments. This recruitment of ULK1 complex downstream of OPTN does not lead to a feed forward increase in OPTN triggered LC3 lipidation, given that LC3 lipidation is similar in the presence or absence ULK1 complex (Fig. 2B), which suggests that in OPTN-triggered mitophagy, the ULK1 complex functions at a stage of autophagosome formation subsequent to LC3 lipidation or in other processes that act in parallel to LC3 lipidation. Kinetics of PI3KC3-C1 recruitment to membranes We next monitored the kinetics of PI3KC3-C1 complex recruitment during LC3 lipidation. As distinct from the ULK1 complex, the intrinsic membrane affinity of PI3KC3-C1 enabled it to bind membranes even in the presence of WIPI2d and conjugation machinery but not cargo receptors (Fig. 5A, first column). This is consistent with the observation that PI3KC3-C1 alone can trigger LC3 lipidation in the absence of ULK1 complex (37). The addition of OPTNS2D and GST-Ub4, but not GST-Ub4 with NDP52 or TAX1BP1, dramatically enhanced the membrane binding of PI3KC3-C1 complex (Fig. 5A, first four columns, B to D). ULK1 complex alone did not increase PI3KC3-C1 membrane binding (Fig. 5A, fifth column). However, the addition of ULK1 complex did promote membrane recruitment of PI3KC3-C1 in the presence of NDP52 or TAX1BP1, although not OPTNS2D (Fig. 5A, last three columns, B to D). These data indicate that OPTN strongly enhances membrane recruitment of PI3KC3-C1 on its own. NDP52 and TAX1BP1 have a similar ultimate effect, but only in the presence of ULK1 complex. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10 Given the dramatic increase of PI3KC3-C1 membrane binding triggered by OPTN, we sought to understand its mechanism. Omission of WIPI2d almost completely blocked the membrane binding of PI3KC3-C1 (Fig. 5E and F), consistent with the previous observation that PI3KC3-C1 and WIPI2d cooperatively bind to membranes (37). In contrast, the depletion of ATG7, ATG3, or LC3, but not E3, resulted in a slight decrease of PI3KC3-C1 binding (Fig. 5E and F), suggesting that a multivalent assembly of OPTN, PI3KC3-C1, WIPI2d, and E3 may be responsible for the membrane recruitment of PI3KC3-C1 by OPTN. Interactions between cargo receptors and the core autophagy machinery We found that NDP52, TAX1BP1 and OPTN trigger robust LC3 lipidation by dramatically enhancing the membrane binding of ULK1 or PI3KC3-C1 complex, and that they do so by distinct mechanisms. We therefore hypothesized that these cargo receptors could interact with the autophagy core complexes in distinct ways. To test this, we systematically analyzed the binding between these cargo receptors and different autophagy components by a microscopy-based bead interaction assay (Fig. 6A). The ULK1 complex was specifically recruited to beads coated with NDP52 and TAX1BP1, but not OPTNS2D (Fig. 6B and G), consistent with the observation that NDP52 or TAX1BP1 directly recruited ULK1 complex to membrane. However, no detectable PI3KC3-C1 complex was recruited to beads coated with OPTNS2D. Instead, weak binding between PI3KC3-C1 and NDP52 or TAX1BP1 was detected (Fig. 6C and G), suggesting that the increased membrane binding of PI3KC3-C1 by OPTN was not mediated by a direct interaction. Weak binding between OPTNS2D and WIPI2d, NDP52 and WIPI2d, OPTNS2D and E3, NDP52 and E3 were also observed (Fig. 6D, E and G). We noticed a strong interaction between TAX1BP1 and WIPI2d or E3 (Fig. 6E and G), which may explain the stronger membrane recruitment of E3 by TAX1BP1 and GST-Ub4 alone. Interactions between NDP52, TAX1BP1, OPTNS2D and LC3B were weak at the tested concentrations (Fig. 6F and G). These data indicate that cargo receptors directly bind to multiple autophagy components, and thus trigger LC3 lipidation through a multivalent web of both strong and weak interactions. Discussion Over the past two years, rapid advances in the mechanistic cell biology of autophagy, and the elucidation of new activities for autophagy proteins, has crystallized into detailed models for .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 11 mechanisms of autophagosome formation (4, 60). The ability to biochemically reconstitute such a pathway from purified components is a stringent and powerful test of such models. Moreover, reconstitution allows nuanced aspects of the interplay between components to be assessed with a rigor that is difficult in vivo. In the yeast model system, it was recently shown that a set of purified components could recapitulate the cargo-stimulated Atg8 lipidation and lipid transfer into Atg9 vesicles, confirming the function of Atg9 vesicles as the seeds of the phagophore (36). Progress in the reconstitution of human autophagy is less advanced, despite the importance of selective autophagy in many human diseases. We previously reconstituted the PI3KC3-C1, WIPI2, and E3 circuit, demonstrating positive feedback (37). Upstream of this circuit, we found that NDP52 mediated the cargo-initiated recruitment of the ULK1 complex to membranes in vitro (29). Here, we showed that it was possible to reconstitute the circuit connecting the major selective cargo receptors involved in mitophagy, NDP52, TAX1BP1 and OPTN from cargo recognition to LC3 lipidation, with each situation manifesting unique properties (Fig. 7). One of the important recent conceptual advances in selective autophagy was the discovery that cargo receptors function upstream of the core autophagy initiating complexes (28, 30-32). This paradigm replaced the earlier model that cargo receptors connected substrates to pre-existing LC3-lipidated membranes. Here, we showed that cargo-engaged NDP52, TAX1BP1, or OPTN were capable of potently driving LC3 lipidation in the presence of physiologically plausible nanomolar concentration of the purified autophagy initiation complexes. These reconstitution data directly confirm the new model for cargo-induced formation of LC3-lipidated membranes in human cells. We found that different cargo receptors use distinct mechanisms to trigger LC3 lipidation downstream of cargo. NDP52 is strongly dependent on the presence of the ULK1 complex, consistent with findings in xenophagy (28) and mitophagy (30). TAX1BP1 behaves much like NDP52, as expected based on the common presence of an N-terminal SKICH domain, the locus of FIP200 binding (28). However, TAX1BP1 was more active in promoting LC3 lipidation in the absence of PI3KC3-C1. Unexpectedly strong binding was observed between TAX1BP1 and E3 and WIPI2d. This raises the possibility that TAX1BP1-mediated selective autophagy may be less dependent on the core complexes as compared to NDP52. In sharp contrast to NDP52 and TAX1BP1, the in vitro LC3 lipidation downstream of OPTN is completely independent of the ULK1 complex. Our finding is consistent with the recent .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 12 report that OPTN-induced mitophagy, in contrast to NDP52, does not depend on the recruitment of FIP200 (32). It is also consistent with our observation of direct binding of NDP52 and TAX1BP1, but not OPTN, to the ULK1 complex in vitro. We found that OPTN induced LC3 lipidation in vitro is strongly dependent on PI3KC3-C1 and WIPI2d, as expected based on the roles of these proteins in E3 activation. No single one of these complexes, or the E3 itself, bound strongly to OPTN, but all of them bound weakly. This suggests that a multiplicity of weak interactions with several factors contributes to the recruitment of the core complexes downstream of OPTN. ATG9A (32), which was not present in this study, likely contributes further to this multivalent web of low affinity interactions. Subunits of PI3KC3-C1 are phosphorylated by the ULK1 kinase (45, 55), and it has long been assumed that these phosphorylation events would promote autophagy. We found, however, that the kinase dead version of the ULK1 complex was as effective in promoting LC3 lipidation as wild-type. This result is consistent with a recent pharmacological study that found ULK1 kinase activity to be dispensable for PI3KC3-C1 activation at p62 condensates (61). In conclusion, we have reconstituted much of the process of cargo-stimulated selective autophagy using purified human proteins. The remaining steps still to be completed in vitro are the ATG2 and ATG9-dependent transfer of phospholipids for phagophore growth, and the engulfment of cargo. The observations here provide powerful confirmation for the model that cargo itself triggers formation of LC3-lipidated membranes on a just-in-time basis. They also reveal nuances of how different cargo receptors utilize distinct repertoires of weak and strong interactions with the core complexes to trigger LC3 lipidation. These are subtleties that would have been difficult to uncover in traditional cellular knock out and rescue experiments. These unique modes of core complex recruitment may underlie the divergent core complex phenotypes that are seen in different classes of selective autophagy in different cellular contexts. Acknowledgements This work was supported by the Aligning Science Across Parkinson’s Collaborative Research Network ASAP-0350 (J.H.H. and S.M.), HFSP (RGP0026/2017 to J.H.H. and S.M.), NIH R01 GM111730 (J.H.H.), and the Jane Coffin Childs Foundation (A. L. Y.). Conflict of interest .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 13 J.H.H. is a co-founder of Casma Therapeutics. S.M is member of the scientific advisory board of Casma Therapeutics. Materials and Methods Plasmid construction Synthetic codon-optimized DNAs encoding components of human ULK1 complex, PI3KC3-C1 complex were subcloned into the pCAG vector with GST, MBP or TwinStrep-Flag (TSF) tag. Synthetic codon-optimized DNAs encoding human ATG12-ATG5-ATG16 were subcloned into the pGBdest vector with Strep tag. DNA encoding human WIPI2d was subcloned into the pCAG vector with TSF tag. DNA encoding mouse ATG7 was subcloned into the pFast BacHT vector with His tag. DNAs encoding human ATG3, LC3B were subcloned into the pET vector with His tag. DNAs encoding human NDP52, OPTN and TAX1BP1 were subcloned into the pGST2 vector with GST tag. DNA encoding linear tetraubiquitin was subcloned into the pGEX5 vector with GST tag. Details are shown in Table S1. Protein expression and purification The ULK1 complex, PI3KC3-C1 complex and WIPI2d protein were expressed and purified from HEK293 GnTI cells described as previously (29, 37). DNAs were transfected cells using polyethylenimine (Polysciences). After 48-72 h expression, cells were harvested and lysed with lysis buffer (50 mM HEPES pH 7.4, 1% Triton X-100, 200 mM NaCl, 1 mM MgCl2, 10% glycerol, and 1mM TCEP) supplemented with EDTA free protease inhibitors (Roche). The lysate was clarified by centrifugation (16000 rpm at 4 °C for 1 h) and incubated with resins. To purify GST-FIP200D641-779-MBP and GST-FIP200-MBP, the supernatant was incubated with Glutathione Sepharose 4B (GE Healthcare) with gentle shaking at 4 °C for 10 h. The mixture was then loaded onto a gravity flow column, and the resin was washed extensively with wash buffer (50 mM HEPES pH 8.0, 200 mM NaCl, 1 mM MgCl2 and 1 mM TCEP). Eluted protein samples flowed through Amylose resin (New England Biolabs) for a second step of affinity purification. The final buffer after MBP affinity purification is 20 mM HEPES pH 8.0, 200 mM NaCl, 2 mM MgCl2, 1 mM TCEP and 50 mM maltose. To purify (±GFP)-ULK1 complex for studying the effect of FIP200D641-779 and ULK1 (kinase-dead mutant), FIP200/ATG13/ATG101 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 14 subcomplex and ULK1 were expressed and purified separately. After the first step of affinity purification, the two samples were mixed, cleaved by TEV at 4 °C overnight, and subjected to a second step of affinity purification using the MBP tag. The final buffer after MBP affinity purification is 20 mM HEPES pH 8.0, 200 mM NaCl, 2 mM MgCl2, 1 mM TCEP and 50 mM maltose. For the rest of GUV experiments, FIP200/ATG13/ATG101 subcomplex and ULK1 were expressed and purified separately in both first and second steps of affinity purification. The final buffer after second step MBP affinity purification is 20 mM HEPES pH 8.0, 200 mM NaCl, 2 mM MgCl2, 1 mM TCEP and 50 mM maltose. The complexes were used immediately for the GUV assays. To purify (±GFP)-PI3KC3-C1 complex, the supernatant was incubated with Glutathione Sepharose 4B (GE Healthcare) at 4 °C for 4 h, applied to a gravity column, and washed extensively with wash buffer (50 mM HEPES pH 8.0, 200 mM NaCl, 1 mM MgCl2, and 1mM TCEP). The protein complexes were eluted with wash buffer containing 50 mM reduced glutathione, and then treated with TEV protease at 4 °C overnight. TEV-treated complexes were loaded on a Strep- Tactin Sepharose gravity flow column (IBA, GmbH). The complexes were eluted with a final buffer containing 20 mM HEPES pH 8.0, 200 mM NaCl, 2 mM MgCl2, 1 mM TCEP, and 10 mM desthiobiotin (Sigma), and then used immediately for the GUV assays. To purify (±GFP)-WIPI2d protein, the supernatant was incubated with Strep-Tactin Sepharose resin at 4 °C for 3 h, applied to a gravity column, and washed extensively with wash buffer (50 mM HEPES pH 7.5, 200 mM NaCl, and 1mM TCEP). The proteins were eluted with wash buffer containing 10 mM desthiobiotin, applied onto a Superdex 200 column (16/60 prep grade, GE Healthcare). The final buffer after gel filtration is 20 mM HEPES pH 7.5, 200 mM NaCl, and 1 mM TCEP. Fractions containing pure (±GFP)-WIPI2d protein were pooled, concentrated, snap frozen in liquid nitrogen and stored at -80 °C. The ATG12-ATG5-ATG16 complex and ATG7 protein were expressed and purified from Sf9 cells as previously described (37). Sf9 cells were infected with a single virus stock P1 corresponding to the poli-cystronic construct coding ATG12-ATG5-ATG16 complex or ATG7. Cells were harvested 72 h after infection, lysed and clarified following the same procedure for mammalian cells described as above. To purify (±GFP)-ATG12-ATG5-ATG16 complex, the supernatant was incubated with Strep-Tactin Sepharose at 4 °C for 3 h, applied to a gravity column, and washed extensively with wash buffer (50 mM HEPES pH 7.5, 200 mM NaCl, and 1mM TCEP). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 15 The proteins were eluted with wash buffer containing 10 mM desthiobiotin, applied onto a Superdex 6 column (10/300 Increase). The final buffer after gel filtration is 20 mM HEPES pH 7.5, 200 mM NaCl, and 1 mM TCEP. Peak fractions containing pure (±GFP)-ATG12-ATG5- ATG16 complexes were pooled and snap frozen in liquid nitrogen and stored at -80 °C. To purify ATG7 protein, the supernatant was loaded on a Ni-NTA column (GE Healthcare) gravity flow column, washed extensively with wash buffer (50 mM HEPES pH 7.5, 200 mM NaCl, 20 mM imidazole and 1mM TCEP). The proteins were eluted with wash buffer containing 200 mM imidazole, applied onto a Superdex 200 column (16/60 prep grade). The final buffer after gel filtration is 20 mM HEPES pH 7.5, 200 mM NaCl, and 1 mM TCEP. Peak fractions containing pure ATG7 protein were pooled and snap frozen in liquid nitrogen and stored at -80 °C. The linear tetraubiquitin, NDP52, OPTN, TAX1BP1, ATG3 and mCherry-LC3B were expressed and purified from E. coli (BL21DE3). Protein expression was induced with 100 μM IPTG when cells were grown to an OD600 of 0.8 and further grown at 18°C overnight. Cells were harvested and stocked in -80 °C if needed. To purify GST tagged linear tetraubiquitin and receptors, the pellets were resuspended in a buffer containing 50 mM HEPES pH 7.5, 300 mM NaCl, 1 mM TCEP and protease inhibitors (Roche), and sonicated before being cleared at 16000 rpm at 4 °C for 1 h. The supernatant was incubated with Glutathione Sepharose 4B at 4 °C for 4 h, applied to a gravity column, and washed extensively with wash buffer (50 mM HEPES pH 7.5, 300 mM NaCl, and 1mM TCEP). The proteins were eluted with wash buffer containing 50 mM reduced glutathione, and then applied onto a Superdex 6 column (10/300 Increase). The final buffer after gel filtration is 20 mM HEPES pH 8.0, 200 mM NaCl, and 1 mM TCEP. Peak fractions containing pure proteins were pooled and snap frozen in liquid nitrogen and stored at -80 °C. To purify ATG3 and mCherry-LC3B, the pellets were resuspended in a buffer containing 50 mM HEPES pH 7.5, 300 mM NaCl, 1 mM TCEP, 20 mM imidazole and protease inhibitors, sonicated and clarified. The supernatant was loaded on a Ni-NTA column (GE Healthcare) gravity flow column, washed extensively with wash buffer (50 mM HEPES pH 7.5, 300 mM NaCl, 20 mM imidazole and 1mM TCEP). The proteins were eluted with wash buffer containing 200 mM imidazole, applied onto a Superdex 75 column (16/60 prep grade). The final buffer after gel filtration is 20 mM HEPES pH 7.5, 200 mM NaCl, and 1 mM TCEP. Fractions containing pure proteins were pooled, concentrated, snap frozen in liquid nitrogen and stored at -80 °C. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 16 Preparation of Giant Unilamellar Vesicles (GUVs) GUVs were prepared by hydrogel-assisted swelling as described previously (37). Briefly, 60 μL 5% (w/w) polyvinyl alcohol (PVA) with a molecular weight of 145,000 (Millipore) was coated onto a plasma-cleaned coverslip of 25 mm diameter. The coated coverslip was placed in a heating incubator at 60 °C to dry the PVA film for 30 min. For all the GUV experiments, a lipid mixture with a molar composition of 64.8% DOPC, 20% DOPE, 10% POPI, 5% DOPS and 0.2% Atto647N DOPE at 1 mg/ml was spread uniformly onto the PVA film. The lipid-coated coverslip was then put under vacuum overnight to evaporate the solvent. 400 μL 400 mOsm sucrose solution was used for swelling for 1 h at room temperature, and the vesicles were then harvested and used within 12 h. Atto647N DOPE (Atto TEC) was used as the GUV membrane dye. All the other lipids for GUVs preparation are from Avanti Polar Lipids. In vitro reconstitution GUV assay The reactions were set up in an eight-well observation chamber (Lab Tek) at room temperature. The chamber was coated with 5 mg/ml β casein for 30 min and washed three times with reaction buffer (20 mM HEPES at pH 8.0, 190 mM NaCl and 1 mM TCEP). A final concentration of 5 µM GST-4xUb, 500 nM cargo receptors, 25 nM ULK1 complex, 25 nM PI3KC3-C1 complex, 100 nM WIPI2d, 50 nM ATG12-ATG5-ATG16 complex, 100 nM ATG7, 100 nM ATG3, 500 nM mCherry-LC3B, 50 µM ATP, and 2 mM MnCl2 was used for all reactions unless otherwise specified. 10 µL GUVs were added to initiate the reaction in a final volume of 120 µL. After 5 min incubation, during which random views were picked for imaging, time-lapse images were acquired in multitracking mode on a Nikon A1 confocal microscope with a 63 × Plan Apochromat 1.4 NA objective. Three biological replicates were performed for each experimental condition. Identical laser power and gain settings were used during the course of all conditions. Microscopy-based bead protein-protein interaction assay A mixture of 0.5 µM GST or GST tagged cargo receptors and different ATG proteins was incubated with 10 µL Glutathione Sepharose beads (GE Healthcare) in a reaction buffer containing 20 mM HEPES at pH 8.0, 200 mM NaCl and 1 mM TCEP. The final concentration of different ATG proteins was as following: 25 nM GFP-ULK1 complex, 25 nM GFP-PI3KC3-C1 complex, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 17 100 nM GFP-WIPI2d, 50 nM ATG12-ATG5-ATG16-GFP complex, and 500 nM mCherry-LC3B. After incubation at room temperature for 60 min, the beads were washed three times, suspended in 120 µL reaction buffer, and then transferred to the observation chamber for imaging. Images were acquired on a Nikon A1 confocal microscope with a 63 × Plan Apochromat 1.4 NA objective. Three biological replicates were performed for each experimental condition. Negative Stain Electron Microscopy Preparation, Collection and Coiled-coil Tracing Protein sample of GST- FIP200D641-779-MBP was diluted to 100 nM concentration in elution buffer. 5 µL of sample was applied to continuous carbon grids which were glow discharge in a PELCO easiGlow instrument for 45 s at 25 mAmps. Protein was wicked away using torn 597 Whatman paper and immediately stained with 2% Uranyl acetate. Wicking was repeated again for a second round of 2% uranyl acetate staining. Data was collected at 120 kV on a Tecnai T12 microscope with a nominal magnification of 49000x. 25 micrographs were taken with a Gatan CCd 4k x 4k camera at a pixel size of 2.2 Å/pixel. Protein particles were manually selected using the manual picking tool within Relion 3.1 and extracted at a binned box size of 120 by 120 corresponding to a pixel size of 8.8 Å/pixel. Extracted particles were measured for coli-coli length in FIJI as previously described (29). In brief, 96 single particles were traced using the Simple Neurite Tracer plug in for FIJI. Histogram of the data was prepared for both path length of the coli-coli (82 nm) and the end to end distance of the coli-coli (62 nm). Image quantification GUV images were analyzed using a custom script implemented in Python 3.6 (https://github.com/Hurley-Lab/GUVquantification/blob/main/GUVintensity-2channel.ipynb). Briefly, to obtain the outline of all the vesicles within a field of view, images were segmented into regions corresponding to local maxima of the membrane fluorescence channel, which were defined by applying an Otsu threshold to the differences between local maxima and minima. Then, binding of the fluorescently labelled proteins was quantified by taking the mean value of these segmented pixels in the fluorescent protein channel. Background was calculated as the average of the vesicle- internal background and the vesicle-external background and subtracted from the fluorescence signal. The intensity trajectories of multiple fields of view were then obtained frame by frame. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 18 Multiple intensity trajectories were calculated, and the averages and standard deviations were calculated and reported. For quantification of protein intensity binding to bead, the outline of individual bead was manually defined based on the bright field channel. The intensity threshold was calculated by the average intensities of pixels inside and outside of the bead and then intensity measurements of individual bead were obtained. Averages and standard deviations were calculated among the measured values per each condition and plotted in a bar graph. Statistical analysis Statistical analysis was performed by unpaired Student’s t test using GraphPad Prism 9. P < 0.05 was considered statistically significant. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 19 References 1. N. Mizushima, M. Komatsu, Autophagy: renovation of cells and tissues. Cell 147, 728- 741 (2011). 2. H. Morishita, N. Mizushima, Diverse Cellular Roles of Autophagy. Annu Rev Cell Dev Biol 35, 453-475 (2019). 3. A. M. Pickrell, R. J. Youle, The Roles of PINK1, Parkin, and Mitochondrial Fidelity in Parkinson's Disease. Neuron 85, 257-273 (2015). 4. T. J. Melia, A. H. Lystad, A. Simonsen, Autophagosome biogenesis: From membrane growth to closure. J Cell Biol 219, (2020). 5. J. H. Hurley, L. N. Young, Mechanisms of Autophagy Initiation. Annu Rev Biochem 86, 225-244 (2017). 6. N. Mizushima, T. Yoshimori, Y. Ohsumi, The role of Atg proteins in autophagosome formation. Annu Rev Cell Dev Biol 27, 107-132 (2011). 7. S. Maeda, C. Otomo, T. Otomo, The autophagic membrane tether ATG2A transfers lipids between membranes. Elife 8, (2019). 8. T. Osawa et al., Atg2 mediates direct lipid transfer between membranes for autophagosome formation. Nat Struct Mol Biol 26, 281-288 (2019). 9. D. P. Valverde et al., ATG2 transports lipids to promote autophagosome biogenesis. J Cell Biol 218, 1787-1798 (2019). 10. S. Maeda et al., Structure, lipid scrambling activity and role in autophagosome formation of ATG9A. Nat Struct Mol Biol 27, 1194-1201 (2020). 11. K. Matoba et al., Atg9 is a lipid scramblase that mediates autophagosomal membrane expansion. Nat Struct Mol Biol 27, 1185-1193 (2020). 12. T. Hanada et al., The Atg12-Atg5 conjugate has a novel E3-like activity for protein lipidation in autophagy. J Biol Chem 282, 37298-37302 (2007). 13. Y. Ichimura et al., A ubiquitin-like system mediates protein lipidation. Nature 408, 488- 492 (2000). 14. N. Fujita et al., An Atg4B mutant hampers the lipidation of LC3 paralogues and causes defects in autophagosome closure. Mol Biol Cell 19, 4651-4659 (2008). 15. H. Nakatogawa, Y. Ichimura, Y. Ohsumi, Atg8, a ubiquitin-like protein required for autophagosome formation, mediates membrane tethering and hemifusion. Cell 130, 165- 178 (2007). 16. Y. S. Sou et al., The Atg8 conjugation system is indispensable for proper development of autophagic isolation membranes in mice. Mol Biol Cell 19, 4762-4775 (2008). 17. K. Tsuboyama et al., The ATG conjugation systems are important for degradation of the inner autophagosomal membrane. Science 354, 1036-1041 (2016). 18. A. B. Birgisdottir, T. Lamark, T. Johansen, The LIR motif - crucial for selective autophagy. J Cell Sci 126, 3237-3247 (2013). 19. V. Rogov, V. Dotsch, T. Johansen, V. Kirkin, Interactions between autophagy receptors and ubiquitin-like proteins form the molecular basis for selective autophagy. Mol Cell 53, 167-178 (2014). 20. J. Sawa-Makarska et al., Cargo binding to Atg19 unmasks additional Atg8 binding sites to mediate membrane-cargo apposition during selective autophagy. Nat Cell Biol 16, 425-433 (2014). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 20 21. T. N. Nguyen et al., Atg8 family LC3/GABARAP proteins are crucial for autophagosome-lysosome fusion but not autophagosome formation during PINK1/Parkin mitophagy and starvation. J Cell Biol 215, 857-874 (2016). 22. F. Reggiori, M. Komatsu, K. Finley, A. Simonsen, Autophagy: more than a nonselective pathway. Int J Cell Biol 2012, 219625 (2012). 23. D. Gatica, V. Lahiri, D. J. Klionsky, Cargo recognition and degradation by selective autophagy. Nat Cell Biol 20, 233-242 (2018). 24. G. Zaffagnini, S. Martens, Mechanisms of Selective Autophagy. J Mol Biol 428, 1714- 1724 (2016). 25. A. Stolz, A. Ernst, I. Dikic, Cargo recognition and trafficking in selective autophagy. Nat Cell Biol 16, 495-501 (2014). 26. V. Kirkin, V. V. Rogov, A Diversity of Selective Autophagy Receptors Determines the Specificity of the Autophagy Pathway. Mol Cell 76, 268-285 (2019). 27. V. Kirkin, D. G. McEwan, I. Novak, I. Dikic, A role for ubiquitin in selective autophagy. Mol Cell 34, 259-269 (2009). 28. B. J. Ravenhill et al., The Cargo Receptor NDP52 Initiates Selective Autophagy by Recruiting the ULK Complex to Cytosol-Invading Bacteria. Mol Cell 74, 320-329 e326 (2019). 29. X. Shi, C. Chang, A. L. Yokom, L. E. Jensen, J. H. Hurley, The autophagy adaptor NDP52 and the FIP200 coiled-coil allosterically activate ULK1 complex membrane recruitment. Elife 9, (2020). 30. J. N. S. Vargas et al., Spatiotemporal Control of ULK1 Activation by NDP52 and TBK1 during Selective Autophagy. Mol Cell 74, 347-362 e346 (2019). 31. E. Turco et al., FIP200 Claw Domain Binding to p62 Promotes Autophagosome Formation at Ubiquitin Condensates. Mol Cell 74, 330-346 e311 (2019). 32. K. Yamano et al., Critical role of mitochondrial ubiquitination and the OPTN-ATG9A axis in mitophagy. J Cell Biol 219, (2020). 33. L. W. Brier, M. Zhang, L. Ge, Mechanistically dissecting autophagy: insights from in vitro reconstitution. Journal of Molecular Biology, (2016). 34. Y. Fujioka et al., Phase separation organizes the site of autophagosome formation. Nature 578, 301-305 (2020). 35. J. M. Alam, N. N. Noda, In vitro reconstitution of autophagic processes. Biochem Soc Trans 48, 2003-2014 (2020). 36. J. Sawa-Makarska et al., Reconstitution of autophagosome nucleation defines Atg9 vesicles as seeds for membrane formation. Science 369, (2020). 37. D. Fracchiolla, C. Chang, J. H. Hurley, S. Martens, A PI3K-WIPI2 positive feedback loop allosterically activates LC3 lipidation in autophagy. J Cell Biol 219, (2020). 38. Y. C. Wong, E. L. Holzbaur, Optineurin is an autophagy receptor for damaged mitochondria in parkin-mediated mitophagy that is disrupted by an ALS-linked mutation. Proc Natl Acad Sci U S A 111, E4439-4448 (2014). 39. M. Lazarou et al., The ubiquitin kinase PINK1 recruits autophagy receptors to induce mitophagy. Nature 524, 309-314 (2015). 40. J. M. Heo, A. Ordureau, J. A. Paulo, J. Rinehart, J. W. Harper, The PINK1-PARKIN Mitochondrial Ubiquitylation Pathway Drives a Program of OPTN/NDP52 Recruitment and TBK1 Activation to Promote Mitophagy. Mol Cell 60, 7-20 (2015). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 21 41. A. S. Moore, E. L. Holzbaur, Dynamic recruitment and activation of ALS-associated TBK1 with its target optineurin are required for efficient mitophagy. Proc Natl Acad Sci U S A 113, E3349-3358 (2016). 42. J. M. Heo et al., Integrated proteogenetic analysis reveals the landscape of a mitochondrial-autophagosome synapse during PARK2-dependent mitophagy. Sci Adv 5, eaay4624 (2019). 43. C. S. Evans, E. L. F. Holzbaur, Lysosomal degradation of depolarized mitochondria is rate-limiting in OPTN-dependent neuronal mitophagy. Autophagy 16, 962-964 (2020). 44. N. Fujita et al., The Atg16L complex specifies the site of LC3 lipidation for membrane biogenesis in autophagy. Mol Biol Cell 19, 2092-2100 (2008). 45. R. C. Russell et al., ULK1 induces autophagy by phosphorylating Beclin-1 and activating VPS34 lipid kinase. Nat Cell Biol 15, 741-750 (2013). 46. J. M. Park et al., The ULK1 complex mediates MTORC1 signaling to the autophagy initiation machinery via binding and phosphorylating ATG14. Autophagy 12, 547-564 (2016). 47. N. Gammoh, O. Florey, M. Overholtzer, X. Jiang, Interaction between FIP200 and ATG16L1 distinguishes ULK1 complex-dependent and -independent autophagy. Nat Struct Mol Biol 20, 144-149 (2013). 48. D. Fracchiolla et al., Mechanism of cargo-directed Atg8 conjugation during selective autophagy. Elife 5, (2016). 49. T. Nishimura et al., FIP200 regulates targeting of Atg16L1 to the isolation membrane. EMBO Rep 14, 284-291 (2013). 50. S. A. Sarraf et al., Loss of TAX1BP1-Directed Autophagy Results in Protein Aggregate Accumulation in the Brain. Mol Cell 80, 779-795 e710 (2020). 51. D. A. Tumbarello et al., The Autophagy Receptor TAX1BP1 and the Molecular Motor Myosin VI Are Required for Clearance of Salmonella Typhimurium by Autophagy. PLoS Pathog 11, e1005174 (2015). 52. B. Richter et al., Phosphorylation of OPTN by TBK1 enhances its binding to Ub chains and promotes selective autophagy of damaged mitochondria. Proc Natl Acad Sci U S A 113, 4039-4044 (2016). 53. R. M. Alsaadi et al., ULK1-mediated phosphorylation of ATG16L1 promotes xenophagy, but destabilizes the ATG16L1 Crohn's mutant. EMBO Rep 20, e46885 (2019). 54. C. Zhou et al., Regulation of mATG9 trafficking by Src- and ULK1-mediated phosphorylation in basal and starvation-induced autophagy. Cell Res 27, 184-201 (2017). 55. D. F. Egan et al., Small Molecule Inhibition of the Autophagy Kinase ULK1 and Identification of ULK1 Substrates. Mol Cell 59, 285-297 (2015). 56. E. Karanasios et al., Dynamic association of the ULK1 complex with omegasomes during autophagy induction. J Cell Sci 126, 5224-5238 (2013). 57. H. C. Dooley et al., WIPI2 links LC3 conjugation with PI3P, autophagosome formation, and pathogen clearance by recruiting Atg12-5-16L1. Mol Cell 55, 238-252 (2014). 58. C. Kraft et al., Binding of the Atg1/ULK1 kinase to the ubiquitin-like protein Atg8 regulates autophagy. EMBO J 31, 3691-3703 (2012). 59. E. A. Alemu et al., ATG8 family proteins act as scaffolds for assembly of the ULK complex: sequence requirements for LC3-interacting region (LIR) motifs. J Biol Chem 287, 39275-39290 (2012). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 22 60. H. Nakatogawa, Mechanisms governing autophagosome biogenesis. Nat Rev Mol Cell Biol 21, 439-458 (2020). 61. M. Zachari, M. Longo, I. G. Ganley, Aberrant autophagosome formation occurs upon small molecule inhibition of ULK1 kinase activity. Life Sci Alliance 3, (2020). 62. S. Baskaran et al., Architecture and dynamics of the autophagic phosphatidylinositol 3- kinase complex. Elife 3, (2014). 63. G. Stjepanovic, S. Baskaran, M. G. Lin, J. H. Hurley, Unveiling the role of VPS34 kinase domain dynamics in regulation of the autophagic PI3K complex. Mol Cell Oncol 4, e1367873 (2017). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 23 Figures .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 24 Fig.1 Reconstitution of NDP52 and TAX1BP1-triggered LC3 lipidation (A) The schematic drawing illustrates the reaction setting. The blue curve indicates the GUV membrane. Gray cartoons are autophagy components present in the reaction. (B) Representative confocal images showing the membrane recruitment of E3 complex (green) and LC3B (red). GUVs were incubated with WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+, and different upstream components as listed above each image column. Images taken at 15 min and 30 min are shown. Scale bars, 10 µm. (C and D) Quantitation of the kinetics of mCherry-LC3B (C) and E3-GFP (D) recruitment to the membrane from individual GUV tracing in A (Averages of 50 vesicles are shown, error bars indicate standard deviations). (E and F) GUVs were incubated with WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+, and different proteins listed above the images in Fig.S3. Quantitation of the kinetics of mCherry-LC3B (E) and E3-GFP (F) recruitment to the membrane from individual GUV tracing (Averages of 50 vesicles are shown, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 25 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 26 Fig. 2 Reconstitution of OPTN triggered LC3 lipidation (A) Representative confocal images showing the membrane recruitment of E3 complex and LC3B. GUVs were incubated with WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+, and different protein as listed above each image column. Images taken at 15 min and 30 min are shown. Scale bars, 10 µm. (B and C) Quantitation of the kinetics of mCherry-LC3B (B) and E3-GFP (C) recruitment to the membrane from individual GUV tracing in A (Averages of 50 vesicles are shown, error bars indicate standard deviations). (D and E) GUVs were incubated with WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, GST-Ub4, ATP/Mn2+, and OPTNWT or OPTNS2D in the presence or absence of PI3KC3-C1. Quantitation of the kinetics of mCherry-LC3B (D) and E3-GFP (E) recruitment to the membrane from individual GUV tracing are shown (Averages of 50 vesicles, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 27 Fig.3 The kinase activity of ULK1 is dispensable for cargo receptor induced-LC3 lipidation (A) Representative confocal images showing the membrane recruitment of E3-GFP complex and mCherry-LC3B. GUVs were incubated with GST-Ub4, NDP52, PI3KC3-C1, WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+ in the presence or absence of ULK1 WT complex or ULK1 kinase dead complex. Images taken at 30 min are shown. Scale bars, 10 µm. (B) Quantitation of the kinetics of E3-GFP complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing in A are shown (Averages of 50 vesicles, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 28 Fig.4 Kinetics of ULK1 complex recruitment to membranes (A) Representative confocal images showing the membrane recruitment of GFP-ULK1 complex and mCherry-LC3B. GUVs were incubated with WIPI2d, E3 complex, ATG7, ATG3, mCherry- LC3B, ATP/Mn2+ and different protein combinations as listed above each image column. Images .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 29 taken at 30 min are shown. Scale bars, 10 µm. (B-D) GUVs were incubated with WIPI2d, E3 complex, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+, and GFP-ULK1 complex or GFP-ULK1 complex together with PI3KC3-C1 complex, in the presence or absence of NDP52 and GST-Ub4 (B), or TAX1BP1 and GST-Ub4 (C), or OPTNS2D and GST-Ub4 (D). Quantitation of the kinetics of GFP-ULK1 complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing are shown (Averages of 50 vesicles, error bars indicate standard deviations). (E) Representative confocal images showing the membrane recruitment of GFP-ULK1 complex and mCherry-LC3B. GUVs were incubated with GST-Ub4, OPTNS2D, GFP-ULK1 complex, PI3KC3-C1 complex, WIPI2d, E3 complex, ATG7, ATG3, mCherry-LC3B, and ATP/Mn2+ each time omitting one of the components downstream of ULK1 complex. Scale bars, 10 µm. (F) Quantitation of the kinetics of GFP-ULK1 complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing in E are shown (Averages of 50 vesicles, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 30 Fig.5 Kinetics of PI3KC3-C1 recruitment to membranes (A) Representative confocal images showing the membrane recruitment of GFP-PI3KC3-C1 complex and mCherry-LC3B. GUVs were incubated with WIPI2d, E3 complex, ATG7, ATG3, .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 31 mCherry-LC3B, ATP/Mn2+ and different protein combinations as listed above each image column. Images taken at 30 min are shown. Scale bars, 10 µm. (B-D) GUVs were incubated with WIPI2d, E3 complex, ATG7, ATG3, mCherry-LC3B, and GFP-PI3KC3-C1 complex or GFP-PI3KC3-C1 complex together with ULK1 complex, in the presence or absence of OPTNS2D and GST-Ub4 (B), or NDP52 and GST-Ub4 (C), or TAX1BP1 and GST-Ub4 (D). Quantitation of the kinetics of GFP-PI3KC3-C1 complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing are shown (Averages of 50 vesicles, error bars indicate standard deviations). (E) Representative confocal images showing the membrane recruitment of GFP-PI3KC3-C1 complex and mCherry-LC3B. GUVs were incubated with GST-Ub4, OPTNS2D, GFP-PI3KC3-C1 complex, WIPI2d, E3 complex, ATG7, ATG3, mCherry-LC3B, and ATP/Mn2+ each time omitting one of the components downstream of PI3KC3-C1 complex. Scale bars, 10 µm. (F) Quantitation of the kinetics of GFP-PI3KC3-C1 complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing in E are shown (Averages of 50 vesicles, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 32 Fig. 6 Interactions between cargo receptors and the core autophagy machinery (A) The schematic drawing illustrates the bead based pull-down setting. (B-F) Representative confocal images showing recruitment of GFP-ULK1 complex (B), GFP-PI3KC3-C1 complex (C), E3-GFP (D), GFP-WIPI2d (E) or mCherry-LC3B (F) to beads coated with GST, GST-NDP52, GST-TAX1BP1 or GST-OPTNS2D. A mixture of GST or GST tagged cargo receptors with different fluorescent protein tagged autophagy components were incubated with GSH beads for 1 .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 33 h and images were taken and shown. Scale bars, 50 µm. (G) The quantification of GFP or mCherry signal on beads are shown (Averages of 40 beads, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 34 Fig. 7 Model for cargo receptor mediated LC3 lipidation For selective autophagy that degrades targets relying on ubiquitination signals, the cargo receptors like NDP52, TAX1BP1, or OPTN first bind to ubiquitinated cargos, and recruit distinct multiple autophagy machineries through a multivalent web of weak interactions, these components work together to trigger membrane association of LC3 family proteins. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 35 Supplementary Materials Supplementary Figure Legends Fig. S1 Purification of core autophagy machinery All purified autophagy components were resolved on a 10% SDS PAGE and shown by Coomassie Brilliant Blue stain. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 36 Fig. S2 Characterization of ULK1 complex with FIP200D641-779 (A) The schematic drawing shows the domain structure of FIP200. (B) Purified FIP200 full-length or FIP200D641-779 was resolved on a 10% SDS PAGE and shown by Coomassie Brilliant Blue stain. (C) Negative stain EM single particles of FIP200D641-779. (D) Histogram of FIP200D641-779 path length and end-to-end distances. (E) GUVs were incubated with GST-Ub4, NDP52, PI3KC3-C1, WIPI2d, E3, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+ in the presence of ULK1 WT complex or ULK1 complex with FIP200D641-779. Quantitation of the kinetics of mCherry-LC3B recruitment to the membrane from individual GUV tracing are shown (Averages of 50 vesicles, error bars indicate standard deviations). .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 37 Fig.S3 Reconstitution of TAX1BP1-triggered LC3 lipidation Representative confocal images showing the membrane recruitment of E3-GFP and mCherry- LC3B. GUVs were incubated with WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+, and different upstream components as listed above each image column, respectively. Images taken at 15 min and 30 min are shown. Scale bars, 10 µm. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 38 Fig. S4 The kinetics of E3 and LC3 membrane recruitment Quantitation of the kinetics of E3-GFP and mCherry-LC3B recruitment to the membrane from individual GUV tracing are shown (Averages of 50 vesicles, error bars indicate standard deviations). The GUVs were incubated with GST-Ub4, ULK1 complex, PI3KC3-C1, WIPI2d, E3, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+ in the presence of NDP52, TAX1BP1 or OPTNS2D. The data were fitted into the Boltzmann sigmoidal curve by GraphPad Prism 9, and t1/2 was calculated. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 39 Fig. S5 The kinase activity of ULK1 is dispensable for OPTN induced-LC3 lipidation (A) Representative confocal images showing the membrane recruitment of E3-GFP complex and mCherry-LC3B. GUVs were incubated with GST-Ub4, OPTNS2D, PI3KC3-C1, WIPI2d, E3-GFP, ATG7, ATG3, mCherry-LC3B, ATP/Mn2+ in the presence or absence of ULK1 WT complex or ULK1 kinase dead complex. Images taken at 30 min are shown. Scale bars, 10 µm. (B) Quantitation of the kinetics of E3-GFP complex and mCherry-LC3B recruitment to the membrane from individual GUV tracing in C are shown (Averages of 50 vesicles, error bars indicate standard deviations). All results representative of three independent experiments. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 40 Table S1 Construct Vector Expression system Published GST-TEVcs-FIP200-MBP pCAG HEKGnTi (29) GST-TEVcs- FIP200D641-779-MBP pCAG HEKGnTi This study ATG13 pCAG HEKGnTi (29) GFP-ATG13 pCAG HEKGnTi (29) GST- TEVcs-ATG101 pCAG HEKGnTi (29) GST-TEVcs-GFP-ATG101 pCAG HEKGnTi (29) MBP-TSF-TEVcs-ULK1 pCAG HEKGnTi (29) MBP-TSF-TEVcs-ULK1K46I pCAG HEKGnTi This study GST-TEVcs-ATG14 pCAG HEKGnTi (62) GST-TEVcs-GFP-ATG14 pCAG HEKGnTi (37) TSF-TEVcs-VPS34 pCAG HEKGnTi (37) TSF-TEVcs-BECN1 pCAG HEKGnTi (62) VPS15 pCAG HEKGnTi (63) WIPI2d-TEVcs-TSF pCAG HEKGnTi (37) GFP-WIPI2d-TEVcs-TSF pCAG HEKGnTi This study ATG12-10xHis-TEVcs-ATG5-10xHis- TEVcs-ATG16L1-TEVcs-StrepII- ATG7-ATG10 pGBdest Sf9 (37) ATG12-10xHis-TEVcs-ATG5-10xHis- TEVcs-ATG16L1-GFP-TEVcs-StrepII- ATG7-ATG10 pGBdest Sf9 (37) 6xHis-TEVcs-ATG7 pFast BacHT(B) Sf9 (37) 6xHis-TEVcs-ATG3 pET Duet-1 E. coli (37) 6xHis-TEVcs-mCherry-LC3B- Gly(∆5C) pET Duet-1 E. coli (37) GST-Ub4 pGEX5 E. coli (29) GST-NDP52 pGST2 E. coli (29) GST-TAX1BP1 pGST2 E. coli This study GST-OPTN pGST2 E. coli This study GST-OPTNS177DS473D pGST2 E. coli This study .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425958doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425958 http://creativecommons.org/licenses/by-nc-nd/4.0/ 10_1101-2021_01_08_425976 ---- Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) Using Electronic Health Record Data Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) Using Electronic Health Record Data Yuri Ahuja, Liang Liang, Selena Huang, Tianxi Cai January 9, 2021 Abstract Leveraging large-scaleelectronichealthrecord (EHR)data toestimatesurvival curves forclinical events canenablemorepowerfulriskestimationandcomparativeeffectivenessresearch.However,useofEHRdata is hindered by a lack of direct event times observations. Occurrence times of relevant diagnostic codes or target disease mentions in clinical notes are at best a good approximation of the true disease onset time. On the other hand, extracting precise information on the exact event time requires laborious manual chart reviewand is sometimesaltogether infeasibleduetoa lackofdetaileddocumentation.Currentstatus labels – binary indicators of phenotype status during follow up – are significantly more efficient and feasible to compile, enablingmoreprecise survival curveestimationgiven limitedresources.Existingsurvivalanalysis methodsusingcurrentstatus labels focusalmostentirelyonsupervisedestimation,andnaiveincorporation of unlabeled data into these methods may lead to biased results. In this paper we propose Semi-supervised CalibrationofRiskwithNoisyEventTimes(SCORNET),whichyieldsaconsistentandefficientsurvivalcurve estimator by leveraging a small size of current status labels and a large size of imperfect surrogate features. In addition to providing theoretical justification of SCORNET, we demonstrate in both simulation and real- worldEHRsettingsthatSCORNETachievesefficiencyakintotheparametricWeibullregressionmodel,while alsoexhibitingnon-parametricflexibilityandrelativelylowempiricalbiasinavarietyofgenerativesettings. 1 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ 1 Introduction The Electronic Health Record (EHR) has in recent years become an increasingly available source of data for clinicalandtranslational research(Kohaneandothers, 2012;HripcsakandAlbers, 2012;Miottoandothers, 2016). Comprisingheterogeneousclinicalencounters includingdiagnosticandproceduralbillingcodes, labtests,pre- scriptions, and free text clinical notes for millions of patients, these rich data offer abundant opportunities for insilicoepidemiologicalanalysis.Oneapplicationthathasgarneredrecent interest isestimationofpopulation disease risk within EHR patient cohorts, which can enable more powerful and precise estimation of real-world disease risks as well as comparative effectiveness analysis of alternative treatment strategies (Hodgkins and others, 2017; Dean and others, 2003; Liu and others, 2018; Panahiazar and others, 2015; Steele and others, 2018). Several studies have had success estimating time to death within rule-defined disease cohorts (Panahiazar and others, 2015; Steele and others, 2018). However, estimating the temporal risk of developing a disease is a more challenging task due to EHR’s lack of direct observations of either disease status or the timing of disease on- set.Convenientproxiesofdiseasestatusoronset timebasedonreadilyavailable featuressuchas International Classification of Disease (ICD) codes often exhibit low specificity and systematic temporal biases, potentially yielding highly biased disease risk estimators if used as event time labels (Cipparone and others, 2015; Uno and others, 2018). On the other hand, extracting precise information on disease outcomes requires labor-intensive manualchartreview,whichisparticularlychallengingforeventtimessincetheeventmayoccuroutsideof the hospital system and only be mentioned during follow-up visits. It is thus only practically feasible to annotate the current status Δ = �() ≤ �) of the event time), where � is the follow up time. In this paper, we consider the problem of estimating the disease risk �(C) = %() ≤ C) when only a small numberoflabelsonΔ andalargequantityofunlabeledEHRfeaturesW, includingproxiesof),areavailable.Su- pervisedsurvival curveestimationwithcurrentstatusdataon {Δ,�} iswell established inthestatistical liter- ature with several available parametric, semi-parametric and non-parametric procedures (Vardi, 1982; Huang, 2 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ 1996; van der Laan and Robins, 1998; van der Laan and Jewell, 2003; Lin and others, 2019, e.g.). For example, van der Laan and Robins (1998) proposed a non-parametric, locally efficient estimator via inverse probability of censoring weighting (IPCW), assuming that (1)) and � are conditionally independent given some informa- tive baseline covariates Z0 ⊂ W (e.g. age, sex, etc.) and (2) a consistent estimator for the conditional density of � | `0 is available.However, theseexistingestimatorsdonot leverageunlabeledEHRfeature informationsuch as time to first surrogate ICD code, which may greatly improve risk estimation precision. SinceWmaybehighlypredictiveof), theestimationof((C)canpotentiallybeimprovedviasemi-supervised learning (SSL) leveraging both the small set of Δ observations in the labeled set and the EHR features W in the unlabeledset. SSLhasbeenshowntosignificantlymitigatebiasand/or improveefficiency forvariousriskpre- diction applications (Chai and others, 2017; Liang and others, 2016; Bair and Tibshirani, 2004; Golub and others, 1999). For example, several studies employ semi-parametric models to impute event times in the unlabeled set for subsequent input into an outcome survival model alongside labeled data (Chai and others, 2017; Liang and others, 2016; Zhao and others, 2014; Uno and others, 2018; Hassett and others, 2017; Chubak and others, 2012; Choi and others, 2015; Kaji and others, 2019; Ruan and others, 2019; Ahuja and others, 2020a). While such imputa- tion approaches may improve efficiency under correct specification of the imputation model, they are subject to significant bias if the imputation model is misspecified. In addition, these existing methods do not allow for useofcurrentstatus labels fortraining.Othergeneralaugmentedinverseprobabilityweightingmethodsinthe missing data literature (Seaman and White, 2013; Rotnitzky and Robins, 2014, e.g.) are not directly applicable here since the probabilities of labels being observed tend to zero in the SSL setting. We address this shortcoming by proposing Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) for estimation of ((C). SCORNET utilizes current status labels while also employing a robust semi- supervised imputation approach on the extensive unlabeled set to maximize survival estimation efficiency. To mitigateimputationbiasandmaximizeefficiencygainfromtheunlabeleddata,SCORNETutilizesahighlyflexi- 3 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ blesemi-non-parametrickernelregressionmodelwithEHRfeaturesascovariates,whichensuresthevalidityof the resulting risk estimator without requiring the imputation model to hold. In addition to providing theoret- ical justifications for the SCORNET estimator, we illustrate via simulation studies that SCORNET substantially outperforms existing methods with regards to the bias-variance tradeoff. The rest of the paper is organized as follows. In Section 2, we detail the SCORNET procedure along with its associated inference procedures. In Section3, wereport riskestimation performancerelative toexisting methods in diversesimulation studies.To further illustrate the utility of SCORNET in clinical applications, we apply it to a real-world EHR study estimat- ing the risk of heart failure among rheumatoid arthritis patients in Section 4. Finally, in Section 5 we briefly discuss the strengths, weaknesses, and potential applications of SCORNET. 2 Methods 2.1 Setup Let) denotetheeventtimeforwhichweareinterestedinestimatingacumulativedistributionfunction �(C) = %() ≤ C) and survival function ((C) = 1−�(C). In the EHR study we do not observe) but ratherΔ = �() ≤ �) for a small labeled subset, where � is the follow up time with finite support [0, g2]. For all subjects, we also observe a set of baseline covariates `0 and longitudinal EHR features `. Since codes used in the EHR are often highlysensitivebutnotspecific, thereoftenexistssomefiltervariableF ∈ {0, 1} suchthatΔ8 | (F8 = 0,W8) = 0 almostsurely,whereW8 = (`T0,8, ` T 8 )T.Moreover,weassumethat (), `)� | `0.Weassumethatdataforanalysis consistofasmall setof= current-status-labeledobservationsrandomlyselectedamongthosewithF = 1 along with a larger set of # unlabeled observations:D = {D8 = (�8,+8Δ8,W8,+8,F8)T, 8 = 1, ..., #} = L∪U, where L = {(�8,Δ8,W8, 1, 1)T : F8 = 1,+8 = 1, 8 = 1, ...,=} andU = {(�8, 0,W8, 0,F8)T : +8 = 0, 8 = = + 1, ..., #} with log(#)/log(=) → a0 > 3/2 as = →∞. 4 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Sincethecensoring� maydependon `0,we followthe IPCWstrategyofvanderLaanandRobins (1998) to weight observations by lC,1(� | `0) = 1(� − C) 52(C | `0) where 1(B) = (B/1)/1, 52(C | `0) = 3�2(C | `0)/3C, �2(C | `0) = %(� ≤ C | `0), (·) is some symmetric density function, and 0 < 1 = $(=−a) is the bandwidth thereof with a ∈ (1/5, 1/3]. IPCW enables consistent estimation of functionals of) ≤ C and W since for any reasonable choice of function @(·) and 0, 3 ∈ {0, 1}, � { Δ 3 8 @(W8)F 0 8 lC,1(�8 | `0,8) } = � { �()8 ≤ C)3@(W8)F08 } +$(12). (1) The IPCW estimator for c(C) = %()8 ≤ C | F8 = 1) proposed by van der Laan and Robins (1998) essentially corresponds to (C) = ∑= 8=1 F8Δ8lC,1(�8 | `0,8)∑= 8=1 F8lC,1(�8 | `0,8) with 52(C | `0) in lC,1(�8 | `0,8) replaced by a consistent estimator that converges faster than =−1/4, which is not difficult to achieve under reasonable modeling assumptions since �8 | `0,8 can be estimated using the full data D. To this end, we propose to derive an estimator for the conditional density 52(C | `0) = _2(C | `0,8)(2(C | `0,8)byimposingasemi-parametricmodel for� | `0.Althoughmanycommonlyemployedmodels can be used since once again � is fully observed for all patients, we illustrate our proposal by focusing on the Cox proportional hazards model (Cox, 1972) under which _2(C | `0,8) = _02(C)4$ T`0,8 and (2(C | `0,8) ≡ 1 −�2(C | `0,8) = exp { −Λ02(C)4$ T`0,8 } , (2) where _2(C | `0,8) is the conditional hazard function of �8 | `0,8, _02(C) is the unknown baseline hazard func- tion, Λ02(C) = ∫ C 0 _02(B)3B, and $ is the vector of unknown covariate effects. 2.2 SCORNET Estimation As outlined in Figure 1, SCORNET consists of three steps: (1) estimating the conditional censoring distribution ℎ (C | `0)usingD; (2)fittingan imputationworkingmodel for c(C | W) ≡ %() ≤ C | W,F = 1)usingL, denoting 5 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ theestimateof c(C | W) as (C | W); and (3)estimating ((C) bymarginalizing (C | W)F+Δ (1 −F) = (C | W)Fvia IPCW. Figure 1: Schematic of the SCORNET algorithm. 2.2.1 Step 1: Estimate 52 (C | `0) Under the Cox Model for � | `0 Toestimate 52(C | `0),wefittheCoxmodel (2) tothefulldataD toobtainthepartial likelihoodestimator $̂ for $. We subsequently estimate Λ0(C) and _02(C) respectively as the standard Breslow estimator Λ̂02(C) and the kernel-smoothed Breslow estimator _̂02(C) (Basha and Hoxha, 2019), where Λ̂02(C) = #∑ 9=1 �(�9 ≤ C)∑# 8=1 �(�8 ≥ �9)exp ( $̂T`0,8 ) , _̂02(C) = #∑ 9=1 0# ( �9 − C )∑# 8=1 �(�8 ≥ �9)exp ( $̂T`0,8 ) , and 0# = $(#−a2) with a2 ∈ (1/5, 1/3]. We then obtain _̂2(C | `0) = _̂02(C)4$̂ T`0,8 and (̂2(C | `0) = exp { −Λ̂02(C)4$̂ T`0,8 } , and we estimate 52(C | `0,8) as 5̂2(C | `0,8) = _̂2(C | `0,8)(̂2(C | `0,8). Following standard asymptotic results for non-parametric kernel regression (Pagan and Ullah, 1999), it is not difficult to show that sup`0,C | 5̂2(C | 6 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ `0) − 52(C | `0)| = $?{log(#)1/2(#0#)−1/2} = >?(). We denote the resulting estimate for the censoring weight as l̂1=(C | `0,8) = 1=(�8 − C)/ 5̂2(C | `0,8). 2.2.2 Step 2: Estimate an Imputation Model c(C | W8) ≡ % ()8 ≤ C | W8,F8 = 1) To leverage the unlabeled data, we fit a flexible imputation working model c(C | W8) = 6 { "(C) + #0(C) T ®̀0,8 + #(C)T`8 } = 6 { )(C)T ®W8 } (3) where `8 denotes theEHRsurrogate features, )(C) = ( U(C), #0(C)T, #(C)T )T, and ®W8 = (1, ®̀T0,8, `T8 )T.Under (3), %()8 ≤ C | W8,F8 = 1) = 6 { )(C)T ®W8 } , and hence we may estimate )(C) as )̂(C) = ( Û(C), #̂0(C)T, #̂(C)T )T , the solution to the IPCW estimating equation evaluated withL, =∑ 8=1 l̂C,1=(�8 | `0,8)F8 ®W8 { Δ8 −6 ( )T ®W8 )} = 0, where 1= = $(=−a) with a ∈ (1/5, 1/2). In practice, 1= can be chosen via either standard cross-validation or heuristicplug-invalues.ForafutureobservationwithfilterstatusF8 = 1 andcovariatesW8,weimpute �()8 ≤ C) as the conditional risk ĉ(C | W8) = 6 { )̂(C)T ®W8 } . It is not difficult to show that )̂(C) converges in probability to )̄(C), the solution to the limiting estimating equation � [ ®W8 { �()8 ≤ C) −6()T ®W8) } | F8 = 1 ] = 0, which ensures that � {c̄(C | W8) | F8 = 1} = %()8 ≤ C | F8 = 1), where c̄(C | W8) = 6{)̄(C)T ®W8}, (4) regardless of the adequacy of the imputation model (3). 2.2.3 Step 3: Estimate �(C) by Marginalizing Imputed Risks Finally, we marginalize the imputed values ĉ8C = ĉ(C | W8) ∀F8 = 1 and Δ8 = 0 ∀F8 = 0 to estimate �(C). Since F8 depends on �8, we again employ IPCW to marginalize ĉ8CF8 +Δ8(1 −F8) = ĉ8CF8 and thereby construct our 7 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ final estimator for �(C): �̂(C) = ∑# 8=1{ĉ8CF8 +Δ8(1 −F8)}l̂C,0# (�8 | `0,8)∑# 8=1 l̂C,0# (�8 | `0,8) = ∑# 8=1 ĉ8CF8l̂C,0# (�8 | `0,8)∑# 8=1 l̂C,0# (�8 | `0,8) . 2.3 Inference for �̂(C) Following standard theory for non-parametric kernel regression (Pagan and Ullah, 1999), we show in the Sup- plementary Materials that �̂(C) → %()8 ≤ C,F8 = 1) + %()8 ≤ C,F8 = 0) = %()8 ≤ C) = �(C) in probability under mild regularity conditions and correct specification of the censoring model regardless of the adequacy of the imputation model. Here, we note that for any C ∈ [0, g2], 0 = %(Δ8 = 0 | F8 = 0,�8 = C,W8) implies that %()8 ≤ C | F8 = 0) = 0. Furthermore, (=1=)1/2 { �̂(C) −�(C) } = ( 1= = )1/2 =∑ 8=1 l̂C,1=(�8 | `0,8)F8 {Δ8 − c̄(C | W8)} +>?(1) = ( 1= = )1/2 =∑ 8=1 lC,1=(�8 | `0,8)F8 {Δ8 − c̄(C | W8)} +>?(1) since supC | 5̂2(C | `0) − 52(C | `0)| = >?(). It follows that (=1=)1/2{�̂(C) −�(C)} is asymptotically normal with mean 0 and variance f 2(C) = '( )�{V(C | `0,8)/ 52(C | `0,8)}, (5) where V(C | `0,8) = �[F8{�()8 ≤ C) − c̄(C | W8)}2 | `0,8] and '( ) = ∫ (G)23G. Our derivation for the asymptotic distribution of �̂(C) can effectively ignore the vari- ability associated with the estimation of censoring weights, which simplifies the asymptotic variance f2(C). Importantly, f2(C) decreases as the imputation model approximates c(C | W8) better since V(C | `0,8) = �[F8{�()8 ≤ C) − c(C | W8)}2 | `0,8] + �[F8{c(C | W8) − c̄(C | W8)}2 | `0,8] decreases. To estimate f2(C) in practice, one may construct a plug-in estimator, f̂ 2(C) = 1= = =∑ 8=1 l̂C,1=(�8 | `0,8) 2F8 { Δ8 −6 ( )̂(C)T ®W8 )}2 . 8 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ By contrast, the supervised IPCW estimator that incorporates filter negative patients takes the form �̂(C) = ∑# 8=1{ĉ(C)F8 +Δ8(1 −F8)}l̂C,0# (�8 | `0,8)∑# 8=1 l̂C,0# (�8 | `0,8) = ∑# 8=1 ĉ(C)F8l̂C,0# (�8 | `0,8)∑# 8=1 l̂C,0# (�8 | `0,8) . The asymptotic variance of (=1=)1/2{�̂(C) −�(C)} is then f2(C) = '( )�{V(C | `0,8)/ 52(C | `0,8)}, where V(C | `0,8) = � [ F8{�()8 ≤ C) −c(C)}2 | `0,8 ] . The variance f2(C) is equivalent to that of SCORNET if and only if the feature set W is uninformative for) (i.e. )W). Supervised IPCWisotherwise lessefficient,withrelativeefficiencycontrolledbytherelativemagnitudes of the marginal error �[{�()8 ≤ C) − �(C)}2 | F8 = 1, `0,8] and the conditional error �[{�()8 ≤ C) − c̄(C | W8)}2 | F8 = 1, `0,8]. 3 Simulation Study We conduct extensive simulation experimentation to evaluate the finite sample performance of the proposed SCORNET estimator in realistic settings with = ∈ {100, 200} observed labels within the set of filter-positive patients, defining the filter to have 99% sensitivity and 88% specificity for Δ. We compare SCORNET to three existing survival function estimators with current status data: 1) parametric Weibull Accelerated Failure Time (AFT) regression with interval event times (Lin andothers, 2019), 2) semi-parametric Cox Proportional Hazards regression with interval event times and Breslow baseline hazard estimation (Huang, 1996; Cox, 1972; Breslow, 1972), and3)non-parametric IPCWestimation(vanderLaanandRobins,1998).Weincorporate thefilter inthe Weibull and Cox models by setting Δ8 | (F8 = 0) = 0 and weighting the = labeled filter-positive patients by 1 = ∑# 8=1 F8. Weibull and Cox are implemented using the icenReg package in R, while IPCW is implemented per the algorithm detailed in van der Laan and Robins (1998), estimating � | `0 usingDunder the Cox model. We note that estimating the censoring distribution usingLyields similar asymptotic performance to usingD, but in finite sample settings the latter offers higher efficiency. 9 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Setting `0 ∼ � | `0 ∼ ) | `0 ∼ ` ∼ 1 Unif(−1, 1) Weibull ( 104−0.5`0, 32 ) Weibull ( 154−0.3`0, 52 ) Normal{) + 1,f(�)/2} 2 Unif(−1, 1) Weibull ( 104−0.5`0, 32 ) Weibull ( 15, 52 ) Normal{) + 1,f(�)/2} 3 Unif(−1, 1) Weibull ( 104−0.5`0, 32 ) Weibull ( 154−0.3` 2 0, 52 ) Normal{) + 1,f(�)/2} 4 Unif(−1, 1) Weibull ( 104−0.5`0, 32 ) Logistic (15 − 4`0, 3) Normal{) + 1,f(�)/2} 5 Unif(−1, 1) Weibull ( 10, 32 ) Weibull ( 154−0.3`0, 52 ) Normal{) + 1,f(�)/2} 6 Unif(−1, 1) Weibull ( 104−0.5` 2 0, 32 ) Weibull ( 154−0.3`0, 52 ) Normal{) + 1,f(�)/2} Table 1: Generative parameters employed in our simulation study. Weconsider6diversegenerativemechanismsasdetailedinTable3,includingcaseswhereWeibull-distributed accelerated failure time of ) | `0, proportional hazards of ) | `0, and proportional hazards of � | `0 are re- spectively violated, as well as cases where SCORNET’s imputation model is and is not misspecified. In settings 1, 2, and 5, we consider various cases where SCORNET and all comparator methods are correctly specified. In setting 1 we consider the specific case where � and ) both depend on `0, and both � | `0 and ) | `0 are Weibull-distributed satisfying accelerated failure time and proportional hazards. In setting 2, by contrast, we consider a case where )`0 to assess robustness to over-parametrization of this relationship, and in setting 5 we consider a case where �`0 to evaluate robustness to over-parametrization thereof. In settings 3 and 4 we assess the benefit of SCORNET and IPCW’s robustness to the distribution of ) | `0 when this distribution sat- isfies neither Weibull accelerated failure time nor proportional hazards. We evaluate SCORNET’s sensitivity to misspecification of the imputation model in settings 1, 3, and 5, as compared to correct specification thereof in settings 2 and 4. Finally, in setting 6 we assess the sensitivity of SCORNET and IPCW to misspecification of theconditionalcensoringmodel� | `0. Foreachgivenconfiguration,wecomputetheempiricalbias, standard error, and root mean squared error (RMSE) of all estimators for �(C) based on their average performance on 10 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ 500 simulated datasets evaluated at 100 equally-spaced time points C ∈ [&�(0.1) + 1=,&�(0.9) − 1=], where & denotes the quantile function of � under the configuration. We used plug-in bandwidths of 1= = B̂(�)=−1/4 and 0# = B̂(�)#−1/4 for the imputation (Step 2) and marginalization (Step 3) steps of SCORNET respectively, where B̂ is the empirical standard deviation of observed �. We present the performance of the estimators av- eraged over the selected time points using = = 200 labels in Figure 2. The performance at each time point can be found in Supplementary Figure 1, and time-averaged performance using = = 100 labels can be found in Supplementary Table 1 of the Supplementary Materials. Figure 2: Time-averaged empirical absolute biases (left), standard errors (second from left), root relative efficiencies (second from right), and relative RMSEs (right) of the Weibull Accelerated Failure Time (red), Cox Proportional Hazards w/ Breslow baseline (blue), supervised IPCW (green), and SCORNET estimators using weakly informative (purple) and strongly informative (orange) surrogates, in various simulated settings with = = 200 observed current status labels. 11 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ As Figure 2 demonstrates, imputing using a strongly informative feature ` (SCORNET-Strong) results in consistently higher efficiency than just using the weakly informative baseline `0 (SCORNET-Weak), which in turn is markedly more efficient than not leveraging the unlabeled set at all (IPCW). SCORNET makes minimal assumptions regarding the distribution of ) | `0, settling for non-parametric efficiency in exchange for en- hanced flexibility. By contrast, the Weibull regression model fully parametrizes ) | `0, and the Cox model assumes proportional hazards thereof, increasing efficiency at the expense of bias in the case of misspecifica- tion. As expected, Weibull consistently achieves higher empirical efficiency than Cox, which in turn is more efficient than IPCW across settings. Notably, SCORNET consistently achieves empirical efficiency comparable to Weibull and significantly higher than Cox despite being far more flexible than both, again highlighting the efficiencygainedbyleveragingauxiliaryinformationtoimputeunobservedrisks.Atthesametime,SCORNETis muchlesssusceptibletomodelmisspecificationbias thanWeibull, asdemonstratedbythe latter’s significantly higherbiasandRMSEinSetting4. Indeed,SCORNETachievesrelatively lowmeanabsolutebiasacrosssettings, with MSE apparently dominated by variance rather than bias in the setting of 100-200 labels. Consistent with the theory, SCORNET is robust to misspecification of the imputation model in settings 1, 3, and 5, achieving equivalently insignificantbiasas insetting2andmarginallybutnotmeaningfullyhigherbias thaninsetting4. That said, correctnessof the imputationmodel insettings2and4doesnotyieldanymeaningful change inrel- ative efficiency, likely because inherent variability functionally dominates imputation model bias given so few labels. Reassuringly, SCORNET (and IPCW) appear insensitive to misspecification of � | `0 in setting 6, achiev- ing functionally equivalent bias to the correctly-specified Weibull and Cox models. Altogether, these results corroborate the assertion that SCORNET’s semi-supervised utilization of informative feature data to impute risks in the unlabeled set improves estimation efficiency without introducing bias regardless of the validity of the imputationmodel.Moreover, theysuggest thatSCORNETisparticularlyvaluable insettingswhere (1)flex- 12 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ ibility is desired with regard to the distribution of) | `0, and (2) there exists a large set of unlabeled patients with associated EHR data – both commonplace in retrospective observational studies. Figure3:Empiricalcoverageprobabilitiesaveragedovertime(left)andplottedovertime(right)ofSCORNET-Strong’s95% confidence intervals constructed with the bootstrap (red) and plug-in (blue) standard error estimators in various simu- lated settings with = = 200 observed current status labels. See Table 1 for details of the generative mechanism employed in each setting. Toassessthefinitesampleperformanceoftheproposedintervalestimationprocedures,weobtainstandard errorestimatesbothusingtheproposedplug-inestimator f̂(C) andviabootstrapwith500replicates. InFigure 3 we demonstrate empirical coverage probabilities of SCORNET’s 95% Wald confidence intervals constructed using each standard error estimator, both averaged over the selected timepoints (left) and at each timepoint (right). Reassuringly, we find that the 95% confidence intervals using both plug-in and bootstrap estimators achievenearly 95% meancoverageacrosssettings.Coverageonlydropsbelow 90% at thetailsof theevent time 13 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ support due to moderately increased bias from kernel smoothing thereabout. The plug-in estimator achieves marginally lower coverage than the bootstrap estimator at the right tail due to underestimation of the true standard error, likely because of overfitting of the imputation model given low local censoring density (and thus low effective #). Notably, we do not observe this trend in setting 4, wherein correct specification of the imputation model obviates overfitting. Thus, we posit that the plug-in estimator can be reliably used for fi- nite sample problems with = ∈ [100, 200] labels as long as the conditional censoring density 52(� | `0) is sufficiently high and the timepoints evaluated are sufficiently far from the tails of the event time support. 4 Application to Assessing Heart Failure Risk Among Rheumatoid Arthritis Patients Rheumatoidarthritis (RA),achronicinflammatorydiseasethataffectsapproximately 1% ofthegeneralpopula- tion, is associated with dramatically increased risk of heart failure (HF) morbidity and mortality (Kaplan, 2010; Nicola andothers, 2005, 2006; Ahlers andothers, 2020). One study estimated that RA patients have a 1.9-fold life- time risk of developing HF compared to matched RA-negative controls (Nicola andothers, 2005), while another estimated that HF accounts for 13% of excess mortality among RA patients (Nicola and others, 2006). Ongoing interest lies in estimating the risk of developing HF subtypes in RA cohorts and quantifying the risk modifying effect of various RA treatments (Ahlers and others, 2020). Due to the increased availability of electronic health record (EHR) data, it is now possible to assess HF risk for a broader RA population using these data. For ex- ample, at Mass General Brigham we previously established an EHR cohort of #0 = 16, 358 RA patients (Huang and others, 2020). This large RA cohort can potentially be used to study the longitudinal risk of HF among RA patients. However, such analysis is not straightforward as HF status is not readily available within the RA cohort. We propose to estimate HF risk among RA patients by leveraging (1) = current status labels on HF status obtained 14 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ via manual chart review, and (2) informative yet unlabeled EHR data, including time to first ICD code for HF, as surrogate variables `. We estimate both the age-specific HF risk, �age(·), and the risk of developing HF after the patient’s incident ICD code for RA (714), �RA+(·), among patients with at least 6 months of follow up whose incident RA codes occur after the age of 16 to select for adult-onset as opposed to juvenile RA. Among filter- positive patients, defined as having at least 1 ICD code for HF, we have = = 300 labels on censoring time HF status Δ for age-specific HF risk, and we have = = 126 for post-RA HF risk. We let the baseline covariates `0 includesexanddecadeoffirstEHReventfor �age(·), andsex,decadeoffirstRAcode,andageatfirstRAcodefor �RA+(·).WeobtainHFriskestimatorsbasedonSCORNETaswellas theaforementionedcomparatorestimators. For the imputation model in Step 2 of SCORNET, we consider three EHR-derived surrogate risk predictors for `: (1) thepredictedΔ basedontheunsupervisedMultimodalAutomatedPhenotyping(MAP)algorithm,which uses the total counts of HF ICD codes and mentions of HF in clinical notes, as well as the total count of all ICD codesasahealthcareutilizationmeasure (Liaoandothers, 2019), (2) thepredictedΔ basedontheunsupervised Surrogate-guidedEnsembleLatentDirichletAllocation(sureLDA)algorithm,which leverages thefeaturesused in MAP as well as 121 additional manually-selected EHR features including counts of relevant medications, ICD codes, and concept unique identifiers (CUIs) in clinical notes (Ahuja and others, 2020b); and (3) the time to first HF ICD code. As in our simulation, we select plug-in bandwidths of 1= = B̂(�)=−1/4 and 0# = B̂(�)#−1/4 for the imputation and marginalization steps of SCORNET respectively, and we evaluate risk at 100 timepoints C ∈ [&�(0.1) +1=,&�(0.9)−1=]. We again compare the performance of SCORNET to that of Weibull, Cox, and IPCW,incorporatingthefilter intheWeibullandCoxmodelsbypropensityweightingaswedointhesimulation study. In Figure 4, we show the estimated HF risk curves along with their standard errors. Reassuringly, all meth- ods appear to agree rather closely for estimation of both age-specific HF risk and HF risk after RA diagnosis. For the latter quantity, however, Weibull and Cox appear to underfit while IPCW appears to overfit relative to 15 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Figure4:Estimatedage-specificandpost-RAcumulativerisksofheart failure(top)andbootstrapstandarderrorsthereof (bottom)overtimeoftheWeibullAcceleratedFailureTime(red,short-long-dashed),CoxProportionalHazardsw/Breslow baseline (blue, dot-dashed), supervised IPCW (green, dashed), and SCORNET (purple, solid) estimators. theSCORNETestimator,whichappears toachieveareasonablemiddleground.Moreover,SCORNETonceagain attainsstandarderrorscomparable to thoseof theWeibull estimatorandmeaningfully lowerthanthoseof the Cox and IPCW estimators. This suggests that while the Weibull and Cox models potentially fail to capture the complexity of the post-RA HF risk function, and IPCW is too unstable for a limited labeled set of size = = 126, SCORNET offers an attractive balance of efficiency and flexibility and is thus well conditioned for such a sce- nario. As shown in Figure 5, averaged over the selected timepoints, the root relative efficiency of SCORNET is 1.11, 2.55, and 3.31 compared to the Weibull, Cox, and IPCW estimators respectively for estimation of age- specific risk, and 1.34, 2.32, and 3.85 respectively for estimation of HF risk after RA diagnosis. Once again, the fact that SCORNET achieves efficiency moderately higher than the relatively inflexible Weibull model and sig- 16 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ nificantly higher than the Coxand IPCW estimators reflects the value of leveraging available information from the EHR to bolster risk estimation efficiency. Figure 5: Time-averaged bootstrap standard errors (left) and empirical root relative efficiencies (right) of the Weibull AcceleratedFailureTime(red),CoxProportionalHazardsw/Breslowbaseline(blue), IPCW(green),andSCORNET(purple) estimators for estimation of (1) age-specific HF risk (left), and (2) HF risk after RA diagnosis (right), among RA patients in the Partners EHR database. 5 Discussion By leveragingasizeableunlabeleddatasetcontaining imperfect surrogatesof the trueevent timesandasmall set with observed current status labels, the SCORNET estimator serves as a robust and efficient alternative to existing model-free survival estimators with current status data. The semi-supervised nature of SCORNET makes it well-conditioned to EHR-based survival estimation in settings where only a limited number of labels are available or readily obtainable. Moreover, by only requiring current status labels rather than the precise 17 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ timing of event onset, SCORNET greatly reduces the burden of chart review and increases the feasibility of studying disease risk using EHR data. To allow for covariate-dependent censoring, which is frequently present in observational settings, SCOR- NET requires additional assumptions on the distribution of � | `0. Although we choose the proportional haz- ards model for illustration, any standard semi-parametric model will yield similar properties for the resulting estimator. Since {�, `0} are observed for all subjects, one can potentially allow for more flexible (i.e. non- parametric) censoring models. That said, our simulation results suggest that SCORNET is relatively insensitive to misspecification of the model for � | `0. Even under mild misspecification, it achieves consistently lower mean squared errors than existing estimators. When interest lies in assessing how risk differs across different patient sub-populations, it is straightfor- wardtoextendSCORNETtoestimatesubgroup-specificrisks forasmallnumberof subgroups.However, future research is warranted to estimate covariate-specific risks for a general set of covariates. 6 Software An R package, including a sample use case and complete documentation, is available at https://cran.r-project.org/web/packages/SCORNET/index.html. Source code can be found at https://github.com/celehs/SCORNET. Funding ThisworkwassupportedbytheU.S.National InstitutesofHealthGrantsT32-AR05588512,T32-GM7489714,and R21-CA242940. 18 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Acknowledgements The authors declare no conflicts of interest. References Ahlers, Michael J., Lowery, Brandon D., Farber-Eger, Eric, Wang, Thomas J., Bradham, William, Orm- seth, Michelle J., Chung, Cecilia P., Stein, C. Michael and Gupta, Deepak K. (2020). Heart failure risk associated with rheumatoid arthritis-related chronic inflammation. Journal of the American Heart Association, 9. Ahuja, Yuri, Hong, Chuan, Xia, Zongqi and Cai, Tianxi. (2020a). Samgep: A novel method for prediction of phenotype event times using the electronic health record. Preprint. Ahuja, Yuri, Zhou, Doudou, He, Zeling, Sun, Jiehuan, Castro, Victor M, Gainer, Vivian, Murphy, Shawn N, Hong, Chuan and Cai, Tianxi. (2020b). surelda: A multidisease automated phenotyping method for the electronic health record. Journal of theAmericanMedical InformaticsAssociation 27(8), 1235–1243. Bair, Eric and Tibshirani, Robert. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoSBiology 2(4), E108. Basha,LuleandHoxha,Fatmir. (2019). Kernelestimationofthebaselinefunctioninthecoxmodel. European Scientific Journal 15(6), 105–118. Breslow,NormanE. (1972). Discussionofprofessorcox’spaper. Journalof theRoyalStatisticalSociety,SeriesB34, 216–217. Chai,Hua,Li,Zi-na,Meng,De-yu,Xia,Liang-yongandLiang,Yong. (2017). Anewsemi-supervisedlearning model combined with cox and sp-aft models in cancer survival analysis. ScientificReports 7(13053). 19 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Choi, Edward, Du, Nan, Chen, Robert, Song, Le and Sun, Jimeng. (2015). Constructing disease network and temporal progression model via context-sensitive hawkes process. IEEE Computer Society. pp. 101–108. Chubak, Jessica, Yu, Onchee, Pocobelli, Gaia, Lamerato, Lois, Webster, Joe, Prout, Marianna N, Yood, Marianne Ulcickas, Barlow, William E and Buist, Dianna SM. (2012). Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. Journal of the National Cancer Institute 104(12), 931–940. Cipparone, Charlotte W, Withiam-Leitch, Matthew, Kimminau, Kim S, Fox, Chet H, Singh, Ranjit and Kahn, Linda. (2015). Inaccuracy of icd-9 codes for chronic kidney disease: A study from two practice-based research networks (pbrns). The Journal of theAmericanBoardof FamilyMedicine 28(5), 26094. Cox,DavidR. (1972). Regressionmodelsandlife-tables. Journalof theRoyalStatisticalSociety.SeriesB34, 187–220. Dean, Bonnie B, Lam, Jessica, Natoli, Jaime L, Butler, Qiana, Aguilar, Daniel and Nordyke, Robert J. (2003). Use of electronic medical records for health outcomes research: A literature review. Medical Care ResearchandReview 31(6), 611–638. Golub,T.R., Slonim,D.K.,Tamayo,P.,Huard,C.,Gaasenbeek,M.,Mesirov, J.P.,Coller,H.,Loh,M.L.,Down- ing, J.R., Caligiuri, M.A., Bloomfield, C.D. and others. (1999). Molecular classification of cancer: Class dis- covery and class prediction by gene expression monitoring. Science 286(5439), 531–537. Hassett, Michael J, Uno, Hajime, Cronin, Angel M, Carroll, Nikki M, Hornbrook, Mark C and Ritzwoller, Debra. (2017). Detecting lung and colorectal cancer recurrence using structured clini- cal/administrativedatatoenableoutcomesresearchandpopulationhealthmanagement.MedicalCare55(12), e88–e98. 20 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Hodgkins, Adam J, Bonney, Andrew, Mullan, Judy, Mayne, Darren John and Barnett, Stephen. (2017). Survival analysis using primary care electronic health record data: A systematic review of the literature. Health InformationManagement Journal 47(1), 6–16. Hripcsak, George and Albers, David J. (2012). Next-generation phenotyping of electronic health records. Journal of theAmericanMedical InformaticsAssociation 20(1), 117–121. Huang, Jian. (1996). Efficient estimation for the proportional hazards model with interval censoring. The Annals of Statistics 24(2), 540–568. Huang, Sicong, Huang, Jie, Cai, Tianrun, Dahal, Kumar P, Cagan, Andrew, He, Zeling, Stratton, Jack- lyn, Gorelik, Isaac, Hong, Chuan, Cai, Tianxi and others. (2020). Impact of icd10 and secular changes on electronic medical record rheumatoid arthritis algorithms. Rheumatology. Kaji, Deepak A, Zech, John R, Kim, Jun S, Cho, Samuel K, Dangayach, Neha S, Costa, Anthony B and Oer- mann, Eric K. (2019). An attention based deep learning model of clinical events in the intensive care unit. PLoSOne 14(2), e0211057. Kaplan, Mariana J. (2010). Cardiovascular complications of rheumatoid arthritis - assessment, prevention. and treatment. RheumaticDiseaseClinics ofNorthAmerica 36(2), 405–426. Kohane, Isaac S, Churchill, Susanne E and Murphy, Shawn N. (2012). A translational engine at the na- tional scale: informatics for integrating biology and the bedside. Journal of the American Medical Informatics Association 19(2), 181–185. Liang, Yong, Chai, Hua, Liu, Xiao-Ying, Xu, Zong-Ben, Zhang, Hai and Leung, Kwong-Sak. (2016). Cancer survival analysis using semi-supervised learning method based on cox and aft models with l1/2 regulariza- tion. BMCMedicalGenomics 9(11), 11. 21 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Liao, Katherine P, Sun, Jiehuan, Cai, Tianrun A, Link, Nicholas, Hong, Chuan, Huang, Jie, Huffman, Jennifer E, Gronsbell, Jessica, Zhang, Yichi, Ho, Yuk-Lam, Castro, Victor, Gainer, Vivian, Murphy, ShawnN,O’Donnell,ChristopherJ,Caziano,JMichael,Cho,Kelly,Szolovits,Peter,Kohane,IsaacS, Yu, Sheng and others. (2019). High-throughput multimodal automated phenotyping (map) with application of phewas. Journal of theAmericanMedical InformaticsAssociation 26(11), 1255–1262. Lin,Hung-Mo,Williamson,JohnMandKim,Hae-Young. (2019). Firthadjustmentforweibullcurrent-status survival analysis. Communications inStatistics -TheoryandMethods 49(18), 4587–4602. Liu, Bin, Li, Ying, Sun, Zhaonan, Ghosh, Soumya and Ng, Kenney. (2018). Early prediction of diabetes com- plicationsfromelectronichealthrecords:Amulti-tasksurvivalanalysisapproach. In:The32ndAAAIConference onArtificial Intelligence. Association for the Advancement of Artificial Intelligence. pp. 101–108. Miotto, Riccardo, Li, Li, Kidd, Brian A and Dudley, Joel T. (2016). Deep patient: an unsupervised represen- tation to predict the future of patients from the electronic health records. ScientificReports 6(6), 26094. Nicola,PauloJ.,Crowson,CynthiaS.,Maradit-Kremers,Hilal,Ballman,KarlaV.,Roger,VeroniqueL., Jacobsen, Steven J. and Gabriel, Sherine E. (2006). Contribution of congestive heart failure and ischemic heart disease to excell mortality in rheumatoid arthritis. Arthritis Rheumatology 54(1), 60–67. Nicola, Paulo J., Maradit-Kremers, Hilal, Roger, Veronique L., Jacobsen, Steven J., Crowson, Cyn- thia S., Ballman, Karla V. and Gabriel, Sherine E. (2005). The risk of congestive heart failure in rheuma- toid arthritis: a population-based study over 46 years. Arthritis Rheumatology 52(2), 412–420. Pagan, Adrian and Ullah, Aman. (1999). Nonparametric econometrics. Cambridge university press. Panahiazar, Maryam, Taslimitehrani, Vahid, Pereira, Naveen and Pathak, Jyotishman. (2015). Using ehrs and machine learning for heart failure survival analysis. Studies inHealthTechnologyand Informatics 216, 40–44. 22 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ Rotnitzky, Andrea and Robins, James M. (2014). Inverse probability weighting in survival analysis. Wiley StatsRef: StatisticsReferenceOnline. Ruan, Tong, Lei, Liqi, Zhou, Yangming, Zhai, Jie, Zhang, Le, He, Ping and Gao, Ju. (2019). Representation learning for clinical time series prediction tasks in electronic health records. BMC Medical Informatics and DecisionMaking 19(259). Seaman, Shaun R and White, Ian R. (2013). Review of inverse probability weighting for dealing with missing data. Statisticalmethods inmedical research 22(3), 278–295. Steele, Andrew J, Denaxas, Spiros C, Shah, Anoop D, Hemingway, Harry and Luscombe, Nicholas M. (2018). Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoSOne 13(8), e0202344. Uno,Hajime,Ritzwoller,DebraP,Cronin,AngelM,Carroll,NikkiM,Hornbrook,MarkCandHassett, MichaelJ. (2018). Determiningthetimeofcancerrecurrenceusingclaimsorelectronicmedicalrecorddata. JCOClinicalCancer Informatics 2, 1–10. van der Laan, Mark J and Jewell, Nicholas P. (2003). Current status and right-censored data structures when observing a marker at the censoring time. TheAnnals of Statistics 31(2), 512–535. van der Laan, Mark J and Robins, James M. (1998). Locally efficient estimation with current status data and time-dependent covariates. Journal of theAmericanStatisticalAssociation 93(442), 693–701. Vardi, Y. (1982). Nonparametric estimation in the presence of length bias. Annals of Statistics 10, 178–203. Zhao, Yue, Herring, Amy H, Zhou, Haibo, Ali, Mirza W and Koch, Gary G. (2014). A multiple imputation methodforsensitivityanalysesof time-to-eventdatawithpossibly informativecensoring. JournalofBiophar- maceutical Statistics 24(2), 229–253. 23 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425976doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425976 http://creativecommons.org/licenses/by-nc/4.0/ 10_1101-2021_01_08_426008 ---- AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees Lenore Pipes 1,∗ and Rasmus Nielsen 1,2,3∗ 1Department of Integrative Biology, University of California-Berkeley, Berkeley, 94707, USA, 2Department of Statistics, University of California-Berkeley, Berkeley, CA 94707, USA, and 3Globe Institute, University of Copenhagen, 1350 København K, Denmark ∗To whom correspondence should be addressed. Abstract Motivation: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the expo- nential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clus- tering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences. Results: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods. Availability and implementation: AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust Contact: lpipes@berkeley.edu Supplementary information: Supplementary figures and table are available online. 1 Introduction Traditional clustering methods such as UCLUST (Edgar, 2010), CD-HIT (Fu et al., 2012), and DNACLUST (Ghodsi et al., 2011) use hierarchical or greedy algorithms that rely on user input of a sequence identity threshold. These methods were developed for high speed clustering of a high quantity of highly similar se- quences (Ghodsi et al., 2011; Li et al., 2001; Edgar, 2010) and, generally, these methods are considered unreliable for identity thresholds <75% because of either the poor quality of alignments at low identities (Zou et al., 2018) or because the performance of the threshold used to count short words drops dramatically with low identities (Huang et al., 2010). At low identities, these meth- ods produce uneven clusters where the majority of sequences are contained in only a few clusters (Chen et al., 2018) and the high variance in cluster sizes reduces the utility of the clustering step for many practical purposes. Clustering of divergent sequences is a fundamental step in genomics analysis because it allows for an early divide-and-conquer strategy that will significantly increase the speed of downstream analyses (Zheng et al., 2018) and clus- tering of divergent sequences is a frequent request of users of at least one clustering method (Huang et al., 2010). Currently, there are no clustering methods that can accurately cluster large taxo- nomically divergent metabarcoding reference databases such as the Barcode of Life database (Ratnasingham and Hebert, 2007) in relatively even clusters. Only a few other methods, such as Sp- Clust (Matar et al., 2019) and TreeCluster (Balaban et al., 2019), exist for clustering potentially divergent sequences. SpClust cre- ates clusters based on the use of Laplacian Eigenmaps and the Gaussian Mixture Model based on a similarity matrix calculated on all input sequences. While this approach is highly accurate, the calculation of an all-to-all similarity matrix is a computation- ally exhaustive step. TreeCluster uses user-specified constraints for splitting a phylogenetic tree into clusters. However, TreeClus- ter requires an input tree and thus can also be prohibitively slow .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ 2 Pipes and Nielsen for large numbers of sequences where a phylogenetic tree is dif- ficult to estimate reliably. With the increasing size of reference databases (Schoch et al., 2020), there is a need for new compu- tationally efficient methods that can cluster divergent sequences. Here we present AncestralClust that was specifically developed for clustering of divergent metabarcoding reference sequences in clusters of relatively even size. 2 Methods To cluster divergent sequences, we developed AncestralClust which is written in C (Figure 1). Firstly, k random sequences are chosen and the sequences are aligned pairwise using the wavefront algorithm (Marco-Sola et al., 2020). A Jukes-Cantor distance ma- trix is constructed from the alignments and a neighbor-joining phylogenetic tree is constructed. The Jukes-Cantor model is cho- sen for computational speed, but more complex models could in principle be used to potentially increase accuracy but also in- crease computational time. The C − 1 longest branches in the tree are then cut to yield C clusters. These subtrees comprise the initial starting clusters. The sequences in each starting clus- ter are aligned in a multiple sequence alignment using kalign3 (Lassmann, 2020). The ancestral sequences at the root of the tree of each cluster is estimated using the maximum of the posterior probability of each nucleotide using standard programming algo- rithms from phylogenetics (see e.g., Yang, 2014). The ancestral sequences are used as the representative sequence for each cluster. Next, the rest of the sequences are assigned to each cluster based on the shortest nucleotide distance from the wavefront alignment between the sequence and the C ancestral sequences. If the short- est distance to any of the C ancestral sequences is larger than the average distance between clusters, the sequence is saved for the next iteration. We iterate this process until all sequences are as- signed to a cluster. In each iteration after the first iteration, a cut of a branch in the phylogenetic tree is chosen if the the branch is longer that the average length of branches cut in the first iteration. In praxis, only one or two iterations are needed for most data sets if k is defined to be sufficiently large. We compared AncestralClust to five other state-of-the-art clustering methods: UCLUST (Edgar, 2010), meshclust2 (James and Girgis, 2018), DNACLUST (Ghodsi et al., 2011), CD-HIT (Fu et al., 2012), and SpClust (Matar et al., 2019). We used a variety of measurements to assess the accuracy and evennness of the clustering. We calculated two traditional measures of accu- racy, purity and normalized mutual information (NMI), used in Bonder et al. (2012). The purity of clusters is calculated as: purity(Ω, C) = 1 N ∑ k max j |ωk ∩ cj| (1) where Ω = w1, w2, ..., wk is the set of clusters, C = c1, c2, ..., cj is the set of taxonomic classes and N is the total number of sequences. NMI is calculated as: NMI(Ω, C) = I(Ω, C) [H(Ω) + H(C)]/2 (2) where mutual information gain is I(Ω, C) and H is the entropy function. To measure the evenness of the clusters, we used the coefficient of variation which is calculated as: CV = √∑j i (ni − m) 2/j m (3) where ni is the number of sequences in cluster i, j is the total number of clusters, and m is the mean size of the clusters. We also used a taxonomic incompatibility measure to assess the ac- curacy of the clusters. Let a,b be a pair of species found in cluster i. Incompatibility at a given taxonomic rank is calculated by first identifying the number of times a and b exist in clusters other than cluster i. The total incompatibility is calculated by summing over all pairs of sequences (a,b) and all i. Both NMI and taxonomic incompatibility are very sensitive to the number of clusters and also to unevenness of cluster sizes. To allow fair comparison when numbers of clusters and evenness of cluster sizes vary we, therefore, calculate the relative NMI and relative incompatibility. These measures are calculated by scaling them relative to their expected values under random as- signments given the number of clusters and the cluster sizes. We estimated relative NMI by dividing the raw NMI score by the average NMI of 10 clusterings in which sequences have been as- signed at random with equal probability to clusters, such that the cluster sizes are same as the cluster sizes produced in the original clustering. The same procedure was used to convert the taxonomic incompatibility measure into relative incompatibility. 3 Results To first assess performance of clustering methods on divergent nucleotide sequences, we used 100 random samples of 10,000 sequences from three metabarcode reference databases (16, 18S, and Cytochrome Oxidase I (COI)) from the CALeDNA project Meyer et al. (2019). We chose to compare our method on this dataset against UCLUST because it is the most widely used clus- tering program and it performs better than CD-HIT on low identity thresholds (Chen et al., 2018). We first compared AncestralClust against UCLUST using relative NMI and Coefficient of Variation (Figure 2). We used k = 300 random initial sequences, which is 3% of the total num- ber of sequences in each sample and C = 16 cuts in the initial phylogenetic tree. Notice that the relative NMI tends to be higher with a lower coefficient of variation for AncestralClust across all barcodes. This suggests, that for these divergent eDNA sequences, AncestralClust provides clusterings that are more even in size and that are more consistent with conventional taxonomic assignment. As a second measure of accuracy we measured relative incom- patibility and coefficient of variation using AncestralClust and UCLUST using for the same datasets under the same running .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ AncestralClust 3 conditions. Notice in Figure 3, AncestralClust tends to create balanced clusters with lower relative taxonomic incompatibilities compared to UCLUST at all taxonomic levels. Similar results are seen for metabarcode 18S (Fig S1). However, for metabar- code 16S (Fig S2), AncestralClust performs noticeably better than UCLUST at the species, genus, and family levels but at the order, class, and phylum levels it performs either the same or worse. Also, at the species, genus, and family levels, it is apparent that as the UCLUST clusters approach a lower coefficient of variation, the relative incompatibility increases dramatically. Next, we analyzed two datasets with different properties: one dataset of diverse species from the same gene and another dataset of homologous genes from species of the same phyla. In the first dataset, we expect that the sequences to cluster according to species. In the second dataset, we expect the sequences to cluster according to different genes. We compared AncestralClust to four commonly used clustering programs (UCLUST, meshclust2, CD- HIT2, and DNACLUST) and one clustering program designed for divergent sequences, SpClust. The first dataset contained 13,043 sequences from the COI CaleDNA database from 11 divergent species that were from 7 different phyla and 11 different classes and the second data set contained sequences from 6 different genes from taxonomically similar species. First, we compared all meth- ods using 13,043 COI sequences from the 11 different species (Table 1). We expect these sequences to form 11 different clus- ters, each including all the sequences from one species. We chose identity thresholds to enforce the expected number of clusters for each method. We were unable to form 11 clusters using CD-HIT because the program does not allow clustering of sequences with identity thresholds < 80% at default parameters. For SpClust, we used the three precision modes available for the method. In this analysis, AncestralClust achieved a perfect clustering (the purity was 1 and relative incompatibility was 0) although it was the second slowest, and had the second lowest memory require- ments. UCLUST was one of the fastest methods and used the least amount of memory but had the second lowest purity with third highest relative NMI values. meshclust2 had no incompatibilities and the second highest purity and relative NMI values but was the third slowest method. DNACLUST had the most uneven clusters and the second lowest relative NMI value with the highest relative incompatibility. SpClust only identified one cluster, with a com- putational time of ~2 days. In comparison, AncestralClust took ~5 minutes and UCLUST used < 1 second. Next, we analyzed ’genomic set 1’ from Matar et al. (2019), which consists of 39 sequences from 6 homologous genes (FCER1G, S100A1, S100A6, S100A8, S100A12, and SH3BGRL3 in Table 2). We expect these sequences to form 6 clusters. We varied the identity thresholds for UCLUST and meshclust2 using thresholds 0.4, 0.6, and 0.8. For CD-HIT, we used the lowest identity threshold available on default parameters which is 0.8. We were unable to use DNACLUST for this anal- ysis because it cannot handle sequences longer than 4500bp (the average sequence length was 2,387.9bp and the longest sequence was 5,363bp). Since this dataset contained 6 different genes, we calculated relative NMI using genes as the classes and did not use incompatibility as an accuracy measure. Only AncestralClust, UCLUST, and meshclust2 produced the expected number of clus- ters, and among the methods that created the expected number of clusters, AncestralClust had the highest purity value. Ancestral- Clust was the second slowest method and had the highest memory requirements which is due to the wavefront algorithm alignment which isO(s2) in memory requirements where s is the alignment score. Since alignments were performed using 6 different genes that were longer than 1.5kb, this resulted in a high value of s. Sp- Clust had the highest relative NMI using all precision modes and the same purity as AncestralClust for its moderate and maximum precision modes, however, failed to produce the expected number of clusters. 4 Conclusions We developed a phylogenetic-based clustering method, Ances- tralClust, specifically to cluster divergent metabarcode sequences. We performed a comparative study between AncestralClust and widely used clustering programs such as UCLUST, CD-HIT, DNACLUST, meshclust2, and for divergent sequences, SpClust. UCLUST and DNACLUST are substantially faster than Ances- tralClust and should be the preferred method if computational speed is the main concern. However, AncestralClust tends to form clusters of more even size with lower taxonomic incompatibility and higher NMI than other methods, for the relatively divergent sequences analyzed here. We recommend the use of Ancestral- Clust when sequences are divergent, especially if a relatively even clustering is also desirable, for example for various divide-and- conquer approaches where computational speed of downstream analyses increases faster than linearly with cluster size. Acknowledgements This work used the Extreme Science and Engineering Discov- ery Environment (XSEDE) Bridges system at the Pittsburgh Supercomputing Center through allocation BIO180028. References Balaban, M., Moshiri, N., Mai, U., Jia, X., and Mirarab, S. (2019). Treecluster: Clustering biological sequences using phylogenetic trees. PloS one, 14(8), e0221068. Bonder, M. J., Abeln, S., Zaura, E., and Brandt, B. W. (2012). Compar- ing clustering and pre-processing in taxonomy analysis. Bioinformatics, 28(22), 2891–2897. Chen, Q., Wan, Y., Zhang, X., Lei, Y., Zobel, J., and Verspoor, K. (2018). Comparative analysis of sequence clustering methods for deduplication of biological databases. J. Data and Information Quality, 9(3). Edgar, R. C. (2010). Search and clustering orders of magnitude faster than blast. Bioinformatics, 26(19), 2460–2461. Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150–3152. .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ 4 Pipes and Nielsen Ghodsi, M., Liu, B., and Pop, M. (2011). Dnaclust: accurate and efficient clustering of phylogenetic marker genes. BMC bioinformatics, 12(1), 1–11. Huang, Y., Niu, B., Gao, Y., Fu, L., and Li, W. (2010). CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 26(5), 680–682. James, B. T. and Girgis, H. Z. (2018). Meshclust2: Application of alignment-free identity scores in clustering long dna sequences. bioRxiv, page 451278. Lassmann, T. (2020). Kalign 3: multiple sequence alignment of large datasets. Li, W., Jaroszewski, L., and Godzik, A. (2001). Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17(3), 282–283. Marco-Sola, S., Moure López, J. C., Moreto Planas, M., and Es- pinosa Morales, A. (2020). Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics, (btaa777), 1–8. Matar, J., Khoury, H. E., Charr, J.-C., Guyeux, C., and Chrétien, S. (2019). Spclust: Towards a fast and reliable clustering for potentially divergent biological sequences. Computers in biology and medicine, 114, 103439. Meyer, R. S., Curd, E. E., Schweizer, T., Gold, Z., Ramos, D. R., Shirazi, S., Kandlikar, G., Kwan, W.-Y., Lin, M., Freise, A., et al. (2019). The california environmental dna “caledna” program. bioRxiv, page 503383. Ratnasingham, S. and Hebert, P. D. (2007). Bold: The barcode of life data system (http://www. barcodinglife. org). Molecular ecology notes, 7(3), 355–364. Schoch, C. L., Ciufo, S., Domrachev, M., Hotton, C. L., Kannan, S., Khovanskaya, R., Leipe, D., Mcveigh, R., O’Neill, K., Robbertse, B., et al. (2020). Ncbi taxonomy: a comprehensive update on curation, resources and tools. Database, 2020. Yang, Z. (2014). Molecular evolution: a statistical approach. Oxford University Press. Zheng, W., Mao, Q., Genco, R. J., Wactawski-Wende, J., Buck, M., Cai, Y., and Sun, Y. (2018). A parallel computational framework for ultra-large- scale sequence clustering analysis. Bioinformatics, 35(3), 380–388. Zou, Q., Lin, G., Jiang, X., Liu, X., and Zeng, X. (2018). Sequence clus- tering in bioinformatics: an empirical study. Briefings in Bioinformatics, 21(1), 1–10. .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ AncestralClust 5 Figure 1. Overview of AncestralClust. In (1), k random sequences are chosen for the initial clusters. (2) Using the k sequences a distance matrix is constructed. Using the distance matrix, a neighbor-joining tree is constructed and C − 1 cuts are made to create C clusters. In (4), each cluster is multiple sequenced aligned and the ancestral sequences are reconstructed in the root node of each tree. The rest of the unassigned sequences are then aligned to the ancestral sequences of each cluster and the shortest distance to each ancestral sequence is calculated. The process is iterated until all sequences are assigned to a cluster. Figure 2. Relative NMI against coefficient of variation for AncestralClust and UCLUST for 100 samples of 10,000 randomly chosen 16S, 18S, and COI reference sequences from the CALeDNA Project (Meyer et al., 2019). The similarity threshold for UCLUST was 0.58. For AncestralClust, we used 300 initial random sequences with 15 initial clusters. Relative NMI was calculated by dividing NMI by the average of 10 random samples of the same fixed cluster size. .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ 6 Pipes and Nielsen Figure 3. Relative incompatibility against coefficient of variation for AncestralClust and UCLUST for 100 samples of 10,000 randomly chosen COI reference sequences. COI reference sequences are from the CALeDNA Project (Meyer et al., 2019). The similarity threshold for UCLUST was 0.58. For AncestralClust, we used 300 initial random sequences with 15 initial clusters. Table 1. Comparisons of clustering methods using 13,043 COI sequences from 11 different species. The list of species can be found in Table S1. Incompatibility was calculated at the taxonomic rank of species. For UCLUST, meshclust2, and DNACLUST, the identity thresholds were chosen to force the expected 11 number of clusters. For CD-HIT, the lowest possible identity was chosen which is 0.8. In the case of SpClust, Coefficient of Variation cannot be calculated for 1 cluster. SpClust clusters were created with version 2. Method # of clusters Time (sec) Mem (MB) Purity Relative Incompat. (species) Relative NMI Coeff. of Var. AncestralClust 11 293.2 19.3 1 0 551.09 0.8574 UCLUST 11 <1 9.9 0.8717 0.0182 474.63 0.8300 meshclust2 11 108.14 46.5 0.9855 0 498.898 0.1053 CD-HIT 24 5.86 43.9 0.8561 0 241.66 1.2031 DNACLUST 11 <1 170.6 0.9455 0.0545 24.21 1.8987 SpClust (fast) 1 152046.5 2678.9 1 0 1 - SpClust (moderate) 1 188172.9 6457.6 1 0 1 - SpClust (maxPrecision) 1 189577.1 6452.5 1 0 1 - .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ AncestralClust 7 Table 2. Comparisons of clustering methods using 39 sequences from 6 homologous genes from Matar et al. (2019).’id’ refers to the identity threshold used. We used identity thresholds of 0.4, 0.6, and 0.8 for UCLUST and meshclust2. We used precision levels of fast, moderate, and maximum for SpClust using version 1 since version 2 only produced 1 cluster for all modes. DNACLUST has a maximum sequence length of 4500bp and could not be used on this dataset. Method # of clusters Time (sec) Memory (Mb) Purity Relative NMI Coefficient of Variation AncestralClust 6 370.3 412.0 0.9487 1.8660 0.3982 UCLUST (id=0.4) 6 1 15.4 0.7436 1.5667 0.5396 UCLUST (id=0.6) 19 1 20.1 0.7179 1.4379 0.7166 UCLUST (id=0.8) 29 1.9 20.4 0.5641 1.1717 0.4565 meshclust2 (id=0.4) 6 1.1 7.7 0.8462 1.6625 1.2489 meshclust2 (id=0.6) 10 2.9 8.8 0.7949 1.9257 1.071 meshclust2 (id=0.8) 26 2.4 9.4 0.6410 1.2240 0.6325 SpClust (fast) 4 44.6 166.2 0.8718 2.2463 0.8432 SpClust (moderate) 4 112.5 166.1 0.9487 2.4335 0.6453 SpClust (max precision) 4 570.1 166.0 0.9487 2.9449 0.6809 CD-HIT (id=0.8) 31 0.48 39.9 0.4103 1.0950 0.4574 DNACLUST - - - - - - .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.426008doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.426008 http://creativecommons.org/licenses/by/4.0/ 10_1101-674051 ---- Easy Kinetics: a novel enzyme kinetic characterization software Easy Kinetics: a novel enzyme kinetic characterization software Gabriele Morabito 1,2 * Correspondence: g.morabito@age.mpg.de 1 Department of Biology, University of Pisa, Pisa, Italy Keywords: computational enzymology, enzyme’s kinetic 2 Max Planck Institute for Biology of Ageing, Cologne, Germany doi: 10.5281/zenodo.3242785 Abstract Here will be presented the software Easy Kinetics, a publicly available graphical interface that allows rapid evaluation of the main kinetics parameters in an enzyme catalyzed reaction. In contrast to other similar commercial software using algorithms based on non-linear regression models to reach these results, Easy Kinetics is based on a completely different original algorithm, requiring in input the spectrophotometric measurements of ∆Abs/min taken twice at only two different substrate concentrations. The results generated show however a significant concordance with those ones obtained with the most common commercial software used for enzyme kinetics characterization, GraphPad Prism 8Ó, suggesting that Easy Kinetics can be used for routine tests in enzyme kinetics as an alternative valid software. Introduction The continuous and rapid evolution of modern biochemical methods make the study of enzyme’s kinetic very useful both in academic research, to test how interesting polypeptidic chain’s variation impact on enzymes functionality, and in industrial processes, to optimize the production processes of the molecules of interest in enzymatic reactors [2]. The Michaelis-Mentem reaction mechanism was proposed almost a century ago to describe how the reaction speed of enzymes is affected by the substrate’s concentration [3], and it’s still the core reference model to describe enzymes kinetics. This model however requires a few parameters to fit the raw data: "#, Km and Vmax. Several methods were developed by biochemists during years to evaluate these parameters from the raw data, the most used of which allow software like GraphPad Prism 8Ó [1] to apply linear or non-linear regression model [4]. Original alternative methods for Km and Vmax determination were proposed, which graphically determine these values [5], but like the previous ones they require multiple spectrophotometric measurements of ∆Abs/min (at least 6 conducted in duplicate) at different substrate concentrations to precisely determine the main kinetic parameters. In this paper will be presented an alternative method implemented in the software Easy Kinetics, which allows determination of the main kinetics parameters of an enzyme catalyzed reaction and the corresponding kinetics graphs, by the spectrophotometric measurements of ∆Abs/min taken twice at only two different substrate concentrations. Materials and methods Algorithm used in evaluation of Km and Vmax: The evaluation of Km and Vmax by the spectrophotometric measurements of ∆Abs/min taken twice at only two different substrate concentrations, is based on a trigonometric demonstration (Fig.1). Briefly the algorithm transforms the mean of the duplicates at the two measurements in their reciprocal values, considering the Lineweaver-Burk reciprocal plot. Known two points of this graph, it’s universally accepted that they can be joined by one and only one straight line. This line will have an unknown inclination "a" and will intersect the Cartesian axes in points %&'() and - % *' , also unknown. However by tracing the projections of the two known .CC-BY 4.0 International licenseavailable under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which wasthis version posted January 2, 2021. ; https://doi.org/10.1101/674051doi: bioRxiv preprint https://doi.org/10.1101/674051 http://creativecommons.org/licenses/by/4.0/ points (x1,y1) and (x2,y2) on the Cartesian plane, it is evident that the parallel lines y = y2 and y = 0 intersect the studied straight line. By the Alternate Interior Angles theorem [6], if two parallel lines are cut by a transversal one, then the pairs of alternate interior angles are congruent: so, by Fig.1, "a" = "a1". Considering instead the lines y = y2 and y = y1, which are also parallel and intersected by the studied straight line, for the same theorem discussed before, their internal alternate angles are congruent: so, by Fig.1, "a1" = "a2". This implies that: tan(/) = 23 − 2% 53 − 5% But also 6 78 = tan(/), with 9 = 23 − % &'() , so: 1 ;<=7 = 23 − z = 23 − (tan(/) ∗ 53) = 23 − 53 ∗ (23 − 2%) 53 − 5% Once calculated % &'() , the value of % *' can be determined as follow: @− 1 A< @ = 1 ;<=7 tan(/) Inverting the two previous values, A<(? ∗@ [A] B? C 9: D< ;,9: .,EFG = HI ∗ JK L∗ M U = OPQRGSR = TU TV D ∗ W ∗ X YZ = AbsZ^E_`Ga − AbsbF0ac 0.064 ∗ O i0j_GkG_l = U YZ +j0_ = P.M ∗ .,01 X ∗ W ∗ YZ Y`ooGjG`ajl = $%&>/ ;pqr ;s Equation used for the generation of the kinetic graph Equation used for the evaluation of the V0 at a set chosen substrate Equation used to switch the previously evaluated V0, expressed in ∆Abs/min, into a new V0 value expressed in μmoli of reporter product generated per minute Equation used for the evaluation of the enzymatic units in the sample Equation used for the evaluation of the protein concentration during the Bradford assay Equation used for the evaluation of the enzyme’s specific activity Equation used for the evaluation of the enzyme’s Kcat Equation used for the evaluation of the enzyme’s catalytic efficiency Equation used for the evaluation of the Hill coefficient .CC-BY 4.0 International licenseavailable under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which wasthis version posted January 2, 2021. ; https://doi.org/10.1101/674051doi: bioRxiv preprint https://doi.org/10.1101/674051 http://creativecommons.org/licenses/by/4.0/ where [S] represents the substrate’s concentration; Si can be 1, if substrate’s inhibition is present or 0, if substrate’s inhibition is absent; Ki represents the inhibition’s constant evaluated at a very high substrate’s concentration as: +G = (>//∗ ;s)t (uII∗ Bs)∗ vsqw xyz(uII∗Bs){Bs{ (uII∗ Bs) when substrate inhibition is present +G = 1 when substrate inhibition is absent Lf represents the final volume of the sample; Li represents the starting volume of the sample; ε represents the extinction molar coefficient of the product; O represents the optical path of the spectrophotometer; Abshigh represents the absorbance measured at a very high substrate’s concentration; Absprotein represents the absorbance of the protein’s solution; Absblank represents the absorbance measured for the previous solution without proteins inside; P.M. represents the molecular weight of the reporter product. Enzyme’s ∆Abs/min raw data for several concentrations of tested limiting substrates: Tab.1 Experimentally measured ∆Abs/min values for several substrate’s concentrations in the enzyme’s catalyzed reactions tested. .CC-BY 4.0 International licenseavailable under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which wasthis version posted January 2, 2021. ; https://doi.org/10.1101/674051doi: bioRxiv preprint https://doi.org/10.1101/674051 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_08_425952 ---- rdrugtrajectory: An R Package for the Analysis of Drug Prescriptions in Electronic Health Care Records JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. Reddoi: 10.18637/jss.v000.i00 rdrugtrajectory: An R Package for the Analysis of Drug Prescriptions in Electronic Health Care Records Anthony Nash University of Oxford Tingyee E. Chang University of Oxford Benjamin Wan Kings College London M. Zameel Cader University of Oxford Abstract Primary care electronic health care records are rich with patient and clinical infor- mation. Studying electronic health care records has resulted in marked improvements to national health care processes and patient-care decision making, and is a powerful supple- mentary source of data for drug discovery effort. We present the R package rdrugtrajec- tory, designed to yield demographic and patient-level characteristics of drug prescriptions in the UK Clinical Practice Research Datalink dataset. The package operates over Clin- ical Practice Research Datalink Gold clinical, referral and therapy datasets and includes features such as first drug prescriptions analysis, cohort-wide prescription information, cu- mulative drug prescription events, the longitudinal trajectory of drug prescriptions, and a survival analysis timeline builder to identify risks related to drug prescription switching. The rdrugtrajectory package has been made freely available via the GitHub repository. Keywords: EHR, electronic health care records, CPRD, Clinical Practice Research Datalink, prescriptions, R, therapeutics, drug discovery, clinical epidemiology. 1. Introduction The UK Clinical Practice Research Datalink (CPRD) service offers high quality longitudinal data on 50 million patients with up to 20 years of follow-up for 25% of those patients. The service provides drug treatment patterns, feasibility studies and health care resource use stud- ies. Patient electronic health care records (EHR) are stored as coded and anonymised data and sourced from over 1,800 primary care practices across England. CPRD holds informa- tion on consultation events, medical diagnoses, symptoms, prescriptions, vaccination history, laboratory tests, and referrals. CPRD can provide routine linkage to other health-related patient datasets, for example: Small area level data, such as patient and/or practice postcode .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint http://dx.doi.org/10.18637/jss.v000.i00 https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 2 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records linked deprivation measures; data from NHS digital which includes hospital episode statistic, outpatient and accident and emergency data; and cancer data from Public Health England. Evidence from EHRs is making an impact on primary care decision-making and best prac- tice Oyinlola et al. (2016). With nationwide longitudinal datasets more readily available, the evaluation of treatments over long timescales can contribute to clinical decision-making Hepp et al. (2017). For example, adverse events caused by prescription medication can be studied using retrospective data in situations where randomized clinical trials may prove impracti- cal Ghosh et al. (2019); Bally et al. (2017). This publication serves as an introduction to the rdrugtrajectory R package and whilst this publication is by no means a complete tutorial, we will expand on some of the main pack- age features, such as, how to: Isolate patients by first drug prescriptions at given clinical events; calculate time-invariant prescriptions; construct survival analysis timelines (compati- ble with Cox proportional hazard regression and Kaplan Meier curves), and; visualise patient prescription switching. For a comprehensive list of functions please visit the Github reposi- tory https://github.com/acnash/rdrugtrajectory. Almost all features can be controlled by covariates or stratified by some variable, for example, by gender, age, medical codes or treatment product codes. The example code, figures and data structures presented here mimic a small fraction of our own research. In the interest of patient confidentiality, the clinical data used in the analysis have been fabricated. We present a brief tour of some of the functions available, starting with a discussion on the CPRD data structure and how records must be formatted. A glossary of terms has been provided (Table 1) to assist the reader. 2. rdrugtrajectory package and data structures 2.1. rdrugtrajectory availability and installation rdrugtrajectory is free to download from the Github repository https://github.com/acnash/ rdrugtrajectory and holds an MIT license. Fabricated CPRD clinical and CPRD prescrip- tion records in addition to age, gender and index of multiple deprivation scores are included for test and tutorial purposes. Before installing the package, the following R dependencies are required: plyr, dplyr, foreach, doParallel, data.table, parallel, splus2R, rlist, reda, ggplot2, ggalluvial, stats, utils and useful. The latest rdrugtrajectory binary is install using: install.packages("path/to/tar/file", source = TRUE, repos=NULL) rdrugtrajectory was developed and tested on R version 4.0.1. Please consult the Github page for release notes, the latest version and up to date installation instructions. 2.2. CPRD product descirption Several rdrugtrajectory functions use the CPRD product.txt file for assigning a text descrip- tion to a prescription prodcode. The product.txt (and medical.txt for medcode description) is available in the CPRD Data Dictionary Windows software. It is important that the file .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://github.com/acnash/rdrugtrajectory https://github.com/acnash/rdrugtrajectory https://github.com/acnash/rdrugtrajectory https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 3 Term Description rdrugtrajectory An R packaged designed for the management of CPRD prescription data. clinical The ClinicalNNN.txt dataset presented in a rdrugtrajectory dataframe. referral The ReferralNNN.txt dataset presented in a rdrugtrajectory dataframe. therapy The TherapyNNN.txt dataset presented in a rdrugtrajectory dataframe. AdditionalNNN.txt The CPRD dataset of additional clinical information, for example, patient smoking status and alcohol comsumption. Data can be retrieved using CPRDLookups.R. modecode A CPRD identifier that denotes medical conditions, diagnosis and com- plaints made by a patient. medcodes are recorded in the ClinicalNNN.txt and ReferralNNN.txt files. prodcode A CPRD identifier that denotes treatment products, including drugs, foods, and medical apparatus. prodcodes are recorded in the Thera- pyNNN.txt files. patid A unique CPRD patient identifier. Used to link datasets. event Any procode or medcode in a patient’s EHR. eventdate The date of an event recorded by a general practitioner. Present in all three datasets and corresponding rdrugtrajectory dataframe. IMD Index of Multiple Deprivation score - a UK Government socioeconomic measurement based on postcode of the clinic or a patient’s registered ad- dress. Prescription A general time for any prodcode prescribed for treatment. medical history Indicates a combination of one or more sets of CPRD data, for example, the collection of all clinical and therapy EHR for patients with a medcode for migraine. product.txt A plain text file that contains all prodcodes with a description and comes bundled with the CPRD Data Dictionary. The file is used to link a prodcode with a description. Table 1: Table of frequently used terms. remains in plain text, with columns tab-delimited. The files can be simplified by removing all non-essential products. Finally, all the eleven columns that make up the product.txt file must be available, with the first column containing all prodcodes and the fourth column containing the product description. A simplified product.txt file, presented below, can be downloaded from the Github page. > library(rdrugtrajectory) > productDF <- read.csv("../RDrugTrajectory_Data/product.txt", + sep="\t", + header=FALSE) > head(productDF) V1 V2 V3 V4 V5 1 5 60153020 14958680 Atenolol 50mg tablets Atenolol 2 24 60152020 5354283 Atenolol 100mg tablets Atenolol 3 26 67920020 6869099 Atenolol 25mg tablets Atenolol .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 4 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 4 49 58950020 4920857 Amitriptyline 25mg tablets Amitriptyline hydrochloride 5 65 68572020 4771731 Lisinopril 10mg tablets Lisinopril 6 78 68571020 4006669 Lisinopril 5mg tablets Lisinopril V6 V7 V8 V9 1 50mg Tablet Oral 2040000 2 100mg Tablet Oral 2040000 3 25mg Tablet Oral 2040000 4 25mg Tablet Oral 04030100/04070300/04070402 5 10mg Tablet Oral 2050501 6 5mg Tablet Oral 2050501 V10 1 Beta-adrenoceptor Blocking Drugs 2 Beta-adrenoceptor Blocking Drugs 3 Beta-adrenoceptor Blocking Drugs 4 Tricyclic And Related Antidepressant Drugs/Neuropathic Pain/Prophylaxis Of Migraine 5 Angiotensin-converting Enzyme Inhibitors 6 Angiotensin-converting Enzyme Inhibitors V11 V12 1 Feb-09 3059002 2 Feb-09 3059001 3 Feb-09 5070002 4 Feb-09 2776002 5 Feb-09 5250003 6 Feb-09 5250002 2.3. rdrugtrajectory package structure rdrugtrajectory contains three R files: (1) all functions related to data curating and search- ing reside within PRDDrugTrajectory.R; (2) analysis tools and timeline construction reside within CPRDDrugTrajectoryStats.R; and, (3) all utilities including input/output operations reside within CPRDDrugTrajectoryUtils.R. The packages contains several fabricated CPRD datasets: testClinicalDF, testTherapyDF, ageGenderDF, imdDF, and drugListDF. A de- scription of each, along with information on data types and structures are given below. 2.4. The CPRD EHR data structure The structure of CPRD Gold data may depend on whether the CPRD license holder per- forms intermediate data management steps before releasing data to the user. However, typ- ically, CPRD Gold data follows the CPRD Gold specification https://cprdcw.cprd.com/ _docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf. Currently, rdrugtrajectory sup- ports EHR data from the flat files ClinicalNNN.txt, ReferralNNN.txt, and TherapyNNN.txt. The Additional Clinical Details files (AdditionalNNN.txt) are currently supported using our re- leased R script CPRDLookups.R https://github.com/acnash/CPRD_Additional_Clinical ?. Patients are assigned a unique numerical patid value. The operations performed by rdrugtra- jectory requires the patid to identify patients and subset patient groups. We recommend that patid, medcode, prodcode are kept as character data throughout any preliminary data curating .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf https://github.com/acnash/CPRD_Additional_Clinical https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 5 steps. Medical events are recorded as codes and stored in the ClinicalNNN.txt and Refer- ralNNN.txt under the column header medcode. Prescription events, such as drug prescriptions are also recorded as codes and stored in the TherapyNNN.txt file under the column header prodcode and the sequences of repeat prescriptions are under the issueseq column header. Dates associated medical and prescription events, recorded by the General Practitioner, are stored under the column header eventdate. 2.5. Essential data types and data structures rdrugtrajectory can operate over CPRD Gold EHR clinical, referral and prescription data provided each dataset format is presented as separate R dataframes or combined into a rdrug- trajectory medical history dataframe. The construction of clinical, referral and prescription dataframes require, as a minimum, a patid and eventdate column, and either medcode or prod- code (for therapy data, issueseq is necessary), and presented in that order. Every record of medcode or prodcode must be accompanied by an eventdate entry (encoded as a Date class of the form YYYY-MM-DD). Patients can have duplicate events within the same data set and between data sets. Medical and prescription codes can be retrieved from the corresponding medical.txt and product.txt files which come bundled with the CPRD Data Dictionary Win- dows application. rdrugtrajectory comes packaged with fabricated EHR data in the structure of: > library(rdrugtrajectory) > #fabricated clinical data (referral data follows the same format) > names(testClinicalDF) [1] "patid" "eventdate" "medcode" "consid" > #fabricated prescription data > names(testTherapyDF) [1] "patid" "eventdate" "prodcode" "consid" "issueseq" Users can check if the structure of an EHR dataframe meets the requirements for this package by calling checkCPRDRecord; additional columns such as consultation identification number (consid) are not considered. In the following instance, a prescription dataset with the required columns and the optional consultation identification number is presented. > library(rdrugtrajectory) > #check the structure of testTherapy, specify that it is therapy data > checkCPRDRecord(df=testTherapyDF, dataType="therapy") [1] "The data.frame is appropriately formatted. Returning TRUE." [1] TRUE > #display the rdrugtrajectory EHR therapy dataframe > str(testTherapyDF, strict.width="wrap") .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 6 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 'data.frame': 91647 obs. of 5 variables: $ patid : int 3515 3515 3515 3515 3515 3515 3515 3653 3653 3653 ... $ eventdate: Date, format: "2005-02-24" "2006-01-26" ... $ prodcode : int 83 83 83 707 707 707 707 297 297 297 ... $ consid : int 540850 540865 540892 541108 541114 541118 541133 571336 571345 571357 ... $ issueseq : int 0 0 0 0 0 0 0 0 1 2 ... Users can combine with the rdrugtrajectory EHR dataframes any number of patient and EHR data to act as covariates and stratifying variables, typically this can be done using the R cbind operation. For example, BMI and smoking status, both of which can be retrieved from the AdditionalNNN.txt dataset files using CPRDLookups.R, can be linked by searching for and binding with the record patid values. The rdrugtrajectory package contains several utility functions to retrieve CPRD data, including, patient year of birth, gender (male or female) and either patient-level or clinical-level index of multiple deprivation score (IMD). The patient age can be determined by adding 1800 to the value in yob column in the Patient CPRD EHR dataset and then subtracting that value (birth year) from the year of the CPRD database release. This data requires preliminary treatment before presenting to the rdrugtrajectory package. Patient age, gender and IMD score must be presented in a dataframe with the linked patient column patid, along with the columns age, gender, and score. Providing the patid column is preserved, patient characteristics can be presented in separate dataframe, for example: > library(rdrugtrajectory) > #patient age and gender as one dataframe > str(ageGenderDF, strict.width="wrap") 'data.frame': 3838 obs. of 3 variables: $ patid : int 1 2 3 4 5 6 7 8 9 10 ... $ yob : num 45 35 33 42 63 57 34 51 51 22 ... $ gender: int 2 2 1 2 2 1 2 2 2 1 ... > #clinic-level IMD score as one datafrmae > str(imdDF, strict.width="wrap") 'data.frame': 2126 obs. of 3 variables: $ patid : int 6 11 16 34 42 44 54 60 63 79 ... $ pracid: int 184 31 66 344 66 47 18 90 379 317 ... $ score : int 1 3 1 4 1 2 1 5 1 2 ... The patid patient identifier is fundamental in every operation performed by rdrugtrajectory. The examples presented here and those in the reference manual rely on searching and subset- ting EHR data using a list or vector of patient identifier. The function getUniquePatidList will retrieve an R List of patient identification numbers from any dataframe with a patid column. The aforementioned rdrugtrajectory EHR dataframes, clinical, referral and therapy, can be combined into a single dataframe. We refer to this dataset instance as the patient’s medical .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 7 history and can be constructed using constructMedicalHistory. This dataframe expects events to be in chronological order, and will introduce a new column, code and codetype to denote each of the combined events. The code (medcode and/or prodcode) can be distinguished by a codetype value of c (clinical events), r (referral events), and t (prescription events). Events are returned in chronological order using the eventdate data. The following code demonstrates how to retrieve a list of patient identifier from a prescription dataframe and from a medical history dataframe, followed by how to subset using base R operations and, finally, the medical history dataframe structure. > library(rdrugtrajectory) > #Retrieve patids from therapy data. > idList <- getUniquePatidList(testClinicalDF) > medHistoryDF <- constructMedicalHistory(testClinicalDF, NULL, testTherapyDF) [1] "Using clinical data." [1] "Using therapy data." [1] "Building with clinical and therapy data." > #Retrieve patid from medical history. > medHistoryIDList <- getUniquePatidList(medHistoryDF) > numOfPatients <- length(medHistoryIDList) > #Subset using the first 100 patients. > smallMedHistoryDF <- subset(medHistoryDF, + medHistoryDF$patid %in% medHistoryIDList[1:100]) > #Separate out the first 100 patient with a clinical record. > smallClinicalOnlyDF <- subset(smallMedHistoryDF, + smallMedHistoryDF$codetype == "c") > #Separate out the first 100 patient with a therapy record. > smallTherapyOnlyDF <- subset(smallMedHistoryDF, + smallMedHistoryDF$codetype == "t") > #Subset only or those patient records beyond 31st Jan 2010. > laterMedHistoryDF <- subset(medHistoryDF, + medHistoryDF$eventdate > as.Date("2010-01-31")) > #Medical history dataframe structure > str(medHistoryDF, strict.width="wrap") 'data.frame': 103336 obs. of 4 variables: $ patid : int 1 1 1 1 1 1 1 2 2 3 ... $ eventdate: Date, format: "2002-06-07" "2005-07-25" ... $ code : int 5767 5767 5767 707 707 707 707 5767 769 5767 ... $ codetype : chr "c" "c" "c" "t" ... The patid data can also be used to retrieve patient characteristics, for example, the gender of the patient using getGenderOfPatients: .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 8 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records > library(rdrugtrajectory) > idList <- getUniquePatidList(testTherapyDF) > #Only use half of the cohort. > idList <- idList[1:(length(idList)/2)] > #Get gender data by specific gender. > maleCode <- 1 > femaleCode <- 2 > malePatientsDF <- getGenderOfPatients(idList, ageGenderDF, maleCode) > femalePatientsDF <- getGenderOfPatients(idList, ageGenderDF, femaleCode) > #Get all gender data > allPatientsDF <- getGenderOfPatients(getUniquePatidList(testTherapyDF), + ageGenderDF) > #Structure of the patient gender data. > str(allPatientsDF, strict.width="wrap") 'data.frame': 3838 obs. of 2 variables: $ patid : int 1 2 3 4 5 6 7 8 9 10 ... $ gender: int 2 2 1 2 2 1 2 2 2 1 ... IMD data can be retrieved by combining getUniquePatidList and getIMDOfPatients func- tions: > library(rdrugtrajectory) > idList <- getUniquePatidList(testTherapyDF) > #Get patients with an IMD score of 1 or 2 > onePatientsDF <- getIMDOfPatients(idList, imdDF, 1) > twoPatientsDF <- getIMDOfPatients(idList, imdDF, 2) > #Get all IMD scores for all patients in testTherapyDF > allPatientsDF <- getIMDOfPatients(getUniquePatidList(testTherapyDF), imdDF) > #Structure of the patient gender data. > str(allPatientsDF, strict.width="wrap") 'data.frame': 2123 obs. of 2 variables: $ patid: int 6 11 16 34 42 44 54 60 63 79 ... $ score: int 1 3 1 4 1 2 1 5 1 2 ... The final example of EHR dataframe manipulation presented here demonstrates how to re- trieve all prescription records for patients prescribed a specific prescription treatment. For example, such an operation can be used to retrieve all prescription records for any patient prescribed amitriptyline. In addition, it is also possible to return only prescription records matching specific prescription treatments. Importantly, prescription prodcodes can be grouped into lists and used to collect those patients with at least one record that matches an element of that list. This approach is useful if the dose is not relevant to the study or the prescription is dispensed under multiple product names. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 9 > library(rdrugtrajectory) > #It is easy to retrieve a list of all unique prodcodes in the cohort. > prodCodesVector <- unique(testTherapyDF$prodcode) > reducedProdCodesVector <- prodCodesVector[1:10] > #All records are maintained for those patients with a matching prodcode. > therapyOfInterestDF <- getPatientsWithProdCode(testTherapyDF, + reducedProdCodesVector) > #Only those records that match are retained. > reducedTherapyOfInterestDF <- getPatientsWithProdCode(testTherapyDF, + reducedProdCodesVector, + removeExcessDrugs=TRUE) 3. EHR drug prescription results and discussion Having briefly demonstrated some basic operation on retrieving patient records by matching EHR dataframes against sets of patid values, we move on to showcase several operations available to the user. We begin by presenting examples of cohort prescription summary statistics followed by methods of dataset curating and stratifying by patient groups. We then present examples on how to search for patients prescribed with a first-line treatments, followed by presenting some of these patient groups as sequences of prescriptions. Finally, we demonstrate several examples of building time-lines. For futher examples, please see the Github page and reference manual. 3.1. Cohort summmary statistics getEventdateSummaryByPatient rdrugtrajectory can return summary based statistics on patient and cohort level prescription data with getEventdateSummaryByPatient and getPopulationDrugSummary, respectively. For example, a single patient (via getUniquePatidList and [] dataframe subsetting) pre- scription history returns the patient patid, number of prescription events, median number of days between events, fewest number of days between events, the most number of days between events (maxTime and longestDuration are the same), and record duration (number of days between the first and last prescription event on record): > library(rdrugtrajectory) > idList <- getUniquePatidList(testTherapyDF) > resultList <- getEventdateSummaryByPatient( + testTherapyDF[testTherapyDF$patid==idList[[1]],]) > str(resultList, strict.width="wrap") List of 2 $ TimeSeriesList: num [1:6] 336 652 2540 34 42 44 $ SummaryDF :'data.frame': 1 obs. of 7 variables: ..$ patid : int 3515 ..$ numberOfEvents : int 7 .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 10 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records ..$ medianTime : num 190 ..$ minTime : num 34 ..$ maxTime : num 2540 ..$ longestDuration: num 2540 ..$ recordDuration : int 3648 - attr(*, "class")= chr "EventdateSummaryObj" getPopulationDrugSummary This approach can be extended across the cohort of patients with getPopulationDrugSummary. The returning PopulationEventdateSummary S3 object is a list of three elements. The first element is the SummaryDF dataframe derived from calling getEventdateSummaryByPatient per patient, with the set of statistics retrievable through the accompanied patid. The second element is the TimeSeriesList, which holds a vector per patient of the number of days between consecutive prescription events. Vectors can be accessed using the patid element name: > library(rdrugtrajectory) > resultList <- getPopulationDrugSummary(df = testTherapyDF, + prodCodesVector = NULL) > str(resultList, strict.width="wrap", list.len = 5) List of 2 $ SummaryDF :'data.frame': 3838 obs. of 7 variables: ..$ patid : int [1:3838] 3515 3653 3756 3813 435 553 731 891 1781 1991 ... ..$ numberOfEvents : int [1:3838] 7 21 1 1 13 2 15 2 23 79 ... ..$ medianTime : num [1:3838] 190 60 0 0 28.5 ... ..$ minTime : num [1:3838] 34 34 0 0 11 ... ..$ maxTime : num [1:3838] 2540 1623 0 0 322 ... .. [list output truncated] $ TimeSeriesList:List of 3838 ..$ 3515: num [1:6] 336 652 2540 34 42 44 ..$ 3653: num [1:20] 890 222 182 301 539 ... ..$ 3756: num 0 ..$ 3813: num 0 ..$ 435 : num [1:12] 26 23 24 24 32 322 31 29 11 51 ... .. [list output truncated] - attr(*, "class")= chr "PopulationEventdateSummary" > #Get all patids for patients younger than 40. > ageIDList <- getUniquePatidList(ageGenderDF[ageGenderDF$yob < 40,]) > timeSeriesList <- resultList[[2]] > #Get all patids of available data. > recordPatids <- names(timeSeriesList) > #Get time data for the intersect of those patids of patients < 40 and the patids > #of available data. > subTimeList <- timeSeriesList[intersect(ageIDList, recordPatids)] > str(subTimeList, strict.width="wrap", list.len = 5) .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 11 List of 640 $ 2 : num 0 $ 3 : num 0 $ 7 : num 25 $ 10 : num 0 $ 15 : num 0 [list output truncated] 3.2. Curating drug prescription records There is no direct link between a prescription event and a medcode in the CPRD data. The relationship between the two can be inferred from the event dates of the prescription and clinical events, in addition, to information provided by the consultation ID and the prescription issue number. matchDrugWithDisease rdrugtrajectory provides several methods for curating prescription datasets with the aim of es- tablishing a relationship between prescription and clinical events. The matchDrugWithDisease function returns a subset of all prescription events with an established relationship between therapy and clinical event. To what degree these patients are included in the search is con- trolled with a function argument. There are three scenarios: all patients with a record of a specific prescription event and specific clinical event, at any point; all patients with a record of a specific prescription event on the same date as a specific clinical event; and, all patients with a record of a specific prescription event on the same date as a specific clinical event and clear from additional clinical events on that day. One would expect fewer patients as the stringency of the search criteria is increased: > library(rdrugtrajectory) > prodcodes <- unique(testTherapyDF$prodcode) > amitriptylineCodes <- prodcodes[1:5] > propranololCodes <- prodcodes[6:11] > medcodeList <- unique(testClinicalDF$medcode) > headacheCodes <- medcodeList[1:10] > amitriptylineResult1 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1) > amitriptylineResult2 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 2) > amitriptylineResult3 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 12 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records + drugcodeList = amitriptylineCodes, + severity = 3) > propranololResult1 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = propranololCodes, + severity = 1) > propranololResult2 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = propranololCodes, + severity = 2) > propranololResult3 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = propranololCodes, + severity = 3) getGenderOfPatients The example presented, demonstrates how to identify patients prescribed amitriptyline and patients prescribed propranolol (there is patient overlap, easily controlled for by subsetting) whilst controlling for clinical overlap with or without consideration for off topic clinical events. With the identified patients, we can, for example, stratify by gender: > library(rdrugtrajectory) > library(ggplot2) > ami1Gender <- getGenderOfPatients(amitriptylineResult1, ageGenderDF) > ami2Gender <- getGenderOfPatients(amitriptylineResult2, ageGenderDF) > ami3Gender <- getGenderOfPatients(amitriptylineResult3, ageGenderDF) > prop1Gender <- getGenderOfPatients(propranololResult1, ageGenderDF) > prop2Gender <- getGenderOfPatients(propranololResult2, ageGenderDF) > prop3Gender <- getGenderOfPatients(propranololResult3, ageGenderDF) > amiDF <- data.frame(Freq=c(nrow(ami1Gender[ami1Gender$gender==1, ]), + nrow(ami2Gender[ami2Gender$gender==1, ]), + nrow(ami3Gender[ami3Gender$gender==1, ]), + nrow(ami1Gender[ami1Gender$gender==2, ]), + nrow(ami2Gender[ami2Gender$gender==2, ]), + nrow(ami3Gender[ami3Gender$gender==2, ]) + ), + Search=c("Prescribed","With headache","No comorbidities", + "Prescribed","With headache","No comorbidities"), + Drug="Amitriptyline", + Gender=c("Male","Male","Male", + "Female","Female","Female") + ) .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 13 > propDF <- data.frame(Freq=c(nrow(prop1Gender[prop1Gender$gender==1, ]), + nrow(prop2Gender[prop2Gender$gender==1, ]), + nrow(prop3Gender[prop3Gender$gender==1, ]), + nrow(prop1Gender[prop1Gender$gender==2, ]), + nrow(prop2Gender[prop2Gender$gender==2, ]), + nrow(prop3Gender[prop3Gender$gender==2, ]) + ), + Search=c("At any time","With clinical","Clinical & No comorbidities", + "At any time","With clinical","Clinical & No comorbidities"), + Drug="Propranolol", + Gender=c("Male","Male","Male", + "Female","Female","Female") + ) > drugPrescriptionDF <- rbind(amiDF, propDF) > ggPrescriptionAmi <- ggplot(drugPrescriptionDF[ + drugPrescriptionDF$Drug=="Amitriptyline",], + aes(x=Search, y=Freq, fill=Gender)) + + geom_bar(stat="identity", position=position_dodge()) + + theme_bw() + xlab("Search critera (severity)") + ylab("Patient count") + + theme(axis.text.x = element_text(angle=45,hjust=1)) + + ggtitle("Amitriptyline") > ggPrescriptionProp <- ggplot(drugPrescriptionDF[ + drugPrescriptionDF$Drug=="Propranolol",], + aes(x=Search, y=Freq, fill=Gender)) + + geom_bar(stat="identity", position=position_dodge()) + + theme_bw() + xlab("Search critera (severity)") + ylab("Patient count") + + theme(axis.text.x = element_text(angle=45,hjust=1)) + + ggtitle("Propranolol") > Filtering through prescription events can also be controlled by a date range. For example, if one was calculating the number of patients prescribed amitriptyline per year from 2000 to 2004 and matched to a headache event, one can apply a date range: > library(rdrugtrajectory) > library(ggplot2) > prodcodes <- unique(testTherapyDF$prodcode) > amitriptylineCodes <- prodcodes[1:5] > #Clinical event of interest are headaches. > medcodeList <- unique(testClinicalDF$medcode) > #Medcodes can be refined further. > headacheCodes <- medcodeList[1:10] > #Dataframes defined for binned dates are constructed by providing all the > #patients to consider and the binned start and stop date. > date2000DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)), + start=as.Date(as.character("2000-01-01")), + stop=as.Date(as.character("2000-12-31"))) .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 14 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 0 500 1000 1500 2000 No c om or bi di tie s Pr es cr ib ed W ith h ea da ch e Search critera (severity) P a tie n t co u n t Gender Female Male AmitriptylineA 0 250 500 750 1000 At a ny ti m e Cl in ica l & N o co m or bi di tie s W ith c lin ica l Search critera (severity) P a tie n t co u n t Gender Female Male PropranololB Figure 1: The number of patients prescribed (A) amitriptyline or (B) propranolol. The criteria to match against clinical data is indicated: at any time, with a clinical record, and with a clinical record clear off topic clinical events. > date2001DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)), + start=as.Date(as.character("2001-01-01")), + stop=as.Date(as.character("2001-12-31"))) > date2002DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)), + start=as.Date(as.character("2002-01-01")), + stop=as.Date(as.character("2002-12-31"))) > date2003DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)), + start=as.Date(as.character("2003-01-01")), + stop=as.Date(as.character("2003-12-31"))) > date2004DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)), + start=as.Date(as.character("2004-01-01")), + stop=as.Date(as.character("2004-12-31"))) > #Retrieve prescription frequencies per binned range > amitResult2000 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 15 + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1, + dateDF = date2000DF) > amitResult2001 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1, + dateDF = date2001DF) > amitResult2002 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1, + dateDF = date2002DF) > amitResult2003 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1, + dateDF = date2003DF) > amitResult2004 <- matchDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medcodeList = headacheCodes, + drugcodeList = amitriptylineCodes, + severity = 1, + dateDF = date2004DF) > #The number of patids returned by matchDrugWithDisease is equal to the number > #of patients with a drug - disease match per year > dataDF <- data.frame(Year=c("2000","2001","2002","2003","2004"), + Count=c(length(amitResult2000),length(amitResult2001), + length(amitResult2002),length(amitResult2003), + length(amitResult2004))) > ggPrescriptionYear <- ggplot(dataDF, aes(x=Year, y=Count)) + + geom_bar(stat = "identity") + theme_bw() getPatientsWithFirstDrugWithDisease Unlike matchDrugWithDisease which retrieves patients with a prescription event matching clinical criteria at any time within a CPRD EHR record, getPatientsWithFirstDrugWithDisease identifies patients with a first prescription event that matches a desired clinical event. Please note, care must be taken when searching for medication with off-label uses. For example, beta-blockers are frequently prescribed to treat hypertension and arrhythmia, however, the beta-blocker propranolol is also prescribed to treat migraine. Without in depth analysis into the patient history, patients propranolol with records for hypertension or arrhythmia in addi- tion to migraine on a matching eventdate with the first propranolol prescription, could result .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 16 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 0 100 200 300 2000 2001 2002 2003 2004 Year C o u n t Figure 2: The number of patients prescribed amitriptyline from the start of the year 2000 to the end of 2004, stratified in year intervals. in a misleading disease-drug association. In cases where a health care professional suggests a change in the patient’s lifestyle choices, that patient may have several clinical events free from prescriptions before the first prescription of interest is prescribed. Using basic subsetting one can calculate the number of clinical events before the patient’s first prescription intervention (Figure 3 A). Further more, we can stratify patients into subgroups (Figure 3 B): > library(rdrugtrajectory) > library(ggplot2) > #A vector of prescriptions of interest. > drugList <- unique(testTherapyDF$prodcode) > sampleDrugs <- drugList[1:8] > #A vector of clinical events to match prescriptions against. > medCodes <- unique(testClinicalDF$medcode) > sampleMedCodes <- medCodes[1:30] > #Returns the subset of the first prescription event prescribed on the same > #eventdate as those clinical events of interest .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 17 > firstDF <- getPatientsWithFirstDrugWithDisease(clinicalDF = testClinicalDF, + therapyDF = testTherapyDF, + medCodesVector = sampleMedCodes, + drugCodesVector = sampleDrugs) > #Ensure the only clinical data are for those with an assume first-drug-disease > firstClinicalDF <- subset(testClinicalDF, + testClinicalDF$patid %in% getUniquePatidList(firstDF)) > #Only keep the diseases of interest > firstClinicalDF <- subset(firstClinicalDF, + firstClinicalDF$medcode %in% sampleMedCodes) > #Only keep the prescriptions of interest > firstDF <- subset(firstDF, firstDF$prodcode %in% sampleDrugs) > idList <- getUniquePatidList(firstClinicalDF) > beforeResultDF <- data.frame(patid=unlist(idList), Freq=0) > for(id in idList) { + #Retrieve the clinical/therapy data for each patients, one by one. + indClinicalDF <- subset(firstClinicalDF, firstClinicalDF$patid == id) + indTherapyDF <- subset(firstDF, firstDF$patid == id) + #Get the first event date on record; this will match a clinical date. + firstEventDate <- indTherapyDF$eventdate[1] + clinicalBeforeTherapyDF <- subset(indClinicalDF, + indClinicalDF$eventdate < firstEventDate) + #Number of clinical complaints before first prescription. + nComplaints <- nrow(clinicalBeforeTherapyDF) + beforeResultDF[beforeResultDF$patid==id,]$Freq <- nComplaints + } > ggBefore <- ggplot(beforeResultDF, aes(x=Freq)) + + geom_histogram(binwidth=1, color="black", fill="white") + + ylab("Patients") + xlab("Clinical events before prescription") + + theme_bw() > #Note: not every patient will have a clinical IMD score. > imdIDsDF <- getIMDOfPatients(idList = idList, + imdDF = imdDF) > #Only work with those with an IMD score. > imdResultsDF <- subset(beforeResultDF, + beforeResultDF$patid %in% getUniquePatidList(imdIDsDF)) > imdResultsDF <- imdResultsDF[order(imdResultsDF$patid),] > imdIDsDF <- imdIDsDF[order(imdIDsDF$patid),] > imdResultsDF <- cbind(imdResultsDF, IMD_score=as.factor(imdIDsDF$score)) > ggBeforeIMD <- ggplot(imdResultsDF, + aes(x=Freq, fill=IMD_score)) + + geom_histogram(binwidth=1) + theme_bw() + + ylab("Patients") + xlab("Clinical events before prescription") getMultiPrescriptionSameDayPatients The function getMultiPrescriptionSameDayPatients returns all prescription events for .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 18 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 0 100 200 0 10 20 Clinical events before prescription P a tie n ts A 0 50 100 150 0 10 20 Clinical events before prescription P a tie n ts IMD_score 1 2 3 4 5 B Figure 3: The number of clinical events before the first treatment across the whole cohort (A), and by IMD score (B). those patients prescribed more than two prescriptions on the same date. All events of those pa- tients without a prescription prodcode event can be removed. Combining getMultiplePrescriptionSameDayPatients with getPatientsWithFirstDrugWithDisease or matchDrugWithDisease is useful for filter- ing patients for specific prescription patterns. For example, to retrieve all patient prescription records if specific prescriptions are (a) never recorded together on the same date and (b) are used as a first line treatment for a given complaint: > library(rdrugtrajectory) > prodcodesVector = unique(testTherapyDF$prodcode)[1:8] > #ensure only patients with specific prescriptions are returned providing a > #patient is prescribed those drugs on different dates, never on the same date. > uniqueTherapyDF <- getMultiPrescriptionSameDayPatients(df = testTherapyDF, + prodCodesVector = prodcodesVector, + removePatientsWithoutDrugs = TRUE) > #Ensure that the patients (patid) in the therapy and clinical dataframes > #are the same. Subsetting might not be enough. > reducedClinicalDF <- subset(testClinicalDF, .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 19 + testClinicalDF$patid %in% getUniquePatidList(uniqueTherapyDF)) > #Specific medcodes have not been provided. All medcodes in the clinical > #dataframe are considered. This is possible if one either one is not interested > #in the nature of the clinical complaint or the clinical dataframe has been > #adjusted to only include clinical complaints of interest. > firstDF <- getPatientsWithFirstDrugWithDisease(clinicalDF = reducedClinicalDF, + therapyDF = uniqueTherapyDF, + drugCodesVector = sampleDrugs) In the above example, patients with more than one prescription on the same date or without a prescription at all (from the set of desired prescription prodcodes) were removed from the cohort. This reduced the number of patients from 3838 patients to 2930. Next, only those patients with a first line treatment (first prescription event on the same date as a clinical event) were kept, reducing the sample size to 587 patients. removePatientsByDuration Longitudinal EHR cohort studies often requires careful time-related consideration. Currently, rdrugtrajectory presents two functions that identify prescription records of patients that match two time constraints. The first, removePatientsByDuration, removes all patients with prescription events that are no more than n years between consecutive events or removes patients if the duration between the first and last prescription event on record is less than n years. > library(rdrugtrajectory) > df <- removePatientsByDuration(minObsYr = 5, + minBreakYr = 2, + therapyDF = testTherapyDF) getBurnInPatients The second time-related function, getBurnInPatients identifies all patient prescription records with at least n days free from prescription events before a specific prescription event. This is useful if one requires a period of time free from prescription intervention before a given prescription event: > library(rdrugtrajectory) > drugOfInterestVector <- c(83,49,297,1888,940,5) > patientList <- getBurnInPatients(df = testTherapyDF, + startCodesVector = drugOfInterestVector, + periodDaysBefore = 172) > burnInTherapyDF <- subset(testTherapyDF, + testTherapyDF$patid %in% patientList) In the above example, from a cohort of 3838 patients, 426 patients had a period of up to 172 days free from of prescription events before the first prescription prodcode specified via the startCodesVector argument. The functionality relies on the patient having prescription events before the burn-in period (required to define whether the patient had a CPRD record early .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 20 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records enough before the burn-in period began). For example, this patient had over three years of prescription events before the prescription of interest (from 2003-05-29 to 2007-10-17 with over 172 days free from exposure before the prescription event of interest prodcode 297: > head(burnInTherapyDF[burnInTherapyDF$patid == 332412,], n=9) [1] patid eventdate prodcode consid issueseq <0 rows> (or 0-length row.names) 3.3. First drug prescriptions getFirstDrugPrescription A patient’s first prescription event on CPRD record can be identified by supplying getFirstDrugPrescription with a list of prescription prodcodes. The functions returns FirstDrugObject, an R S3 ob- ject of type List. Only the first prescription event to match anyone one of the prescription prodcodes provided is identified. The first element of FirstDrugObject contains a named list of patid vectors. Each vector contains the patids of all those patients that share the same first prescription prodcode. The list element is named after the corresponding prescription prodcode. The second element in FirstDrugOject, like the first, is a list of Date vectors, each named after the corresponding prescription prodcode. Each Date vector contains the eventdate of the prescription event for the patient identified by the patid in the identical position of the preceding List. The third list element contains a table of prescription frequencies for each first prescription prodcode on record. The prodcode is accompanied by a product description providing a file of CPRD prescription products has been provided. Below we demonstrate how to retrieve information on first-line treatment: > library(rdrugtrajectory) > library(ggplot2) > #An adjusted data dictionary file. > fileLocation <- "product.txt" > #Without supplying a vector of product files all prodcodes in the therapy > #dataset are considered. > resultFDO <- getFirstDrugPrescription(df = testTherapyDF, + idList = NULL, + prodCodesVector = NULL, + descriptionFile = fileLocation) > patidList <- resultFDO[[1]] > eventdateList <- resultFDO[[2]] > drugFrequencyDF <- resultFDO[[3]] > drugFrequencyDF <- drugFrequencyDF[order(drugFrequencyDF$Frequency, + decreasing = TRUE), ] > ggFreq <- ggplot(data=drugFrequencyDF, aes(x=description, y=Frequency)) + + geom_bar(stat="identity") + theme_bw() + + theme(axis.text.x = element_text(angle=45, hjust=1)) + + xlab("Drug product description") .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 21 > #The structure of the FirstDrugObject. > str(resultFDO, strict.width="wrap", list.len = 5) 0 500 1000 Am itr ip ty lin e 10 m g ta bl et s Am itr ip ty lin e 25 m g ta bl et s Am itr ip ty lin e 50 m g ta bl et s At en ol ol 1 00 m g ta bl et s At en ol ol 2 5m g ta bl et s At en ol ol 5 0m g ta bl et s Ca nd es ar ta n 2m g ta bl et s Ca nd es ar ta n 4m g ta bl et s Li sin op ril 1 0m g ta bl et s Li sin op ril 2 .5 m g ta bl et s Li sin op ril 5 m g ta bl et s Pr op ra no lo l 1 0m g ta bl et s Pr op ra no lo l 4 0m g ta bl et s Pr op ra no lo l 8 0m g m od ifie d− re le as e ca ps ul es Pr op ra no lo l 8 0m g ta bl et s To pi ra m at e 25 m g ta bl et s Ve nl af ax in e 37 .5 m g ta bl et s Ve nl af ax in e 75 m g m od ifie d− re le as e ca ps ul es Ve nl af ax in e 75 m g m od ifie d− re le as e ta bl et s Drug product description F re q u e n cy Figure 4: The frequency of first line treatment prescription. getAgeGroupByEvents In the next example we explore stratifying first-line prescription events by patient character- istics, such as, age, gender, IMD, and number of medcodes (for instance, by comorbidities) or prodcodes (for instance, to separate those patients by additional prescriptions), or by any additional clinical event retrieved using CPRDLookups.R ?. rdrugtrajectory provides several utility functions to stratify patients (see reference manual for further information). The func- tion getAgeGroupByEvents calculates the number of first-line prescription events by patient age. By specifying a set of patids and eventdates from the FirstDrugObject, we can calculate the number of first-line prescriptions by age-group for patients linked with a specified medical condition: > library(rdrugtrajectory) > fileLocation <- "product.txt" .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 22 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records > resultFDO <- getFirstDrugPrescription(df = testTherapyDF, + idList = NULL, + prodCodesVector = NULL, + descriptionFile = fileLocation) > patidList <- resultFDO[[1]] > eventdateList <- resultFDO[[2]] > names(ageGenderDF) <- c("patid","age","gender") > #The age-groups: [18,25), [25,30), [30,35), ..., [60,60+). > ageGroupVector <- c(18,25,30,35,40,45,50,55,60) > #CPRD database release year. > ageAtYear <- "2017" > ageGroupList <- getAgeGroupByEvents(idList = as.list(patidList[1:2]), + eventdateList = eventdateList[1:2], + ageDF = ageGenderDF, + ageGroupVector = ageGroupVector, + ageAtYear = ageAtYear) > ageGroupList [[1]] 18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60+ 1 103 94 106 131 165 182 153 185 240 [[2]] 18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60+ 1 45 39 35 23 43 34 32 18 25 In the above example, the age of each patient (ageDF) was provided using year-of-birth calcu- lated against the release year of the CPRD Gold database (explained above). By providing the database release year (in ageAtYear) and the first prescription eventdate (in eventdateList), the age of each patient is adjusted against the prescription eventdate year. Finally, by using a list slice on idList and eventdateList, (individual prescriptions can be specified using their prodcode, for example, eventdateList$‘105‘), first prescription prescriptions frequencies by age-group are retrievable (Figure 5). > library(ggplot2) > ageGroupDrugDF <- data.frame(Age=names(ageGroupList[[1]]), + Count=unlist(ageGroupList[[1]]), + Drug="Amitriptyline 10mg") > ggAmitriptyline <- ggplot(ageGroupDrugDF, aes(x=Age, y=Count)) + + geom_bar(stat="identity") + + theme_bw() + ggtitle("Amitriptyline 10mg") + + theme(axis.text.x = element_text(angle=45, hjust=1)) + + xlab("Age-group") + ylab("Frequency") .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 23 0 50 100 150 200 250 18 −2 4 25 −2 9 30 −3 4 35 −3 9 40 −4 4 45 −4 9 50 −5 4 55 −5 9 60 + Age−group F re q u e n cy Amitriptyline 10mg Figure 5: The distribution of Amitriptyline 10mg as a first-line treatment by age-group. 3.4. Prescription sequences mapDrugTrajectory Identifying patient prescription trajectories in longitudinal EHRs remains our biggest motiva- tor behind the development of rdrugtrajectory. Therefore, we developed mapDrugTrajectory to identify the chronological of patient prescription events. We restrict the calculation to only look for prescription prodcodes as supplied to groupingList as a named list (named prodcode vectors). The required number of grouped-prescription events is defined by specifying the minDepth and the number of those changes to display is controlled by maxDepth maximum number. By keeping minDepth and maxDepth the same, only patients with a valid number of prescription changes are displayed (Figure 6 (A) and (C)). Patient records with fewer than minDepth number of changes to prescription sequences are ignored (Figure 6 (B)). For further information please refer to the reference manual. In the code below, mapDrugTrajectory returns patients with at least first five grouped pre- scriptions. prodcodes that have not been grouped are ignored. Duplication of prodcodes (those from the same group) do not count as a change in treatment: .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 24 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records Figure 6: The distribution of grouped prodcodes across three patients. (A) Five groups of valid prescription prodcodes, (B) only three groups, (C) five valid groups, in addition to prodcodes 101 and 1 which are ignored. > library(ggplot2) > library(ggalluvial) > structureList <- list(Amitriptyline = c(83,49,1888), + Propranolol = c(707,297,769), + Topiramate = c(11237), + Venlafaxine = c(470,301,39359), + Lisinopril = c(78,65,277), + Atenolol = c(5,24,26), + Candesartan = c(531) + ) > resultList <- mapDrugTrajectory(df = testTherapyDF, + minDepth = 5, + maxDepth = 5, + groupingList = structureList, + removeUndefinedCode = TRUE) > df <- resultList[[3]] > ggSwitch <- ggplot(df, + aes(y = Freq, axis1 = FirstDrug, axis2 = Switch1, + axis3 = Switch2, axis4 = Switch3, axis5 = Switch4)) + + geom_alluvium(aes(fill = FirstDrug), width = 1/12) + + geom_stratum(width = 1/12, fill = "black", color = "grey") + + geom_label(stat = "stratum", infer.label = TRUE) + + scale_fill_brewer(type = "qual", palette = "Set1") + + theme_bw() + theme(legend.position = "none") + + scale_x_discrete(limits = c("First Drug", "1st Switch", "2nd Switch", + "3rd Switch","4th Switch"), + expand = c(.05, .05)) + + ggtitle("Migraine Preventative Switching Among Patients") .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 25 Venlafaxine Propranolol Lisinopril Atenolol Amitriptyline Candesartan Venlafaxine Propranolol Lisinopril Atenolol Amitriptyline TopiramateCandesartan Venlafaxine Propranolol Lisinopril Atenolol Amitriptyline Topiramate Candesartan Venlafaxine Propranolol Lisinopril Atenolol Amitriptyline TopiramateCandesartan Venlafaxine Propranolol Lisinopril Atenolol Amitriptyline 0 100 200 300 First Drug 1st Switch 2nd Switch 3rd Switch 4th Switch F re q Migraine Preventative Switching Among Patients Figure 7: Prescription pattern switching of seven different migraine preventatives. A patient required a a minimum of five changes in prescriptions (including the initial prescription) and, equally, the display was set to five changes in prescription. 3.5. Prescription timeline construction rdrugtrajectory contains several functions that transforms patient data into a format com- patible with mean cumulative function (MCF) semi-parametric estimates, prescription per- sistence, prescription incidence, and survival analysis. generateMCFOneGroup Prescription events are binned into weekly units to increase the statistical power at each time point. The user presents a group at a time, for example, all clinical events of male patients with a first-line prescription of amitriptyline for a migraine. The clinical data has already been refined using the steps for first-line prescription, as described above. The function generateMCFOneGroup accepts a dataframe or events, the MCF start date (eventdates are adjusted so all patient records in the dataset begin at the same time), and the minimum number of events per patients (by default this is two events). The following example presents the calculation of first prescription events, the assignment of gender and the calculation of .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 26 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records MCF of prescription (therapy dataframe) burden of amitriptyline and propranolol: > library(rdrugtrajectory) > fileLocation <- "product.txt" > resultList <- getFirstDrugPrescription(df = testTherapyDF, + idList = NULL, + prodCodesVector = NULL, + descriptionFile = fileLocation) > patidList <- resultList[[1]] > eventdateList <- resultList[[2]] > drugFrequencyDF <- resultList[[3]] > drugFrequencyDF <- drugFrequencyDF[order(drugFrequencyDF$Frequency, + decreasing = TRUE), ] > amitriptylinePatid <- patidList$`83` > propranololPatid <- patidList$`707` > maleCode <- 1 > malePatidsDF <- getGenderOfPatients(idList = getUniquePatidList(testTherapyDF), + genderDF = ageGenderDF, + genderCodeVector = maleCode) > amitriptylineMalePatids <- subset(amitriptylinePatid, + amitriptylinePatid %in% malePatidsDF$patid) > propranololMalePatids <- subset(propranololPatid, + propranololPatid %in% malePatidsDF$patid) > amiMaleTherapyDF <- subset(testTherapyDF, + testTherapyDF$patid %in% amitriptylineMalePatids) > propMaleTherapyDF <- subset(testTherapyDF, + testTherapyDF$patid %in% propranololMalePatids) > amiMaleMCFDF <- generateMCFOneGroup(therapyDF = amiMaleTherapyDF, + startDateCharVector = "2000-01-01", + minRecords = 2) > propMaleMCFDF <- generateMCFOneGroup(therapyDF = propMaleTherapyDF, + startDateCharVector = "2000-01-01", + minRecords = 2) > amiMaleMCFDF <- cbind(amiMaleMCFDF, Drug = "Amitriptyline") > propMaleMCFDF <- cbind(propMaleMCFDF, Drug = "Propranolol") > drugMCFDF <- rbind(amiMaleMCFDF, propMaleMCFDF) > resultMCF <- reda::mcf(reda::Recur(week, id, No.) ~ Drug, data = drugMCFDF) > mcfPlot <- reda::plot(resultMCF, conf.int=TRUE) + + ggplot2::xlab("Weeks") + ggplot2::theme_bw() + ggplot2::ggtitle("") getFirstDrugIncidenceRate Prescription incidence be calculated with getFirstDrugIncidenceRate. The following code demonstrates how to use a FirstDrugObject to calculate incidence rates for a set of prodcodes. The study observation starts from the enrollmentDate and ends at the studyEndDate: > library(rdrugtrajectory) .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 27 0 100 200 300 0 250 500 750 Weeks M C F E st im a te s Drug Amitriptyline Propranolol Figure 8: MCF of drug prescriptions of patients with a first drug prescription for either amitriptyline or propranolol, stratified by gender. The dotted lines indicate a 95% confidence interval. > fileLocation <- "product.txt" > drugList <- unique(testTherapyDF$prodcode) > requiredProds <- drugList[1:10] > firstDrugObject <- getFirstDrugPrescription(df = testTherapyDF, + idList = NULL, + prodCodesVector = requiredProds, + descriptionFile = fileLocation) > medhistoryDF <- constructMedicalHistory(testClinicalDF, NULL, testTherapyDF) > patidList <- unlist(firstDrugObject$patidList) > resultMatrix <- getFirstDrugIncidenceRate(firstDrugObject = firstDrugObject, + medHistoryDF = medhistoryDF, + enrollmentDate = as.Date("2000-01-01"), + studyEndDate = as.Date("2016-12-31")) > incidenceDF <- as.data.frame(t(resultMatrix), stringsAsFactors = TRUE) The above example returns an incidence rate of 0.11 per 17 person years over a cohort of .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 28 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records 3838 patients. For a detailed description please see Detail for getFirstDrugIncidenceRate in the reference manual. getDrugPersistence Prescription persistence is calculated as the fraction of patients with a prescription for a specific treatment N-days after the first prescription event. For example, if we wanted to calculate the fraction of patients with a prescription 365-days after their first prescription, with a 30-day buffer either side, one specifies a duration of 395-days and a preceding buffer of 60-days (therefore, capturing the range 335 to 395, 30-days either side of one calender year): > library(rdrugtrajectory) > patientList <- getDrugPersistence(therapyDF = testTherapyDF, + idList = NULL, + prodcodeList = NULL, + duration = 395, + buffer = 60, + endOfRecordDate = "2017-12-31") Of 3838 patient therapy records, 954 patients had a prescription 365 (+/- 30) days after the first prescription event on record, resulting in a crude fraction of only 0.25 patients. getDrugPersistence only observes events recorded precisely duration days after the first prescription. The buffer can be used to identify patients who received a prescription shortly after the end of the duration, but more importantly, to ensure patients actively undergoing treatment (indicated by a prescription shortly before the desired duration days) are included. As the buffer is reduced, the fraction of prescription persistence is reduced until the algorithm attempts to only identify patients with a prescription exactly duration of days after the first prescription. Future software updates will incorporate repeat prescription data to increase the accuracy of the calculation. 4. Closing remarks and future work rdrugtrajectory is an R package which has the potential for exciting applications such as im- proving clinical decision-making, identifying possible new treatments and analysing outcomes from existing treatments. We have demonstrated several functions, some of which detail sorting and matching records whilst others demonstrate fundamental statistical analysis. We used fabricated clinical and prescription dataframes, along with the age, gender and index of multiple deprivation score of each patient and presented analyses of cohort-wide prescrip- tion patterns, first-line treatment distributions, how to stratify by patient characteristics, and some basic tools to assist longitudinal analysis of prescriptions. The descriptions presented in this publication are not substitutes for the material in the reference manual. We recommend the reader consults the R ? help command or reference manual before running a function. In particular, functions related to the construction of timelines for survival analysis (time dependent/independent Cox regression, Kaplan Meier survival curves and mean cumulative function) or a matrix for drug incidence rate requires fine tuning of several parameters. .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ Journal of Statistical Software 29 0.0 0.1 0.2 0.3 0 25 50 75 100 125 Buffer size (n days before 365) F ra ct io n o f p re sc ri p tio n p e rs is te n ce Figure 9: The fraction of prescription persistence adjusted by a buffer number of days before a calender year. As the buffer approaches the value of duration the fraction approaches 1. The latest release of rdrugtrajectory along with source code and reference manual is available for download from https://github.com/acnash/rdrugtrajectory. Whilst active members of the scientific research community we will continue to add new features to rdrugtrajectory whilst making necessary improvements to existing features. Acknowledgements Oxford Science Innovation, NIHR Oxford Biomedical Research Centre and NIHR Oxford Health Biomedical Research Centre (Informatics and Digital Health theme, grant BRC-1215- 20005). Thanks to Dr Michelle Hardy for assistance with the article. References Bally M, Dendukuri N, Rich B, Nadeau L, Helin-Salmivaara A, Garbe E, Brophy JM (2017). “Risk of Acute Myocardial Infarction with NSAIDs in Real World Use: Bayesian Meta- .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint https://github.com/acnash/rdrugtrajectory https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 30 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records Analysis of Individual Patient Data.” British Medical Journal, 357, j1909. doi:10.1136/ bmj.j1909. Ghosh RE, Crellin E, Beatty S, Donegan K, Myles P, Williams R (2019). “How Clinical Practice Research Datalink data are used to support pharmacovigilance.” Therapeutic Advances in Drug Safety, 10, 1–7. doi:10.1177/2042098619854010. Hepp Z, Dodick DW, Varon SF, Chia J, Matthew N, Gillard P, Hansen RN, Devine EB (2017). “Persistence and Switching Patterns of Oral Migraine Prophylactic Medications Among Patients with Chronic Migraine: A Retrospective Claims Analysis.” Cephalalgia, 37(5), 470–485. doi:10.1177/0333102416678382. Oyinlola JO, Campbell J, Kousoulis AA (2016). “Is Real World Evidence Influencing Practice? A Systematic Review of CPRD Research in NICE Guidance.” BMC Health Service Research, 16(299), 1–12. doi:10.1186/s12913-016-1562-8. Affiliation: Nuffield Department of Clinical Neurosciences Medical Sciences Division University of Oxford Oxford UK OX3 9DU E-mail: anthony.nash@ndcn.ox.ac.uk Journal of Statistical Software http://www.jstatsoft.org/ published by the Foundation for Open Access Statistics http://www.foastat.org/ MMMMMM YYYY, Volume VV, Issue II Submitted: yyyy-mm-dd doi:10.18637/jss.v000.i00 Accepted: yyyy-mm-dd .CC-BY-ND 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint http://dx.doi.org/10.1136/bmj.j1909 http://dx.doi.org/10.1136/bmj.j1909 http://dx.doi.org/10.1177/2042098619854010 http://dx.doi.org/10.1177/0333102416678382 http://dx.doi.org/10.1186/s12913-016-1562-8 mailto:anthony.nash@ndcn.ox.ac.uk http://www.jstatsoft.org/ http://www.foastat.org/ http://dx.doi.org/10.18637/jss.v000.i00 https://doi.org/10.1101/2021.01.08.425952 http://creativecommons.org/licenses/by-nd/4.0/ 10_1101-2021_01_08_425967 ---- Partition Quantitative Assessment (PQA): A quantitative methodology to assess the embedded noise in clustered omics and systems biology data Partition Quantitative Assessment (PQA): A quantitative methodology to assess the embedded noise in clustered omics and systems biology data Camacho-Hernández, Diego A.1,2†, Nieto-Caballero, Victor E.1,2†, León-Burguete, José E.1,2, and Freyre-González, Julio A.1,* 1 Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology and 2 Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México (UNAM), Morelos, Mexico. † These authors contributed equally to this work. * Corresponding author: jfreyre@ccg.unam.mx Abstract: Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Many of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical measures; but none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, to assess this problem. Keywords: omics data; hierarchical clustering; noise quantification. 1. Introduction A common task in today’s research is the identification of specific markers, as predictors of a classification yielded in clustering analysis of the data. For instance, this approach is particularly useful after high-throughput experiments to compare gene expression or methylation profiles among different cell lines [1]. This task is coming handful in the nascent field of single-cell sequencing, leading the important step of clustering cells to further classification or as a qualifying metric of the sequencing process [2]. Regarding the vastly used gene expression assays, the vector of profiles for each marker across different cell lines is recorded using hierarchical clustering algorithms. These algorithms yield a dendrogram and a heat map representing the vector of marker profiles, illustrating the arrangement of the clusters. To assess how well the clustering is segregating different cell lines, a class stating the desired partitioning of each cell line is provided a posteriori. Then, a simple visual inspection of the vector of classes is used to estimate whether the clustering is providing a good partition. Such partition vector is colored according to the classification that each item is associated with, and it is expected that similar items will be contiguous, so the formed groups are assessed qualitatively on the biological background of each item. This procedure should not be confused with “supervised clustering”, which provides a vector of classes starting the desired partitioning a priori. This is then used to guide the clustering algorithms by allowing the learning of the metric distances that optimizes the partitioning [3]. Additionally, it may get confused with the metric assessment of the clustering algorithms, especially with the external cluster evaluation. For this, various metrics have been developed to qualify the clustering algorithm itself, such as intrinsic and extrinsic measures. These metrics are used for clustering algorithm validation. The extrinsic validation compares the clustering to a goal to say whether it is a good clustering or not. The internal validation compares the elements within the cluster and their differences [4]. PQA involves characteristics of both kinds of validation, through using both the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ crafted goal standard and the yielded signal itself (clustered vector). However, PQA gathers these elements not qualifying the clustering algorithm itself but to quantify the noise embedded in the cluster, this noise may be due to the intrinsic metric or marker used to order the data set. A possible caveat of the qualitative assessment discussed above is that humans tend to perceive meaningful patterns within random data leading to a cognitive bias known as apophenia [5]. While interpreting the partitions obtained from unsupervised clustering analysis, researchers attempt to visually assess how close the classifications are to each other finding patterns that are not well supported by the data. Such an effect is raised because the adjacency between items may give a notion of the dissimilarity distance in the dendrogram leaves. Unfortunately, as much as we know, there is no method to quantitatively assess the quality of the groups of classifications from the clustering or, at least, there is no attempt to quantify whether certain configuration or order of the items may be due to randomness. This is a serious caveat, since the insertion of noise can lead to false conclusion or misleading results. Furthermore, the purging of this noise can lead to a more efficient descriptions of markers and its phenomena, accelerating the advance in many fields. In statistics, serial correlation (SC) is a term used to describe the relationship between observations of the same variable over specific periods. It was originally used in engineering to determine how a signal, for instance, a radio wave, varies with itself over time. Later, SC was adapted to econometrics to analyze economic data over time principally to predict stock prices and, in other fields, to model-independent random variables [6]. We applied the SC to propose a manner to quantify how well is the grouping of a posterior classification just by retrieving the results of unsupervised clustering analysis. Thus, we propose a novel relative score, PQA, to solve the subjectivity of the visual inspection and to statistically quantify how much noise is embedded in the results of clustering analysis. 2. Methodology 2.1. Assigning numeric labels to classifications A vector denoting the putative similarities among the variables in a study is usually obtained after a clustering analysis. Each variable is classified to generate a vector of profiles (VP). Such a vector of classifications is usually translated into a colors vector, in which each color represents a classification. It is common to inspect this vector to find groups that make sense according to the analyzed data. To the method presented in this work, the VP may be as simple as a vector of strings or numbers that represent the input. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ Figure 1. The pipeline of the PQA methodology. Whatever representation of the classifications may be, it is necessary to transform the classifications to a vector of numeric labels, in which a number represents a classification, to be able to calculate SC. To accomplish this, we assign the first numeric label (number 1) to the first item in the vector, which usually lays at one of the vector’s extremes. Then, if the classification o the next item is different from the previous one, the next number in the sequence is assigned, and so on. This way of labeling warrants that the changes in the SC values are due to the order of numbers, that is to say, the grouping of the classifications resulting from the clustering, and it is not an artifact of the labeling itself (Figure 1). 2.2. PQA score Because the order of the VP could be interpreted as the grouping of the classifications, we measure how well the same classifications are held together in the VP through an SC shifted one position. Such sort of correlation is defined as the Pearson-product-moment correlation between the VP discarding the first item, and the VP discarding the last (Equation 1, xi (order vector i-th position), n (length of x), 𝜌𝑖 (resulting SC)). 𝜌𝑖 = ∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛 𝑗=2 𝑛−1 ) ∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛−1 𝑗=1 𝑛−1 ) 𝑛−1 𝑖=1 𝑛 𝑖=2 √∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛 𝑗=2 𝑛−1 ) 2 𝑛 𝑖=2 √∑ (𝑥𝑖 − ∑ 𝑥𝑖 𝑛−1 𝑗=1 𝑛−1 ) 2 𝑛−1 𝑖=1 (1) We then define the PQA as the SC of the VP after removing background noise, normalized for the SC of the percent grouping partitions (defined as the sorted vector in ascending order). This, the more similar VP is to its sorted vector, the higher the score is yielded (Equation 2, 𝝆𝒙 (SC of the VP), 𝝆𝑹𝒂𝒏𝒅𝒙̅̅ ̅̅ ̅̅ ̅̅ ̅ (Mean of the SC of one thousand randomizations), 𝝆𝑷𝒆𝒓𝒇𝒆𝒄𝒕𝒙 SC of the sorted vector in ascending order)). 𝑷𝑸𝑨𝒙 = 𝝆𝒙−𝝆𝑹𝒂𝒏𝒅𝒙̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ 𝝆𝑷𝒆𝒓𝒇𝒆𝒄𝒕𝒙 (2) 2.3. Background-noise correlation factor in the PQA score .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ To compute the background-noise correlation factor in the PQA score definition, we sample the indexes of the VP and the swapping the corresponding items. This background correction is aimed to remove inherent noise in the data, even though the score may still be subjected to noise from the chosen clustering algorithm or discrepancies in the posterior classification. 2.4. Statistical significance of the PQA score To quantify the statistical significance of the PQA score, we calculate a Z-score (Equation 3), 𝒛𝒙 = 𝑷𝑸𝑨𝒙−𝑷𝑸𝑨𝑹𝒂𝒏𝒅̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ 𝑺𝑫𝑷𝑸𝑨𝑹𝒂𝒏𝒅 (3) where 𝑃𝑄𝐴𝑥 is the PQA score of the VP, 𝑃𝑄𝐴𝑅𝑎𝑛𝑑̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅̅ is the mean of PQA scores of one thousand randomizations of the VP. These randomizations have the purpose of generating a solid random background to compare it to the real signal. The number of randomizations does not depend on the size of the VP. It is worth to notice that there are two randomization processes, one is meant to generate the input population of random vectors to calculate the PQA score to further calculate a Z- score and the other is representing the noise in Equation 2. 2.5. Defining noise proportions To provide a quantification of the embedded noise in the VP, we calculate the Z-scores from the distribution of PQA values of the randomized vectors. This shuffling is yielded by scrambling the vector. Then this Z-score is interpolated to retrieve the estimated noise in the VP cluster. 2.6. Effect of the length and number of partitions of the vector in the Z-score distributions. Since we want to compare the PQA with the noise, we randomized 1000 times the VP. We opted to describe the dynamic of the Z-score given the different percentage of noise and the number of partitions. For this, we synthetically crafted vector of both ranging from 0 to 100 elements and number of classifications. The Z-scores were retrieved from the crafted vectors using the formulas described above. 3. Results and Discussion 3.1. Effects of permuted numeric labels on the partition We wondered whether the correct assigning of numeric labels to alter the less possible the SC calculations, so we analyzed how the SC changes over the synthetic partitions with permuted labels. We began generating synthetic partitions in ascending and descending order, increasing both the number of classifications and the number of items, up to 100. It is important to highlight that the number of items belonging to each classification was kept constant. Because trying all the possible permutations for each vector would be implausible, we created a subset of 1000 permutations of each vector, then we calculated the mean SC (Figure 1, see Methodology). We observed that the mean SC got high when the number of items in the VP was greater or equal to 2 times the number of classifications, nevertheless, we got the highest SC when the numeric labels we assigned by sequential order, either ascending or descending (Figure 2). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ Figure 2. Z-scores of the PQA scores from partitions varying in the number of classifications and the length of the partition. 3.2. Length of partitions as a proxy of the number of classifications We wonder whether the number of classifications and the length of the VP may change the statistical significance of the PQA score because of the less the number of items in the VP, the greater the chance to group each item with any order. We then tested such effect by calculating a Z-score from ordered synthetic partitions increasing both the number of classifications and the number of items up to 100. We also kept constant the number of classifications for the sake of this analysis. We noticed that only the length of the partition has a true effect on the Z-score, but that is not the case for the number of classifications. We observed that every partition minor than 13 could be considered as pure noise, however, we consider a Z-score cutoff of greater than 3 (p-value of 0.002). We also observed Z-score values still greater than 2 with a length of 12, 11, and 10, but lesser than with lengths between 2 and 9 (Figure 2). If we were more flexible, we could have laid out a length cutoff on those values without losing statistical significance, since a Z-score of 2 corresponds roughly to a p-value of 0.05. The results of this analysis were expected by intuition because the probability of an item to occupy a position in the VP increases the number of items does the same. 3.3. Proof of concept: Quantifying real noise After a literature revision, we noticed that some datasets were subject to visual inspection in their respective papers, so we applied our method to quantify the proportion of noise embedded in those datasets and to test whether they may lead to apophenia. We choose two datasets from literature because of two main reasons, first, the data should have a high number of items that are way above our Z-score significance threshold (>13) and, second, we wanted contrasting orderings of the partitions so to have one dataset that looks very disordered and another that looks somewhat ordered to compare the noise proportions. Lastly, we assessed the behavior of the metric in highly ordered data. This also matches our threshold mentioned above. 3.3.1. Cancer methylation signatures The first dataset consists of methylation profiles of 242 different cancerous and non-cancerous samples [7] (Figure 3). Though the classifications look very sparse and the groups are torn apart in many subgroups distributed along with the data’s VP. We detected 25.1% of noise and a PQA score .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ of 0.53 (Figure 4, with a Z-score of 8.2 and a p-value of 9.6x10-17), both numbers imply that even though there may be disordered in the VP, there is not a very high noise proportion nor a high PQA score. These results suggest that, like any other statistical test, the longer the number of items in the partition the more diluted is the effect of disorder in the VP, and the results also lead to a greater statistical significance as shown in the analysis of the number of items and classifications. Besides the authors concluded that their clustering analysis results made sense from their molecular and biological background, as well as the perspectives about the analyzed profiles, they only assessed grouping just by visual inspection and concluded the grouping was well done. However, understanding the noise in the cluster can help to pursue better markers since it could help to narrow the search space in these kinds of studies. (a) (b) Figure 3. Visual representation of clustered data used to assess the method. (a) Dataset from Jie Shen et. al. (b) Dataset from Tooyoka et. al. 3.3.2. Distribution of microRNAs in cancer The second dataset consists of 103 expression profiles of microRNAs from three classes of samples: invasive breast cancer, those with ductal carcinoma in situ (DCIS), and health (Figure 3) [8]. The authors visually identified three clusters, though selecting the right cutting height threshold is difficult. Besides, one of the clusters is a mix of classes in different proportions, leading the authors to arguably conclude that the DCIS and control sample profiles are not different. On this matter, the PQA score and the proportion of noise are 0.62 and 30.2%, respectively (Figure 4, with Z-score of 6.2 and a p-value of 3.9x10-10) providing a quantitative assay to support the grouping that the authors claimed. Furthermore, in comparison with the methylation profiles discussed above, we can appreciate that a partition which appear even less fuzzy has even a higher noise ratio, supporting the idea of how visual inspection could lead to misleading results. (a) (b) Figure 4. Z-score distribution by percentage of randomized items. (a) Dataset from Jie Shen et. al. (b) Dataset from Tooyoka et. al. The red dots represent the Z-score interpolation of the corresponding data sets. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ 3.3.3. Comparison of genetic regulatory networks with theoretical models Finally, to assess the PQA methodology using systems biology data we clustered 210 networks according to their pairwise dissimilarity [9]. First, 42 curated biological networks were retrieved from Abasy Atlas (v2.2) [10]. For each biological network, we then constructed four networks each according to a theoretical model (Barabasi-Alberts, Erdos-Renyi, Scale-free, and Hierarchical- modular). We estimated the parameters of each theoretical model from the properties of the corresponding biological network. The models used reproduce one or more intrinsic characteristics of the biological networks, such as power-law distribution, hubs, and scale-free degrees, and hierarchical modular structure [11]. Visual inspection suggested that the classification yielded a highly ordered PV, distinguishing according to the nature of each network (Figure 5). The PQA score for this VP is 0.92 (p-value = 2.5x10-40, Z-score =13.2) and the proportion of noise was 5.8% (Figure 6). In contrast to the previous examples, here we obtained a highly ordered clustering and a very low proportion of noise, which suggests that although the models recapitulate some of the properties of genetic regulatory networks, each of them is not sufficient to capture their structural properties. Figure 5. Cluster analysis of distance among gene regulatory networks and theoretical network models. The abbreviations and colors used in the posterior classification are as follows: Barabasi- Alberts (BA, red), Erdos-Renyi (ER, blue), Scale-free (SF, green), Hierarchical modularity (HM, purple), and biological networks (Bi, orange). .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ Figure 6. Z-score distribution by percentage of randomized items of VP from genetic regulatory networks. The red dot represents the Z-score interpolation of the actual data set. 4. Conclusions In this work, we presented a novel method to quantify the proportion of noise embedded in the grouping of associated classes of the elements in hierarchical clustering. We proposed a relative score derived from an SC of the VP from the dendrogram of any clustering analysis and calculated Z- statistics as well as an extrapolation to deliver an estimation of noise in the VP. We explain how the method is formulated and show the tests we made to systematically refine it. We additionally made a proof of concept by using clustering data from two works that we think perfectly represent overfitting by apophenia. Additionally, we added an example from network biology where clustered networks are separated by intrinsic characteristics. Although in this work we focused on examples where hierarchical clustering is performed, this framework can apply to any partition algorithm in which the elements are identified and a vector of the order can be acquired. We concluded that the clustered sets of biologic data have a high measure of noise, despite looking well grouped. We proved what a minimum number of classifications should be considered in this sort of clustering analysis to have a significant reduction of noise. On the other hand, we permuted the labels of the associated classes and concluded that the effect is negligible. We proved that randomness still plays an important role by biasing the results, though it may not be evident through visual inspection. The PQA could be used as a benchmark to test what clustering algorithm should be appropriate for the analyzed dataset by minimizing the noise proportion and to guide omics experimental designs. Nevertheless, a word of caution, the PQA score alone can be subject to subjectivity if not used properly since it depended on the characteristics of the analyzed data. Thus, the PQA score is thought to be considered as a quantification of noise in clustered data and should be used with discretion. Author Contributions: Conceptualization, J.A.F.G.; methodology, J.A.F.G.; software, D.A.C.H., V.E.N.C., and J.A.F.G.; validation, D.A.C.H., V.E.N.C., and J.A.F.G.; formal analysis, D.A.C.H., V.E.N.C., and J.A.F.G.; investigation, D.A.C.H., V.E.N.C., J.R.L.B., and J.A.F.G.; resources, J.A.F.G.; data curation, D.A.C.H., V.E.N.C., and J.E.L.B.; writing—original draft preparation, D.A.C.H., V.E.N.C., J.E.L.B., and J.A.F.G.; writing—review and editing, D.A.C.H., V.E.N.C., and J.A.F.G.; visualization, D.A.C.H., V.E.N.C., J.E.L.B., and J.A.F.G.; supervision, J.A.F.G.; project administration, J.A.F.G.; funding acquisition, J.A.F.G. All authors have read and agreed to the published version of the manuscript. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ Funding: This work was supported by the Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT-UNAM) [IN205918 to J.A.F.G.]. Conflicts of Interest: The authors declare no conflict of interest. References 1. Kang, S., Kim, B., Park, S.-B., et al. 2013. Stage-specific methylome screen identifies that NEFL is downregulated by promoter hypermethylation in breast cancer. International Journal of Oncology 43(5), pp. 1659–1665, doi:10.3892/ijo.2013.2094. 2. Kiselev, V. Y., Andrews, T. S., & Hemberg, M. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nature Reviews Genetics, 20(5), 273-282, doi:10.1038/s41576-018-0088-9. 3. Al-Harbi, S.H. and Rayward-Smith, V.J. 2006. Adapting k-means for supervised clustering. Applied Intelligence 24(3), pp. 219–226, doi:10.1007/s10489-006-8513-8. 4. Hassani, M., & Seidl, T. (2017). Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam Journal of Computer Science, 4(3), 171-183, doi:10.1007/s40595-016-0086-9. 5. Fyfe, S., Williams, C., Mason, O.J. and Pickup, G.J. 2008. Apophenia, theory of mind and schizotypy: perceiving meaning and intentionality in randomness. Cortex 44(10), pp. 1316–1325, doi:10.1016/j.cortex.2007.07.009. 6. Getmansky, M., Lo, A.W. and Makarov, I. 2004. An econometric model of serial correlation and illiquidity in hedge fund returns. Journal of financial economics 74(3), pp. 529–609, doi:10.1016/j.jfineco.2004.04.001 . 7. Shen, J., Hu, Q., Schrauder, M., et al. 2014. Circulating miR-148b and miR-133a as biomarkers for breast cancer detection. Oncotarget 5(14), pp. 5284–5294, doi:10.18632/oncotarget.2014. 8. Toyooka, S., Toyooka, K. O., Maruyama, R., Virmani, A. K., Girard, L., Miyajima, K., ... & Brambilla, E. (2001). DNA Methylation Profiles of Lung Tumors. Molecular cancer therapeutics, 1(1), 61-67. 9. Schieber, T. A., Carpi, L., Díaz-Guilera, A., Pardalos, P. M., Masoller, C., & Ravetti, M. G. (2017). Quantification of network structural dissimilarities. Nature communications, 8(1), 1-10. 10. Escorcia-Rodríguez, J. M., Tauch, A., & Freyre-González, J. A. (2020). Abasy Atlas v2. 2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization. Computational and Structural Biotechnology Journal, doi:10.1016/j.csbj.2020.05.015. 11. Barabasi, A. L., & Oltvai, Z. N. (2004). Network biology: understanding the cell's functional organization. Nature reviews genetics, 5(2), 101-113, doi:10.1038/nrg1272. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425967doi: bioRxiv preprint http://f1000.com/work/bibliography/3408741 http://f1000.com/work/bibliography/3408741 http://f1000.com/work/bibliography/3408741 http://f1000.com/work/bibliography/3408741 http://f1000.com/work/bibliography/2985013 http://f1000.com/work/bibliography/2985013 http://f1000.com/work/bibliography/2985013 http://f1000.com/work/bibliography/2985013 http://f1000.com/work/bibliography/2985013 http://f1000.com/work/bibliography/8693887 http://f1000.com/work/bibliography/8693887 http://f1000.com/work/bibliography/8693887 http://f1000.com/work/bibliography/8693887 http://f1000.com/work/bibliography/8561657 http://f1000.com/work/bibliography/8561657 http://f1000.com/work/bibliography/8561657 http://f1000.com/work/bibliography/8561657 https://doi.org/10.1101/2021.01.08.425967 http://creativecommons.org/licenses/by/4.0/ 10_1101-2021_01_08_425918 ---- 94119894 A global cancer data integrator reveals principles of synthetic lethality, sex disparity and immunotherapy. Christopher Yogodzinski1,2,#*, Abolfazl Arab1-3, Justin R. Pritchard4, Hani Goodarzi1-3, Luke A. Gilbert1,2,5* 1 Department of Urology, University of California, San Francisco, San Francisco, CA, USA 2 Helen Diller Family Comprehensive Cancer Center, San Francisco, San Francisco, CA, USA 3 Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA 4 Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 5 Department of Cellular & Molecular Pharmacology, University of California, San Francisco, CA, USA # Current Address: University of North Carolina Chapel Hill School of Medicine, Chapel Hill, NC, USA *Corresponding authors Correspondence: cyogodzi@unc.edu (C.Y.), luke.gilbert@ucsf.edu (L.A.G) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Abstract Advances in cancer biology are increasingly dependent on integration of heterogeneous datasets. Large scale efforts have systematically mapped many aspects of cancer cell biology; however, it remains challenging for individual scientists to effectively integrate and understand this data. We have developed a new data retrieval and indexing framework that allows us to integrate publicly available data from different sources and to combine publicly available data with new or bespoke datasets. Beyond a database search, our approach empowered testable hypotheses of new synthetic lethal gene pairs, genes associated with sex disparity, and immunotherapy targets in cancer. Our approach is straightforward to implement, well documented and is continuously updated which should enable individual users to take full advantage of efforts to map cancer cell biology. Introduction Large scale but often independent efforts have mapped phenotypic characteristics of more than one thousand human cancer cell lines. Despite this, static lists of univariate data generally cannot identify the underlying molecular mechanisms driving a complex phenotype. We hypothesized that a global cancer data integrator that could incorporate many types of publicly available data including functional genomics, whole genome sequencing, exome sequencing, RNA expression data, protein mass spectrometry, DNA methylation profiling, ChIP- seq, ATAC-seq, and metabolomics data would enable us to link disease features to gene products 1–15. We set out to build a resource that enables cross platform correlation analysis of multi-omic data as this analysis is in and of itself is a high-resolution phenotype. Multi-omic analysis of (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 functional genomics data with genomic, metabolomic or transcriptomic profiling can link cell state or specific signaling pathways to gene function 2,3,13,15–18. Lastly, co-essentiality profiling across large panels of cell lines has revealed protein complexes and co-essential modules that can assign function to uncharacterized genes 19. Problematically, in many cases publicly available data are poorly integrated when considering information on all genes across different types of data and the existing data portals are inflexible. For example, lists of genes cannot be queried against groups of cell lines stratified by mutation status or disease subtype. Furthermore, one cannot integrate new data derived from individual labs or other consortia. We created the Cancer Data Integrator (CanDI) which is a series of python modules designed to seamlessly integrate genomic, functional genomic, RNA, protein and metabolomic data into one ecosystem. Our python framework operates like a relational database without the overhead of running MySQL or Postgres and enables individual users to easily query this vast dataset and add new data in flexible ways. This was achieved by unifying the indices of these datasets via index tables that are automatically accessed through CanDI’s biologically relevant Python Classes. We highlight the utility of CanDI through four types of analysis to demonstrate how complex queries can reveal previously unknown molecular mechanisms in synthetic lethality, sex disparity and immunotherapy. These data nominate new small molecule and immunotherapy anti-cancer strategies in KRAS-mutant colon, lung and pancreatic cancers. Results CanDI is a global cancer data integrator. We set out to integrate three types of data by creating programmatic and biologically relevant abstractions that allow for flexible cross referencing across all datasets. Data from the (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Cancer Cell Line Encyclopedia (CCLE) for RNA expression, DNA mutation, DNA copy number and chromosome fusions across more than 1000 cancer cells lines was integrated into our database with the functional genomics data from the Cancer Dependency Map (DepMap) (Fig. 1a,b and Supplementary Fig. 1) 1,12,20. We also integrated protein-protein interaction data from the CORUM database along with three additional distinct protein localization databases 4,7,11,21. CanDI by default will access the most recent release of data from DepMap although users can also specify both the release and data type that is accessed. The key advantage to this approach is that CanDI enables one to easily input user defined queries with multi-tiered conditional logic into this large integrated dataset to analyze gene function, gene expression, protein localization and protein-protein interactions. CanDI identifies genes that are conditionally essential in BRCA-mutant ovarian cancer. The concept that loss-of-function tumor suppressor gene mutations can render cancer cells critically reliant on the function of a second gene is known as synthetic lethality. Despite the promise of synthetic lethality, it has been challenging to predict or identify genes that are synthetic lethal with commonly mutated tumor suppressor genes. While there are many underlying reasons for this challenge, we reasoned that data integration through CanDI could identify synthetic lethal interactions missed by others. A paradigmatic example of synthetic lethality emerged from the study of DNA damage repair (DDR)22. Somatic mutations in the DNA double-strand break (DSB) repair genes, BRCA1/2, create an increased dependence on DNA single strand break (SSB) repair. This dependence can be exploited through small molecule inhibition of PARP1 mediated SSB repair. Inhibition of PARP provides significant clinical responses in advanced breast and ovarian cancer (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 patients but they ultimately progress22. Thus, new synthetic lethal associations with BRCA1/2 are a potential path towards therapeutic development PARP refractory patients. To illustrate the flexibility of CanDI to mine context specific synthetic sick lethal (SSL) genetic relationships we hypothesized that the genes that modulate response to a PARP1 inhibitor might be enriched for selectively essential proliferation or survival of BRCA1/2-mutant cancer cells. To test this hypothesis, we integrated the results of an existing CRISPR screen that identified genes that modulate response to the PARP inhibitor olaparib23. We then tested whether any of these genes are differentially essential for cell proliferation or survival in ovarian cancer and in breast cancer cell models that are either BRCA1/2 proficient or deficient (Fig. 1c,d). This query revealed that the Fanconi Anemia pathway is selectively essential in BRCA1/2-mutated ovarian cancer models but not in BRCA1/2-wild type ovarian cancer, BRCA1/2-mutated breast cancer or BRCA1/2-wildtype breast cancer models (Fig. 1e and Supplementary Table 1). To our knowledge a SSL phenotype between FANCM and BRCA1/2 has never been reported although a recent paper nominated a role for FANCM and BRCA1 in telomere maintenance24. Importantly, FANCM is a helicase/translocase and thus considered to be a druggable target for cancer therapy25. Clinical genomics data support this SSL hypothesis although this remains to be tested in ovarian cancer patient samples26. Because the DepMap currently only allows single genes to be queried and does not enable users to easily stratify cell lines by mutation such analysis would normally take a user several days to complete manually. Our approach enabled this analysis to be completed using a desktop computer in less than two hours, which includes the visualization of data presented here (Fig. 1e). Figure 1. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Figure 1. (A) A schematic showing human cell models integrated by CanDI. (B) A schematic illustrating types of data integrated by CanDI. (C) A cartoon of a genome-scale CRISPRi screen to identify genes that modulate response to PARP inhibition by Olaparib. (D) A schematic depicting data feature inputs parsed by CanDI. (E) Essentiality of Fanconi Anemia genes in ovarian and breast cancer cell lines separated by BRCA mutation status. A Bayes Factor score of gene essentiality is displayed by a heat map. N=4 BRCA1/2-mutant ovarian cancer, N=27 BRCA-wildtype ovarian cancer, N=5 BRCA1/2-mutant breast cancer, N=19 BRCA1/2-wildtype breast cancer. Conditional genetic essentiality in KRAS- and EGFR- mutant NSCLC cells. Beyond TSGs, many common driver oncogenes such as KRASG12D are currently undruggable, which motivates the search for oncogene specific conditional genetic dependencies. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 We reasoned that CanDI enables us to rapidly search functional genomics data for genes that are conditionally essential in lung cancer cells driven by KRAS- and EGFR-mutations. We stratified non-small cell lung cancer cell (NSCLC) models by EGFR and KRAS mutations and then looked at the average gene essentiality for all genes within each of these 4 subtypes of NSCLC. We observed that KRAS is conditionally self-essential in KRAS-mutant cell models but that no other genes are conditionally essential in KRAS-mutant, EGFR-mutant, KRAS-wildtype or EGFR-wildtype cell models (Fig. 2a,b and Supplementary Table 2). This finding demonstrates that very few---if any--- genes are synthetic lethal with KRAS- or EGFR- in KRAS- and EGFR- mutant lung cancer cell lines. It may be that these experiments are underpowered or it may be that when the genetic dependencies of diverse cell lines representing a disease subtype are averaged across a single variable (e.g. a KRAS-mutation) very few common synthetic lethal phenotypes are observed27. CanDI provides potential solutions for both of these hypotheses. CanDI enables a global analysis of conditional essentiality in cancer. It is thought that data aggregation across vast landscapes of unknown co-variates does not necessarily increase the statistical power to identify rare associations27. Thus, the global analyses of aggregated cancer data sometimes lies in systematically sub setting data based on key co- variates post aggregation. This has been observed in driver gene identification28. Inspired by our analysis of TSG and oncogene conditionally essentiality above, we next used CanDI to identify genes that are conditionally essential in the context of several hundred cancer driver mutations. We first grouped driver mutations (e.g. nonsense or missense) for each driver gene. For this analysis, we selected several thousand genes that are in the 85-90th percentile of essentiality within the DepMap data and therefore conditionally essential, meaning these genes are required (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 for cell growth or survival in a subset of cell lines. Importantly, it is not known why these several thousand genes are conditionally essential. We then tested whether each of these conditionally essential genes has a significant association with individual driver mutations. Our analytic approach does not weight the number of cell models representing each driver mutation nor does this give information on phenotype effect sizes. Our analysis nominates a large number of conditionally dependent genetic relationships with both TSG and oncogenes (Fig. 2c,d and Supplementary Table 3). A number of the conditional genetic dependencies identified in our independent variable analysis above are represented by a limited number of cell models and so further investigation is needed to validate these conditional dependencies, but this data further suggests that averaging genetic dependencies across diverse cell lines with un-modeled covariates obscures conditional SSL relationships. To further investigate this hypothesis, we analyzed these same conditional genetic relationships with a second analytic approach that weights the number of cell models representing each driver mutation. We observed a limited number of conditional genetic dependencies that largely consists of oncogene self-essential dependencies as previously highlighted for KRAS-mutant cell lines (Fig. 2e-g and Supplementary Table 4)13,29. Thus, analysis that averages each conditional phenotype across diverse panels of cell lines with unknown covariates masks interesting conditional genetic dependencies. Figure 2. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Figure 2. (A) Average gene essentiality for KRAS and EGFR in groups of NSCLC cell lines stratified by KRAS mutation status or by both KRAS and EGFR mutation status. N=38 for KRAS-wildtype shown in blue N=19 for KRAS-mutant shown in blue. N=30 for KRAS- wildtype EGFR-wildtype shown in grey and N=16 for KRAS-mutant EGFR-wildtype shown in grey. Gene essentiality is an averaged Bayes Factor score for each group of cell lines. (B) Average gene essentiality for KRAS and EGFR in groups of NSCLC cell lines stratified by EGFR mutation status or by both EGFR and KRAS mutation status. N=46 for EGFR-wildtype shown in blue, N=11 for EGFR-mutant shown in blue. N=30 for EGFR-wildtype KRAS- wildtype shown in grey and N=8 for EGFR-mutant KRAS-wildtype shown in grey. Gene essentiality is an averaged Bayes Factor score for each group of cell lines. (C) P-values from Chi2 tests of gene essentiality and nonsense mutations. (D) P-values from Chi2 tests of gene essentiality and missense mutations. (E) A scatter plot showing effect size of the change in gene essentiality with select missense mutations and the -Log10(P-value) of each essentiality/mutation pair. (F) A scatter plot showing effect size of the change in gene essentiality with select nonsense mutations and the -Log10(P-value) of each essentiality/mutation pair. (G) A scatter plot showing effect size of the change in gene essentiality with all mutations and the -Log10(P-value) of each essentiality/mutation pair. CanDI reveals female and male context specific essential genes in colon, lung and pancreatic cancer. Cancer functional genomics data is often analyzed without consideration for fundamental biological properties such as the sex of the tumor from which each cell line is derived. It is well established that biological sex influences cancer predisposition, cancer progression and response (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 to therapy30. We hypothesized that individual genes may be differentially essential across male and female cell lines. This hypothesis to our knowledge has never been tested in an unbiased large-scale manner. To maximize our statistical power to identify such differences we chose to test this hypothesis in a disease setting with large number of relatively homogenous cell lines and fewer unknown covariates. Using CanDI, we stratified all KRAS-mutant NSCLC, pancreatic adenocarcinoma (PDAC), and colorectal cancer (CRC) by sex and then tested for conditional gene essentiality. This analysis identified a number of genes that are differentially essential in male or female KRAS-mutant NSCLC, PDAC and CRC models (Fig. 3a-f and Supplementary Table 5). The genes that we identify are not common across all three disease types suggesting as one might expect that the biology of the tumor in part also determines gene essentiality. To test whether any association between differentially essential genes could be identified from expression data (e.g essential genes encoded on the Y chromosome) we first used CanDI to identify genes that are differentially expressed between male and female cell lines within each disease 31. We then plotted the set of differentially essential genes against the differentially expressed genes in KRAS-mutant NSCLC, PDAC and CRC models (Fig. 3a,c,e and Supplementary Table 6) and found little overlap between these gene lists. A number of genes that are more essential in male cells, such as AHCYL1, ENO1, GPI and PKM, regulate cellular metabolism. This finding is consistent with previous literature on sex and metabolism32. Our analysis demonstrates that stratifying groups of heterogeneous cancer models by three variables, in this case tumor type, KRAS mutation status and sex, reveals differentially essential genes. CanDi enables biologically principled stratification of data in the CCLE and DepMap by any feature associated with a group of cell models. This stratification allows us to identify genes associated with sex, which is not possible with other covariates included. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Figure 3. Figure 3. (A) Differential gene expression and differential gene essentiality in male and female CRC cell lines. N=7 male cell lines and N=3 female cell lines. (B) The distribution of Bayes factor gene essentiality scores in male and female CRC cell lines. The top seven and bottom (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 three differentially essential genes are shown in violin plots split by the sex of the cell lines. (C) Differential gene expression and differential gene essentiality in male and female NSCLC cell lines. N=9 male cell lines and N=5 female cell lines. (D) The distribution of Bayes factor gene essentiality scores in male and female NSCLC cell lines. The top seven and bottom three differentially essential genes are shown in violin plots split by the sex of the cell lines. (E) Differential gene expression and differential gene essentiality in male and female PDAC cancer cell lines. N=13 male cell lines and N=5 female cell lines. (F) The distribution of Bayes factor gene essentiality scores in male and female PDAC cell lines. The top seven and bottom three differentially essential genes are shown in violin plots split by the sex of the cell lines. CanDI enables rapid integration of external datasets to reveal new immunotherapy targets. An emerging challenge in the cancer biology is how to robustly integrate larger “resource” datasets like CCLE with the vast amount of published data from individual laboratories. For example, a big challenge in antibody discovery is identifying specific surface markers on cancer cells. To approach these big questions we utilized CanDIs ability to rapidly take new datasets, such as raw RNA-seq counts data in a disparate study of interest, then normalize and integrate this data into the CCLE, DepMap and protein localization databases previously described. Specifically, we rapidly integrated an RNA-seq expression dataset that measured the set of transcribed genes in primary lung bronchial epithelial cells from 4 donors 33. Classes within CanDI enable rapid application of DESeq2 to assess the differential expression between outside datasets and the CCLE. We used this feature to identify genes that are differentially expressed between primary lung bronchial epithelial cells and KRAS-mutant NSCLC, EGFR-mutant NSCLC or all NSCLC models in CCLE. We then used CanDI to identify (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 genes that are upregulated in cancer cells over normal lung bronchial epithelial cells with protein products that are localized to the cell membrane. This analysis of KRAS-mutant, EGFR-mutant and pan-NSCLC generated highly similar lists of differentially expressed surface proteins (Fig. 4a-f and Supplementary Table 7). Notably, overexpression of several of these genes, such as CD151 and CD44, has been observed in lung cancer and is associated with poor prognosis 34–36. These proteins represent potential new immunotherapy targets in KRAS-driven NSCLC. Figure 4. Figure 4. (A) A graph showing genes that are upregulated in KRAS-mutant NSCLC cell lines relative to primary human bronchial epithelial cells. A cell membrane protein localization score (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 is shown for each gene. Higher protein localization scores indicate higher confidence annotations. (B) A scatter plot showing gene expression for genes that encode cell surface proteins in KRAS-mutant NSCLC cell lines and primary human bronchial epithelial cells. N=46 for KRAS-mutant NSCLC cell lines and N=4 for primary human bronchial epithelial cells. (C) A graph showing genes that are upregulated in EGFR-mutant NSCLC cell lines relative to primary human bronchial epithelial cells. A cell membrane protein localization score is shown for each gene. Higher protein localization scores indicate higher confidence annotations. (D) A scatter plot showing gene expression for genes that encode cell surface proteins in EGFR-mutant NSCLC cell lines and primary human bronchial epithelial cells. N=21 for EGFR-mutant NSCLC cell lines and N=4 for primary human bronchial epithelial cells. (E) A graph showing genes that are upregulated in NSCLC cell lines relative to primary human bronchial epithelial cells. A cell membrane protein localization score is shown for each gene. Higher protein localization scores indicate higher confidence annotations. (F) A scatter plot showing gene expression for genes that encode cell surface proteins in NSCLC cell lines and primary human bronchial epithelial cells. N=141 for NSCLC cell lines and N=4 for primary human bronchial epithelial cells. Discussion Data integration is a critical requirement in biology research in the era of genomics and functional genomics. Large scale efforts such as the CCLE have revealed genomic features of more than 1000 cell line models. This data has not to our knowledge previously been integrated with functional genomics data in a manner that individual users can enter batched queries that are stratified by disease subtype or mutation status. This is not just a small improvement in functionality, but rather it is an enabling format that makes possible the types of conditional (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 genomics analyses that drive discovery. Moreover, it fills a fundamental gap in the cancer research community that integrates large scale projects with investigator initiated studies Our data framework enables biologists without specialized expertise in bioinformatics to use the full spectrum of data in the CCLE and DepMap in a higher throughput and precise manner. Using CanDI, we identified genes that are selectively essential in male versus female KRAS-mutant NSCLC, PDAC and CRC models. To our knowledge, such analysis has never been performed to begin to query the biologic basis of sex disparity in cancer or cancer therapy. We illustrate another feature of our framework by analyzing a list of hit genes nominated by a bespoke CRISPR drug screen for gene essentiality in BRCA1/2-wild type and BRCA1/2- mutated breast and ovarian cancer. In a third application, we analyzed the principle of synthetic lethality for 17427 genes in 19 KRAS-mutant and 11 EGFR-mutant NSCLC models. We then used CanDI to globally identify genes that are conditionally essential in the context of common cancer driver mutations. Finally, we nominated 12 potential new immunotherapy targets in KRAS-mutant, EGFR-mutant and pan -NSCLC models by using CanDI to identify genes that are differentially expressed in normal bronchial epithelial cells versus NSCLC models that are localized at the plasma membrane. Our data reveal a wealth of new hypotheses that can be rapidly generated from publicly available cancer data. By sharing data flows and use cases with a CanDI community we illustrate the ways in which individual research groups can interact with massive cancer genomics projects without reinventing tools or relying upon DepMap tool releases. We anticipate that CanDI will be widely used in cell biology, immunology and cancer research. Methods (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 CanDI The CanDI data integrator is available at https://github.com/Yogiski/CanDI. CanDI Module Structure The CanDI data integrator is a python library built on top of the Pandas that is specialized in integrating the publicly available data from The Cancer Dependency Map (DepMap Release: 2019 Quarter 3)12, The Cancer Cell Line Encyclopedia (CCLE Release: 2019 Quarter 3) 1, The Pooled In-Vitro CRISPR Knockout Essentiality Screens Database (PICKLES Library: Avana 2018 Quarter 4) 20, The Comprehensive Resource of Mammalian Protein Complexes (CORUM)8 and protein localization data from The Cell Atlas4, The Map of the Cell11, and The In Silico Surfaceome7,21. Data from DepMap and CCLE used in the following analyses are from the 2019Q3 release. Data from PICKLES is from the 2018 Quarter 4 release of DepMap using the Avana library. Access to all datasets is controlled via a python class called Data. Upon import the data class reads the config file established during installation and defines unique paths to each dataset and automatically loads the cell line index table and the gene index table. Installation of CanDI, configuration, and data retrieval is handled by a manager class that is accessed indirectly through installation scripts and the Data class. Interactions with this data are controlled through a parent Entity class and several handlers. The biologically relevant abstraction classes (Gene, CellLine Cancer, Organelle, GeneCluster, CellLineCluster) inherit their methods from Entity. Entity methods are wrappers for hidden data handler classes who perform specific transformations, such as data indexing and high throughput filtering. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Differential Expression In all cases where it is mentioned differential expression was evaluated using the DESeq2 R package (Release 3.10) 31. Significance was considered to be an adjusted p-value of less than 0.01. Differential Essentiality Essentiality scores are taken from the PICKLES database (Avana 2018Q4). To reduce the number of hypotheses posed during this analysis the mutual information of gene essentiality was calculated using the mutual information metric from the python package SciKitLearn (Version 0.22.0). Genes with mutual information scores greater than one standard devation above the median were removed from consideration. Differential essentiality was evaluated by performing a Mann-Whitney u-test between two groups on every gene that passed the mutual information filter. Significance was considered to be a p-value of less than 0.01. Magnitude of differential essentiality of a given gene was shown as the difference in mean Bayes factors between two groups of cell lines. Protein Localization Confidence Protein localization data was assembled from The Cell Atlas4, The Map of the Cell11, and The In Silico Surfaceome7,21. Confidence annotations were taken from the supplemental data of each paper and put on a number scale from 0 to 4 and summed for a total confidence score for each localization annotation for every gene where across all three papers. The analysis shown in Figure 4 represents a gene list that was further manually curated to remove the genes that are (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 localized to the intracellular space at the cell membrane revealing cell surface protein targets that are highly expressed in NSCLC cancer models over normal lung bronchial epithelial cells 4,7,11,21. DepMap Creative Commons License When an individual user runs CanDI they are downloading DepMap data and thus are agreeing to a CC Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/). Synthetic Lethality of Fanconi Anemia Genes in Ovarian and Breast Cancer Models We made a list of the top 50 gene hits that confer sensitivity to PARP inhibition in HeLa cells23. Using CanDI the essentiality scores of these top hits were visualized across all ovarian cancer cell models in PICKLES (Avana 2018Q4). FANCA and FANCE showed selective essentiality in the BRCA1/2 mutant ovarian cancer cell lines. Following this observation CanDI was used to gather the gene essentiality for all FANC genes in the fanconi anemia pathway. CanDI was then used to visualize these data across all ovarian and breast cancer cell lines, sorting by BRCA1/2 mutation status. Synthetic Lethality in KRAS and EGFR mutant Cell Lines CanDI was leveraged to bin NSCLC cell lines present in both CCLE (Release: 2019Q3) and PICKLES (Avana 2018Q4) into 8 groups. KRAS mutant and KRAS wild type cell lines with and without EGFR mutants removed as well as EGFR mutant and EGFR wild type cell lines with and without KRAS mutants removed. The mean essentiality score for every gene in the genome was calculated for every group of cell lines. Synthetic lethality score per gene is defined as the change in mean essentiality from the mutant groups to the wild type groups. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Pan Cancer Synthetic Lethality Analysis A set of 299 core oncogenes and tumor suppressor driver mutations was chosen for analysis37. To test the effect of these gene’s mutations on gene essentiality CanDI was leveraged to split into two groups: a nonsense mutation group containing genes annotated as tumor suppressors (N=153) and a missense mutation group containing genes annotated as oncogenes with specific driver protein changes (N=53). CanDI was then used to collect a core set of genes with highly variable essentiality. To do this the Bayes factors from the PICKLES database (Avana 2018Q4) were converted to binary numeric variables. Bayes factors over 5 were assigned a 1=essential and Bayes factors under 5 were assigned a 0=non-essential. Genes were then sorted buy their variance across cell lines and genes between the 85th and 95th percentile were used for this analysis (N=2340). To determine a short list of genes with which to follow up on Chi2 tests were applied to the 95940 gene pairs in the missense group and the 603720 gene pairs in the tumor suppressor group. Three new groups were formed for further analysis: the first consisted of the significant gene/mutation pairs from the oncogenic group, the second consisted of the significant gene/mutation pairs from the tumor suppressor group, and the third was a combination of the significant pairs from both groups with no discrimination on the type of mutations considered. These groups were further analyzed for differential essentiality via the Mann Whitney method described above and the Cohens D effect size were calculated to measure the extent of the phenotype. Differential Expression and Essentiality of Male and Female KRAS driven cancers (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 We used CanDI to gather all cell lines that are present in both PICKLES (Avana 2018Q4) and CCLE (Release 2019Q3). CanDI was then leveraged to put these cell lines into the following tissue groups: KRAS mutant Colon/Colorectal, PDAC, and NSCLC. Each tissue group was then split into male and female sub-groups. Differential expression was analyzed by applying the methods described above to raw RNA-seq counts data from CCLE (Release: 2019Q3). Genes with adjusted p-values less than 0.01 were considered significantly differentially expressed. Differential essentiality was analyzed using the methods described above on the previously described sex-subgroups for each tissue type. Genes with p-values less than 0.01 were considered significantly differentially essential between male and female cell models. For each tissue type the distributions of the top 7 significantly differentially essential genes were highlighted in comparison with the bottom 3 as a negative control. Differential expression of benign and malignant cancer cell lines We downloaded human bronchial epithelial (HBE) RNA-seq data from Gillen et al via the European Nucleotide Archive to use as a benign lung tissue model33. This 4 data set contains gene expression data for primary HBE cells cultured from three different donors and also NHBE cells (Lonza CC-2541, a mixture of HBE and human tracheal epithelial cells). We then used CanDI to put NSCLC models into three different groups: KRAS mutant, EGFR mutant, and all cell lines. For our benign model raw counts were quantified via kallisto38. Raw counts for our malignant cell lines were queried via CanDI. DESeq2 was then applied to evaluate the differential expression between our normal lung tissue model and our three malignant lung tissue groups. The results from DESeq2 were then filtered by significance (adjusted p-value < 0.01). To filter based on potential immunotherapy targets we removed all genes not annotated as being (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 localized to the plasma membrane, and genes with localization confidence scores lower than six. Genes that were obviously mis-annotated as surface proteins were also manually removed. Supplementary Figure/Table Legends Supplementary Figure 1. Supplementary Figure 1. An Object-oriented schema diagram showing core structure of CanDI software. Supplementary Table 1. A table containing raw PICKLES Bayes factors displayed in the heat map of Fig. 1e. Supplementary Table 2. A table containing mean PICKLES Bayes factors for each series displayed in Fig. 2a,b. A (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Supplementary Table 3. A table containing the data for all chi2 tests performed to generate Fig. 2c,d. Supplementary Table 4. A table containing the data for scatter plots shown in Fig. 2e,f,g. Supplementary Table 5. A table containing the data from the differential essentiality analysis for all three tissues in Fig. 3a-f. Supplementary Table 6. A table containing the data from the differential expression analysis for all three tissues in Fig. 3a,c,e. Supplementary Table 7. A table containing the differential expression analysis data merged with the location data for all three tissues shown in Fig. 4. Acknowledgements We thank everyone in the Gilbert lab for helpful comments and discussion. LAG is supported by K99/R00 CA204602 and DP2 CA239597 as well as the Goldberg-Benioff Endowed Professorship in Prostate Cancer Translational Biology. Conflicts of Interest None Bibliography 1. Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). 2. Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. 25, 850–860 (2019). 3. Tsherniak, A. et al. Defining a Cancer Dependency Map. Cell 170, 564-576.e16 (2017). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 4. Thul, P. J. et al. A subcellular map of the human proteome. Science 356, (2017). 5. Cancer Cell Line Encyclopedia Consortium & Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line data sets. Nature 528, 84–87 (2015). 6. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). 7. Bausch-Fluck, D. et al. The in silico human surfaceome. PNAS 115, E10988–E10997 (2018). 8. Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes- 2019. Nucleic Acids Res. 47, D559–D563 (2019). 9. Nusinow, D. P. et al. Quantitative Proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 387-402.e16 (2020). 10. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017). 11. Itzhak, D. N., Tyanova, S., Cox, J. & Borner, G. H. Global, quantitative and dynamic mapping of protein subcellular localization. Elife 5, (2016). 12. Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017). 13. Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019). 14. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015). 15. Hart, T. et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype- Specific Cancer Liabilities. Cell 163, 1515–1526 (2015). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 16. Wang, T. et al. Gene Essentiality Profiling Reveals Gene Networks and Synthetic Lethal Interactions with Oncogenic Ras. Cell 168, 890-903.e15 (2017). 17. Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature 568, 551–556 (2019). 18. Adamson, B. et al. A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 167, 1867-1882.e21 (2016). 19. Wainberg, M. et al. A genome-wide almanac of co-essential modules assigns function to uncharacterized genes. http://biorxiv.org/lookup/doi/10.1101/827071 (2019) doi:10.1101/827071. 20. Lenoir, W. F., Lim, T. L. & Hart, T. PICKLES: the database of pooled in-vitro CRISPR knockout library essentiality screens. Nucleic Acids Res 46, D776–D780 (2018). 21. Bausch-Fluck, D. et al. A Mass Spectrometric-Derived Cell Surface Protein Atlas. PLoS One 10, (2015). 22. O’Connor, M. J. Targeting the DNA Damage Response in Cancer. Mol. Cell 60, 547–560 (2015). 23. Zimmermann, M. et al. CRISPR screens identify genomic ribonucleotides as a source of PARP-trapping lesions. Nature 559, 285–289 (2018). 24. Pan, X. et al. FANCM, BRCA1, and BLM cooperatively resolve the replication stress at the ALT telomeres. PNAS 114, E5940–E5949 (2017). 25. Lou, K., Gilbert, L. A. & Shokat, K. M. A Bounty of New Challenging Targets in Oncology for Chemical Discovery. Biochemistry 58, 3328–3330 (2019). 26. Narayan, G. et al. Promoter Hypermethylation of FANCF: Disruption of Fanconi Anemia- BRCA Pathway in Cervical Cancer. Cancer Res 64, 2994–2997 (2004). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 27. Ideker, T., Dutkowski, J. & Hood, L. Boosting signal-to-noise in complex biology: prior knowledge is power. Cell 144, 860–863 (2011). 28. Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016). 29. Lou, K. et al. KRASG12C inhibition produces a driver-limited state revealing collateral dependencies. Sci Signal 12, (2019). 30. Cancer Disparities - National Cancer Institute. https://www.cancer.gov/about- cancer/understanding/disparities (2016). 31. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014). 32. Rubin, J. B. et al. Sex differences in cancer mechanisms. Biol Sex Differ 11, (2020). 33. Gillen, A. E. et al. Molecular characterization of gene regulatory networks in primary human tracheal and bronchial epithelial cells. J. Cyst. Fibros. 17, 444–453 (2018). 34. Mj, K. et al. Prognostic Significance of CD151 Overexpression in Non-Small Cell Lung Cancer. Lung cancer (Amsterdam, Netherlands) vol. 81 https://pubmed.ncbi.nlm.nih.gov/23570797/ (2013). 35. Ko, Y. H. et al. Prognostic significance of CD44s expression in resected non-small cell lung cancer. BMC Cancer 11, 340 (2011). 36. Penno, M. B. et al. Expression of CD44 in Human Lung Tumors. Cancer Res 54, 1381–1387 (1994). 37. Bailey, M. H. et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371-385.e18 (2018). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 38. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34, 525–527 (2016). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Count sgRNAs abundance by deep sequencing to measure gene/drug phenotypes T0 SampleCRISPR Hela cell line Lentiviral transduction of genome-scale CRISPR sgRNA library Olaparib Untreated 1 1 3 2 Hela Cell Line CAL51 Cell Line KPL1 Cell Line ZR751 Cell Line ... COV362 Cell Line JHOS2 Cell Line TOV31G Cell Line ... Breast cancer Cervical cancer Ovarian cancer CA B D E CanDI Integration Cancer Data Integrator Essentiality Mutation ... CanDI Cellular Genomics Functional Genomics Transcriptomics Proteomics Vs. 2 3 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 −40 −20 0 20 40 60 Differential Essentiality (Δ Average BF) −10.0 −7.5 −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 PPP1R15B CFLAR NXT1 CTNNB1 SLC4A7 MANSC1 AHCYL1 ARHGEF10L MRPL20 EFCAB11 C ol on Non-Sigfnificant Differentially Expressed Differentially Essential Shown in Violin Plots PP P1 R1 5B CF LA R NX T1 CT NN B1 SL C4 A7 MA NS C1 AH CY L1 AR HG EF 10 L MR PL 20 EF CA B1 1 Gene −60 −40 −20 0 20 40 60 80 100 B ay es F ac to r Top Hit Female Top Hit Male −30 −20 −10 0 10 20 30 Differential Essentiality (Δ Average BF) −10.0 −7.5 −5.0 −2.5 0.0 2.5 5.0 7.5 D iff er en ti al E xp re ss io n ( Lo g2 (F C )) BCL2L1 GPI ENO1 RTCB PKM WAC PCID2 ARHGAP12 SLC19A2 GPR137 BC L2 L1 GP I EN O1 RT CB PK M W AC PC ID 2 AR HG AP 12 SL C1 9A 2 GP R1 37 Gene −50 −25 0 25 50 75 100 B ay es F ac to r −30 −20 −10 0 10 20 30 Differential Essentiality (Δ Average BF) −10 −5 0 5 10 15 20 CHMP3 CHMP5 HAUS6 WLS KATNB1 ID1 ACSL3 KCNE1 RUFY1 KRT16 Pa nc re as CH MP 3 CH MP 5 HA US 6 W LS KA TN B1 ID 1 AC SL 3 KC NE 1 RU FY 1 KR T1 6 Gene −50 −25 0 25 50 75 100 B ay es F ac to r Lu ng Negative Control Female Negative Control Male Essential Gene ThresholdM or e Es se nt ia l Le ss E ss en tia l M or e Es se nt ia l Le ss E ss en tia l M or e Es se nt ia l Le ss E ss en tia l Female Cell LinesMale Cell Lines More Essential In More Essential In Male Cell Lines More Essential In Female Cell Lines More Essential In Male Cell Lines More Essential In Female Cell Lines More Essential In U p re gu la te d In U p re gu la te d In D iff er en ti al E xp re ss io n ( Lo g2 (F C )) U p re gu la te d In M al e C el l L in es U p re gu la te d In Fe m al e C el l L in es D iff er en ti al E xp re ss io n ( Lo g2 (F C )) U p re gu la te d In U p re gu la te d In M al e C el l L in es Fe m al e C el l L in es M al e C el l L in es Fe m al e C el l L in es A B C D E F (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 0 2 4 6 8 10 12 14 16 Log2(Fold Change) 0 10 20 30 40 50 60 70 80 -L og 10 (Q V al ue ) CD151 SLC4A2 B2M ITGA3 SLC3A2 HLA-C CD44 LRPAP1 DDR1 VDAC2 SLC29A1 SLCO4A1 KRAS Mutant CD151 SLC4A2 B2M ITGA3 SLC3A2 HLA-C CD44 LRPAP1 DDR1 VDAC2 SLC29A1 SLCO4A1 Gene 0 2 4 6 8 10 12 14 Lo g2 ( TP M + 1 ) KRAS Mutant Cell Line Type Benign Bronchial Malignant 0 2 4 6 8 10 12 14 16 Log2(Fold Change) 0 10 20 30 40 50 -L og 10 (Q V al ue ) B2M SLC4A2 CD151 ITGA3 ATP1A1 SLC3A2 CD44DDR1 HLA-CLRPAP1 ITGA5 TFPI EGFR Mutant B2M SLC4A2 CD151 ITGA3 ATP1A1 SLC3A2 CD44 DDR1 HLA-C LRPAP1 ITGA5 TFPI Gene 0 2 4 6 8 10 12 14 Lo g2 ( TP M + 1 ) EGFR Mutant 0 5 10 15 20 25 Log2(Fold Change) 0 10 20 30 40 -L og 10 (Q V al ue ) B2M CD151 THY1 SLC3A2 SLC4A2 LRPAP1 HLA-C DDR1 SLC29A1 ITGA3 PTGFRN VDAC2 All Lung Cancer B2M CD151 THY1 SLC3A2 SLC4A2 LRPAP1 HLA-C DDR1 SLC29A1 ITGA3 PTGFRN VDAC2 Gene 0 2 4 6 8 10 12 14 Lo g2 ( TP M + 1 ) All Lung Cancer Location Confidence 6 7 8 9 10 A B C D E F (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 Gene Essentiality in KRAS MT Cell Lines (Average BF) G en e Es se nt ia lit y in K R AS W T C el l L in es ( Av er ag e BF ) KRAS EGFR KRAS EGFR More EssentialLess Essential M ore Essential Less Essential Essential Gene Threshold EGFR MT Included EGFR MT Removed Gene Essentiality in EGFR MT Cell Lines (Average BF) G en e Es se nt ia lit y in E G FR W T C el l L in es ( Av er ag e BF ) KRAS EGFR KRAS EGFR More EssentialLess Essential M ore Essential Less Essential Essential Gene Threshold KRAS MT Included KRAS MT Removed A B C Es se nt ia lit y Nonsense Tumor Supressor Genes Context Speci�c 0 Effect Size 0.0 BRAF/BRAF NRAS/NRAS KRAS/KRAS HRAS/HRAS 0 Effect Size 0 Effect Size 0 KRAS/KRAS NRAS/NRAS BRAF/BRAF HRAS/HRAS NRAS/KRAS Non-Hit Signi�cant Hit Essentiality/Mutation Missense All Mutations Nonsense E F G More Essential Less Essential 0.00 0.05 1.00 P-value D Missense Oncogenes Tumor Supressor Genes Context Speci�c Mutations (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918 A (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425918doi: bioRxiv preprint https://doi.org/10.1101/2021.01.08.425918